Lead Data Engineer Skill need
2 min readAug 12, 2023
- Python: Python is a general-purpose programming language that is often used for data engineering tasks. It is known for its readability and its large library of modules that can be used for data manipulation, analysis, and visualization.
- Spark: Spark is a unified analytics engine that is used for large-scale data processing. It is fast, scalable, and fault-tolerant, making it ideal for processing big data.
- Pandas: Pandas is a Python library that is used for data manipulation and analysis. It provides a variety of tools for working with structured and unstructured data, including data cleaning, data wrangling, and data analysis.
- Jupyter Notebooks: Jupyter Notebooks are a web-based interactive environment that is used for creating and sharing documents that contain code, text, and visualizations. They are a popular tool for data scientists and data engineers to explore and analyze data.
- Kubernetes: Kubernetes is an open-source container orchestration system that is used to automate the deployment, scaling, and management of containerized applications.
- Airflow: Airflow is an open-source workflow management platform that is used to automate the execution of tasks. It is often used for data engineering tasks, such as ETL pipelines.
- Terraform: Terraform is an open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure. It is often used for managing cloud infrastructure, such as AWS, Azure, and GCP.
- Ansible: Ansible is an open-source automation tool that is used to configure and manage systems. It is often used for managing cloud infrastructure, such as AWS, Azure, and GCP.
- GitHub Actions: GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that is used to automate the build, test, and deploy of software. It is often used for data engineering projects, such as ETL pipelines.
- AWS EMR: AWS EMR is a managed Hadoop and Spark platform that is used for big data processing. It is a popular choice for data engineering projects that require the processing of large datasets.
- S3: Amazon Simple Storage Service (S3) is a scalable, durable, and highly available object storage service that is used for storing data. It is a popular choice for storing data that is used for data engineering projects.
- Glue: AWS Glue is a managed extract, transform, and load (ETL) service