Position : Senior ETL Pipeline Engineer
Location : Dallas, TX (Onsite – Hybrid)
Job Type : Contract
Note : Must have min 10+ years of experience in IT, looking for H1B candidates – visa copy and passport number are mandatory.
Job Description
Overview:
Seeking an experienced Senior ETL Pipeline Engineer with strong expertise in building scalable, cloud-agnostic data pipelines using modern data engineering tools and platforms. This role involves end-to-end ownership of ETL development, from design through deployment, in a containerized and orchestrated environment. The ideal candidate is comfortable working across multi-cloud and hybrid infrastructures, integrating diverse data sources, and supporting long-term data initiatives.
Key Responsibilities:
- Design, develop, and manage robust ETL pipelines using Python and Apache Spark to process large-scale datasets across structured, semi-structured, and unstructured formats.
- Containerize ETL workflows using Docker for portability and deploy them using Kubernetes for scalability and fault tolerance.
- Leverage Apache Airflow for orchestrating and scheduling complex data workflows.
- Build and maintain cloud-agnostic pipelines capable of running in multi-cloud or hybrid (cloud + on-premises) environments.
- Integrate data from a variety of sources, including Hadoop ecosystem, RDBMS, NoSQL databases, REST APIs, and third-party data providers.
- Work with data lake architectures and technologies such as Amazon S3, Trino, Presto, and Athena to support analytics and reporting use cases.
- Implement CI/CD practices to automate the deployment and update processes for ETL pipelines.
- Collaborate with cross-functional teams to align pipeline design with business and data architecture goals.
- Monitor pipeline health, performance, and cost-efficiency; troubleshoot and resolve issues proactively.
- Document pipeline architecture, operational playbooks, and best practices.
- (Preferred) Contribute to infrastructure automation using Infrastructure as Code (IaC) tools.
Requirements:
- Strong proficiency in Python and hands-on experience with Apache Spark for data transformation.
- Deep understanding of Docker and Kubernetes for containerized deployments.
- Experience with AWS Cloud and willingness to work across other cloud platforms as needed.
- Solid experience with Apache Airflow or equivalent orchestration tools.