Posted on 7/31/2025
Software Engineer – Data Lake, Associate
Jobright.ai
Boston, MA
Qualifications
- 4+ years of experience in software development, with at least 2 years focused on data engineering and distributed systems
- Hands on with Python and SQL, with experience in backend development
- Experience with distributed data processing frameworks such as Apache Spark and Flink
- Proven track record of designing and implementing scalable ETL/ELT pipelines, ideally using AWS services like EMR
- Strong knowledge of cloud platforms, particularly AWS (e.g., EMR, S3, Redshift), and optimizing data workflows in the cloud
- Experience with data pipeline orchestration tools like Airflow
- Familiarity with real-time data streaming technologies such as Kafka or Pulsar
- Understanding of data modeling, database design, and data governance best practices
- Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment
- Strong communication skills with experience mentoring or leading engineering teams
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience
Responsibilities
- Implement scalable, fault-tolerant data pipelines using distributed processing frameworks like Apache Spark and Flink on AWS EMR, optimizing for throughput and latency
- Design batch and real-time, event-driven data workflows to process billions of data points daily, leveraging streaming technologies like Kafka and Flink
- Optimize distributed compute clusters and storage systems (e.g., S3, HDFS) to handle petabyte-scale datasets efficiently, ensuring resource efficiency and cost-effectiveness
- Develop robust failure recovery mechanisms, including checkpointing, replication, and automated failover, to ensure high availability in distributed environments
- Optimize data storage and processing systems to handle petabyte-scale datasets efficiently, ensuring performance and cost-effectiveness
- Collaborate with cross-functional teams to deliver actionable datasets that power analytics and AI capabilities
- Implement data governance policies and security measures to maintain data quality and compliance
- Own the technical direction of highly visible data systems, improving monitoring, failure recovery, and performance
- Mentor engineers, review technical documentation, and articulate phased approaches to achieving the team’s technical vision
- Contribute to the evolution of internal data processing tools and frameworks, enhancing their scalability and usability
Full Description
Verified Job On Employer Career Site
Job Summary:
Klaviyo is a company that empowers creators to own their destiny by making first-party data accessible and actionable. They are seeking a Software Engineer II for their Data Lake team, responsible for designing and optimizing large-scale data processing systems and implementing scalable data pipelines on AWS.
Responsibilities:
• Implement scalable, fault-tolerant data pipelines using distributed processing frameworks like Apache Spark and Flink on AWS EMR, optimizing for throughput and latency
• Design batch and real-time, event-driven data workflows to process billions of data points daily, leveraging streaming technologies like Kafka and Flink.
• Optimize distributed compute clusters and storage systems (e.g., S3, HDFS) to handle petabyte-scale datasets efficiently, ensuring resource efficiency and cost-effectiveness.
• Develop robust failure recovery mechanisms, including checkpointing, replication, and automated failover, to ensure high availability in distributed environments
• Optimize data storage and processing systems to handle petabyte-scale datasets efficiently, ensuring performance and cost-effectiveness.
• Collaborate with cross-functional teams to deliver actionable datasets that power analytics and AI capabilities.
• Implement data governance policies and security measures to maintain data quality and compliance.
• Own the technical direction of highly visible data systems, improving monitoring, failure recovery, and performance.
• Mentor engineers, review technical documentation, and articulate phased approaches to achieving the team’s technical vision.
• Contribute to the evolution of internal data processing tools and frameworks, enhancing their scalability and usability.
Qualifications:
Required:
• 4+ years of experience in software development, with at least 2 years focused on data engineering and distributed systems.
• Hands on with Python and SQL, with experience in backend development.
• Experience with distributed data processing frameworks such as Apache Spark and Flink.
• Proven track record of designing and implementing scalable ETL/ELT pipelines, ideally using AWS services like EMR.
• Strong knowledge of cloud platforms, particularly AWS (e.g., EMR, S3, Redshift), and optimizing data workflows in the cloud.
• Experience with data pipeline orchestration tools like Airflow.
• Familiarity with real-time data streaming technologies such as Kafka or Pulsar.
• Understanding of data modeling, database design, and data governance best practices.
• Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment.
• Strong communication skills with experience mentoring or leading engineering teams.
• Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
Company:
Klaviyo is an automation and email platform designed to help grow businesses. Founded in 2012, the company is headquartered in Boston, Massachusetts, USA, with a team of 1001-5000 employees. The company is currently Public Company. Klaviyo has a track record of offering H1B sponsorships.
Find AI, ML, Data Science Jobs By Location
Find Jobs By Position
Subscribe to the AI Search Newsletter
Get top updates in AI to your inbox every weekend. It's free!