< More Jobs

Posted on 7/31/2025

Software Engineer – Data Lake, Associate

Jobright.ai

Boston, MA

Full-time

Qualifications

  • 4+ years of experience in software development, with at least 2 years focused on data engineering and distributed systems
  • Hands on with Python and SQL, with experience in backend development
  • Experience with distributed data processing frameworks such as Apache Spark and Flink
  • Proven track record of designing and implementing scalable ETL/ELT pipelines, ideally using AWS services like EMR
  • Strong knowledge of cloud platforms, particularly AWS (e.g., EMR, S3, Redshift), and optimizing data workflows in the cloud
  • Experience with data pipeline orchestration tools like Airflow
  • Familiarity with real-time data streaming technologies such as Kafka or Pulsar
  • Understanding of data modeling, database design, and data governance best practices
  • Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment
  • Strong communication skills with experience mentoring or leading engineering teams
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience

Responsibilities

  • Implement scalable, fault-tolerant data pipelines using distributed processing frameworks like Apache Spark and Flink on AWS EMR, optimizing for throughput and latency
  • Design batch and real-time, event-driven data workflows to process billions of data points daily, leveraging streaming technologies like Kafka and Flink
  • Optimize distributed compute clusters and storage systems (e.g., S3, HDFS) to handle petabyte-scale datasets efficiently, ensuring resource efficiency and cost-effectiveness
  • Develop robust failure recovery mechanisms, including checkpointing, replication, and automated failover, to ensure high availability in distributed environments
  • Optimize data storage and processing systems to handle petabyte-scale datasets efficiently, ensuring performance and cost-effectiveness
  • Collaborate with cross-functional teams to deliver actionable datasets that power analytics and AI capabilities
  • Implement data governance policies and security measures to maintain data quality and compliance
  • Own the technical direction of highly visible data systems, improving monitoring, failure recovery, and performance
  • Mentor engineers, review technical documentation, and articulate phased approaches to achieving the team’s technical vision
  • Contribute to the evolution of internal data processing tools and frameworks, enhancing their scalability and usability

Full Description

Verified Job On Employer Career Site

Job Summary:

Klaviyo is a company that empowers creators to own their destiny by making first-party data accessible and actionable. They are seeking a Software Engineer II for their Data Lake team, responsible for designing and optimizing large-scale data processing systems and implementing scalable data pipelines on AWS.

Responsibilities:

• Implement scalable, fault-tolerant data pipelines using distributed processing frameworks like Apache Spark and Flink on AWS EMR, optimizing for throughput and latency

• Design batch and real-time, event-driven data workflows to process billions of data points daily, leveraging streaming technologies like Kafka and Flink.

• Optimize distributed compute clusters and storage systems (e.g., S3, HDFS) to handle petabyte-scale datasets efficiently, ensuring resource efficiency and cost-effectiveness.

• Develop robust failure recovery mechanisms, including checkpointing, replication, and automated failover, to ensure high availability in distributed environments

• Optimize data storage and processing systems to handle petabyte-scale datasets efficiently, ensuring performance and cost-effectiveness.

• Collaborate with cross-functional teams to deliver actionable datasets that power analytics and AI capabilities.

• Implement data governance policies and security measures to maintain data quality and compliance.

• Own the technical direction of highly visible data systems, improving monitoring, failure recovery, and performance.

• Mentor engineers, review technical documentation, and articulate phased approaches to achieving the team’s technical vision.

• Contribute to the evolution of internal data processing tools and frameworks, enhancing their scalability and usability.

Qualifications:

Required:

• 4+ years of experience in software development, with at least 2 years focused on data engineering and distributed systems.

• Hands on with Python and SQL, with experience in backend development.

• Experience with distributed data processing frameworks such as Apache Spark and Flink.

• Proven track record of designing and implementing scalable ETL/ELT pipelines, ideally using AWS services like EMR.

• Strong knowledge of cloud platforms, particularly AWS (e.g., EMR, S3, Redshift), and optimizing data workflows in the cloud.

• Experience with data pipeline orchestration tools like Airflow.

• Familiarity with real-time data streaming technologies such as Kafka or Pulsar.

• Understanding of data modeling, database design, and data governance best practices.

• Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment.

• Strong communication skills with experience mentoring or leading engineering teams.

• Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Company:

Klaviyo is an automation and email platform designed to help grow businesses. Founded in 2012, the company is headquartered in Boston, Massachusetts, USA, with a team of 1001-5000 employees. The company is currently Public Company. Klaviyo has a track record of offering H1B sponsorships.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!