< More Jobs

Posted on 2025/10/24

Senior ML/AI Engineer

ThoughtsWin

Calgary, AB, Canada

Contractor

Full Description

Job Description:

We are seeking a highly skilled and visionary Senior ML/AI Engineer to lead the architecture, development, and optimization of our machine learning infrastructure and MLOps ecosystem.

This role is ideal for a hands-on technical leader who thrives at the intersection of software engineering, data science, and systems architecture.

You will play a pivotal role in shaping our next-generation AI capabilities—designing scalable, cloud-agnostic ML platforms, productionizing complex models, and driving innovation in automation, deployment, and performance optimization.

The ideal candidate is passionate about operational excellence, cutting-edge ML engineering practices, and mentoring teams to build world-class AI solutions that power real business impact.

Sr ML/AI Engineer

• Architect and lead the design of the end-to-end MLOps platform and scalable ML serving infrastructure (e.g., cloud-agnostic solutions, custom model serving frameworks).

• Set coding standards, best practices, and design patterns for ML engineering across the team. Act as a technical mentor to junior and mid-level engineers.

• Own and optimize the highest-volume, most complex ML pipelines for training and inference. Drive initiatives to achieve ultra-low latency and high-throughput model serving.

• Evaluate and select core technologies (e.g., Kubernetes, serverless, specialized hardware) for the ML platform. Manage infrastructure cost optimization and security at a platform level.

• Design and implement advanced monitoring systems that track model drift, concept drift, and data lineage. Establish automated model testing and canary release strategies.

• Lead technical collaboration with Data Scientists, Data Engineers, and Core Software Engineers to define integration contracts and productionize models across multiple business domains.

• Define the enterprise standards for model registry, artifact management, and data versioning to ensure complete experiment and production reproducibility.

• Streamline and reduce the cycle time for moving a Data Science prototype from research to a fully monitored and stable production service.

Skills and experience:

• Exceptional proficiency in Python (including advanced libraries like Pandas, NumPy, Scikit-learn) and/or Scala/Java for building high-performance, production-quality systems.

• Deep practical experience leveraging services within at least one major cloud ecosystem (e.g., AWS, Azure, GCP) or a platform like Databricks for production deployment and scaling

• Deep, hands-on experience with major machine learning and deep learning frameworks (e.g., TensorFlow, PyTorch).

• Proven track record in designing and implementing end-to-end MLOps systems using open-source tools.

• Expertise with workflow management tools like Apache Airflow, Kubeflow etc.

• Mastery of Docker and Kubernetes (K8s) for packaging, scaling, and managing containerized ML services across any environment.

• Extensive experience building scalable data pipelines leveraging distributed processing frameworks such as Apache Spark (PySpark/Scala).

• Ability to architect cloud-agnostic ML systems, including designing low-latency API serving layers, real-time inference mechanisms, and event-driven architectures.

Leadership and Soft Skills

• Demonstrated ability to evaluate new technologies and frameworks, and define the technical roadmap for the ML platform and MLOps practices.

• Proven experience mentoring junior engineers, conducting technical reviews, and driving team-wide adoption of engineering best practices.

• Exceptional ability to communicate complex technical concepts to non-technical stakeholders (e.g., product managers, business leaders) and translate business needs into technical designs.

• Superior analytical skills for root cause analysis of production system failures and complex performance bottlenecks.

Education and Experience

• Experience: 7+ years of experience in machine learning, software engineering, or related fields, with at least 3 years focused on building and scaling production ML systems.

• Education: Bachelor’s or Master’s degree in Computer Science, Data Science, or a quantitative field is preferred.