< More Jobs

Posted on 2026/06/04

ML Application Engineer - AI Inference & Model Optimization (Staff/Senior Staff level) - Riyadh, KSA

Qualcomm

Al Khobar Saudi Arabia

Full-time

About the Role

Qualcomm is seeking a Machine Learning Applications Engineer – AI Inference & Model Optimization to support the enablement of rack‑scale deep learning workloads on advanced Qualcomm AI inference accelerators.

This customer‑facing, highly technical role focuses on porting, optimizing, and validating deep learning AI models on production systems and enabling Qualcomm’s partners to develop and deploy advanced machine learning applications, including computer vision, speech, generative AI, and state‑of‑the‑art multimodal reasoning models, using popular frameworks such as PyTorch, TensorFlow, and ONNX on Qualcomm Cloud AI accelerators.

The role requires strong expertise in AI models, quantization, performance optimization, and deployment, plus the ability to shape architecture, workload sizing, and system design.

It also requires experience with deep learning model development across hardware platforms, solid programming skills, collaboration with cross‑functional teams, and proficiency in machine learning frameworks, Linux, and container orchestration tools.

What You’ll Do

A) AI Model Porting & Optimizen

• Deploy, optimize, and scale deep learning AI models onto accelerator‑based data center platforms, including model conversion workflows, quantization techniques (INT8 / mixed precision), runtime integration and optimization, and integration onto Qualcomm’s Cloud AI ML stack from frameworks such as PyTorch, TensorFlow, and ONNX.

• Drive improvements in model throughput, latency, and accuracy, with clear trade‑off analysis.

• Build, test, and deploy scalable inference pipelines using serving frameworks such as vLLM, TGI, and Triton.

• Optimize workloads for LLM and GenAI models across both multi‑SoC and multi‑card architectures.

• Collaborate with engineering teams to analyze and refine training and inference for advanced deep learning applications.

• Identify bottlenecks across compute, memory, and runtime, and guide optimization strategies.

• Contribute to Qualcomm’s Cloud AI GitHub repository and developer documentation, sharing technical best practices and solutions.

• Develop and integrate end‑to‑end ML application pipelines with customer frameworks and libraries.

B) Customer‑Facing Technical Engagement

• Act as a trusted technical advisor for customers deploying AI workloads.

• Engage in hardware sizing and architecture discussions, aligning model requirements with infrastructure capabilities.

• Provide technical guidance on AI model selection, deployment feasibility, system architecture, and performance expectations.

• Lead discussions on model capabilities and limitations based on real customer use cases.

C) Model–Infrastructure Alignment

• Assess and evaluate AI model requirements and recommend alternative model approaches when necessary.

• Align model characteristics (latency, throughput, accuracy) with accelerator and system capabilities.

• Connect model requirements with memory constraints, accelerator architecture, and scaling limitations.

• Support customers in defining model selection strategies based on deployment realities.

D) Performance & Scalability Engineering

• Evaluate performance characteristics of AI models in production scenarios, including throughput expectations, latency targets, and concurrency behavior.

• Guide architecture decisions around scaling strategies (horizontal vs vertical) and hardware deployment sizing.

• Contribute to discussions on workload scalability limits and impact of model selection on system performance and efficiency.

• Provide insights into capacity planning and infrastructure optimization.

E) End‑to‑End AI Pipeline Design

• Drive discussions around end‑to‑end AI pipelines, including multi‑model workflows (e.g., detection + tracking + recognition).

• Guide decisions on video and data processing stages, such as video pipeline choices (FFMPEG vs GStreamer) and integration into inference pipelines.

• Ensure pipelines are aligned with performance requirements, hardware capabilities, and real‑time constraints.

F) Model Trade‑off Analysis & Validation

• Highlight and explain trade‑offs between accuracy vs compatibility and model quality vs deployment feasibility.

• Support decision‑making on model simplification vs performance gains and precision vs efficiency trade‑offs.

• Lead or support model capability validation in deployment environments.

• Collaborate with customers to define inference assumptions and model sizing strategies for large‑scale workloads.

Required Qualifications

• Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).

• 10–15+ years of experience in deep learning model development or deployment on CPUs/GPUs/ASICs, inference systems, and optimization.

• Experience with data center or edge AI platforms, model quantization and optimization techniques, AI model frameworks (PyTorch, TensorFlow).

• Strong programming skills in C/C++/Python, debugging, performance analysis, and Linux‑based systems.

• Hands‑on expertise with low‑level software, drivers, and system bring‑up.

• Proven ability to analyze and optimize model performance in production environments.

• Solid understanding of AI inference hardware constraints, system‑level performance bottlenecks, and customer‑facing technical communication.

• Willingness to travel for customer engagements and strategic reviews.

Preferred Qualifications

• Experience deploying models on platforms that use hardware accelerators for inference.

• Managing multi‑model workflows and building real‑time AI systems (computer vision, video, analytics).

• Knowledge of distributed inference methods and large‑scale model deployments.

• Developing and maintaining video processing workflows using relevant software frameworks.

• Deep understanding of how system‑level decisions affect performance in deployment environments.

• Capability to simplify complex technical ideas for clients.

• Hands‑on experience running deep learning models on PyTorch, TensorFlow, ONNX.

• Experience developing software solutions for Linux environments with containers and orchestration.

• Experience with source code and configuration management tools; Git knowledge required.

• Customer‑facing experience translating requirements into technical solutions.

• Ability to build and deliver technical demos, proofs‑of‑concept, and reference applications for ML/GenAI workloads.

• Strong technical writing skills for customer‑ready documentation and partner training sessions.

• Experience driving issue triage and technical escalations with customers.

• Excellent stakeholder management and communication skills.

Benefits

• Salary including housing & transport allowance.

• Stock (RSU's) and performance‑related bonus.

• 16 weeks fully paid maternity leave.

• 6 weeks fully paid paternity leave.

• Employee stock purchase scheme.

• Child education allowance.

• Relocation and immigration support (if needed).

• Life and medical insurance.

• Live+ Well reimbursement for health and recreational membership fees.

Equal Opportunity Employment

Qualcomm is an equal‑opportunity employer.

If you are an individual with a disability and need an accommodation during the application/hiring process, Qualcomm will provide a reasonable accommodation to support your participation in the hiring process. Contact disability‑View email address on click.appcast.io for accommodation requests.

Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.