Posted on 2025/03/17
AI Navigator – Large Model Cloud Inference Deployment Engineer
SenseTime 商汤科技
Hong Kong
Full Description
Job Responsibilities
• Optimize the inference deployment of large models on computing clusters, focusing on multi-node, multi-GPU parallel inference, task scheduling, KV cache management, and other techniques to enhance inference performance and reduce costs.
• Research the latest advancements in large-model serving and integrate cutting-edge techniques into real-world business applications.
JobRequirements
• Deep understanding of mainstream large-model algorithms and underlying principles.
• Familiarity with large-model inference pipelines and optimization techniques such as Continuous Batching and Paged Attention.
• Proficiency in mainstream large-model inference engines, such as ppl.llm, vLLM, TensorRT, TGI, or experience with traditional inference engines.
• Strong software engineering foundation, familiarity with design patterns, and proficiency in C++ and Python.
• Strong learning ability, communication skills, and the ability to articulate complex technical concepts clearly.
All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.

Zero to AI Engineer
Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.
Find AI, ML, Data Science Jobs By Location
Find Jobs By Position