Posted on 2025/03/17

AI Navigator – Large Model Cloud Inference Deployment Engineer

SenseTime 商汤科技

Hong Kong

Part-time

Apply Promote

Full Description

Job Responsibilities

• Optimize the inference deployment of large models on computing clusters, focusing on multi-node, multi-GPU parallel inference, task scheduling, KV cache management, and other techniques to enhance inference performance and reduce costs.

• Research the latest advancements in large-model serving and integrate cutting-edge techniques into real-world business applications.

JobRequirements

• Deep understanding of mainstream large-model algorithms and underlying principles.

• Familiarity with large-model inference pipelines and optimization techniques such as Continuous Batching and Paged Attention.

• Proficiency in mainstream large-model inference engines, such as ppl.llm, vLLM, TensorRT, TGI, or experience with traditional inference engines.

• Strong software engineering foundation, familiarity with design patterns, and proficiency in C++ and Python.

• Strong learning ability, communication skills, and the ability to articulate complex technical concepts clearly.

All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.

Apply Promote

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Learn More