Posted on 2025/10/07
AI Model Optimization & Fine-Tuning Engineer
Confidencial
Lahore, Pakistan
Full Description
Title: AI Model Optimization & Fine-Tuning Engineer
Experience: 5 Years -
Salary: Rs. 250,000 - 400,000
About the Company
A STEM-driven energy solutions provider dedicated to advancing clean, efficient, and sustainable power technologies.
Description
We are seeking a hands-on AI Model Optimization Engineer with proven experience in taking large base models, fine-tuning, distilling, and quantizing them for fully offline mobile deployment.
This role requires real-world experience with model compression, dataset preparation, and mobile inference optimization for Android/iOS devices.
Responsibilities
End-to-end pipeline: data prep → fine-tuning → distillation → quantization → mobile packaging → benchmarking.
Apply PTQ/QAT quantization and distillation to deploy LLMs and multimodal models onto devices with limited memory/thermal budgets.
Format and prepare datasets for fine-tuning (tokenization, tagging, deduplication, versioning).
Optimize models for battery efficiency, low latency, and minimal RAM usage.
Benchmark and debug inference performance with Perfetto, Battery Historian, Instruments, etc.
Collaborate with app teams to integrate optimized models.
Mandatory Skills Checklist (Applicants must demonstrate experience in ALL of the following)
✅ Quantization & Distillation
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
Methods like AWQ, GPTQ, SmoothQuant, RPTQ.
Knowledge of 4-bit/8-bit schemes (INT4, INT8, FP4, NF4).
Distillation methods (teacher–student, logit matching, feature distillation).
✅ Fine-Tuning & Data Handling
LoRA/QLoRA/DoRA/AdaLoRA fine-tuning.
Instruction-tuning pipelines with PyTorch + Hugging Face.
Dataset formatting: JSONL, multi-turn dialogs, tagging, tokenization (SentencePiece/BPE).
Deduplication, stratified sampling, and eval set creation.
✅ On-Device Deployment
Hands-on with at least two runtimes: llama.cpp / GGUF, MLC LLM, ExecuTorch, ONNX Runtime Mobile, TensorFlow Lite, Core ML.
Experience with hardware acceleration: Metal (iOS), NNAPI (Android), GPU/Vulkan, Qualcomm DSP/NPU, XNNPACK.
Real-world deployment: must provide examples of models running fully offline on mobile (tokens/s, RAM usage, device specs).
✅ Performance & Benchmarking
Tools: Perfetto, systrace, Battery Historian, adb stats (Android); Xcode Instruments, Energy Log (iOS).
Profiling decode speed, cold start vs. warm start latency, RAM usage, and energy consumption.
✅ General
Strong PyTorch and Hugging Face experience.
Clear documentation and ability to explain optimization trade-offs.
Skills
ASR/TTS/VAD
• Multilingual
LLaMA Gemma or Mistral
Qwen
edge-AI frameworks
LLM quantization
Find AI, ML, Data Science Jobs By Location
Find Jobs By Position