< More Jobs

Posted on 2025/10/07

AI Model Optimization & Fine-Tuning Engineer

Confidencial

Lahore, Pakistan

Full-time

Full Description

Title: AI Model Optimization & Fine-Tuning Engineer

Experience: 5 Years -

Salary: Rs. 250,000 - 400,000

About the Company

A STEM-driven energy solutions provider dedicated to advancing clean, efficient, and sustainable power technologies.

Description

We are seeking a hands-on AI Model Optimization Engineer with proven experience in taking large base models, fine-tuning, distilling, and quantizing them for fully offline mobile deployment.

This role requires real-world experience with model compression, dataset preparation, and mobile inference optimization for Android/iOS devices.

Responsibilities

End-to-end pipeline: data prep → fine-tuning → distillation → quantization → mobile packaging → benchmarking.

Apply PTQ/QAT quantization and distillation to deploy LLMs and multimodal models onto devices with limited memory/thermal budgets.

Format and prepare datasets for fine-tuning (tokenization, tagging, deduplication, versioning).

Optimize models for battery efficiency, low latency, and minimal RAM usage.

Benchmark and debug inference performance with Perfetto, Battery Historian, Instruments, etc.

Collaborate with app teams to integrate optimized models.

Mandatory Skills Checklist (Applicants must demonstrate experience in ALL of the following)

✅ Quantization & Distillation

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).

Methods like AWQ, GPTQ, SmoothQuant, RPTQ.

Knowledge of 4-bit/8-bit schemes (INT4, INT8, FP4, NF4).

Distillation methods (teacher–student, logit matching, feature distillation).

✅ Fine-Tuning & Data Handling

LoRA/QLoRA/DoRA/AdaLoRA fine-tuning.

Instruction-tuning pipelines with PyTorch + Hugging Face.

Dataset formatting: JSONL, multi-turn dialogs, tagging, tokenization (SentencePiece/BPE).

Deduplication, stratified sampling, and eval set creation.

✅ On-Device Deployment

Hands-on with at least two runtimes: llama.cpp / GGUF, MLC LLM, ExecuTorch, ONNX Runtime Mobile, TensorFlow Lite, Core ML.

Experience with hardware acceleration: Metal (iOS), NNAPI (Android), GPU/Vulkan, Qualcomm DSP/NPU, XNNPACK.

Real-world deployment: must provide examples of models running fully offline on mobile (tokens/s, RAM usage, device specs).

✅ Performance & Benchmarking

Tools: Perfetto, systrace, Battery Historian, adb stats (Android); Xcode Instruments, Energy Log (iOS).

Profiling decode speed, cold start vs. warm start latency, RAM usage, and energy consumption.

✅ General

Strong PyTorch and Hugging Face experience.

Clear documentation and ability to explain optimization trade-offs.

Skills

ASR/TTS/VAD

• Multilingual

LLaMA Gemma or Mistral

Qwen

edge-AI frameworks

LLM quantization