< More Jobs

Posted on 2026/01/24

Generative AI Operations Engineer

Openkyber

Texas, United States

Full-time

Full Description

Position: AI (GenAI) Engineer Location: Plano, TX/ Palo Alto, CA (Day 1 Onsite) Duration : Long term Responsibilities :

Minimum qualifications BS/MS in computer science, Electrical Engineering, Data Science, or a related technical field; or equivalent practical experience. 3 5 years of experience in AI/ML engineering, data engineering, or applied data science delivering production-grade solutions.

Strong Python and SQL skills; mastery working with large-scale telemetry/time-series datasets and building reliable, testable data transformations.

Hands-on experience with Azure services for data/AI solutions, including: Azure Machine Learning; Azure AI Services/Azure OpenAI (LLM/GenAI capabilities) Azure Databricks/Spark (Delta Lake, lakehouse patterns) Azure Data Lake Storage / Blob Storage Azure Functions or similar serverless compute Azure DevOps (or similar CI/CD tooling), Git, and automated testing Working knowledge of GenAI development patterns, including: Retrieval-Augmented Generation (RAG): chunking strategies, embeddings, hybrid search, re-ranking, grounding with citations, vector stores (e.g., Azure AI Search) Prompt design: system prompts, few-shot patterns, structured outputs (JSON/JSON Schema), function/tool calling Evaluation fundamentals: response quality, grounding, accuracy, latency, cost, and safety Production mindset: robust logging/monitoring, tracing, observability, troubleshooting; security basics (RBAC, managed identities, Key Vault, data privacy/PII handling), and operational readiness (rate limits, retries, timeouts, backoff, caching). Domain (RAN & Mobility) qualifications Solid understanding of 4G/5G RAN and mobility concepts (e.g., handovers, drops, throughput, congestion, interference, PRB utilization, RSRP/RSRQ/SINR). Ability to translate network issues into measurable KPIs and investigative workflows, producing actionable outputs for operations and engineering (e.g., RCA steps, remediation recommendations, and change validation plans).

Preferred qualifications Built an internal assistant/copilot for network operations, triage, or RCA using KPIs, alarms, tickets, and documentation; experience grounding outputs with traceable evidence and citations.

Experience with agentic workflows and orchestration (function/tool calling, multi-step chains, retries/guardrails) to automate diagnosis and propose actions. MLOps/LLMOps practices: CI/CD for pipelines/services, model/prompt/knowledge-base versioning, automated evaluations (e.g., RAG quality), drift monitoring, observability (OpenTelemetry), and cost/token controls.

Azure ecosystem depth: Azure AI Search (vector/hybrid search), Azure Event Hubs/Stream analytics, Azure Data Factory/Synapse pipelines, AKS (Kubernetes), and containerized deployments.

Familiarity with GenAI frameworks and tooling (e.g., Semantic Kernel, LangChain/LlamaIndex), MLflow/Model Registry, vector databases, and prompt/unit regression testing. Understanding of telecom standards and tooling: 3GPP concepts, vendor-specific counters (e.g., Ericsson/Nokia/Samsung) Relevant Azure certification (or in progress), especially Azure AI Engineer Associate.

What you ll do Build, deploy, and operate GenAI-powered tools that accelerate network troubleshooting: triage assistants, KPI summaries, anomaly detection/explanations, and recommended next actions with citations to source data.

Design and implement RAG pipelines: document preparation (chunking/metadata), embeddings, vector search with re-ranking, grounding and citation strategies, semantic caching, and safety guardrails.

Ship reliable services: productionize models and prompts with CI/CD, automated tests, canary/A B releases, monitoring/alerts, and SLOs for accuracy, grounding, latency, and cost.

Implement evaluation and continuous monitoring: offline and online eval harnesses, golden sets, human-in-the-loop review, prompt/knowledge drift detection, and token/cost budgets.

Integrate with internal systems and tools: alarms and KPI platforms, ticketing, inventory/topology APIs, runbooks, and dashboards to close the loop from detection to remediation.

Collaborate cross-functionally with data engineering, platform/security, and RAN SMEs to take use cases from discovery to production and iterate based on measurable impact (e.g., MTTR reduction, accuracy lift, fewer escalations).

For applications and inquiries, contact: hirings@openkyber.com

Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.