Posted on 2025/11/14
Generative AI Engineer
Recro
Tirunelveli, Tamil Nadu, India
Full Description
About the job
Role Overview
As the AI Systems Architect , you’ll own the end-to-end design and delivery of production-grade agentic and Generative AI systems.
This is a highly hands-on role requiring deep architectural insight, coding proficiency, and an obsession with performance, scalability, and reliability.
You’ll architect secure, cost-efficient AI platforms on AWS, guide developers throughcomplex debugging and optimization, and ensure all systems are observable, governed, and production-ready.
Key Responsibilities
• Architect Production AI Systems: Design robust architectures for agentic systems (planning, reasoning, tool-calling), GenAI/RAG pipelines, and evaluation workflows.
Create detailed design documents, including flow/UML/sequence diagrams and AWS deployment topologies.
• Optimize for Cost & Performance: Model throughput, latency, concurrency, autoscaling, CPU/GPU sizing, and vector index performance to ensure scalable, efficient deployments.
• Lead Debugging & Stability Efforts: Conduct deep-dive debugging, fix critical defects, and resolve production incidents; pair-program with developers to improve code quality and performance.
• Standardize Agentic Frameworks: Build reference implementations using Semantic Kernel (preferred), LangGraph, AutoGen, or CrewAI with strong schema validation, grounding, and memory management.
• Engineer Retrieval & Search Systems: Architect hybrid retrieval solutions including ingestion, chunking, embeddings, ranking, caching, and freshness management while minimizing hallucination risk.
• Productionize on AWS: Deploy and manage systems using Amazon EKS, Bedrock, S3, SQS/SNS, RDS, and ElastiCache. Integrate IAM/Okta, Secrets Manager, and Datadog for observability, enforcing SLIs/SLOs and error budgets.
• Implement Observability & Monitoring: Set up distributed tracing, metrics, and logging via OpenTelemetry and Datadog. Standardize dashboards, alerts, and incident response workflows.
• Govern Evaluation & Rollouts: Build test and evaluation frameworks—golden sets, A/B experiments, regression suites, and controlled rollouts—to ensure consistent quality across releases.
• Embed Security & Safety: Enforce least privilege, PII protection, and policy compliance through threat modeling, sandboxed execution, and prompt-injection defense.
• Establish Engineering Standards: Create reusable SDKs, connectors, CI/CD templates, and architecture review checklists to promote consistency across teams.
• Cross-Functional Leadership: Collaborate with product, data, and SRE teams for capacity planning, DR strategies, and post-incident RCA reviews.
Mentor engineers to strengthen design and reliability practices.
Must-Have Qualifications
• 7–10 years in software/AI engineering, including 4+ years in GenAI application development and 2+ years architecting agentic AI systems.
• Expert in Python 3.11+ (asyncio, typing, packaging, profiling, pytest).
• Hands-on experience with Semantic Kernel , LangGraph , AutoGen , or CrewAI .
• Proven delivery of GenAI/RAG systems on AWS Bedrock or equivalent vector-based platforms (OpenSearch Serverless, Pinecone, Redis).
• Deep understanding of AWS ecosystem : EKS, Bedrock, S3, SQS/SNS, RDS, ElastiCache, Secrets Manager, IAM/Okta, Kong API Gateway, Datadog.
• Expertise in observability and incident management using OpenTelemetry and Datadog.
• Strong focus on cost, performance, and security engineering —FinOps mindset, autoscaling, caching, and policy enforcement.
• Exceptional communication—clear diagrams, ADRs, and peer review practices.
Nice-to-Have Skills
• Multi-agent orchestration (task decomposition, coordinator-worker, graph-based planning).
• Expertise with vector databases (OpenSearch, Pinecone, pgvector, Redis).
• Experience with AI evaluation, guardrails, and rollout gating.
• Familiarity with frontend agent interfaces, secure APIs, and AuthN/Z best practices.
• Exposure to policy-as-code , multi-tenant architectures, and feature management (Kong, LaunchDarkly, Flipt).
• Experience with CI/CD via GitHub Actions and IaC (Terraform/AWS CloudFormation).
Find AI, ML, Data Science Jobs By Location
Find Jobs By Position