< More Jobs

Posted on 2026/02/04

Senior AI Infrastructure Engineer

Assail

Boston, MA, United States

Full-time

Qualifications

  • 7+ years in DevOps, SRE, or Infrastructure Engineering
  • 1–2+ years owning production AI workloads, including GPU-backed inference or large-scale model serving
  • Demonstrated ownership of production systems under load, not just "set it up once and move on"
  • PostgreSQL 16: Tenant data with Row-Level Security, PgBouncer connection pooling
  • Neo4j 5.x: Graph database for attack chains and knowledge graphs
  • Qdrant 1.7: Vector database for semantic search and pattern matching
  • Redis 7: Short-term memory, caching, and pub/sub (24hr TTL patterns)
  • MinIO: S3-compatible object storage for APK artifacts and reports
  • Observability & Reliability
  • Implement end-to-end tracing with Jaeger across gRPC services, Kafka, and model inference
  • Enforce multi-tenant isolation at every layer:
  • Kubernetes namespaces and NetworkPolicies
  • Tenant-scoped Kafka topics (tenant-{id}-missions, tenant-{id}-agent-events)
  • Row-Level Security in PostgreSQL
  • Filtered vector search in Qdrant by tenant_id
  • Operate container security scanning with Trivy, Cosign (image signing), and Syft (SBOM generation)
  • Orchestration & Containers
  • Kubernetes (Expert) — K3s, EKS, or equivalent
  • Podman and Buildah (rootless container runtime)
  • Infrastructure as Code: Terraform (we deploy on AWS)
  • AWS: EKS, IAM, VPC, or equivalent cloud experience
  • Experience with local/on-prem GPU environments alongside cloud
  • vLLM (required) — production deployment and optimization
  • NVIDIA GPU operations (CUDA, driver management, memory optimization)
  • Familiarity with quantization (INT4/INT8 via BitsAndBytes) and model optimization
  • Apache Kafka — production operations, consumer groups, exactly-once semantics
  • PostgreSQL — Row-Level Security, connection pooling, replication
  • Neo4j — graph database operations and Cypher queries
  • Qdrant or similar vector database — HNSW indexing, filtered search
  • Jaeger or OpenTelemetry for distributed tracing
  • Experience with gRPC observability
  • Networking & Security
  • gRPC and Protobuf — service mesh patterns, load balancing
  • VPC design, private networking, Kubernetes NetworkPolicies
  • TLS/mTLS configuration for databases and services
  • Required Domain Awareness
  • Operating AI systems in adversarial environments (security product context)
  • Preventing data leakage across multi-tenant boundaries
  • Supporting reproducible, auditable AI outputs for security findings
  • Understanding the blast radius of misconfigured AI infrastructure in a pentest platform

Benefits

  • HashiCorp Vault for secrets management

Responsibilities

  • Scalable Inference & Serving
  • Deploy and operate our vLLM-based inference stack serving a custom fine-tuned 14B+ parameter security model
  • Optimize Time to First Token (TTFT) and tail latency under concurrent load from our 145-agent swarm
  • Manage multi-model routing across specialized functions (code analysis, deobfuscation, reasoning)
  • Ensure OpenAI-compatible API availability with <100ms p99 latency targets
  • GPU Orchestration & Capacity Planning
  • Manage and optimize NVIDIA RTX GPU utilization (RTX 5090 / CUDA 12.1+) within our Kubernetes clusters
  • Configure GPU passthrough, tensor parallelism, and memory optimization for vLLM inference
  • Design scheduling and autoscaling strategies to minimize idle GPU spend while supporting burst agent workloads
  • Forecast GPU capacity needs as the agent swarm scales (currently 145 agents across 10 types)
  • Kubernetes & Container Operations
  • Own our K3s-based Kubernetes infrastructure running on RHEL 10
  • Manage StatefulSets for stateful services (Neo4j, Qdrant, Kafka, Zookeeper, Android emulators)
  • Configure HPA (Horizontal Pod Autoscaler) for agent deployments (1-20 replicas per agent type)
  • Operate Podman rootless containers with Buildah for secure image builds
  • Maintain local container registry and image lifecycle
  • Event Streaming & Distributed Systems
  • Operate our Apache Kafka 3.6 backbone handling 10,000+ messages/sec for agent coordination
  • Monitor consumer lag, partition health, and message throughput across tenant-scoped topics
  • Ensure exactly-once delivery semantics for mission-critical agent task distribution
  • Database Operations (5 Specialized Stores)
  • Build CI/CD pipelines that support:
  • Model weight deployment and LoRA adapter merges
  • Configuration and prompt updates
  • Automated testing and canary releases for AI features
  • Integrate with our custom deployment tooling (ares-cc.py) and Helm charts
  • Enable fast rollback when model behavior or inference performance regresses
  • Extend our Prometheus metrics for:
  • Per-agent task duration and failure rates
  • vLLM request latency and GPU utilization
  • Kafka consumer lag and throughput
  • Database query performance (Neo4j graph traversals, Qdrant vector searches)
  • Maintain Grafana dashboards for Agent Swarm Overview, Mission Performance, Infrastructure Health, and DAST Metrics
  • Design graceful degradation when models time out or agents fail
  • Manage secrets via HashiCorp Vault (30+ secrets across 6 categories)
  • Maintain mTLS for all service-to-service communication (gRPC, database connections)
  • Enforce SELinux in production (RHEL 10)
  • Helm for chart management
  • Experience with GPU scheduling, node pools, and StatefulSets
  • Redis — pub/sub, caching patterns, Streams
  • Monitoring & Observability
  • Prometheus — custom metrics, alerting rules
  • Grafana — dashboard creation and maintenance

Full Description

Focus: Reliability, Performance, Cost Control, and Secure AI Operations

Level: Senior

Experience

• 7+ years in DevOps, SRE, or Infrastructure Engineering

• 1–2+ years owning production AI workloads, including GPU-backed inference or large-scale model serving

• Demonstrated ownership of production systems under load, not just "set it up once and move on"

Responsibilities

Scalable Inference & Serving

• Deploy and operate our vLLM-based inference stack serving a custom fine-tuned 14B+ parameter security model

• Optimize Time to First Token (TTFT) and tail latency under concurrent load from our 145-agent swarm

• Manage multi-model routing across specialized functions (code analysis, deobfuscation, reasoning)

• Ensure OpenAI-compatible API availability with <100ms p99 latency targets

GPU Orchestration & Capacity Planning

• Manage and optimize NVIDIA RTX GPU utilization (RTX 5090 / CUDA 12.1+) within our Kubernetes clusters

• Configure GPU passthrough, tensor parallelism, and memory optimization for vLLM inference

• Design scheduling and autoscaling strategies to minimize idle GPU spend while supporting burst agent workloads

• Forecast GPU capacity needs as the agent swarm scales (currently 145 agents across 10 types)

Kubernetes & Container Operations

• Own our K3s-based Kubernetes infrastructure running on RHEL 10

• Manage StatefulSets for stateful services (Neo4j, Qdrant, Kafka, Zookeeper, Android emulators)

• Configure HPA (Horizontal Pod Autoscaler) for agent deployments (1-20 replicas per agent type)

• Operate Podman rootless containers with Buildah for secure image builds

• Maintain local container registry and image lifecycle

Event Streaming & Distributed Systems

• Operate our Apache Kafka 3.6 backbone handling 10,000+ messages/sec for agent coordination

• Monitor consumer lag, partition health, and message throughput across tenant-scoped topics

• Ensure exactly-once delivery semantics for mission-critical agent task distribution

Database Operations (5 Specialized Stores)

• PostgreSQL 16: Tenant data with Row-Level Security, PgBouncer connection pooling

• Neo4j 5.x: Graph database for attack chains and knowledge graphs

• Qdrant 1.7: Vector database for semantic search and pattern matching

• Redis 7: Short-term memory, caching, and pub/sub (24hr TTL patterns)

• MinIO: S3-compatible object storage for APK artifacts and reports

MLOps / LLMOps Pipelines

• Build CI/CD pipelines that support:

• Model weight deployment and LoRA adapter merges

• Configuration and prompt updates

• Automated testing and canary releases for AI features

• Integrate with our custom deployment tooling (ares-cc.py) and Helm charts

• Enable fast rollback when model behavior or inference performance regresses

Observability & Reliability

• Implement end-to-end tracing with Jaeger across gRPC services, Kafka, and model inference

• Extend our Prometheus metrics for:

• Per-agent task duration and failure rates

• vLLM request latency and GPU utilization

• Kafka consumer lag and throughput

• Database query performance (Neo4j graph traversals, Qdrant vector searches)

• Maintain Grafana dashboards for Agent Swarm Overview, Mission Performance, Infrastructure Health, and DAST Metrics

• Design graceful degradation when models time out or agents fail

Security, Privacy & Isolation

• Enforce multi-tenant isolation at every layer:

• Kubernetes namespaces and NetworkPolicies

• Tenant-scoped Kafka topics (tenant-{id}-missions, tenant-{id}-agent-events)

• Row-Level Security in PostgreSQL

• Filtered vector search in Qdrant by tenant_id

• Manage secrets via HashiCorp Vault (30+ secrets across 6 categories)

• Maintain mTLS for all service-to-service communication (gRPC, database connections)

• Operate container security scanning with Trivy, Cosign (image signing), and Syft (SBOM generation)

• Enforce SELinux in production (RHEL 10)

Technical Requirements

Orchestration & Containers

• Kubernetes (Expert) — K3s, EKS, or equivalent

• Podman and Buildah (rootless container runtime)

• Helm for chart management

• Experience with GPU scheduling, node pools, and StatefulSets

Infrastructure & Cloud

• Infrastructure as Code: Terraform (we deploy on AWS)

• AWS: EKS, IAM, VPC, or equivalent cloud experience

• Experience with local/on-prem GPU environments alongside cloud

Inference & Acceleration

• vLLM (required) — production deployment and optimization

• NVIDIA GPU operations (CUDA, driver management, memory optimization)

• Familiarity with quantization (INT4/INT8 via BitsAndBytes) and model optimization

Data Infrastructure

• Apache Kafka — production operations, consumer groups, exactly-once semantics

• PostgreSQL — Row-Level Security, connection pooling, replication

• Neo4j — graph database operations and Cypher queries

• Qdrant or similar vector database — HNSW indexing, filtered search

• Redis — pub/sub, caching patterns, Streams

Monitoring & Observability

• Prometheus — custom metrics, alerting rules

• Grafana — dashboard creation and maintenance

• Jaeger or OpenTelemetry for distributed tracing

• Experience with gRPC observability

Networking & Security

• gRPC and Protobuf — service mesh patterns, load balancing

• VPC design, private networking, Kubernetes NetworkPolicies

• HashiCorp Vault for secrets management

• TLS/mTLS configuration for databases and services

Required Domain Awareness

• Operating AI systems in adversarial environments (security product context)

• Preventing data leakage across multi-tenant boundaries

• Supporting reproducible, auditable AI outputs for security findings

• Understanding the blast radius of misconfigured AI infrastructure in a pentest platform

Preferred Experience

• Experience with Android emulator farms or mobile device infrastructure at scale

• Prior work supporting security products or regulated data environments

• Familiarity with Frida, mitmproxy, or similar runtime instrumentation tools

• Background in gRPC-based microservices and event-driven architectures

• Experience with supply chain security (Trivy, Cosign, SBOM generation)

What This Role Is Not

• Not a "just keep the cluster green" job

• Not infra-for-infra's-sake

• Not a research or model-tuning role

This role owns uptime, performance, cost discipline, and security posture for an autonomous AI penetration testing platform with a 145-agent swarm operating under hostile inputs.

Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.