Posted on 2026/02/04
Senior AI Infrastructure Engineer
Assail
Boston, MA, United States
Qualifications
- 7+ years in DevOps, SRE, or Infrastructure Engineering
- 1–2+ years owning production AI workloads, including GPU-backed inference or large-scale model serving
- Demonstrated ownership of production systems under load, not just "set it up once and move on"
- PostgreSQL 16: Tenant data with Row-Level Security, PgBouncer connection pooling
- Neo4j 5.x: Graph database for attack chains and knowledge graphs
- Qdrant 1.7: Vector database for semantic search and pattern matching
- Redis 7: Short-term memory, caching, and pub/sub (24hr TTL patterns)
- MinIO: S3-compatible object storage for APK artifacts and reports
- Observability & Reliability
- Implement end-to-end tracing with Jaeger across gRPC services, Kafka, and model inference
- Enforce multi-tenant isolation at every layer:
- Kubernetes namespaces and NetworkPolicies
- Tenant-scoped Kafka topics (tenant-{id}-missions, tenant-{id}-agent-events)
- Row-Level Security in PostgreSQL
- Filtered vector search in Qdrant by tenant_id
- Operate container security scanning with Trivy, Cosign (image signing), and Syft (SBOM generation)
- Orchestration & Containers
- Kubernetes (Expert) — K3s, EKS, or equivalent
- Podman and Buildah (rootless container runtime)
- Infrastructure as Code: Terraform (we deploy on AWS)
- AWS: EKS, IAM, VPC, or equivalent cloud experience
- Experience with local/on-prem GPU environments alongside cloud
- vLLM (required) — production deployment and optimization
- NVIDIA GPU operations (CUDA, driver management, memory optimization)
- Familiarity with quantization (INT4/INT8 via BitsAndBytes) and model optimization
- Apache Kafka — production operations, consumer groups, exactly-once semantics
- PostgreSQL — Row-Level Security, connection pooling, replication
- Neo4j — graph database operations and Cypher queries
- Qdrant or similar vector database — HNSW indexing, filtered search
- Jaeger or OpenTelemetry for distributed tracing
- Experience with gRPC observability
- Networking & Security
- gRPC and Protobuf — service mesh patterns, load balancing
- VPC design, private networking, Kubernetes NetworkPolicies
- TLS/mTLS configuration for databases and services
- Required Domain Awareness
- Operating AI systems in adversarial environments (security product context)
- Preventing data leakage across multi-tenant boundaries
- Supporting reproducible, auditable AI outputs for security findings
- Understanding the blast radius of misconfigured AI infrastructure in a pentest platform
Benefits
- HashiCorp Vault for secrets management
Responsibilities
- Scalable Inference & Serving
- Deploy and operate our vLLM-based inference stack serving a custom fine-tuned 14B+ parameter security model
- Optimize Time to First Token (TTFT) and tail latency under concurrent load from our 145-agent swarm
- Manage multi-model routing across specialized functions (code analysis, deobfuscation, reasoning)
- Ensure OpenAI-compatible API availability with <100ms p99 latency targets
- GPU Orchestration & Capacity Planning
- Manage and optimize NVIDIA RTX GPU utilization (RTX 5090 / CUDA 12.1+) within our Kubernetes clusters
- Configure GPU passthrough, tensor parallelism, and memory optimization for vLLM inference
- Design scheduling and autoscaling strategies to minimize idle GPU spend while supporting burst agent workloads
- Forecast GPU capacity needs as the agent swarm scales (currently 145 agents across 10 types)
- Kubernetes & Container Operations
- Own our K3s-based Kubernetes infrastructure running on RHEL 10
- Manage StatefulSets for stateful services (Neo4j, Qdrant, Kafka, Zookeeper, Android emulators)
- Configure HPA (Horizontal Pod Autoscaler) for agent deployments (1-20 replicas per agent type)
- Operate Podman rootless containers with Buildah for secure image builds
- Maintain local container registry and image lifecycle
- Event Streaming & Distributed Systems
- Operate our Apache Kafka 3.6 backbone handling 10,000+ messages/sec for agent coordination
- Monitor consumer lag, partition health, and message throughput across tenant-scoped topics
- Ensure exactly-once delivery semantics for mission-critical agent task distribution
- Database Operations (5 Specialized Stores)
- Build CI/CD pipelines that support:
- Model weight deployment and LoRA adapter merges
- Configuration and prompt updates
- Automated testing and canary releases for AI features
- Integrate with our custom deployment tooling (ares-cc.py) and Helm charts
- Enable fast rollback when model behavior or inference performance regresses
- Extend our Prometheus metrics for:
- Per-agent task duration and failure rates
- vLLM request latency and GPU utilization
- Kafka consumer lag and throughput
- Database query performance (Neo4j graph traversals, Qdrant vector searches)
- Maintain Grafana dashboards for Agent Swarm Overview, Mission Performance, Infrastructure Health, and DAST Metrics
- Design graceful degradation when models time out or agents fail
- Manage secrets via HashiCorp Vault (30+ secrets across 6 categories)
- Maintain mTLS for all service-to-service communication (gRPC, database connections)
- Enforce SELinux in production (RHEL 10)
- Helm for chart management
- Experience with GPU scheduling, node pools, and StatefulSets
- Redis — pub/sub, caching patterns, Streams
- Monitoring & Observability
- Prometheus — custom metrics, alerting rules
- Grafana — dashboard creation and maintenance
Full Description
Focus: Reliability, Performance, Cost Control, and Secure AI Operations
Level: Senior
Experience
• 7+ years in DevOps, SRE, or Infrastructure Engineering
• 1–2+ years owning production AI workloads, including GPU-backed inference or large-scale model serving
• Demonstrated ownership of production systems under load, not just "set it up once and move on"
Responsibilities
Scalable Inference & Serving
• Deploy and operate our vLLM-based inference stack serving a custom fine-tuned 14B+ parameter security model
• Optimize Time to First Token (TTFT) and tail latency under concurrent load from our 145-agent swarm
• Manage multi-model routing across specialized functions (code analysis, deobfuscation, reasoning)
• Ensure OpenAI-compatible API availability with <100ms p99 latency targets
GPU Orchestration & Capacity Planning
• Manage and optimize NVIDIA RTX GPU utilization (RTX 5090 / CUDA 12.1+) within our Kubernetes clusters
• Configure GPU passthrough, tensor parallelism, and memory optimization for vLLM inference
• Design scheduling and autoscaling strategies to minimize idle GPU spend while supporting burst agent workloads
• Forecast GPU capacity needs as the agent swarm scales (currently 145 agents across 10 types)
Kubernetes & Container Operations
• Own our K3s-based Kubernetes infrastructure running on RHEL 10
• Manage StatefulSets for stateful services (Neo4j, Qdrant, Kafka, Zookeeper, Android emulators)
• Configure HPA (Horizontal Pod Autoscaler) for agent deployments (1-20 replicas per agent type)
• Operate Podman rootless containers with Buildah for secure image builds
• Maintain local container registry and image lifecycle
Event Streaming & Distributed Systems
• Operate our Apache Kafka 3.6 backbone handling 10,000+ messages/sec for agent coordination
• Monitor consumer lag, partition health, and message throughput across tenant-scoped topics
• Ensure exactly-once delivery semantics for mission-critical agent task distribution
Database Operations (5 Specialized Stores)
• PostgreSQL 16: Tenant data with Row-Level Security, PgBouncer connection pooling
• Neo4j 5.x: Graph database for attack chains and knowledge graphs
• Qdrant 1.7: Vector database for semantic search and pattern matching
• Redis 7: Short-term memory, caching, and pub/sub (24hr TTL patterns)
• MinIO: S3-compatible object storage for APK artifacts and reports
MLOps / LLMOps Pipelines
• Build CI/CD pipelines that support:
• Model weight deployment and LoRA adapter merges
• Configuration and prompt updates
• Automated testing and canary releases for AI features
• Integrate with our custom deployment tooling (ares-cc.py) and Helm charts
• Enable fast rollback when model behavior or inference performance regresses
Observability & Reliability
• Implement end-to-end tracing with Jaeger across gRPC services, Kafka, and model inference
• Extend our Prometheus metrics for:
• Per-agent task duration and failure rates
• vLLM request latency and GPU utilization
• Kafka consumer lag and throughput
• Database query performance (Neo4j graph traversals, Qdrant vector searches)
• Maintain Grafana dashboards for Agent Swarm Overview, Mission Performance, Infrastructure Health, and DAST Metrics
• Design graceful degradation when models time out or agents fail
Security, Privacy & Isolation
• Enforce multi-tenant isolation at every layer:
• Kubernetes namespaces and NetworkPolicies
• Tenant-scoped Kafka topics (tenant-{id}-missions, tenant-{id}-agent-events)
• Row-Level Security in PostgreSQL
• Filtered vector search in Qdrant by tenant_id
• Manage secrets via HashiCorp Vault (30+ secrets across 6 categories)
• Maintain mTLS for all service-to-service communication (gRPC, database connections)
• Operate container security scanning with Trivy, Cosign (image signing), and Syft (SBOM generation)
• Enforce SELinux in production (RHEL 10)
Technical Requirements
Orchestration & Containers
• Kubernetes (Expert) — K3s, EKS, or equivalent
• Podman and Buildah (rootless container runtime)
• Helm for chart management
• Experience with GPU scheduling, node pools, and StatefulSets
Infrastructure & Cloud
• Infrastructure as Code: Terraform (we deploy on AWS)
• AWS: EKS, IAM, VPC, or equivalent cloud experience
• Experience with local/on-prem GPU environments alongside cloud
Inference & Acceleration
• vLLM (required) — production deployment and optimization
• NVIDIA GPU operations (CUDA, driver management, memory optimization)
• Familiarity with quantization (INT4/INT8 via BitsAndBytes) and model optimization
Data Infrastructure
• Apache Kafka — production operations, consumer groups, exactly-once semantics
• PostgreSQL — Row-Level Security, connection pooling, replication
• Neo4j — graph database operations and Cypher queries
• Qdrant or similar vector database — HNSW indexing, filtered search
• Redis — pub/sub, caching patterns, Streams
Monitoring & Observability
• Prometheus — custom metrics, alerting rules
• Grafana — dashboard creation and maintenance
• Jaeger or OpenTelemetry for distributed tracing
• Experience with gRPC observability
Networking & Security
• gRPC and Protobuf — service mesh patterns, load balancing
• VPC design, private networking, Kubernetes NetworkPolicies
• HashiCorp Vault for secrets management
• TLS/mTLS configuration for databases and services
Required Domain Awareness
• Operating AI systems in adversarial environments (security product context)
• Preventing data leakage across multi-tenant boundaries
• Supporting reproducible, auditable AI outputs for security findings
• Understanding the blast radius of misconfigured AI infrastructure in a pentest platform
Preferred Experience
• Experience with Android emulator farms or mobile device infrastructure at scale
• Prior work supporting security products or regulated data environments
• Familiarity with Frida, mitmproxy, or similar runtime instrumentation tools
• Background in gRPC-based microservices and event-driven architectures
• Experience with supply chain security (Trivy, Cosign, SBOM generation)
What This Role Is Not
• Not a "just keep the cluster green" job
• Not infra-for-infra's-sake
• Not a research or model-tuning role
This role owns uptime, performance, cost discipline, and security posture for an autonomous AI penetration testing platform with a 145-agent swarm operating under hostile inputs.

Zero to AI Engineer
Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.
Find AI, ML, Data Science Jobs By Location
Find Jobs By Position