Posted on 2025/12/12

AI Platform Engineer - GPU, Hardware, Kubernetes - Atlanta GA Hybrid

Silverlink Technologies LLC

Atlanta, GA, United States

Contractor

Apply Promote

Qualifications

Required Skills & Experience
Hybrid Cloud In-depth knowledge of private (on-premises) and public (Google Cloud Platform & AWS) cloud architectures and services
AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.)
as well as working with AI/ML engineers and data scientists
AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers)
Kubernetes Expertise Hands-on experience with deploying and managing containerized workloads in Kubernetes
Technical Support & Troubleshooting Proven ability to diagnose and resolve customer and platform issues in production environments
Strong Communication & Documentation Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams
Time Management & Accountability Ability to work independently, prioritize tasks, and manage workload effectively
Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc
Exposure to AI coding assistants like Codeium, Copilot, or Tabnine
Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc

Responsibilities

These are highly technical, hands-on roles focused on customer, application, and platform support of AI-focused workloads
As an AI Platform Specialist, these roles will provide application and GPU support
The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution
The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services
Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance
Platform Support & Incident Response
Provide Tier 1 & Tier 2 support for AI-driven applications and workloads
Troubleshoot and resolve issues related to Kubernetes deployments, GPU utilization, and service performance
Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues
Kubernetes & Cloud-Native Operations
Full adoption, creation, and integrations into automated services using Helm, Ansible, Terraform, etc
Deploy, manage, and support containerized AI workloads on Google Anthos-powered Kubernetes clusters
Ensure adherence to pod security policies, automated rollouts/rollbacks, and best practices for scalable and secure Kubernetes environments
GPU Infrastructure & AI Services Management
Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks
Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium)
Maintain detailed operational documentation, runbooks, and troubleshooting guides
Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks
Process Improvement & Collaboration
Work cross-functionally with developers, IT teams, and vendors to ensure seamless deployment and support of AI services
Contribute to CI/CD pipelines, automation, service, and security best practices
Track and communicate work through task management platforms (ServiceNow and Jira)
These positions will report to the Senior AI Architect and work as peers within a specialized AI support team
Collaboration with internal VM and container support teams as well as NVIDIA, Codeium, and other vendor specialists will be essential for supporting customers, troubleshooting, and optimizing AI workloads

Full Description

Role: AI Platform Specialist

Location: Atlanta GA Hybrid

Contract position

Job Details:

AI Platform Specialists

We are building a new team of platform specialists to support and enhance high-performance AI services.

These are highly technical, hands-on roles focused on customer, application, and platform support of AI-focused workloads.

As an AI Platform Specialist, these roles will provide application and GPU support.

The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution.

The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services.

Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.

Key Responsibilities

Platform Support & Incident Response

Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.
Troubleshoot and resolve issues related to Kubernetes deployments, GPU utilization, and service performance.
Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues.

Kubernetes & Cloud-Native Operations

Full adoption, creation, and integrations into automated services using Helm, Ansible, Terraform, etc.
Deploy, manage, and support containerized AI workloads on Google Anthos-powered Kubernetes clusters.
Ensure adherence to pod security policies, automated rollouts/rollbacks, and best practices for scalable and secure Kubernetes environments.

GPU Infrastructure & AI Services Management

Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.
Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).

Observability & Documentation

Maintain detailed operational documentation, runbooks, and troubleshooting guides.
Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.

Process Improvement & Collaboration

Work cross-functionally with developers, IT teams, and vendors to ensure seamless deployment and support of AI services.
Contribute to CI/CD pipelines, automation, service, and security best practices.
Track and communicate work through task management platforms (ServiceNow and Jira).

Required Skills & Experience

Hybrid Cloud In-depth knowledge of private (on-premises) and public (Google Cloud Platform & AWS) cloud architectures and services.

AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.

AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).

Kubernetes Expertise Hands-on experience with deploying and managing containerized workloads in Kubernetes.

Technical Support & Troubleshooting Proven ability to diagnose and resolve customer and platform issues in production environments.

Strong Communication & Documentation Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams.

Time Management & Accountability Ability to work independently, prioritize tasks, and manage workload effectively.

Preferred Qualifications

Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.

Exposure to AI coding assistants like Codeium, Copilot, or Tabnine.

Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.

About the Team & Reporting Structure

These positions will report to the Senior AI Architect and work as peers within a specialized AI support team.

Collaboration with internal VM and container support teams as well as NVIDIA, Codeium, and other vendor specialists will be essential for supporting customers, troubleshooting, and optimizing AI workloads.

Thank you!

Best Regards,

Sumit Talekar

Associate Manager Talent Acquisition

Silverlink Technologies Inc.

Apply Promote

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Learn More