< More Jobs

Posted on 4/5/2025

AI Infrastructure Operations Lead

Xai

Memphis, TN

Full-time

Qualifications

  • Work ethic and strong prioritization skills are important
  • All engineers and researchers are expected to have strong communication skills
  • They should be able to concisely and accurately share knowledge with their teammates
  • We're looking for someone with technical expertise and a proactive approach to maintain and scale our facilities effectively

Responsibilities

  • As the Associate Site Operations Manager, you'll oversee the data center technicians who keep xAI's AI infrastructure running smoothly
  • This role is pivotal in ensuring our systems operate at peak efficiency, supporting the compute power behind our mission
  • You'll co-lead a skilled team, manage critical operations, and implement smart, sustainable solutions
  • Oversee Site Operations: Manage power, cooling, networking, and hardware deployments to ensure 99.999% uptime for xAI's AI compute systems, keeping our infrastructure reliable and ready for innovation
  • Guide Your Team: Lead and develop a team of Data Center Operations Technicians through training, performance evaluations, and fostering a collaborative, high-performing environment tied to xAI's objectives
  • Streamline Processes: Take charge of hardware lifecycles, incident resolution, and inventory management, refining procedures to ensure your team operates with precision and consistency
  • Connect Key Players: Coordinate between technicians, xAI's AI specialists, and external vendors to integrate new technology and expand capacity seamlessly
  • Drive Sustainable Solutions: Champion energy-efficient practices and sustainability efforts, optimizing resources while supporting the demands of cutting-edge AI workloads
  • Measure Success: Track and report key metrics like uptime, power efficiency, and issue resolution times, using data to enhance site performance and inform decisions
  • Handle Emergencies: Lead the team through urgent situations with clear direction, resolving issues quickly to protect our AI systems from disruption
  • Optimize Operations: Build and refine processes-such as preventative maintenance schedules with vendors and ticket workflows in Jira-to keep operations efficient and scalable
  • Support Expansion: Work with leadership to standardize best practices across sites (if applicable), ensuring operations align with xAI's ambitious growth plans

Full Description

About xAI

Our Mission

We are committed to creating AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence.

We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.

All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

As the Associate Site Operations Manager, you'll oversee the data center technicians who keep xAI's AI infrastructure running smoothly. This role is pivotal in ensuring our systems operate at peak efficiency, supporting the compute power behind our mission. You'll co-lead a skilled team, manage critical operations, and implement smart, sustainable solutions.

We're looking for someone with technical expertise and a proactive approach to maintain and scale our facilities effectively.

Responsibilities

• Oversee Site Operations: Manage power, cooling, networking, and hardware deployments to ensure 99.999% uptime for xAI's AI compute systems, keeping our infrastructure reliable and ready for innovation.

• Guide Your Team: Lead and develop a team of Data Center Operations Technicians through training, performance evaluations, and fostering a collaborative, high-performing environment tied to xAI's objectives.

• Streamline Processes: Take charge of hardware lifecycles, incident resolution, and inventory management, refining procedures to ensure your team operates with precision and consistency.

• Connect Key Players: Coordinate between technicians, xAI's AI specialists, and external vendors to integrate new technology and expand capacity seamlessly.

• Drive Sustainable Solutions: Champion energy-efficient practices and sustainability efforts, optimizing resources while supporting the demands of cutting-edge AI workloads.

• Measure Success: Track and report key metrics like uptime, power efficiency, and issue resolution times, using data to enhance site performance and inform decisions.

• Handle Emergencies: Lead the team through urgent situations with clear direction, resolving issues quickly to protect our AI systems from disruption.

• Optimize Operations: Build and refine processes-such as preventative maintenance schedules with vendors and ticket workflows in Jira-to keep operations efficient and scalable.

• Support Expansion: Work with leadership to standardize best practices across sites (if applicable), ensuring operations align with xAI's ambitious growth plans.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!