Posted on 2026/02/25

AI Infrastructure & Solutions Architect

University of Toronto

Toronto, ON

Full-time

Apply Promote

Job description Date Posted: 02/24/2026

Req ID: 47032

Faculty/Division: Ofc of the Chief Information Officer

Department: Enterprise Infrastructure Solutions

Campus: St. George (Downtown Toronto)

Position Number: 00058955

Existing Vacancy: Yes

Description:

About us:

The Enterprise Infrastructure Solutions (EIS) group, part of the Information Technology Services (ITS) division, is responsible for campus core net...work, campus wireless, wide area network connectivity and internet connectivity for the University, including connectivity to research and education networks.

EIS is also responsible for services related to departmental network management, network, server and storage infrastructure, Windows and Linux servermanagement services, database and application integration and support, enterprise backup service, 24/7 operation of central administrative data centers and telecommunications services.

If you’re motivated and passionate about learning technologies and dedicated to improving experiences for today’s s

student, consider a career with us.

Your opportunity:

Reporting to the Manager, AI Engineering & Operations within the Enterprise Infrastructure Solutions group, the AI Infrastructure & Solutions Architect plays a critical role in defining the future of research and administrative computing at the University.

In this role, you will lead the architectural design and deployment of secure, scalable AI platforms that serve the entire campus community.

You will bridge the gap between high-performance hardware and practical user applications, ensuring that our AI infrastructure, from GPU infrastructure to sovereign data sandboxes, are reliable, supportable, and aligned with institutional governance and ethical AI frameworks.

You will collaborate closely with AI Developers & Integration Specialists, while retaining ownership of platform architecture, reliability, security posture, and lifecycle management across on-premises and hybrid cloud environments.

This role offers a rare opportunity to design and operate sovereign, on-premises AI platforms at institutional scale, supporting both cutting-edge research and mission-critical administrative use cases.

Your responsibilities will include:

• Architecting and operating container orchestration platforms (Kubernetes/K8s), including GPU operators and AI-aware schedulers for efficient accelerator utilization.

• Designing and enforcing AI platform security and governance controls, including data sovereignty, access isolation, auditability, and compliance with privacy and ethical AI frameworks.

• Implementing observability and monitoring solutions to track model performance, drift, GPU utilization, inference latency, and platform health using tools such as Prometheus, Grafana, or specialized AI monitoring stacks.

• Designing and operating GPU and accelerator platforms (e.g., NVIDIA, AMD, or emerging accelerators), including capacity planning, scheduling strategies, and lifecycle management.

• Analyzing platform usage metrics to optimize token consumption, GPU allocation, and overall cost efficiency while maintaining performance and reliability.

• Partnering with AI Developers & Integration Specialists to define platform abstractions, deployment patterns, and service interfaces that enable rapid innovation without compromising security or supportability.

• Producing and maintaining architectural documentation, disaster recovery and business continuity plans, and technical guidance for researchers and platform users.

Essential Qualifications:

• Bachelor’s degree in Computer Science, Information Technology, Engineering, or an acceptable combination of education and equivalent experience.

• Eight or more years of experience on on-premises or cloud infrastructure management

• Three to five+ years of direct AI infrastructure or MLOps experience, recognizing the rapid evolution of the field, with demonstrated exposure to LLM platforms, RAG pipelines, or large-scale ML systems.

• Deep expertise in Infrastructure as Code (IaC) and automation for managing complex, multi-environment platforms

• Advanced Kubernetes and container orchestration knowledge, including GPU scheduling, operators, and container runtimes (Docker, Podman).

• MLOps and model lifecycle tooling experience, such as MLflow, Kubeflow, Weights & Biases, and model serving frameworks like Triton or vLLM.

• Strong understanding of high-performance and AI-optimized networking, including high-throughput, low-latency designs (e.g., InfiniBand, RDMA) and hybrid connectivity.

• Experience designing or operating hybrid cloud architectures, including at least one major cloud platform (AWS, Azure, or GCP), with awareness of identity, networking, and cost management considerations.

Assets (Nonessential):

• Experience with vector databases and retrieval systems supporting RAG-based architectures.

• Hands-on experience deploying open-weights AI models (e.g., Llama, Mistral) in on-premises, air-gapped, or tightly governed environments.

• Knowledge of AI security practices, including adversarial ML considerations or secure AI framework implementations.

• Familiarity with Canadian data sovereignty, privacy, and research compliance requirements, particularly within higher education.

• Contributions to open-source AI, infrastructure, or MLOps projects, or active participation in the AI engineering community.

• Experience with ITSM platforms (e.g., ServiceNow) and operational workflow automation.

To be successful in this role you will be:

• Decisive

• Self-driven

• Efficient

• Organized

• Proactive

• Innovative

Please note: This is a new position.

A copy of the detailed position description is available to USW employees upon request to the Operations and Real Estate Partnerships Office.

Closing Date: 03/17/2026, 11:59PM ET

Employee Group: USW

Appointment Type: Budget - Continuing

Schedule: Full-Time

Pay Scale Group & Hiring Zone:

USW Pay Band 19 -- $123,756. with an annual step progression to a maximum of $158,256.

Pay scale and job class assignment is subject to determination pursuant to the Job Evaluation/Pay Equity Maintenance Protocol.

Job Category: Information Technology (IT)

Recruiter: Khristen Sivaramalingam

Lived Experience Statement

Candidates who are members of Indigenous, Black, racialized and 2SLGBTQ+ communities, persons with disabilities, and other equity deserving groups are encouraged to apply, and their lived experience shall be taken into consideration as applicable to the posted position. Show full description Choose what you’re giving feedback on Report this listing

Apply Promote

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Learn More