< More Jobs

Posted on 2026/01/21

Principal Cloud Architect / GPU / HPC / AI Infrastructure

Oracle

Hong Kong

Full-time

Full Description

Principal Cloud Architect – GPU / HPC / AI Infrastructure

Function: Pre-Sales / Solution Architecture – OCI Accelerated Computing & AI

About the Role

Are you excited by large-scale AI, GPU clusters and next-generation cloud infrastructure?

As a Principal Cloud Architect, you will be at the forefront of helping our customers design and implement accelerated computing and AI platforms on OracleCloud Infrastructure (OCI).

You will work directly with AI startups, digital-native unicorns, and strategic enterprise customers to architect and deploy:

• Large-scale GPU and HPC clusters

• LLM training and inference platforms

• Agentic AI and intelligent automation solutions

This role blends deep technical hands‑on work with customer‑facing solution consulting.

You will partner closely with sales, product and engineering teams to shape our customers’ AI journey and contribute to Oracle’s strategic vision for cloud and AI adoption in the region.

What You Will Do

In this role, you will:

• Design & deploy GPU/HPC infrastructure on OCI

• Architect large-scale GPU and HPC clusters on OCI (and hybrid environments)

• Use Terraform, Ansible, Slurm, Kubernetes and related tooling to build repeatable, automated deployments

• Define cluster architecture including node types, storage layout, networking, and security

• Build AI‑ready platforms

• Support LLM‑based solutions, agentic AI systems, and robotic / intelligent systems from proof‑of‑concept to production

• Collaborate with customers to size and tune infrastructure for training and inference workloads

• Implement best practices for performance, reliability, observability and cost optimization

• Be a trusted technical advisor

• Work with CTOs, Heads of AI, and senior engineering leaders to translate business problems into scalable AI/HPC architectures

• Provide guidance on cloud migration, hybrid deployments, and reference architectures for GPU/HPC workloads

• Lead technical workshops, design sessions, and deep‑deep discussions with customer engineering teams

• Drive customer enablement and internal advocacy

• Deliver training, hands‑on labs, and technical enablement on OCI AI/HPC capabilities

• Create and share code samples, deployment blueprints, reference architectures, and demos

• Contribute to blogs, whitepapers, best‑practice guides, or conference talks to showcase solutions and thought leadership

• Influence product & roadmap

• Collaborate with product and engineering teams to close technical gaps, relay customer feedback, and help shape the future of OCI accelerated computing

• Work with key AI partners and ISVs to integrate their solutions into customer architectures

Core Technical Requirements

To be successful, you should bring strong hands‑on experience in most of the following areas:

• Practical experience designing or operating GPU or HPC clusters (cloud and/or on‑prem)

• Understanding of cluster topology, GPU/CPU ratios, storage bandwidth, and scaling

• Automation & Infrastructure as Code

• Proficiency with Python, Bash, or PowerShell for scripting and tooling

• Hands‑on experience with Terraform and/or Ansible to automate infrastructure and cluster provisioning

• Experience with cluster managers and schedulers such as Slurm, PBS, or Bright

• Strong understanding of Kubernetes / container orchestration for AI or batch workloads

• High‑Performance Networking

• Knowledge of RDMA, InfiniBand, MPI, and distributed file systems used in HPC environments

• Experience troubleshooting or optimizing network and I/O bottlenecks in distributed workloads

• AI / ML Platform Experience

• Familiarity with AI/ML platforms, LLMs, and inference serving stacks (e.g. distributed training frameworks, model serving patterns)

• Understanding of GPU utilization, mixed precision, parallelism strategies (data/model/pipeline parallel)

Business & Leadership Skills

• 5+ years in pre‑sales, technical consulting, customer‑facing solution architecture, or equivalent roles

• Proven ability to present complex technical architectures to both deeply technical and senior business audiences

• Strong skills in requirements discovery, solution design, and storytelling around value and outcomes

• Comfortable leading design workshops, whiteboarding sessions, and technical decision‑making with customer engineering teams

• Passion for working with top‑tier customers and partners to deliver innovative cloud and AI solutions

• Ability to work independently with regional and global teams in a fast‑evolving AI/cloud landscape

Language & Location

• Language:

• Mandarin is mandatory, as many customers and partners in this role are Mandarin‑speaking.

• English proficiency is also required for internal collaboration and regional stakeholders.

• Location:

• Based in Hong Kong, supporting customers across Greater China and regional markets as needed.

Preferred Qualifications

• Demonstrated thought leadership in AI/HPC/cloud through:

• Publications, conference talks, community contributions, or open‑source projects

• Experience architecting or operating solutions on Oracle Cloud Infrastructure (OCI) or other major cloud platforms (AWS, Azure, GCP)

• Prior experience in AI/HPC solution pre‑sales or working directly with digital‑native / AI‑first companies

Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.