< More Jobs

Posted on 2025/12/13

Observability, Automation & AI Ops Engineer - MetLife HACK4JOB

MetLife

Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Full-time

Full Description

MetLife – Kuala Lumpur, Malaysia

Observability, Automation & AI Ops Engineer

The Observability, Automation & AI Ops Engineer is responsible for designing, implementing, and optimizing advanced monitoring, automation, and AI-driven operations solutions across MetLife’s hybrid cloud and on-premises environments.

This role ensures high availability, reliability, and efficiency of IT services by leveraging modern observability platforms, automation frameworks, and artificial intelligence for proactive incident management and continuous improvement.

Key Responsibilities

Observability Engineering

• Design, deploy, and manage observability platforms (Elastic, Splunk, Prometheus, Grafana, OpenTelemetry) for end‑to‑end visibility of applications, infrastructure, and business services.

• Develop and maintain telemetry pipelines for logs, metrics, traces, and events.

• Build dashboards and automated alerting systems with AI‑powered anomaly detection.

• Collaborate with DevOps, SRE, and application teams to integrate observability into CI/CD pipelines and cloud‑native architectures.

• Analyze system health, identify trends, and drive data‑driven decisions for performance optimization and reliability.

Automation Engineering

• Design, implement, and maintain automation solutions for infrastructure provisioning, configuration management, and operational workflows (Ansible, Terraform, CI/CD tools).

• Develop self‑healing scripts and intelligent runbooks for automated incident response and remediation.

• Integrate automation with monitoring and ITSM tools to streamline operations and reduce manual intervention.

• Lead or participate in automation projects to improve efficiency, reduce errors, and support business agility.

• Stay current with emerging automation technologies and best practices.

• Implement and maintain AI‑driven systems for real‑time monitoring, predictive analytics, and automated root cause analysis.

• Develop and train machine learning models using operational data for anomaly detection and forecasting.

• Deploy and manage AIOps platforms (Moogsoft, Dynatrace, DataDog, Elastic) to enable proactive incident management and self‑healing capabilities.

• Collaborate with IT, DevOps, and Data Science teams to integrate AI/ML into IT operations and service management.

• Monitor and optimize AI model performance, ensuring reliability and continuous improvement.

Technical Leadership & Collaboration

• (Senior Level) Mentor junior engineers, provide technical guidance, and lead cross‑functional project teams.

• Drive adoption of observability, automation, and AI Ops best practices across the organization.

• Participate in technology evaluations, pilots, and rollouts of new solutions.

Qualifications & Skills

Experience

• Associate: 0–2 years in observability, automation, or IT operations.

• Engineer: 2–5 years relevant experience.

• Senior: 5+ years with demonstrated technical and/or team leadership.

Skills

• Proficiency in observability platforms (Elastic, Splunk, Prometheus, Grafana, OpenTelemetry).

• Strong experience with automation tools (Ansible, Terraform, CI/CD, scripting languages).

• Familiarity with AIOps platforms and AI/ML frameworks (Scikit‑learn, TensorFlow, PyTorch).

• Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).

• Excellent troubleshooting, analytical, and communication skills.

• (Senior Level) Ability to lead, mentor, and manage technical teams.

Preferred Certifications

• Relevant certifications in observability, automation, cloud, or AI/ML platforms are a plus.

• ITIL v4

Language Requirements

• Business proficiency in English.

• Proficiency in Japanese is an added bonus.

Why This Role Matters

This role is critical to MetLife’s digital transformation, enabling proactive, data‑driven IT operations, reducing downtime, and accelerating innovation through automation and AI.

The application for this hackathon is open to individuals from all countries.

The job opportunities are based in Kuala Lumpur, Malaysia.

Ready to innovate and showcase your skills?

Join the MetLife Hack4Job event today—click Apply and secure your spot!

Additional Information

• Seniority level: Mid‑Senior level

• Employment type: Full‑time

• Job function: Information Technology

• Industries: Banking and Financial Services