< More Jobs

Posted on 6/11/2025

Senior AI Voice Engineer

allUP inc

Bronx, NY

Full-time
$84.13–$103.37 an hour

Qualifications

  • Experiment fearlessly, measure obsessively
  • 3+ years building real-time audio or telephony systems (Twilio, SignalWire, FreeSWITCH, or similar)
  • Deep knowledge of SIP, RTP, or WebRTC internals
  • Hands-on with streaming/websocket architectures and back-pressure handling
  • Worked on carrier integrations, number provisioning, or call routing at scale
  • Fluency with Node.js/TypeScript or another modern backend language in production
  • Experience with AWS CDK / IaC and event-driven pipelines
  • Proven track record of optimizing for sub-second latency and high concurrency
  • Background in speech science, TTS/ASR model tuning, or voice biometrics
  • Strong system-design skills: queues, backoffs, retries, observability
  • Early-stage startup or green-field product experience
  • Comfortable in AWS (ECS/Fargate, Lambda, DynamoDB/RDS, CloudWatch)
  • Familiarity with React; enough to build small internal UIs when needed
  • Bias for shipping, ownership, and clear communication
  • Active contributor to open-source voice/telephony projects

Benefits

  • Competitive Comp + Meaningful Equity – Grow with us as we scale
  • Flexible, People-First Culture – We work hard, but we protect family and personal time
  • Salary info: $175,000 to $215,000 per year with equity

Responsibilities

  • Architect & Own the Voice Pipeline
  • Design, build, and maintain low-latency, highly available services (Twilio, SIP/WebRTC, websockets, streaming gRPC) that power phone calls at scale
  • Push the Limits of Natural Speech
  • Integrate and fine-tune TTS/ASR models to pronounce symbols, numbers, names, and slang flawlessly
  • Deliver Call Features Users Love
  • Live transfers, IVR fallbacks, appointment scheduling, smart error recovery
  • Optimize for Speed & Reliability
  • Track every hop from carrier to model; squeeze milliseconds, harden edge cases, and own the on-call rotation that keeps calls flowing 24/7
  • Collaborate Across the Stack
  • Pair with product, ML, and frontend engineers to craft APIs and in-app controls that unlock amazing end-to-end experiences
  • Velocity – Weekly releases, minimal ceremony, rapid experimentation
  • Impact – Your code will be on the critical path for every customer interaction

Full Description

Voice Engineer - heyLibby

About heyLibby

heyLibby is a seed-stage startup on a mission to give every small-to-midsize fitness and wellness business a world-class AI team member. We were co-founded by Spencer Rascoff, the former CEO of Zillow and current CEO of Match Group. Our AI works seamlessly across phone, email, text, and chat to help businesses manage their communications efficiently. We move fast, prioritize impact, and work closely with top industry experts. We ship fast, learn fast, and measure ourselves by the tangible impact we deliver to customers.

Why this role matters

Our voice channel is the heart of the product. We need a seasoned engineer who lives and breathes real-time audio—someone who can make every call feel like a conversation with a helpful human, not a bot. You'll own the voice stack end-to-end: latency, natural-sounding TTS, crisp recognition, graceful live transfers, and bulletproof uptime.

What You'll Do

• Architect & Own the Voice Pipeline

Design, build, and maintain low-latency, highly available services (Twilio, SIP/WebRTC, websockets, streaming gRPC) that power phone calls at scale.

• Push the Limits of Natural Speech

Integrate and fine-tune TTS/ASR models to pronounce symbols, numbers, names, and slang flawlessly. Experiment fearlessly, measure obsessively.

• Deliver Call Features Users Love

Live transfers, IVR fallbacks, appointment scheduling, smart error recovery.

• Optimize for Speed & Reliability

Track every hop from carrier to model; squeeze milliseconds, harden edge cases, and own the on-call rotation that keeps calls flowing 24/7.

• Collaborate Across the Stack

Pair with product, ML, and frontend engineers to craft APIs and in-app controls that unlock amazing end-to-end experiences.

What we're looking for

Must-Have Experience

• 3+ years building real-time audio or telephony systems (Twilio, SignalWire, FreeSWITCH, or similar)

• Deep knowledge of SIP, RTP, or WebRTC internals

• Hands-on with streaming/websocket architectures and back-pressure handling

• Worked on carrier integrations, number provisioning, or call routing at scale

• Fluency with Node.js/TypeScript or another modern backend language in production

• Experience with AWS CDK / IaC and event-driven pipelines

• Proven track record of optimizing for sub-second latency and high concurrency

Bonus Points

• Background in speech science, TTS/ASR model tuning, or voice biometrics

• Strong system-design skills: queues, backoffs, retries, observability

• Early-stage startup or green-field product experience

• Comfortable in AWS (ECS/Fargate, Lambda, DynamoDB/RDS, CloudWatch)

• Familiarity with React; enough to build small internal UIs when needed

• Bias for shipping, ownership, and clear communication

• Active contributor to open-source voice/telephony projects

How We Work & What You Get

• Autonomy & Ownership – You're the voice authority; we trust you to drive decisions.

• Velocity – Weekly releases, minimal ceremony, rapid experimentation.

• Impact – Your code will be on the critical path for every customer interaction.

• Competitive Comp + Meaningful Equity – Grow with us as we scale.

• Flexible, People-First Culture – We work hard, but we protect family and personal time.

Salary info: $175,000 to $215,000 per year with equity.

Ready to build the most human-sounding AI voice on the market? Apply now and let's redefine how small businesses talk to their customers.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!