Posted on 6/11/2025

Senior AI Voice Engineer

allUP inc

Bronx, NY

Full-time

$84.13–$103.37 an hour

Apply Promote

Qualifications

Experiment fearlessly, measure obsessively
3+ years building real-time audio or telephony systems (Twilio, SignalWire, FreeSWITCH, or similar)
Deep knowledge of SIP, RTP, or WebRTC internals
Hands-on with streaming/websocket architectures and back-pressure handling
Worked on carrier integrations, number provisioning, or call routing at scale
Fluency with Node.js/TypeScript or another modern backend language in production
Experience with AWS CDK / IaC and event-driven pipelines
Proven track record of optimizing for sub-second latency and high concurrency
Background in speech science, TTS/ASR model tuning, or voice biometrics
Strong system-design skills: queues, backoffs, retries, observability
Early-stage startup or green-field product experience
Comfortable in AWS (ECS/Fargate, Lambda, DynamoDB/RDS, CloudWatch)
Familiarity with React; enough to build small internal UIs when needed
Bias for shipping, ownership, and clear communication
Active contributor to open-source voice/telephony projects

Benefits

Competitive Comp + Meaningful Equity – Grow with us as we scale
Flexible, People-First Culture – We work hard, but we protect family and personal time
Salary info: $175,000 to $215,000 per year with equity

Responsibilities

Architect & Own the Voice Pipeline
Design, build, and maintain low-latency, highly available services (Twilio, SIP/WebRTC, websockets, streaming gRPC) that power phone calls at scale
Push the Limits of Natural Speech
Integrate and fine-tune TTS/ASR models to pronounce symbols, numbers, names, and slang flawlessly
Deliver Call Features Users Love
Live transfers, IVR fallbacks, appointment scheduling, smart error recovery
Optimize for Speed & Reliability
Track every hop from carrier to model; squeeze milliseconds, harden edge cases, and own the on-call rotation that keeps calls flowing 24/7
Collaborate Across the Stack
Pair with product, ML, and frontend engineers to craft APIs and in-app controls that unlock amazing end-to-end experiences
Velocity – Weekly releases, minimal ceremony, rapid experimentation
Impact – Your code will be on the critical path for every customer interaction

Full Description

Voice Engineer - heyLibby

About heyLibby

heyLibby is a seed-stage startup on a mission to give every small-to-midsize fitness and wellness business a world-class AI team member. We were co-founded by Spencer Rascoff, the former CEO of Zillow and current CEO of Match Group. Our AI works seamlessly across phone, email, text, and chat to help businesses manage their communications efficiently. We move fast, prioritize impact, and work closely with top industry experts. We ship fast, learn fast, and measure ourselves by the tangible impact we deliver to customers.

Why this role matters

Our voice channel is the heart of the product. We need a seasoned engineer who lives and breathes real-time audio—someone who can make every call feel like a conversation with a helpful human, not a bot. You'll own the voice stack end-to-end: latency, natural-sounding TTS, crisp recognition, graceful live transfers, and bulletproof uptime.

What You'll Do

• Architect & Own the Voice Pipeline

Design, build, and maintain low-latency, highly available services (Twilio, SIP/WebRTC, websockets, streaming gRPC) that power phone calls at scale.

• Push the Limits of Natural Speech

Integrate and fine-tune TTS/ASR models to pronounce symbols, numbers, names, and slang flawlessly. Experiment fearlessly, measure obsessively.

• Deliver Call Features Users Love

Live transfers, IVR fallbacks, appointment scheduling, smart error recovery.

• Optimize for Speed & Reliability

Track every hop from carrier to model; squeeze milliseconds, harden edge cases, and own the on-call rotation that keeps calls flowing 24/7.

• Collaborate Across the Stack

Pair with product, ML, and frontend engineers to craft APIs and in-app controls that unlock amazing end-to-end experiences.

What we're looking for

Must-Have Experience

• 3+ years building real-time audio or telephony systems (Twilio, SignalWire, FreeSWITCH, or similar)

• Deep knowledge of SIP, RTP, or WebRTC internals

• Hands-on with streaming/websocket architectures and back-pressure handling

• Worked on carrier integrations, number provisioning, or call routing at scale

• Fluency with Node.js/TypeScript or another modern backend language in production

• Experience with AWS CDK / IaC and event-driven pipelines

• Proven track record of optimizing for sub-second latency and high concurrency

Bonus Points

• Background in speech science, TTS/ASR model tuning, or voice biometrics

• Strong system-design skills: queues, backoffs, retries, observability

• Early-stage startup or green-field product experience

• Comfortable in AWS (ECS/Fargate, Lambda, DynamoDB/RDS, CloudWatch)

• Familiarity with React; enough to build small internal UIs when needed

• Bias for shipping, ownership, and clear communication

• Active contributor to open-source voice/telephony projects

How We Work & What You Get

• Autonomy & Ownership – You're the voice authority; we trust you to drive decisions.

• Velocity – Weekly releases, minimal ceremony, rapid experimentation.

• Impact – Your code will be on the critical path for every customer interaction.

• Competitive Comp + Meaningful Equity – Grow with us as we scale.

• Flexible, People-First Culture – We work hard, but we protect family and personal time.

Salary info: $175,000 to $215,000 per year with equity.

Ready to build the most human-sounding AI voice on the market? Apply now and let's redefine how small businesses talk to their customers.

Apply Promote

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!