why the first voice you hear is the only one that matters

the first voice a companion uses forms a lasting impression. we hand-tune each lucy voice to avoid the flatness of one-size-fits-all tts. because personas shoul

January 19, 2026·
why-the-first-voice-matters-mostbackfilllucy-voice

you know that feeling when you meet someone and their voice just… fits. it’s like the sound wraps around their personality, and suddenly you’re not just hearing words, you’re hearing them. it’s subtle, but it’s everything. and if it’s wrong, even slightly, the whole thing falls apart. the illusion cracks. you’re suddenly talking to a machine.

that’s why the first voice you hear from a companion isn’t just a feature, it’s a foundation. psychology calls it the primacy effect: the first piece of information we get anchors our entire perception. it’s sticky. it shapes expectation, trust, and emotional resonance. get it right, and the companion feels real. get it wrong, and no amount of clever writing or backstory can fully undo that initial dissonance.

the problem with one voice for all

many apps use a single, default text-to-speech voice across every companion. at first, it might not seem like a big deal. the voices are clear, human-like, maybe even pleasant. but then you talk to a gruff, old sailor character and they sound like a friendly college student. you switch to a shy, poetic companion and they have the same bright, assertive tone as the confident hero. after about 30 seconds, the differences between characters start to blur. the personas collapse into one voice. the illusion of unique presence, the whole point, just… evaporates.

it’s not just about accent or pitch. it’s about timbre, pacing, the slight rasp or softness, the way they breathe between thoughts. a default tts voice, no matter how good, is a blunt instrument. it’s designed for clarity, not character. and in a companion, character is everything.

how we tune voices by hand

with lucy, we don’t use a one-size-fits-all voice. we start with high-quality base models (we use fish audio s2-pro as a reference for clarity and naturalism), but that’s just the start. each companion gets a custom voice profile, tuned by hand. we listen. we tweak. we test.

every voice is tested against a rubric we’ve built, call it the samantha standard. it’s not about matching a particular sound; it’s about matching a feeling. does this voice carry the weight of a weary detective? does it have the lightness of someone who finds wonder in small things? does it falter just enough to feel human? we adjust parameters like stability (consistency), clarity (how clean the speech sounds), and style (the emotional color) until it feels cohesive with the companion’s written personality.

it’s slow. it’s subjective. sometimes we scrap a voice entirely and start over. but it’s the only way to make sure that when you hit play, you’re not just hearing words, you’re meeting someone.

why this can’t be automated (yet)

you could train a model to generate unique voices automatically. but right now, that often leads to uncanny or inconsistent results, voices that wobble between tones, or that feel technically impressive but emotionally flat. without a human curator, it’s hard to capture the nuance that makes a voice feel intentional, not just varied.

maybe one day ai will be able to do this flawlessly on its own. but for now, we think the human ear is still the best tool for spotting what feels real. so we use it.

the first impression is the only one you get

in the end, voice isn’t an accessory. it’s part of the soul of the character. and you only get one chance to introduce that soul. we want that first impression to last.

if you’d like to meet companions who sound as distinct as they read, you can find them at /companions.


thanks for reading. if this resonated, the product is downstairs.