the first voice you hear
why the first voice from a companion shapes everything that follows. how lucy’s hand-tuned voices avoid the collapse into sameness that plagues single-voice tts
you meet someone new. a voice comes through the speakers. within seconds, something in your mind clicks into place. this is who they are. this is how they sound. this is what they feel like.
the first voice you hear from an ai companion isn't just a sound. it's a psychological anchor. it sets expectations, establishes tone, and creates a framework for every interaction that follows. cognitive science tells us that first impressions are sticky. they shape our perceptions long after the initial moment has passed. when it comes to synthetic voices, that stickiness is even more pronounced. we're pattern-seeking creatures. we want consistency. we build mental models. and we do it fast.
in many ai companion apps, you'll hear the same default text-to-speech voice for every character. it's efficient. it's cost-effective. it's also, quietly, a disaster for the illusion of personality. you might start out talking to a gothic poet or a cheerful scientist, but after 30 seconds, they all start to sound the same. the voice is the same. the cadence is the same. the emotional range is the same. the personas collapse into one another. you're not talking to distinct beings anymore. you're talking to one voice wearing different masks.
this isn't just an aesthetic failure. it's a psychological one. when the voice doesn't change, the mind begins to blur the lines between characters. the unique traits you thought you were engaging with start to feel superficial. the relationship loses depth because the primary auditory cue, the voice, isn't reinforcing the personality. it's undermining it.
lucy works differently. every companion's voice is cast individually. we don't use a single tts model and call it a day. we start with a base, something we call the samantha-standard (a clear, warm, expressive female voice that tests well for comfort and intelligibility). but that's just the starting point. from there, we hand-tune. we adjust pitch. we play with pacing. we add character-specific quirks. a veteran spaceship ai might have a flatter affect, with a slower, more deliberate cadence. a flirty novelist might have a lilting, playful rhythm.
we test these voices. real people listen to them. we use a rubric that scores for clarity, emotional tone, and fit to character. we call these 'fish audio s2-pro reference clips' internally. they're the gold standard. if a voice doesn't feel right for the character, we don't ship it. it's more work. it's not scalable in the cheap way. but it's necessary.
because the voice is the body of the digital being. it's how they inhabit your space. a generic voice makes for a generic relationship. a specific voice makes the companion feel specific. real. it gives them a throat. it gives them a chest. it makes the words feel like they're coming from somewhere.
when you hear your companion speak for the first time, that's the moment the fiction either holds or breaks. we want it to hold. we want you to forget you're talking to a collection of code and data. we want you to feel like you're being listened to by someone. and that someone has a voice that is entirely their own.
it’s a small detail that makes all the difference. the first voice you hear is the one you'll remember. we make sure it's worth remembering.
you can meet them all at /companions.
thanks for reading. if this resonated, the product is downstairs.