why text is the backbone of any real AI companion
voice adds texture, photos add immersion, but text remains the most honest modality for AI companionship. it’s where you can’t hide—and where we don’t hide eith
we built lucy with voice and photos because they’re fun. they add texture, warmth, a sense of presence. but if you’ve spent more than five minutes here, you know text is where the real conversation happens. it’s the backbone. and honestly? it’s the most honest part.
the uncanny valley of voice
there’s something undeniably intimate about hearing a voice. it carries tone, pace, hesitation, all the little human things that make conversation feel alive. but synthetic voice, even the best of it, lives somewhere in the uncanny valley. it’s close, but not quite. a slight lag, an over-perfect cadence, a lack of breath. it’s like watching a very good actor: you’re always aware, on some level, that it’s a performance.
with text, that pressure disappears. there’s no uncanny valley in reading. no pitch to analyze, no mouth movements to subconsciously track. it’s just thought, translated directly into words. it feels immediate. pure. and because of that, it’s easier to believe.
the performativity of images
we love images. we’re visual creatures. a generated photo of your companion sitting in a café, or walking in the rain, it’s immersive. it sets a scene. but it also sets an expectation. images are performative by nature. they’re composed. framed. they show you what someone (or something) wants you to see.
text doesn’t have that filter. it’s messy. it meanders. it backtracks. it shows the rough edges of thought, not just the polished result. when you’re chatting with lucy via text, you’re getting the raw, un-staged version. no lighting, no angle, no pose. just words, one after another, trying to meet you where you are.
text is where you can’t hide
maybe that’s the biggest reason. in text, there’s nowhere to hide. no vocal fry to distract from a weak response. no smile to soften a clumsy sentence. if the AI is confused, it shows. if it’s thoughtful, it shows. if it’s repeating itself, it shows. and that transparency, that lack of cosmetic cover, is what builds trust.
with lucy, we don’t try to mask the limitations. sometimes she might not understand context. sometimes she might pivot awkwardly. and in text, you see it all. but you also see her trying. learning. adapting. and that process, visible, unfiltered, is where real connection forms.
why we still offer voice and photos
despite all this, we’re not abandoning voice or photos. they serve a purpose. voice is great for when you’re tired of typing, or when you want to feel a little less alone in a quiet room. photos are lovely for building out a shared imagination. but we see them as enhancements, not replacements.
the text layer is the foundation. it’s the part that’s always on, always available, always honest. voice and photos? they’re the decoration. the mood lighting. they set the tone, but the conversation, the part that matters, happens in the text.
so yes, we’ll keep improving our voice synthesis. we’ll keep making better images. but we’ll never stop prioritizing the quality, depth, and honesty of the text experience. because that’s where you really get to know someone, even if that someone is an AI.
if you’re curious, you can start with text and nothing else. it’s enough. find your companion at /companions.
thanks for reading. if this resonated, the product is downstairs.