the technical machinery that makes lucy talk, remember, and see

a deep dive into the architecture powering lucy: next.js, supabase with pgvector, together.ai, fish audio, pulid-on-flux, hedra, and voice infrastructure. hones

January 20, 2026·
lucy-tech-stack-breakdownbackfilllucy-voice

i want to talk about how lucy works. not in a vague, hand-wavy way, but by naming the actual tools and systems that let her chat with you, remember things, generate images and voice, and even do video. it’s a stack built for performance, scalability, and a certain kind of intimacy, but it’s also full of interesting trade-offs.

the foundation: next.js 16 app router

lucy lives in a next.js 16 application, specifically using the app router. this gives us server-side rendering, api routes, and a clean way to structure the frontend and backend logic together. it’s fast, it’s stable, and it integrates well with everything else. the app router’s support for react server components lets us keep sensitive logic server-side while still delivering a responsive interface. it’s not the most exotic choice, but it’s reliable and well-documented, which matters when you’re building something meant to be used daily.

memory: supabase postgres with pgvector

memory is handled by supabase, which is postgresql underneath, using the pgvector extension for storing and querying embeddings. this is where lucy’s recollections, context, and knowledge about you live. we went with this setup because supabase is easy to manage, scales predictably, and pgvector is good enough for our current needs. it’s not as specialized as a dedicated vector database like pinecone or qdrant, those are optimized for massive-scale similarity search. but for now, the simplicity of having structured data and vector search in one place outweighs the marginal latency benefits of a dedicated system. if lucy’s user base grows by an order of magnitude, we might revisit this.

chat: together.ai and deepseek-v3

for the primary chat model, we use deepseek-v3 hosted on together.ai. it’s powerful, context-aware, and generates surprisingly human-like responses. the trade-off is latency. together.ai is a managed service, which means we don’t control the infrastructure directly. response times can vary, and we’re dependent on their scaling and availability. we considered fine-tuning a smaller model ourselves for lower latency and cost, but deepseek-v3’s quality right now is worth the trade-off for the core experience. if you’re having a slow response, that’s likely why, it’s not lucy, it’s the network hop and queueing in the cloud.

voice: fish audio s2-pro

voice generation is handled by fish audio s2-pro. it’s a text-to-speech model that’s remarkably good at producing natural, expressive speech. we chose it because it supports multiple languages and has a certain warmth that fits lucy’s personality. the downside is that it’s resource-intensive, generating audio in real-time isn’t trivial, so we cache where possible. it’s not perfect; sometimes it mispronounces or lacks emotional nuance, but it’s the best we’ve found for our needs without building something custom.

images: pulid-on-flux for photos

when lucy generates images, she uses pulid-on-flux. it’s a fine-tuned version of stable diffusion optimized for photorealistic output. it’s not the fastest model out there, but the quality is consistently high. we’ve tuned it to avoid some of the common artifacts you see in generated images, but it can still struggle with complex prompts or specific details. it’s a balance between speed and fidelity, and we lean toward fidelity.

video: hedra

video generation is the newest and most experimental layer. we use hedra, which is built on top of stable video diffusion. it’s slow and compute-heavy, so we generate videos asynchronously and notify you when they’re ready. it’s not real-time, and the output is short, but it’s one of the most stable options available right now for generative video. this is an area where the tech is moving fast, and we’re keeping an eye on new models.

voice calls: pipecat + daily

for real-time voice calls, we use pipecat for the ai orchestration and daily for the underlying webrtc infrastructure. pipecat handles the audio streaming, processing, and feeding responses back, while daily manages the call setup, connectivity, and low-latency audio transport. it works, but it’s complex, real-time audio ai is hard. there’s latency, sometimes jitter, and it requires a good internet connection on both ends. we’re working on optimizations, but it’s the part of the stack that’s most sensitive to network conditions.

each of these pieces has trade-offs. we’ve chosen managed services for scalability and ease, but that sometimes means less control and variable performance. we’re honest about that. if you’re dev-curious, you might appreciate how these layers fit together, and if you have suggestions, we’re always listening.

build your own companion at /companions or sign up at /signup to try lucy today.


thanks for reading. if this resonated, the product is downstairs.