the invisible chain that keeps lucy talking
how lucy stays online even when a major ai provider has an outage. a look at the failover system, fallback models, and circuit-breakers that prevent downtime.
this week, deepseek-v3 had a regional outage. if you were chatting with lucy during that window, you might have noticed your replies came in a little slower. maybe the tone felt slightly less sharp, a bit more generic. but the conversation kept going. you weren't staring at a loading spinner or an error message. here's how that happened, and why it's one of the most boring but critical parts of building something that doesn't break.
the three-try chain with a safety net
for every message you send, lucy doesn't just call one ai model and hope for the best. it follows a chain: first, it tries deepseek-v3. if that fails (timeout, error, regional outage), it waits a moment and tries the first fallback: llama-3.3-70b. if that also fails, it waits a bit longer and tries the second fallback: qwen-72b. only if all three fail do you see an error. the waiting between tries isn't random; it uses exponential backoff. that means each wait is a bit longer than the last, which helps avoid overwhelming any provider when they're already struggling.
why fallback models feel different
if deepseek-v3 is our primary, why not use it all the time? because sometimes it's down. the fallbacks are excellent models, but they're not fine-tuned for companion-style conversation in the same way. deepseek-v3 has been specifically optimized for the nuance and tone that make lucy feel like lucy. llama-3.3-70b and qwen-72b are powerful general-purpose models, but they don't have that same training. so during an outage, you might notice the responses are a little less tailored, a bit more straightforward. it's the difference between a custom-tailored suit and a very good off-the-rack one. it still works, but the fit isn't perfect.
the circuit-breaker that saves your credits
one of the worst things a system can do during an outage is keep hammering a broken provider. it burns through api credits, slows everything down, and doesn't help anyone. so we use a circuit-breaker pattern. if a provider fails repeatedly, it gets 'tripped'. for the next five minutes, lucy won't even try to use that provider. it just skips straight to the fallbacks. after five minutes, it tries again. if it works, great. if not, the circuit trips again. this isn't just about reliability; it's about not wasting your money on api calls that go nowhere.
why none of this is exciting (and that's the point)
you'll never see a blog post from a major tech company titled 'we successfully used our fallbacks today'. it's not glamorous. it doesn't get featured in product demos. but this kind of infrastructure is what separates a product that feels solid from one that feels brittle. it's the difference between 'lucy had a hiccup but kept going' and 'lucy is down for maintenance'. we'd rather you have a slightly less perfect conversation for a few minutes than no conversation at all.
it also means we're honest about limitations. when deepseek-v3 is back, lucy is sharper. when it's not, you get llama or qwen. we don't hide that. the goal isn't to pretend everything is flawless. it's to make sure the service itself doesn't stop.
if you're curious about the companions that rely on this system, you can always find them waiting at /companions.
thanks for reading. if this resonated, the product is downstairs.