Case Study

FamilySafe

How a zero-knowledge digital vault went from idea to live product through Prove → Build → Grow.

Zero-Knowledge Crypto Shamir's Secret Sharing .NET 9 / React / MySQL Kubernetes GoCardless

The product

FamilySafe is a zero-knowledge, end-to-end encrypted digital vault for personal records — finance, legal, identity, health, digital assets. Users store cards organised by category, share them selectively with family, guests and professional advisors, and nominate executors who can unlock a "probate" subset of the vault only after a verified death — and only if a configurable threshold of executors collaborate.

The product's defining promise is uncompromising: the FamilySafe servers cannot decrypt vault contents under any circumstance — not with a database dump, not under court order. The release rules are enforced cryptographically, not by policy.

That promise is also what made FamilySafe a high-risk build. If the cryptographic design didn't work end-to-end in the browser, in production, the product had no business case.

This is how we took it from idea to live product across our three engagement stages.

Stage 1

Prove

Where the idea becomes real.

Question we set out to answer: Can we deliver "the servers genuinely cannot decrypt" while still providing a usable death-unlock workflow that real executors — including non-technical family members and solicitors — can complete?

If the answer was no, there was no product. So that's where Prove started.

The riskiest feature we built

The Executor Key Management and Death-Unlock subsystem. Specifically, a working two-layer Shamir's Secret Sharing scheme implemented end-to-end:

  • Layer 1 — 2-of-2 split of the probate key: one share (KC) sealed server-side with a key that lives only in a Kubernetes Secret; the other share (SE) leaving the server entirely.
  • Layer 2 — k-of-n split of SE across executors, with the threshold configurable by the vault owner (any one, any two, all of them, or "at least N").
  • Browser-side reassembly after death verification, with the server hard-blocked from returning KC until an admin verifies a death certificate.

We built it in production-grade .NET 9 + React + MySQL — not a slide deck, not a notebook prototype. The POC ran the full flow:

  1. A vault owner generated executor shares from a real card.
  2. Shares were issued to executors (some with accounts, some by invite — which required a separate temporary keypair flow).
  3. A simulated death was reported and verified.
  4. Executors collaborated in the browser, exchanging shares out-of-band.
  5. The probate cards decrypted in the executor's browser, with the server having seen zero plaintext share material at any point.

What Prove also delivered to the client

  • Architecture and data model for the vault, sharing, and probate subsystems — the same model now in production.
  • AI-fit assessment — where AI made sense (in-product assistant, admin workflows) and where it didn't (anywhere near the cryptographic boundary).
  • Risk and feasibility analysis — including the operational risks: key rotation, stale executors, lost shares, recovery flows.
  • A roadmap broken into shippable feature slices — each one independently deployable to a live preview environment.
  • A specification the client owns — every design decision recorded in a wiki the client controls outright.

Outcome

At the end of Prove, the client had cryptographic proof — running in real code — that the product they wanted to build was buildable. Every subsequent decision was made on a foundation that was no longer hypothetical.

Stage 2

Build

Where the product gets built — feature by feature, in front of the client.

With the riskiest feature de-risked, Build turned the roadmap into a product, one feature at a time, in front of the client.

How it ran day to day

Every feature followed the same loop:

  1. Scope the next slice from the Prove roadmap.
  2. Build in real production-grade code.
  3. Push to the live preview environment — a private URL running 24/7 that the client and their stakeholders could use the moment a feature landed.
  4. Sign off in the preview, with the product board updated live.
  5. Move to the next slice.
The client never had to ask "what are you working on?" or "when can I see it?" — both questions were answered in real time by the preview URL and the live board.

The feature slices we shipped

The full FamilySafe platform was built as a sequence of slices, each shippable and reviewable independently:

Cryptographic core

  • Browser-side key derivation (Argon2id → K0), random VMK generation, asymmetric keypair for sharing.
  • Per-card K1 encryption, category-level KGroup, vault-wide KGlobal, probate-level KProbate.
  • Client-side storage policy enforced (K0 in RAM only, VMK in SessionStorage, JWT in HttpOnly cookies — never the wrong combinations).

Authentication and 2FA

  • Two-stage login: password validates, then a separate factor unlocks. Vault decryption gated behind both.
  • Four real factors shipped: Passkey (recommended), TOTP, Email OTP, Recovery Codes.
  • DB-backed endpoint authorisation — every API route loaded into an EndpointCache at boot; routes not seeded are blocked at the middleware.
  • Non-live DEV_PASSWORD_ONLY bypass for testing, gated to localhost/preview hosts and a config flag — physically incapable of activating on live.

Vault and sharing

  • Categories, cards, fields, attachments, reminders, profile photos.
  • Sharing model with three table primitives (key_templates / key_user_defaults / key_share) covering single-card, category, and global shares.
  • Family member, guest, and advisor invite flows — including the asynchronous case where a recipient is invited before they have an account, with the owner finalising the encryption on next login (cipher/iv/tag populated then).
  • Delegated VMK ("elderly assist") — a deliberate exception for write-delegation, designed and labelled as such.

Probate and executor management

  • Executor management screen with add/remove/resend, threshold configuration (k of n), and Generate Keys regeneration with old-share revocation.
  • Executor portal — case list, case detail, my-share retrieval, assembly session, audit log view.
  • Death-report submission with admin verification queue.
  • Hard-blocked /kc endpoint — the gate that enforces "cannot decrypt before death is verified" at the network layer, not just the UI.
  • Rekey on every executor change — re-generate PK, re-encrypt probate cards, re-split shares, revoke old, rotate kc_enc.

Billing and payments

  • GoCardless integration with the company-correct architecture: FamilySafe is the source of truth, GoCardless is a collection rail. No recurring schedules in GoCardless; everything anchored to a billing_anchor_at that never drifts.
  • Daily billing CronJob on Kubernetes, with double-billing protection via a last_billed_at check.
  • Webhook-driven payment lifecycle: payments.confirmed, failed, cancelled, paid_out, plus mandate lifecycle. Signature validation done correctly: HMAC-SHA256 over the raw body, lowercase hex, constant-time compare.
  • One-off payments via Billing Requests + hosted page for upgrades like Lifetime.

Email infrastructure

  • Database-backed queue with a stateless worker pod model: workers carry no DB credentials and call internal-only API endpoints to claim work, render templates with Scriban, send via the SMTP relay (Brevo), and report results.
  • Crash-safe processing — rows stuck in Processing past LOCK_TIMEOUT_MINUTES are requeued by sp_email_queue_requeue_stuck and may be claimed by another worker.
  • Templates ship inside the worker image, versioned with releases — prevents the "what version of the template went out yesterday?" problem.

Backups and disaster recovery

  • Kubernetes CronJob that uses mysqldump --single-transaction --hex-blob (the --hex-blob flag is mandatory for FamilySafe's BINARY(16) UUIDs and encrypted VARBINARY columns — without it, restores corrupt).
  • Output gzipped, uploaded to Wasabi S3 (eu-west-3), retained 14 days. Live cadence: every 2 hours, full database.
  • Read-only backup_user MySQL account; Wasabi credentials are write-only; both in a single Kubernetes Secret.

Admin and operations

  • Vault administration screen with family/advisor/pending-invite tabs.
  • User management with admin-controlled MFA disable (audited via sp_add_auth_security_event).
  • Activity log + event-log viewer.
  • Profile administration with full self-service 2FA management.

Transparency artifacts the client kept throughout Build

  • Live preview URL — every feature visible the moment it merged.
  • Live product board — done, in progress, queued, planned. No status meetings.
  • Wiki of architecture decisions — owned by the client, written as the work happened.
  • Endpoint-by-endpoint cross-reference — every screen mapped to its API calls and stored procedures, so the client could audit complexity in any area without reading code.

Outcome

A production-deployable FamilySafe — fully encrypted, fully tested, fully auditable — handed off to a Kubernetes cluster on Civo with Traefik v2 ingress, Cert-Manager managing Let's Encrypt certs, and DNS pointing at a single LoadBalancer IP serving four domains.

The client knew, the day Build ended, exactly what was running and exactly how it worked — because they had been watching it grow for months.

Stage 3

Grow

Where the product stays alive and gets bigger.

FamilySafe is live. We run it. Same feature-led philosophy — small features, short cycles, continuous progress — alongside everything that keeps the product alive and growing.

What we run, every day

Infrastructure

  • The full Kubernetes cluster on our managed estate — namespace, ingress, certificates, deployments for API, vault frontend, marketing website, email worker, and MySQL.
  • Cert-Manager + Let's Encrypt rotation for www.familysafe.co.uk, api.familysafe.co.uk, vault.familysafe.co.uk, and the company brand domain.
  • Wasabi-backed offsite backups on a 2-hour live cadence (full database), with a documented restore-to-test procedure run on a schedule.

Operational hygiene

  • Continuous security patching (base images, .NET runtime, Node, MySQL, kubectl, the lot).
  • MySQL service-account token rotation for ops access.
  • Brevo SMTP credential rotation tied to ops events.
  • Webhook-secret rotation between sandbox and live.

Continuous product growth — feature-led, same as Build

Recent and in-flight slices since launch:

  • AI Chatbot — Phase 1 — an in-product FamilySafe Assistant that answers user questions grounded in their own vault context, with the same zero-knowledge respect as the rest of the product (no plaintext leaves the user's cryptographic boundary).
  • Endpoint caching — recent merge to reduce p95 latency on hot routes.
  • 2FA mail copy refinement — small, frequent UX improvements driven by real user feedback from live.

Each one ships to the preview environment first, gets signed off, then promotes to live — same loop as Build, just continuous.

Monitoring and dashboards

  • Pod health, ingress error rates, MySQL slow-query surfacing.
  • Backup-success monitoring with paged alerts on missed runs.
  • Email queue age and failure-rate dashboards (oldest pending, retry rates by category).
  • Webhook delivery health (signature failures, replay attempts, duplicate handling).

The chatbot we own

  • Prompt tuning, content updates, escalation flows, and conversation review — all part of the retainer, no separate "AI work" line item.
  • New AI features added as feature slices when they make sense (the chatbot is Phase 1; phases beyond are scoped from real usage data, not speculation).

Where Grow is heading next for FamilySafe

  • AI marketing agents — lead generation, outreach, content drafting, social scheduling — drawing on the same anonymised, aggregated product data we already monitor.
  • AI-tuned business dashboards — package conversion, churn signals, cost per active vault, agent performance.
  • AWS cost engineering as we migrate select workloads.

What this case study proves

Prove works because we build the riskiest thing first.

FamilySafe's riskiest thing — a server that genuinely cannot decrypt its own customer data — is also the thing that makes it a viable product. Building that first meant the client never paid to discover, six months in, that the foundation didn't hold.

Build works because the client never has to ask "where is it?"

Every feature went to the preview environment the day it merged. Every status was on the board, in real time. The client has the entire product wiki on their own infrastructure. There were no surprises — and no $50k bills for "discovery work" that never materialised.

Grow works because the team that built it is the team that runs it.

There was no handover gap, no "we'll need to ramp up on the codebase" delay. Day-one of Grow looked exactly like day-N of Build — small features, short cycles, shipped to preview, signed off, promoted. The product gets faster, smarter, and bigger every month, and the client builds none of it.

Stage scope summary, against this product

StageWhat we delivered for FamilySafe
Prove End-to-end working death-unlock crypto in production code; full architecture; data model; risk analysis; shippable roadmap; client-owned spec.
Build Full vault platform — auth/2FA, vault encryption, sharing, probate, executor management, billing, email infrastructure, backups, admin tools — built feature by feature, visible in preview throughout.
Grow Live infrastructure on our estate; continuous security and patching; ongoing feature slices (AI chatbot, performance, UX); monitoring and dashboards; chatbot ownership; planned AI marketing agents and AWS cost engineering.

Investment shape

Prove

Up to £15,000. Fixed, scoped before work starts.

Build

Scoped from the Prove output, fixed before work starts.

Grow

Monthly retainer, sized to the product.

For FamilySafe-scale products — multiple cryptographic primitives, payment integration, asynchronous worker infrastructure, multi-tenant sharing, a probate workflow with regulatory implications, live infrastructure on the partner's estate — Grow is sized at a level that costs less than hiring a single mid-level engineer and delivers an entire team plus the platform they run on.

Want to talk?

If your product has a riskiest feature you'd rather not discover the truth about in month nine — that's a Prove engagement.

If you've got the proof and you want to ship without status meetings or ten-page weekly reports — that's a Build engagement.

If you want a team that builds the thing, runs the thing, and grows the thing — without you ever owning a Kubernetes cluster — that's Grow.

Start a conversation