Zero-Knowledge Encryption for Genomics: Why It Matters

The Unique Security Problem of Genomic Data

Genomic data is unlike any other type of sensitive information. A credit card number can be cancelled and reissued. A passport can be replaced. A genome cannot. Every person's DNA is a permanent, immutable identifier — and one that reveals information not just about the individual, but about their biological relatives who never consented to any study.

As sequencing costs have collapsed — from $3 billion for the first human genome in 2003 to under $200 today — the volume of genomic data being generated, shared, and stored has grown exponentially. With this growth has come an escalating risk landscape. Several high-profile studies have demonstrated that genomic data can be re-identified from supposedly anonymised datasets using publicly available reference panels. A 2013 Science paper showed that whole-genome sequences could be used to identify individuals within a few steps using only public genealogy databases.

This means that the standard approach to data protection — removing names and obvious identifiers before sharing — is insufficient for genomic data. The data itself is the identifier.

Why "Encryption at Rest" Is Not Enough

Most cloud storage providers encrypt your data at rest. Google Drive, Dropbox, AWS S3 — they all encrypt files stored on their servers. This sounds reassuring, but there is a critical limitation: the provider holds the encryption keys.

This model, sometimes called "encryption at rest with provider-managed keys," protects against one specific threat: a physical intruder stealing a hard drive from a data centre. It does not protect against:

The provider's employees accessing your data (insider threat)
Government subpoenas compelling the provider to hand over data
Breaches that compromise the provider's key management infrastructure
The provider scanning your data for compliance, advertising, or AI training purposes

For ordinary files, provider-managed encryption is usually acceptable. For genomic sequences from human subjects, it is not — particularly when HIPAA or GDPR requires that a covered entity ensure the confidentiality of PHI even from third-party service providers.

What Zero-Knowledge Encryption Actually Means

Zero-knowledge encryption means that the service provider has zero knowledge of the content of your files. The encryption happens on your device (or in your browser) before the data is transmitted to any server. The provider stores only ciphertext — encrypted data that is mathematically indistinguishable from random noise without the key.

Crucially, the provider never receives the encryption key. This is the defining property: the service cannot decrypt your data, even if compelled to by a court order, even if their servers are breached, and even if a malicious employee tries to access it.

This architecture is the basis for tools like Signal (for messaging), ProtonMail (for email), and BioTransfer (for file sharing). In each case, encryption happens client-side before any data touches the provider's infrastructure.

AES-GCM-256: The Algorithm Behind BioTransfer's Encryption

BioTransfer uses AES-GCM-256 (Advanced Encryption Standard with Galois/Counter Mode, 256-bit key) implemented via the browser's native Web Crypto API. Here is why this matters:

AES-256: Computationally Unbreakable

AES-256 uses a 256-bit key, which means there are 2^256 possible keys — roughly 1.15 × 10^77. Even if every atom in the observable universe were a computer performing a trillion operations per second, exhaustively searching all possible keys would take longer than the age of the universe. AES-256 is the encryption standard approved by NIST for top-secret US government data and is widely accepted as computationally secure against all known attacks.

GCM Mode: Authentication + Encryption

GCM (Galois/Counter Mode) adds authenticated encryption to AES. This means it not only encrypts data for confidentiality but also produces an authentication tag that verifies the data has not been tampered with in transit. Any modification to the ciphertext — even a single bit flip — will cause decryption to fail with an authentication error rather than silently producing corrupted output. For genomic data, where a corrupted base call could invalidate an entire analysis, this integrity guarantee is essential.

Web Crypto API: Browser-Native Security

The Web Crypto API is a browser-native cryptography interface implemented in C++ within the browser engine itself. Unlike JavaScript cryptography libraries, Web Crypto operations run in a secure context that is isolated from the page's JavaScript. This prevents a class of attacks where malicious scripts on a page could intercept cryptographic keys. BioTransfer uses crypto.subtle.generateKey() and crypto.subtle.encrypt() — operations that are part of the browser's trusted computing base.

How BioTransfer's Zero-Knowledge Architecture Works in Practice

When you enable Secure Transfer mode in BioTransfer, here is exactly what happens:

Step 1 — Key generation: The browser generates a unique 256-bit AES key using crypto.subtle.generateKey(). This key never leaves your browser in plaintext.
Step 2 — Chunked encryption: For large files (100 GB+), the file is processed in 5 MB chunks. Each chunk is encrypted with a unique Initialisation Vector (IV) derived from the chunk sequence number. This prevents IV reuse — a critical requirement for GCM security.
Step 3 — Upload: Encrypted ciphertext chunks are uploaded directly to Cloudflare R2 via multipart upload. BioTransfer's server receives only ciphertext — never the plaintext file or the encryption key.
Step 4 — Key in URL fragment: The encryption key is encoded and placed in the URL fragment (the part after #). Critically, URL fragments are never sent to servers by browsers — they exist only in the client. Even BioTransfer's server does not receive the key when the download link is opened.
Step 5 — Recipient decryption: The recipient opens the link. Their browser reads the key from the URL fragment, downloads the encrypted chunks, and decrypts them client-side in real time. The recipient sees the original file — no software installation required.

Threat Model: What Zero-Knowledge Encryption Protects Against

✅ BioTransfer infrastructure breach — attackers get ciphertext only
✅ Government subpoena to BioTransfer — company cannot produce plaintext
✅ Insider threat at BioTransfer — employees see only encrypted data
✅ Man-in-the-middle attacks — GCM authentication detects tampering
✅ Network interception — TLS + AES-GCM provides layered protection

Zero-knowledge encryption does not protect against threats on the sender's or recipient's own device (malware, keyloggers, screen capture). It also does not replace proper data governance — you still need appropriate agreements and institutional approval before sharing human-subject data with collaborators.