← Back to Guides ← BioTransfer Home
Security

Zero-Knowledge Encryption for Genomics: Why It Matters

Why standard cloud encryption falls short for genomic data — and what zero-knowledge architecture actually means in practice.

The Unique Security Problem of Genomic Data

Genomic data is unlike any other type of sensitive information. A credit card number can be cancelled and reissued. A passport can be replaced. A genome cannot. Every person's DNA is a permanent, immutable identifier — and one that reveals information not just about the individual, but about their biological relatives who never consented to any study.

As sequencing costs have collapsed — from $3 billion for the first human genome in 2003 to under $200 today — the volume of genomic data being generated, shared, and stored has grown exponentially. With this growth has come an escalating risk landscape. Several high-profile studies have demonstrated that genomic data can be re-identified from supposedly anonymised datasets using publicly available reference panels. A 2013 Science paper showed that whole-genome sequences could be used to identify individuals within a few steps using only public genealogy databases.

This means that the standard approach to data protection — removing names and obvious identifiers before sharing — is insufficient for genomic data. The data itself is the identifier.

Why "Encryption at Rest" Is Not Enough

Most cloud storage providers encrypt your data at rest. Google Drive, Dropbox, AWS S3 — they all encrypt files stored on their servers. This sounds reassuring, but there is a critical limitation: the provider holds the encryption keys.

This model, sometimes called "encryption at rest with provider-managed keys," protects against one specific threat: a physical intruder stealing a hard drive from a data centre. It does not protect against:

For ordinary files, provider-managed encryption is usually acceptable. For genomic sequences from human subjects, it is not — particularly when HIPAA or GDPR requires that a covered entity ensure the confidentiality of PHI even from third-party service providers.

What Zero-Knowledge Encryption Actually Means

Zero-knowledge encryption means that the service provider has zero knowledge of the content of your files. The encryption happens on your device (or in your browser) before the data is transmitted to any server. The provider stores only ciphertext — encrypted data that is mathematically indistinguishable from random noise without the key.

Crucially, the provider never receives the encryption key. This is the defining property: the service cannot decrypt your data, even if compelled to by a court order, even if their servers are breached, and even if a malicious employee tries to access it.

This architecture is the basis for tools like Signal (for messaging), ProtonMail (for email), and BioTransfer (for file sharing). In each case, encryption happens client-side before any data touches the provider's infrastructure.

AES-GCM-256: The Algorithm Behind BioTransfer's Encryption

BioTransfer uses AES-GCM-256 (Advanced Encryption Standard with Galois/Counter Mode, 256-bit key) implemented via the browser's native Web Crypto API. Here is why this matters:

AES-256: Computationally Unbreakable

AES-256 uses a 256-bit key, which means there are 2^256 possible keys — roughly 1.15 × 10^77. Even if every atom in the observable universe were a computer performing a trillion operations per second, exhaustively searching all possible keys would take longer than the age of the universe. AES-256 is the encryption standard approved by NIST for top-secret US government data and is widely accepted as computationally secure against all known attacks.

GCM Mode: Authentication + Encryption

GCM (Galois/Counter Mode) adds authenticated encryption to AES. This means it not only encrypts data for confidentiality but also produces an authentication tag that verifies the data has not been tampered with in transit. Any modification to the ciphertext — even a single bit flip — will cause decryption to fail with an authentication error rather than silently producing corrupted output. For genomic data, where a corrupted base call could invalidate an entire analysis, this integrity guarantee is essential.

Web Crypto API: Browser-Native Security

The Web Crypto API is a browser-native cryptography interface implemented in C++ within the browser engine itself. Unlike JavaScript cryptography libraries, Web Crypto operations run in a secure context that is isolated from the page's JavaScript. This prevents a class of attacks where malicious scripts on a page could intercept cryptographic keys. BioTransfer uses crypto.subtle.generateKey() and crypto.subtle.encrypt() — operations that are part of the browser's trusted computing base.

How BioTransfer's Zero-Knowledge Architecture Works in Practice

When you enable Secure Transfer mode in BioTransfer, here is exactly what happens:

Threat Model: What Zero-Knowledge Encryption Protects Against

Zero-knowledge encryption does not protect against threats on the sender's or recipient's own device (malware, keyloggers, screen capture). It also does not replace proper data governance — you still need appropriate agreements and institutional approval before sharing human-subject data with collaborators.

Try zero-knowledge encrypted file sharing.

AES-GCM-256 encryption in your browser. Free tier available, no account required.

Start a Secure Transfer
Related
Secure FASTQ & BAM File Transfers
Related
HIPAA & GDPR Compliance for Researchers
Related
RNA-Seq Data Sharing Best Practices