How to Securely Transfer FASTQ and BAM Files

Why FASTQ and BAM Transfers Are Uniquely Challenging

Next-generation sequencing (NGS) has transformed biological research, but it has also introduced a major logistical challenge: moving enormous files between labs, core facilities, and compute clusters. A single whole-genome sequencing run can generate FASTQ files exceeding 100 GB. A paired-end RNA-Seq experiment with multiple samples can easily reach several hundred gigabytes.

Standard file transfer methods — email attachments, FTP, or consumer cloud drives — simply were not built for this scale. Email has attachment limits of 25 MB. FTP transfers frequently drop at large file sizes without built-in resumption. Consumer cloud drives throttle upload speeds and often compress or alter file integrity.

Beyond size, sequencing data carries sensitivity. FASTQ files from human subjects can contain re-identifiable genomic information. BAM files aligned to a reference genome are even more information-dense. Any transfer method used for this data must guarantee both integrity and, where required, confidentiality.

Understanding FASTQ and BAM File Formats

Before choosing a transfer method, it helps to understand what you are actually moving:

FASTQ (.fastq, .fq, .fastq.gz): Raw sequencing reads with quality scores. These are the direct output from sequencers like Illumina or Oxford Nanopore. Compressed FASTQ files (.fastq.gz) are standard, but even compressed, a single sample can exceed 10–50 GB.
BAM (.bam) and CRAM (.cram): Aligned reads mapped to a reference genome. BAM files are binary and dense — a 30x whole-genome BAM is typically 60–120 GB. CRAM is a more compressed alternative but requires a reference genome for decompression.
Index files (.bai, .crai): BAM and CRAM files require companion index files for random access. Always transfer these alongside the primary file.
FAST5 / POD5: Raw signal files from Oxford Nanopore sequencers. These are extremely large — a single flow cell can produce terabytes of FAST5 data.

The Problem With Traditional Transfer Methods

FTP and SFTP

FTP was designed for an era when files were kilobytes, not gigabytes. While SFTP adds encryption, it offers no native support for resuming interrupted transfers. A dropped connection mid-way through a 50 GB FASTQ upload means starting over. SFTP also requires IT infrastructure — SSH keys, server configuration, firewall rules — which creates friction in academic collaborations.

Aspera / Globus

Tools like IBM Aspera and Globus are purpose-built for large scientific data transfers and work well within institutional environments. However, they require software installation on both sender and receiver endpoints, institutional licensing, and administrator access to configure endpoints. For ad-hoc collaborations between labs at different institutions, this setup overhead is prohibitive.

Consumer Cloud Drives

Google Drive, Dropbox, and OneDrive are convenient but problematic for research data. Upload speeds are throttled, file size limits apply, and — critically — the service provider holds decryption keys to your data. This creates a compliance risk for any data governed by HIPAA, GDPR, or IRB protocols.

Key Requirements for Research-Grade File Transfer

When evaluating any tool for transferring sequencing data, look for these capabilities:

Multi-part upload: Large files should be split into chunks and uploaded in parallel. This dramatically improves speed and allows resumption if a part fails.
Integrity verification: An MD5 or SHA checksum should be computed on the sender's machine and verified on receipt. Without this, you cannot confirm the file was not corrupted in transit.
No file size limit (or very high limit): A 1 TB limit is the practical minimum for genomics workflows.
Encryption in transit: TLS for all transfers is baseline. For sensitive data, end-to-end encryption (where the provider cannot read your data) is the gold standard.
No software installation required: Collaborators at other institutions should be able to receive files without installing anything.

How BioTransfer Handles FASTQ and BAM Files

BioTransfer was designed specifically around these requirements. The upload process uses S3-compatible multipart upload via Cloudflare R2. Files are automatically split into 5 MB chunks, each uploaded in parallel directly from the browser to Cloudflare's edge network. If a chunk fails, only that chunk is retried — not the entire file.

Before upload begins, BioTransfer computes an MD5 checksum of the file client-side using SparkMD5. This checksum is stored alongside the transfer record. Recipients can verify the checksum after download to confirm the file arrived exactly as sent — a critical step for any downstream bioinformatics pipeline where data integrity affects results.

For sensitive sequencing data, BioTransfer's Secure Transfer mode adds AES-GCM-256 end-to-end encryption. The encryption happens entirely in the browser using the Web Crypto API. The encryption key is embedded in the share link URL fragment — a part of the URL that is never sent to the server. Even BioTransfer's own infrastructure cannot read the contents of an encrypted transfer.

Step-by-Step: Transferring a FASTQ File Securely

Step 1: Go to BioTransfer and select your FASTQ file (or drag and drop multiple files for a batch transfer).
Step 2: Toggle Secure Transfer (E2EE) if the data involves human subjects or is governed by a data use agreement.
Step 3: Enter the recipient's email. BioTransfer will notify them automatically.
Step 4: The file uploads directly to Cloudflare R2 from your browser — your server never touches the data.
Step 5: Share the download link. If using Secure mode, the link contains the decryption key fragment — share it through a secure channel.
Step 6: The recipient opens the link, the file downloads and decrypts in their browser. No account or software installation required.

Best Practices for BAM File Transfers

BAM files require a few extra considerations beyond raw FASTQ data:

Always transfer the .bai index file alongside the .bam file. Use BioTransfer's batch/folder transfer feature to keep them together.
If transferring CRAM files, note that the receiver will need the same reference genome build (e.g., hg38) to decompress. Include a README in the batch with the reference genome version used.
For very large BAMs (200 GB+), consider splitting by chromosome using samtools view -b -o chr1.bam input.bam chr1 before transfer if only specific regions are needed by the recipient.
Always verify the MD5 checksum after download before running any variant calling or downstream analysis.