A practical guide for researchers and bioinformaticians handling next-generation sequencing data.
Next-generation sequencing (NGS) has transformed biological research, but it has also introduced a major logistical challenge: moving enormous files between labs, core facilities, and compute clusters. A single whole-genome sequencing run can generate FASTQ files exceeding 100 GB. A paired-end RNA-Seq experiment with multiple samples can easily reach several hundred gigabytes.
Standard file transfer methods — email attachments, FTP, or consumer cloud drives — simply were not built for this scale. Email has attachment limits of 25 MB. FTP transfers frequently drop at large file sizes without built-in resumption. Consumer cloud drives throttle upload speeds and often compress or alter file integrity.
Beyond size, sequencing data carries sensitivity. FASTQ files from human subjects can contain re-identifiable genomic information. BAM files aligned to a reference genome are even more information-dense. Any transfer method used for this data must guarantee both integrity and, where required, confidentiality.
Before choosing a transfer method, it helps to understand what you are actually moving:
FTP was designed for an era when files were kilobytes, not gigabytes. While SFTP adds encryption, it offers no native support for resuming interrupted transfers. A dropped connection mid-way through a 50 GB FASTQ upload means starting over. SFTP also requires IT infrastructure — SSH keys, server configuration, firewall rules — which creates friction in academic collaborations.
Tools like IBM Aspera and Globus are purpose-built for large scientific data transfers and work well within institutional environments. However, they require software installation on both sender and receiver endpoints, institutional licensing, and administrator access to configure endpoints. For ad-hoc collaborations between labs at different institutions, this setup overhead is prohibitive.
Google Drive, Dropbox, and OneDrive are convenient but problematic for research data. Upload speeds are throttled, file size limits apply, and — critically — the service provider holds decryption keys to your data. This creates a compliance risk for any data governed by HIPAA, GDPR, or IRB protocols.
When evaluating any tool for transferring sequencing data, look for these capabilities:
BioTransfer was designed specifically around these requirements. The upload process uses S3-compatible multipart upload via Cloudflare R2. Files are automatically split into 5 MB chunks, each uploaded in parallel directly from the browser to Cloudflare's edge network. If a chunk fails, only that chunk is retried — not the entire file.
Before upload begins, BioTransfer computes an MD5 checksum of the file client-side using SparkMD5. This checksum is stored alongside the transfer record. Recipients can verify the checksum after download to confirm the file arrived exactly as sent — a critical step for any downstream bioinformatics pipeline where data integrity affects results.
For sensitive sequencing data, BioTransfer's Secure Transfer mode adds AES-GCM-256 end-to-end encryption. The encryption happens entirely in the browser using the Web Crypto API. The encryption key is embedded in the share link URL fragment — a part of the URL that is never sent to the server. Even BioTransfer's own infrastructure cannot read the contents of an encrypted transfer.
BAM files require a few extra considerations beyond raw FASTQ data:
.bai index file alongside the .bam file. Use BioTransfer's batch/folder transfer feature to keep them together.samtools view -b -o chr1.bam input.bam chr1 before transfer if only specific regions are needed by the recipient.Free tier available. No account required. Works with files up to 2 GB — Pro plans for 1 TB.
Start Uploading Now