Genomic Data Privacy

You cannot anonymise
a genome.
You can only govern the risk.

Genomic data is biologically permanent, re-identifiable by nature, and legally classified as sensitive information under the Australian Privacy Act. GenomeIQapplies population-level controls, family-link suppression, and rare-variant suppression to make genomic sharing defensible - not just hopeful.

Genomic formats
VCF · BAM · CRAM
Population controls
Configurable
HREC
Risk manifests
Zero
Raw PHI shipped
GenomeIQ · VCF privacy processing
BEFORE - raw genomic export
File metadata
Sample identifier · clinical phenotype
Sequencing run date
Pipeline source · reference assembly
Individual name (header)
Individual date of birth (header)
Per-record sample column
AFTER - GenomeIQ processed
File metadata
Sample identifier pseudonymised · phenotype retained
Sequencing run date
Pipeline source · reference assembly
Individual name redacted
Individual date of birth redacted
Per-record sample column pseudonymised
Cohort policy · minimum cohort size enforced
Rare variants · suppressed before release
Family links · pseudonymised before release
Why genomic privacy is harder

The variant tells you about the disease.
The dataset tells you who has it.

Genomic data cannot be made truly anonymous. A sequenced genome is a biological fingerprint - with sufficient reference data, individuals can be re-identified from variant profiles alone. The Privacy Act's definition of ‘sensitive information’ applies to genetic data regardless of whether names are attached.

GenomeIQ does not promise anonymisation. It provides systematic risk reduction - through population-level controls, rare-variant suppression, and family-link suppression - and the audit trail to demonstrate that each release meets your ethical and legal obligations.

Risk level 1Header PHI

Patient name, DOB, sample ID in VCF/BAM headers. Directly identifiable. Removed or pseudonymised by GenomeIQ.

Risk level 2Sample identity linkage

Sample IDs that can be cross-referenced against external databases (biobanks, GWAS studies). Pseudonymised with consistent study-level token.

Risk level 3Re-identification from variants

Singleton variants (appearing in only one individual in a cohort) can uniquely identify the individual. GenomeIQ filters variants below configurable frequency thresholds.

Risk level 4Pedigree inference

Family-link metadata in sample headers can enable indirect identification. GenomeIQ removes family-link metadata before release and enforces minimum cohort-size policy.

Privacy controls

Configurable risk controls for every release.

GenomeIQ applies layered controls. Each is configurable per dataset, per recipient, and per data-use agreement.

Population-level enforcement

Every released cohort must contain a minimum of k individuals sharing each combination of quasi-identifiers. GenomeIQ enforces your k threshold before any data leaves - variants below frequency f/k are automatically suppressed.

k-threshold configuration
Minimum cohort sizek ≥ 5 (configurable)
Rare variant cutoffMAF < 0.01 suppressed
Singleton filtern = 1 always removed

Header de-identification

VCF, BAM, FASTQ, and CRAM header fields pseudonymised. Sample IDs replaced with consistent cohort-level tokens.

Rare variant suppression

Variants below population frequency threshold removed before release. Threshold configurable per release profile.

Pedigree de-linking

Family relationships stripped from multi-sample files. Relatedness inference attacks mitigated by design.

Audit trail per release

Every cohort release signed with applied controls, suppressed variants count, and recipient. Exportable for HREC.

Technical specifications

Every stage of the genomic pipeline.

GenomeIQ handles data from raw sequencing output through to aggregated variant files - with controls applied at each stage.

Supported formats
VCF / gVCF · BAM / CRAM · FASTQ · BED · MAF · GFF3
Privacy model
Population-level enforcement (configurable threshold)
Variant controls
Configurable rare-variant suppression · population stratification
Sample handling
Pseudonymisation · cohort-level token · family-link suppression
Reference genome assemblies
Standard human reference assemblies
Throughput
Async cohort processing · streaming VCF support
Deployment
On-prem agent · flexible compute deployment · cloud-isolated execution
Compliance
Australian Privacy Act (sensitive information), OAIC APP 3/6/11, GA4GH Framework, HREC

GenomeIQ is in development.

Join the waitlist. Research institutions, biobanks, and genomics labs get first access.