Genomic Data Privacy

You cannot anonymise
a genome.
You can only govern the risk.

Genomic data is biologically permanent, re-identifiable by nature, and legally classified as sensitive information under the Australian Privacy Act. GenomeIQ^™applies population-level controls, family-link suppression, and rare-variant suppression to make genomic sharing defensible - not just hopeful.

Genomic formats

VCF · BAM · CRAM

Population controls

Configurable

HREC

Risk manifests

Zero

Raw PHI shipped

GenomeIQ · VCF privacy processing

BEFORE - raw genomic export

File metadata

Sample identifier · clinical phenotype

Sequencing run date

Pipeline source · reference assembly

Individual name (header)

Individual date of birth (header)

Per-record sample column

AFTER - GenomeIQ processed

File metadata

Sample identifier pseudonymised · phenotype retained

Sequencing run date

Pipeline source · reference assembly

Individual name redacted

Individual date of birth redacted

Per-record sample column pseudonymised

Cohort policy · minimum cohort size enforced

Rare variants · suppressed before release

Family links · pseudonymised before release

Why genomic privacy is harder

The variant tells you about the disease.
The dataset tells you who has it.

Genomic data cannot be made truly anonymous. A sequenced genome is a biological fingerprint - with sufficient reference data, individuals can be re-identified from variant profiles alone. The Privacy Act's definition of ‘sensitive information’ applies to genetic data regardless of whether names are attached.

GenomeIQ does not promise anonymisation. It provides systematic risk reduction - through population-level controls, rare-variant suppression, and family-link suppression - and the audit trail to demonstrate that each release meets your ethical and legal obligations.

Risk level 1Header PHI

Patient name, DOB, sample ID in VCF/BAM headers. Directly identifiable. Removed or pseudonymised by GenomeIQ.

Risk level 2Sample identity linkage

Sample IDs that can be cross-referenced against external databases (biobanks, GWAS studies). Pseudonymised with consistent study-level token.

Risk level 3Re-identification from variants

Singleton variants (appearing in only one individual in a cohort) can uniquely identify the individual. GenomeIQ filters variants below configurable frequency thresholds.

Risk level 4Pedigree inference

Family-link metadata in sample headers can enable indirect identification. GenomeIQ removes family-link metadata before release and enforces minimum cohort-size policy.

Privacy controls

Configurable risk controls for every release.

GenomeIQ applies layered controls. Each is configurable per dataset, per recipient, and per data-use agreement.

Population-level enforcement

Every released cohort must contain a minimum of k individuals sharing each combination of quasi-identifiers. GenomeIQ enforces your k threshold before any data leaves - variants below frequency f/k are automatically suppressed.

k-threshold configuration

Minimum cohort sizek ≥ 5 (configurable)

Rare variant cutoffMAF < 0.01 suppressed

Singleton filtern = 1 always removed

Header de-identification

VCF, BAM, FASTQ, and CRAM header fields pseudonymised. Sample IDs replaced with consistent cohort-level tokens.

Rare variant suppression

Variants below population frequency threshold removed before release. Threshold configurable per release profile.

Pedigree de-linking

Family relationships stripped from multi-sample files. Relatedness inference attacks mitigated by design.

Audit trail per release

Every cohort release signed with applied controls, suppressed variants count, and recipient. Exportable for HREC.

Technical specifications

Every stage of the genomic pipeline.

GenomeIQ handles data from raw sequencing output through to aggregated variant files - with controls applied at each stage.

Supported formats

VCF / gVCF · BAM / CRAM · FASTQ · BED · MAF · GFF3

Privacy model

Population-level enforcement (configurable threshold)

Variant controls

Configurable rare-variant suppression · population stratification

Sample handling

Pseudonymisation · cohort-level token · family-link suppression

Reference genome assemblies

Standard human reference assemblies

Throughput

Async cohort processing · streaming VCF support

Deployment

On-prem agent · flexible compute deployment · cloud-isolated execution

Compliance

Australian Privacy Act (sensitive information), OAIC APP 3/6/11, GA4GH Framework, HREC

GenomeIQ^™ is in development.

Join the waitlist. Research institutions, biobanks, and genomics labs get first access.

You cannot anonymisea genome.You can only govern the risk.

The variant tells you about the disease.The dataset tells you who has it.