Coming Soon · Join the waitlist
Genomic Data Privacy

You cannot anonymise
a genome.

You can only govern the risk.

Genomic data is biologically permanent, re-identifiable by nature, and legally classified as sensitive information under the Australian Privacy Act. GenomeIQ applies k-anonymity controls, pedigree suppression, and singleton filtering to make genomic sharing defensible - not just hopeful.

GenomeIQ · VCF privacy processing
BEFORE - raw VCF header
##fileformat=VCFv4.2
##SAMPLE=<ID=NA12878,Phenotype=AML>
##fileDate=20240814
##source=IlluminaDRAGEN 4.2
##reference=GRCh38
##INDIVIDUAL=<Name=██████████████>
##INDIVIDUAL=<DOB=19█████>
#CHROM POS REF ALT SAMPLE_NA12878
AFTER - GenomeIQ processed
##fileformat=VCFv4.2
##SAMPLE=<ID=PSEUDO-2E9F4A,Phenotype=RETAINED>
##fileDate=20240814
##source=IlluminaDRAGEN 4.2
##reference=GRCh38
##INDIVIDUAL=<Name=REDACTED>
##INDIVIDUAL=<DOB=REDACTED>
#CHROM POS REF ALT PSEUDO-2E9F4A
k = 5 · minimum cohort size enforced
3 singletons · filtered before release
Pedigree · family links pseudonymised
Why genomic privacy is harder

The variant tells you about the disease.
The dataset tells you who has it.

Genomic data cannot be made truly anonymous. A sequenced genome is a biological fingerprint - with sufficient reference data, individuals can be re-identified from variant profiles alone. The Privacy Act's definition of ‘sensitive information’ applies to genetic data regardless of whether names are attached.

GenomeIQ does not promise anonymisation. It provides systematic risk reduction - through k-anonymity thresholds, rare variant suppression, and pedigree de-linking - and the audit trail to demonstrate that each release meets your ethical and legal obligations.

Risk level 1Header PHI

Patient name, DOB, sample ID in VCF/BAM headers. Directly identifiable. Removed or pseudonymised by GenomeIQ.

Risk level 2Sample identity linkage

Sample IDs that can be cross-referenced against external databases (biobanks, GWAS studies). Pseudonymised with consistent study-level token.

Risk level 3Re-identification from variants

Singleton variants (appearing in only one individual in a cohort) can uniquely identify the individual. GenomeIQ filters variants below configurable frequency thresholds.

Risk level 4Pedigree inference

Family relationships encoded in sample relationships reveal indirect identity. GenomeIQ de-links pedigree structures before release and enforces minimum cohort size (k-anonymity).

Privacy controls

Configurable risk controls for every release.

GenomeIQ applies layered controls. Each is configurable per dataset, per recipient, and per data-use agreement.

k-Anonymity enforcement

Every released cohort must contain a minimum of k individuals sharing each combination of quasi-identifiers. GenomeIQ enforces your k threshold before any data leaves - variants below frequency f/k are automatically suppressed.

k-threshold configuration
Minimum cohort sizek ≥ 5 (configurable)
Rare variant cutoffMAF < 0.01 suppressed
Singleton filtern = 1 always removed

Header de-identification

VCF, BAM, FASTQ, and CRAM header fields pseudonymised. Sample IDs replaced with consistent cohort-level tokens.

Rare variant suppression

Variants below population frequency threshold removed before release. Threshold configurable per release profile.

Pedigree de-linking

Family relationships stripped from multi-sample files. Relatedness inference attacks mitigated by design.

Audit trail per release

Every cohort release signed with applied controls, suppressed variants count, and recipient. Exportable for HREC.

Technical specifications

Every stage of the genomic pipeline.

GenomeIQ handles data from raw sequencing output through to aggregated variant files - with controls applied at each stage.

Supported formats
VCF / gVCF · BAM / CRAM · FASTQ · BED · MAF · GFF3
Privacy model
k-Anonymity enforcement (configurable threshold)
Variant controls
MAF threshold · singleton suppression · rare variant filter · population stratification
Sample handling
Pseudonymisation · cohort-level token · pedigree de-linking
Reference genomes
GRCh37 / GRCh38 / T2T-CHM13
Throughput
Async cohort processing · streaming VCF support
Deployment
On-prem agent · flexible compute deployment · cloud-isolated execution
Compliance
Australian Privacy Act (sensitive information), OAIC APP 3/6/11, GA4GH Framework, HREC
Coming Soon

GenomeIQ is in development.

Join the waitlist. Research institutions, biobanks, and genomics labs get first access.