You cannot anonymise
a genome.
You can only govern the risk.
Genomic data is biologically permanent, re-identifiable by nature, and legally classified as sensitive information under the Australian Privacy Act. GenomeIQ applies k-anonymity controls, pedigree suppression, and singleton filtering to make genomic sharing defensible - not just hopeful.
The variant tells you about the disease.
The dataset tells you who has it.
Genomic data cannot be made truly anonymous. A sequenced genome is a biological fingerprint - with sufficient reference data, individuals can be re-identified from variant profiles alone. The Privacy Act's definition of ‘sensitive information’ applies to genetic data regardless of whether names are attached.
GenomeIQ does not promise anonymisation. It provides systematic risk reduction - through k-anonymity thresholds, rare variant suppression, and pedigree de-linking - and the audit trail to demonstrate that each release meets your ethical and legal obligations.
Patient name, DOB, sample ID in VCF/BAM headers. Directly identifiable. Removed or pseudonymised by GenomeIQ.
Sample IDs that can be cross-referenced against external databases (biobanks, GWAS studies). Pseudonymised with consistent study-level token.
Singleton variants (appearing in only one individual in a cohort) can uniquely identify the individual. GenomeIQ filters variants below configurable frequency thresholds.
Family relationships encoded in sample relationships reveal indirect identity. GenomeIQ de-links pedigree structures before release and enforces minimum cohort size (k-anonymity).
Configurable risk controls for every release.
GenomeIQ applies layered controls. Each is configurable per dataset, per recipient, and per data-use agreement.
k-Anonymity enforcement
Every released cohort must contain a minimum of k individuals sharing each combination of quasi-identifiers. GenomeIQ enforces your k threshold before any data leaves - variants below frequency f/k are automatically suppressed.
Header de-identification
VCF, BAM, FASTQ, and CRAM header fields pseudonymised. Sample IDs replaced with consistent cohort-level tokens.
Rare variant suppression
Variants below population frequency threshold removed before release. Threshold configurable per release profile.
Pedigree de-linking
Family relationships stripped from multi-sample files. Relatedness inference attacks mitigated by design.
Audit trail per release
Every cohort release signed with applied controls, suppressed variants count, and recipient. Exportable for HREC.
Every stage of the genomic pipeline.
GenomeIQ handles data from raw sequencing output through to aggregated variant files - with controls applied at each stage.