Genomics Practices

The variant is stripped.
The genome is not.

Australian Genomics, Garvan Institute, and the research partners pitching you consortium access all want the same thing: VCF and BAM files that are genuinely safe to share. The problem is that genomic data is inherently re-identifiable. Removing the sample ID reduces risk - it does not eliminate it. GenomeIQ applies population-level k-anonymity enforcement, singleton suppression, and pedigree de-linking before any file moves outside your sequencing environment.

VCF
/ BAM / FASTQ
k-anon
Population controls
OAIC
APP 11 aligned
GenomeIQ · VCF processing stream
##SAMPLE=IDANON-7f3a91pseudonymised
##fileDate20240000pseudonymised
##source=Lab██████████stripped
##referenceGRCh38retained
CHROM/POS/REF/ALTRetained (filtered)retained
k-anon checkPopulation ≥ 100retained
Singleton variantsSuppressed (risk)stripped
Pedigree links██████████████stripped
VCF 4.2 · GRCh38 · k-anon threshold: 100k-anon verified
What changes

What GenomeIQ replaces.

Everything you are currently doing to satisfy your HREC - and hoping is enough.

The old way
Scrub sample IDs and assume the HREC will accept that as de-identification
With GenomeIQ
k-anonymity enforced at the variant level. Signed manifest with suppression counts included on every export.
The old way
Send VCF files to Australian Genomics consortium with no formal risk quantification
With GenomeIQ
Policy-gated export. Consortium receives a clean file with a methodology document they can reference for their own governance.
The old way
AI vendor wants BAM files - no way to prove the genetic signal was controlled
With GenomeIQ
Re-identification risk score documented in the export manifest. Methodology referenced against NHMRC framework.
The old way
Family study pedigree data scrubbed entirely - collaborators can't use it
With GenomeIQ
Pedigree de-linked in export. Pseudonymous map retained for re-link under controlled access. Research integrity preserved.
The old way
Responding to an OAIC or HREC inquiry by reconstructing from LIMS logs
With GenomeIQ
Immutable, timestamped audit log. Every processing decision signed and exportable within hours.
Day one

Three things that change immediately.

Deployed in your sequencing environment

GenomeIQ Agent installed on-prem, connected to your LIMS or sequencer output directory. No raw genomic data touches external infrastructure. Half-day engagement.

Half-day engagement

k-threshold locked to your HREC methodology

Your HREC-approved k parameter is configured in GenomeIQ at deployment. Every subsequent export enforces it automatically - no manual review, no researcher override.

Config locked to HREC document

Risk manifest on every export

Each file release includes a signed methodology manifest with suppression counts, re-identification risk score, and NHMRC framework reference. Ready for consortium submission or HREC audit.

Structured, exportable report
The fundamental problem

You cannot anonymise a genome.
You can only control the risk.

Genomic data is intrinsically identifying. Removing metadata - sample ID, DOB, clinic reference - reduces the risk but does not eliminate it. Population-level k-anonymity controls, singleton suppression, and lineage tracking are required. GenomeIQ applies all three.

Standard VCF de-id coverage
Header metadatacovered
Sample ID pseudonymisationmissed
Singleton suppressionmissed
Pedigree de-linkingmissed
k-anon risk scoringmissed
How it works

The GenomeIQ pipeline.

01
File Ingest
VCF, BAM, or FASTQ received from sequencer or LIMS
02
Header Scrub
All metadata fields processed - sample ID pseudonymised
03
Risk Scoring
k-anonymity evaluated, singleton variants flagged
04
Policy Check
Routing validated against data-use agreement
05
Governed Output
Risk-controlled file delivered. Audit entry written.
Capabilities

What GenomeIQ covers.

k-anonymity risk controls

GenomeIQ applies population-level re-identification risk controls to every file. Singleton variants - those present in fewer than k individuals in the cohort - are suppressed before release. Pedigree relationships are de-linked. The configurable k-threshold aligns to your HREC-approved methodology.

File formats
VCF 4.1VCF 4.2BCFBAMCRAMFASTQ

Header & metadata scrub

Sample ID, clinic reference, date fields, and phenotype annotations pseudonymised or stripped.

Singleton suppression

Variants present in fewer than k individuals suppressed. Threshold configurable per HREC methodology.

Policy-gated routing

Research archive and AI vendor access governed per data-use agreement. Enforced at network layer.

Lineage and audit trail

Every processing decision signed and logged. Re-identification risk score included in manifest.

Technical specifications

Designed for reference-grade genomic pipelines.

Formats
VCF 4.1/4.2, BCF, BAM, CRAM, FASTQ (gzipped or plain)
Reference genomes
GRCh37, GRCh38, T2T-CHM13 - configurable
k-anon threshold
Configurable (default k=100) · aligned to HREC-approved methodology
Singleton suppression
Variants in < k individuals suppressed before release
Pedigree handling
Family relationship de-linking · pseudonymous pedigree map retained for re-link under controlled conditions
Throughput
Whole-genome VCF (30x WGS): ~2–4 min · Exome: <30s
Deployment
On-prem agent · no raw genomic data in cloud pipeline
Compliance
OAIC APP 11, TGA SaMD, NHMRC genomic data framework, ISO 27001
Questions

Things genomics teams usually ask us.

01

Does GenomeIQ satisfy HREC de-identification requirements?

For most ethics applications, yes. GenomeIQ produces a structured risk manifest documenting your k-threshold, suppression counts, and methodology alignment to the NHMRC Genomic Data Framework. Most HRECs accept this in place of a narrative de-identification description. We recommend reviewing the manifest format with your HREC coordinator before submission.

02

What is k-anonymity and why does it matter for genomic data?

k-anonymity means that any individual's record is indistinguishable from at least k−1 others in the dataset. For genomic data, this is applied at the variant level: a variant present in fewer than k individuals in your cohort can be used to narrow identification to a small group - even without a name or date of birth. GenomeIQ suppresses those variants before the file is released. Standard DICOM or HL7 de-identification does not address this.

03

How does GenomeIQ handle data shared with Australian Genomics or similar consortiums?

Each consortium receives a policy-governed export. Your data-use agreement with the consortium is encoded as a GenomeIQ policy - specifying cohort criteria, permitted use, and k-threshold. The export is delivered with a signed manifest the consortium can reference for their own governance obligations. Raw files never leave your sequencing environment.

04

We have a family study - pedigree relationships are essential to our research. Does GenomeIQ handle this?

Yes. GenomeIQ separates pedigree de-linking from pedigree destruction. The exported dataset has family relationship identifiers removed. A pseudonymous pedigree map is retained under controlled conditions - accessible to authorised researchers for re-link under a separate data-access agreement. Your collaborators get what they need; the raw family graph stays under your governance.

05

How do we quantify re-identification risk for our ethics submission?

GenomeIQ produces a re-identification risk score for every export, based on cohort size, variant suppression rate, and k-threshold relative to population reference panels. The score is included in the signed manifest alongside the methodology reference. Most ethics committees treat this as sufficient quantitative evidence of risk control.

06

What k-threshold should we set?

The NHMRC Genomic Data Framework recommends k=100 as a default for open or controlled-access datasets. For restricted-access datasets shared with named collaborators under a formal data-access agreement, k=20 or k=50 may be appropriate. We configure your k-threshold during deployment based on your HREC-approved methodology - it is locked to that document and not adjustable by individual researchers.

See GenomeIQ on your VCF pipeline.

We will run a live demo against your file format and HREC-approved k-threshold. No data leaves your site.

No sales pressure. Just a 30-minute call with a solutions engineer.