Genomics Practices

The variant is stripped.
The genome is not.

Genomics consortiums, research institutes, and the partners pitching you consortium access all want the same thing: VCF and BAM files that are genuinely safe to share. The problem is that genomic data is inherently re-identifiable. Removing the sample ID reduces risk - it does not eliminate it. GenomeIQ^™ applies population-level population-level enforcement, rare-variant suppression, and family-link de-linking before any file moves outside your sequencing environment.

See our threat model →

Genomic formats

VCF · BAM · CRAM · FASTQ

Population controls

Configurable

OAIC

APP 11 aligned

GenomeIQ · VCF processing stream

Sample identifierANON-7f3a91pseudonymised

##fileDate20240000pseudonymised

##source=Lab██████████stripped

Reference assemblyStandard buildretained

CHROM/POS/REF/ALTRetained (filtered)retained

Cohort policy checkPopulation threshold metretained

Singleton variantsSuppressed (risk)stripped

Pedigree links██████████████stripped

Genomic export · cohort policy enforcedCohort policy verified

What changes

What GenomeIQ^™ replaces.

Everything you are currently doing to satisfy your HREC - and hoping is enough.

The old way

With GenomeIQ

✗

The old way

Scrub sample IDs and assume the HREC will accept that as de-identification

With GenomeIQ

Population-level controls enforced. Signed manifest with suppression counts included on every export.

✗

The old way

Send VCF files to a genomics consortium with no formal risk quantification

With GenomeIQ

Policy-gated export. Consortium receives a clean file with a methodology document they can reference for their own governance.

✗

The old way

AI vendor wants BAM files - no way to prove the genetic signal was controlled

With GenomeIQ

Re-identification risk score documented in the export manifest. Methodology referenced against NHMRC framework.

✗

The old way

Family-link data scrubbed entirely - collaborators can't use it

With GenomeIQ

Family-link suppressed in export. Pseudonymous link map retained for re-link under controlled access. Research integrity preserved.

✗

The old way

Responding to an OAIC or HREC inquiry by reconstructing from LIMS logs

With GenomeIQ

Immutable, timestamped audit log. Every processing decision signed and exportable within hours.

Day one

Three things that change immediately.

Deployed in your sequencing environment

GenomeIQ Agent installed on-prem, connected to your LIMS or sequencer output directory. No raw genomic data touches external infrastructure. Half-day engagement.

Half-day engagement

k-threshold locked to your HREC methodology

Your HREC-approved k parameter is configured in GenomeIQ at deployment. Every subsequent export enforces it automatically - no manual review, no researcher override.

Config locked to HREC document

Risk manifest on every export

Each file release includes a signed methodology manifest with suppression counts, re-identification risk score, and NHMRC framework reference. Ready for consortium submission or HREC audit.

Structured, exportable report

The fundamental problem

You cannot anonymise a genome.
You can only control the risk.

Genomic data is intrinsically identifying. Removing metadata - sample ID, DOB, clinic reference - reduces the risk but does not eliminate it. Population-level population-level controls, rare-variant suppression, and lineage tracking are required. GenomeIQ applies all three.

Standard VCF de-id coverage

Header metadatacovered

Sample ID pseudonymisationmissed

Singleton suppressionmissed

Pedigree de-linkingmissed

Population-level risk scoringmissed

How it works

The GenomeIQ^™ pipeline.

File Ingest

VCF, BAM, or FASTQ received from sequencer or LIMS

Header Scrub

All metadata fields processed - sample ID pseudonymised

Risk Scoring

Population-level risk evaluated, rare variants flagged

Policy Check

Routing validated against data-use agreement

Governed Output

Risk-controlled file delivered. Audit entry written.

Capabilities

What GenomeIQ^™ covers.

Population-level risk controls

GenomeIQ applies population-level re-identification risk controls to every file. Singleton variants - those present in fewer than k individuals in the cohort - are suppressed before release. Pedigree relationships are de-linked. The configurable k-threshold aligns to your HREC-approved methodology.

File formats

VCF 4.1VCF 4.2BCFBAMCRAMFASTQ

Header & metadata scrub

Sample ID, clinic reference, date fields, and phenotype annotations pseudonymised or stripped.

Singleton suppression

Variants present in fewer than k individuals suppressed. Threshold configurable per HREC methodology.

Policy-gated routing

Research archive and AI vendor access governed per data-use agreement. Enforced at network layer.

Lineage and audit trail

Every processing decision signed and logged. Re-identification risk score included in manifest.

Technical specifications

Designed for reference-grade genomic pipelines.

Formats

VCF 4.1/4.2, BCF, BAM, CRAM, FASTQ (gzipped or plain)

Reference genome assemblies

Standard human reference assemblies - configurable

Population threshold

Configurable · aligned to HREC-approved methodology

Singleton suppression

Variants in < k individuals suppressed before release

Family-link handling

Relationship suppression · pseudonymous link map retained for re-link under controlled conditions

Throughput

Batch-optimised pipeline for WGS and exome workloads

Deployment

On-prem agent · no raw genomic data in cloud pipeline

Compliance

OAIC APP 11, NHMRC genomic data framework, ISO 27001

Questions

Things genomics teams usually ask us.

Does GenomeIQ satisfy HREC de-identification requirements?

For most ethics applications, yes. GenomeIQ produces a structured risk manifest documenting your k-threshold, suppression counts, and methodology alignment to the NHMRC Genomic Data Framework. Most HRECs accept this in place of a narrative de-identification description. We recommend reviewing the manifest format with your HREC coordinator before submission.

Why do genomic exports need population-level controls?

Genomic data is uniquely re-identifiable. A small number of variants present in only one or two individuals in a cohort can narrow identification to that group - even without a name or date of birth. GenomeIQ applies population-level controls before release: rare variants below a configured frequency are suppressed, sample identifiers are pseudonymised, and family-link metadata is removed. Standard DICOM or HL7 de-identification does not address this.

How does GenomeIQ handle data shared with genomics consortiums?

Each consortium receives a policy-governed export. Your data-use agreement with the consortium is encoded as a GenomeIQ policy - specifying cohort criteria, permitted use, and k-threshold. The export is delivered with a signed manifest the consortium can reference for their own governance obligations. Raw files never leave your sequencing environment.

We have a family study - relationships across individuals are essential. Does GenomeIQ handle this?

Yes. GenomeIQ separates family-link suppression from family-link destruction. The exported dataset has family relationship identifiers removed. A pseudonymous link map is retained under controlled conditions - accessible to authorised researchers for re-link under a separate data-access agreement. Your collaborators get what they need; the raw family graph stays under your governance.

How do we quantify re-identification risk for our ethics submission?

GenomeIQ produces a re-identification risk score for every export, based on cohort size, variant suppression rate, and k-threshold relative to population reference panels. The score is included in the signed manifest alongside the methodology reference. Most ethics committees treat this as sufficient quantitative evidence of risk control.

What k-threshold should we set?

Australian guidance recommends conservative population thresholds for open or controlled-access datasets, with restricted-access thresholds for datasets shared with named collaborators under a formal data-access agreement. GenomeIQ's thresholds are configured during deployment based on your HREC-approved methodology - locked to that document and not adjustable by individual researchers.

See GenomeIQ^™ on your VCF pipeline.

We will run a live demo against your file format and HREC-approved k-threshold. No data leaves your site.

No sales pressure. Just a 30-minute call with a solutions engineer.

The variant is stripped.The genome is not.

What GenomeIQ™ replaces.

Three things that change immediately.

Deployed in your sequencing environment

k-threshold locked to your HREC methodology

Risk manifest on every export

You cannot anonymise a genome.You can only control the risk.

The GenomeIQ™ pipeline.

What GenomeIQ™ covers.

Population-level risk controls

Header & metadata scrub

Singleton suppression

Policy-gated routing

Lineage and audit trail

Designed for reference-grade genomic pipelines.

Things genomics teams usually ask us.

Does GenomeIQ satisfy HREC de-identification requirements?

Why do genomic exports need population-level controls?

How does GenomeIQ handle data shared with genomics consortiums?

We have a family study - relationships across individuals are essential. Does GenomeIQ handle this?

How do we quantify re-identification risk for our ethics submission?

What k-threshold should we set?

See GenomeIQ™ on your VCF pipeline.

The variant is stripped.
The genome is not.

What GenomeIQ^™ replaces.

You cannot anonymise a genome.
You can only control the risk.

The GenomeIQ^™ pipeline.

What GenomeIQ^™ covers.

See GenomeIQ^™ on your VCF pipeline.