Active Genome Index
The local, queryable genome artifact Genomi builds from VCF, gVCF, BAM, FASTQ, and consumer raw genotype exports.
An Active Genome Index is Genomi's local, queryable representation of a genome source. It exists so agents do not reason over giant raw files directly.
Supported sources
Consumer array sources are primarily presence/genotype evidence at assayed sites. Sequencing-derived sources can support richer depth, genotype quality, and callability checks when those fields are present.
What parsing creates
genomi.parse_source detects the source type and writes a durable record under
GENOMI_HOME. The core artifacts are:
| Artifact | Purpose |
|---|---|
work/active-genome-index.sqlite | Query tables for variants, reference spans, metadata, stats, and source header lines |
| canonical BGZF VCF | Genomi-owned normalized source used for random access |
evidence/evidence.sqlite | Per-index evidence storage |
shared-evidence.sqlite | Shared reviewed evidence storage under GENOMI_HOME |
registry.json | Users, assigned AGIs, active AGI IDs, default user, and response profile |
session context.json | Chat-scoped active user, active AGI, and access grants |
The SQLite schema currently requires metadata, stats, records, spans, source header lines, and region/variant/rsID query indexes. Rebuilding is handled by the lifecycle rules below.
Variants first, reference tail in the background
A whole-genome gVCF is ~96% reference blocks, so genomi.parse_source splits
the work for gVCFs into two phases:
- Phase A — variants pass (synchronous). Every variant row is parsed and
written; final stats are computed. The index reaches
variants_readyand the full interpretation surface — rsID, gene, region, exact-allele lookup, ClinVar, PRS — is queryable in minutes. - Phase B — reference pass (background). Detached background job
active_genome_index.build_reference_pass(auto-launched, idempotent, internal) coalesces and appends the reference-block tail and flips the index tocompleted. Itsjob_idis surfaced in the parse result; agents pollgenomi.check_background_job, they don't reparse.
Plain VCFs, small files, capped (max_records) parses, consumer arrays, and
BAM/FASTQ stay single-phase — there is no reference tail to defer. Until
Phase B reports completed, reference-dependent reads (callability,
genotype-support, callset-QC, PRS scoring, ancestry overlap) carry a
reference_pending marker so a host treats a transient empty/negative as
provisional rather than final.
Access and gating
Every capability that touches per-sample genome rows goes through a single gated reader rather than opening the SQLite directly. One door composes the two concerns that used to be hand-stamped across handlers:
- Session authorization.
genomi.approve_agi_accessrecords explicit user approval for the current chat. The gate raises a structuredactive_genome_index_approval_requiredenvelope when an unapproved capability tries to read. - Readiness. The reader knows the parse state (
complete/variants_ready/reference_pending) and gates reference-dependent operations lazily — cheap public prerequisite checks run first, andreference_pendingis stamped once at the operation boundary, not in every handler. Capabilities that only need variants (prs.calculate_score,ancestry.check_sample_overlap,variant.resolve) degrade gracefully atvariants_readyinstead of hard-failing.
The net effect for agents: you don't manage per-capability gates. Approve
once with genomi.approve_agi_access, then any AGI-reading tool returns a
typed readiness envelope when the index isn't fully ready yet.
Users and AGIs
User/profile nicknames belong to people or profiles. Active Genome Index IDs belong to genome artifacts. A user can have multiple genome records and one selected active index.
Useful base operations:
| Operation | Use |
|---|---|
genomi.parse_source | Detect and digitize a source into an AGI |
genomi.list_users | List user/profile metadata |
genomi.assign_user_genome | Link a source or AGI to a profile |
genomi.select_user | Select a profile for the session |
genomi.set_default_user | Persist one default profile for GENOMI_HOME |
genomi.approve_agi_access | Record explicit approval to read an existing AGI |
genomi.describe_context | Inspect active context and response-profile guidance |
Evidence operations
The AGI itself is technical sample evidence. Interpretation comes from focused capabilities.
| Operation | Use |
|---|---|
active_genome_index.summarize | Compact readiness and artifact summary |
active_genome_index.classify_callset_qc | Callset shape, QC field availability, and absence-claim boundaries |
active_genome_index.classify_genotype_support | Whether one allele has enough sample support |
active_genome_index.classify_region_callability | Whether a region can support reference or absence claims |
For rsIDs, genes, or public evidence around a locus, start with
variant.resolve through genomi.invoke after reading the variant skill.
Lifecycle states
genomi.describe_context and read operations report AGI readiness through an
active_genome_index_readiness block with status and a structured reason
code:
| Status | Agent action |
|---|---|
complete | Continue with focused evidence tools |
variants_ready | Continue with variant queries; reference-dependent results carry reference_pending until Phase B finishes. Poll genomi.check_background_job with the surfaced reference job — don't reparse |
needs_reparse | Reparse from the recorded source path if it still exists |
schema_too_new | Upgrade Genomi; do not downgrade the index by reparsing |
| missing source | Ask for the current source path before reparsing |
If genomi.parse_source returns status="in_progress", poll
genomi.check_background_job. Do not replace a complete parse with a capped
sample parse for user-facing interpretation.
Boundaries
- The original genome source remains local.
- Parsing does not automatically run every interpretation tool.
- Missing library evidence is not negative evidence.
- Consumer arrays cannot prove broad absence or coverage claims the way sequencing sources sometimes can.
- Clinical decisions require clinical confirmation.