Extensions
Overview
Extensions are small Python functions you register with nexa-gauge to customize behavior the CLI alone can't express. They share a common pattern:
- Decorate a function with
@register_<thing>("name")in any Python file. - Point the CLI at that file with
--extension-file ./my_file.py(repeatable). - Select the registered name at run time (e.g.
--transform hotpot_qa).
There is no packaging, no sys.path setup, and no entry-point registration — the file is imported once before iteration so its decorators fire.
Available today:
| Extension | Decorator | Selector flag | Purpose |
|---|---|---|---|
| Transforms | @register_transform("name") | --transform <name> | Reshape one raw record into nexa-gauge's expected dict shape before scanning. |
Coming later: prompt overrides (@register_prompt), custom rubrics, and more. The --extension-file flag will load all of them; each gets its own selector flag.
Transforms
A transform is a small Python function that reshapes one raw record into the dict shape nexa-gauge expects, before any node sees it. Reach for it when your data — Hugging Face dataset, exported production logs, or any local JSON — doesn't line up with the canonical fields and a simple column rename isn't enough.
When to reach for a transform
Two tools, two different problems:
| Mismatch | Tool |
|---|---|
Same data, different column name (text → output) | --field LOGICAL=COLUMN |
| Structural reshape (nested dict, multiple source columns → one field, computed values) | @register_transform + --transform |
| Both | Compose: transform reshapes first, --field renames after. |
Reach for --field whenever it works — it's a one-line flag and covers the common case. Transforms exist for the rest.
A common trigger: hotpotqa/hotpot_qa rows look like this:
{
"input": "...",
"answer": "...",
"context": {
"title": ["Atacama Desert", "Chile"],
"sentences": [
["The Atacama is a desert plateau.", "It spans 1,000 km."],
["Chile is in South America.", "Its capital is Santiago."]
]
}
}context is a nested dict, not a string or list of strings. There is no single column to alias into nexa-gauge's context field. The fix is a 10-line transform that zips title and sentences into a list of paragraphs.
Write a transform
from ng_core import register_transform
@register_transform("my_dataset")
def my_dataset(record: dict) -> dict:
# reshape record into nexa-gauge's expected dict shape
return {
"case_id": record["id"],
"input": record["q"],
"output": record["a"],
# ...context, reference as needed
}For the full worked example — hotpotqa/hotpot_qa with its nested context field — see Hugging Face datasets → Reshape nested structures with @register_transform.
The contract:
- Input: one raw record dict (whatever the adapter yields).
- Output: a dict with any subset of
case_id,input,output,context,reference. Other keys are ignored. - Pure and threadsafe. No I/O, no shared mutable state. Transforms run in the producer thread, before per-case parallel fanout.
- Errors surface as
InputParseErrorwith the record index, so they slot into the existing CLI error path.
Note:
gevalandredteamare nexa-gauge metric configs — not dataset data. Don't try to construct them from a transform; configure them at the record level instead. See Data Schema.
Run it
Two flags wire the transform into a run:
nexagauge run <node> \
--input <source> \
--extension-file ./my_transforms.py \
--transform <name>--extension-filepoints at the Python file(s) to import. Repeatable.--transformselects which registered transform to apply per record.
The same flags work with nexagauge estimate. For the full invocation against hotpot_qa, see Hugging Face datasets.
Compose with --field
Transforms reshape; --field renames. They chain naturally:
raw record
↓ transform (optional, restructures)
shaped dict
↓ --field aliases (optional, renames columns)
scanner-ready dict
↓ scan
typed InputsUse both together when your transform produces a dict whose keys aren't quite the canonical names yet:
nexagauge run eval \
--input hf://my-org/dataset \
--extension-file ./my_transforms.py \
--transform my_dataset \
--field input=user_questionErrors
All transform-related failures surface as InputParseError, so they render through the same CLI error path as adapter and scanner failures.
| Situation | Result |
|---|---|
--transform set, name not registered | Exits with InputParseError listing the registered names. |
--extension-file path does not exist | InputParseError: Extension file not found: <path> |
| Transform raises on a record | InputParseError(record_index=N) — halts the run. |
| Transform returns non-dict | InputParseError: Transform '<name>' returned <type> on record <idx>, expected a dict. |