Hugging Face Data

Overview

nexa-gauge can read datasets from Hugging Face with hf://<dataset-id> sources. Rows from the selected split are treated like local records and normalized with the same field aliases.

Install the optional dependency first:

bash

pip install "nexa-gauge[huggingface]"

Basic Usage

bash

nexagauge estimate eval \
  --input hf://<dataset_id> \
  --limit 10

bash

nexagauge run eval \
  --input hf://<dataset_id> \
  --limit 10 \
  --output-dir ./report

auto adapter mode selects the Hugging Face adapter whenever the input starts with hf://.

Adapter Options

Option	Purpose
`--input hf://<dataset-id>`	Hugging Face dataset source.
`--adapter huggingface`	Force the Hugging Face adapter instead of auto-detecting.
`--hf-config <name>`	Optional dataset config name.
`--hf-revision <rev>`	Optional revision, tag, branch, or commit.
`--split <name>`	Dataset split for `estimate`. Default is `train`.
`--limit <n>`	Maximum number of rows to process.
`--start <n>` / `--end <n>`	Process a deterministic row slice.

Example with a config and revision:

bash

nexagauge estimate eval \
  --input hf://<dataset_id> \
  --adapter huggingface \
  --hf-config default \
  --hf-revision main \
  --limit 25

Row Schema

Hugging Face rows must expose the same fields or aliases as local data.

Purpose	Accepted field names
Case ID	`case_id`, `id`
Generation	`output`, `generation`, `response`, `answer`, `completion`
Question	`input`, `query`, `prompt`
Context	`context`, `contexts`, `documents`
Reference	`reference`, `ground_truth`, `gold_answer`, `label`
GEval config	`geval`
Redteam config	`redteam`

Note: Aliases are normalised to the canonical field name in the output. If your input row uses answer, the metrics output will refer to it as output; query or prompt becomes input; ground_truth/gold_answer/label becomes reference; contexts/documents becomes context; id becomes case_id. Don't be surprised when the input key you supplied isn't the key you see in the output JSON — the column on the left is what nexa-gauge reports.

Custom column mappings with `--field`

When a Hugging Face dataset uses column names that aren't in the table above, point nexa-gauge at them with the --field LOGICAL=COLUMN flag instead of preprocessing the dataset. The flag is repeatable, so map as many fields as you need in a single invocation:

bash

nexagauge run relevance \
  --input hf://<dataset_id> \
  --field output=text \
  --field input=q

In this example, the row column text is treated as the output, and q is treated as the input. Everything downstream — chunking, claim extraction, refinement, metric scoring, the cache fingerprint, and the JSON output — uses the canonical names (output, input, …), so two runs of the same content produce the same cache key whether the dataset uses text, answer, or output.

Allowed logical keys: case_id, output, input, reference, context, geval, redteam, refalign. The first five cover the row data fields shown above; the last three (geval, redteam, refalign) map a column to the corresponding metric config block. Anything else fails fast with a list of valid options.

Precedence: if a row carries both the canonical name and your user-mapped column (e.g. an empty output field plus a populated text), the explicit --field mapping wins. This is intentional — you asked for it.

Mirrored on nexagauge estimate: the same --field option works for cost estimation, so the mapping doesn't need to change between estimate and run.

Validation errors you might see:

Invalid --field value 'foo'. Expected 'LOGICAL=COLUMN'. — missing =.
Unknown logical key 'gen' in field mapping. Allowed: case_id, context, geval, input, output, redteam, refalign, reference. — typo in the canonical key (use output, not gen).
--field: duplicate mapping for 'output', last value 'X' wins. — warning only, the last --field for a logical key takes effect.

If a dataset does not already include generated outputs, precompute model responses into a output-like field before running nexa-gauge.

Reshape nested structures with `@register_transform`

--field handles flat column-to-column renames. Some datasets have nested structures that no single column maps to — hotpotqa/hotpot_qa's context, for example, is {title: list[str], sentences: list[list[str]]}, not a string or list of strings. For these, decorate a small Python function with @register_transform("name") and point the CLI at it:

python

# my_transforms.py
from ng_core import register_transform


@register_transform("hotpot_qa")
def hotpot_qa(record: dict) -> dict:
    ctx = record.get("context") or {}
    titles = ctx.get("title") or []
    sentences = ctx.get("sentences") or []
    paragraphs = [
        f"{title}\n{' '.join(sents)}"
        for title, sents in zip(titles, sentences)
    ]
    return {
        "case_id":    record.get("id"),
        "input":   record.get("input", ""),
        "output": record.get("answer", ""),
        "context":    paragraphs,
        "reference":  record.get("answer", ""),
    }

bash

nexagauge run eval \
  --input hf://hotpotqa/hotpot_qa \
  --hf-config distractor \
  --extension-file ./my_transforms.py \
  --transform hotpot_qa \
  --limit 10 \
  --output-dir ./report

The transform runs once per record, before the scanner, and produces a dict in nexa-gauge's canonical shape. Allowed output keys: case_id, input, output, context, reference. The same flags work with nexagauge estimate.

Note: geval and redteam are nexa-gauge metric configs, not dataset data — don't construct them in a transform. Configure them on the record directly.

--extension-file is repeatable, so you can load several files of registered functions in one invocation; --transform then picks which one to apply. You can also compose with --field — the transform reshapes structure first, then --field renames columns on the result.

See Extensions for the full reference (contract, error model, composition rules — and the home for future extension types like prompts).

Metric Activation

The same activation rules apply to Hugging Face rows:

output is required for chunking, refinement, claims, redteam, and most metrics.
input activates relevance.
context activates grounding.
reference activates refmatch (lexical overlap) and refalign (semantic similarity).
geval activates geval_steps and geval.
redteam adds or overrides custom redteam rubrics.

For the complete table, see Data Schema.

Common Runs

Estimate a small slice:

bash

nexagauge run relevance \
  --input hf://sentence-transformers/natural-questions \
  --limit 2 \
  --output-dir ./data/hg_exp_relevance

Run grounding on rows that include context:

bash

nexagauge run grounding \
  --input hf://wandb/RAGTruth-processed \
  --limit 3 \
  --output-dir ./data/hg_exp_grounding

Run lexical reference metrics on rows that include reference:

bash

nexagauge run redteam \
  --input hf://mteb/toxic_conversations_50k \
  --field output=text \
  --limit 3 \
  --output-dir ./data/hg_exp_toxicity

Hugging Face Data

Overview

Basic Usage

Adapter Options

Row Schema

Custom column mappings with --field

Reshape nested structures with @register_transform

Metric Activation

Common Runs

Custom column mappings with `--field`

Reshape nested structures with `@register_transform`