Hugging Face Data
Overview
nexa-gauge can read datasets from Hugging Face with hf://<dataset-id> sources. Rows from the selected split are treated like local records and normalized with the same field aliases.
Install the optional dependency first:
pip install "nexa-gauge[huggingface]"Basic Usage
nexagauge estimate eval \
--input hf://org/dataset \
--limit 10nexagauge run eval \
--input hf://org/dataset \
--limit 10 \
--output-dir ./reportauto adapter mode selects the Hugging Face adapter whenever the input starts with hf://.
Adapter Options
| Option | Purpose |
|---|---|
--input hf://<dataset-id> | Hugging Face dataset source. |
--adapter huggingface | Force the Hugging Face adapter instead of auto-detecting. |
--hf-config <name> | Optional dataset config name. |
--hf-revision <rev> | Optional revision, tag, branch, or commit. |
--split <name> | Dataset split for estimate. Default is train. |
--limit <n> | Maximum number of rows to process. |
--start <n> / --end <n> | Process a deterministic row slice. |
Example with a config and revision:
nexagauge estimate eval \
--input hf://org/dataset \
--adapter huggingface \
--hf-config default \
--hf-revision main \
--limit 25Row Schema
Hugging Face rows must expose the same fields or aliases as local data.
| Purpose | Accepted field names |
|---|---|
| Case ID | case_id, id |
| Generation | generation, response, answer, output, completion |
| Question | question, query, prompt |
| Context | context, contexts, documents |
| Reference | reference, ground_truth, gold_answer, label |
| GEval config | geval |
| Redteam config | redteam |
If a dataset does not already include generated outputs, precompute model responses into a generation-like field before running nexa-gauge.
Metric Activation
The same activation rules apply to Hugging Face rows:
generationis required for chunking, refinement, claims, redteam, and most metrics.questionactivatesrelevance.contextactivatesgrounding.referenceactivatesreference.gevalactivatesgeval_stepsandgeval.redteamadds or overrides custom redteam rubrics.
For the complete table, see Data Schema.
Common Runs
Estimate a small slice:
nexagauge estimate eval \
--input hf://org/dataset \
--limit 5Run grounding on rows that include context:
nexagauge run grounding \
--input hf://org/dataset \
--limit 50 \
--output-dir ./report-groundingRun reference metrics on rows that include reference:
nexagauge run reference \
--input hf://org/dataset \
--limit 50 \
--output-dir ./report-reference