Hugging Face Data

Overview

nexa-gauge can read datasets from Hugging Face with hf://<dataset-id> sources. Rows from the selected split are treated like local records and normalized with the same field aliases.

Install the optional dependency first:

bash
pip install "nexa-gauge[huggingface]"

Basic Usage

bash
nexagauge estimate eval \
  --input hf://org/dataset \
  --limit 10
bash
nexagauge run eval \
  --input hf://org/dataset \
  --limit 10 \
  --output-dir ./report

auto adapter mode selects the Hugging Face adapter whenever the input starts with hf://.

Adapter Options

OptionPurpose
--input hf://<dataset-id>Hugging Face dataset source.
--adapter huggingfaceForce the Hugging Face adapter instead of auto-detecting.
--hf-config <name>Optional dataset config name.
--hf-revision <rev>Optional revision, tag, branch, or commit.
--split <name>Dataset split for estimate. Default is train.
--limit <n>Maximum number of rows to process.
--start <n> / --end <n>Process a deterministic row slice.

Example with a config and revision:

bash
nexagauge estimate eval \
  --input hf://org/dataset \
  --adapter huggingface \
  --hf-config default \
  --hf-revision main \
  --limit 25

Row Schema

Hugging Face rows must expose the same fields or aliases as local data.

PurposeAccepted field names
Case IDcase_id, id
Generationgeneration, response, answer, output, completion
Questionquestion, query, prompt
Contextcontext, contexts, documents
Referencereference, ground_truth, gold_answer, label
GEval configgeval
Redteam configredteam

If a dataset does not already include generated outputs, precompute model responses into a generation-like field before running nexa-gauge.

Metric Activation

The same activation rules apply to Hugging Face rows:

  • generation is required for chunking, refinement, claims, redteam, and most metrics.
  • question activates relevance.
  • context activates grounding.
  • reference activates reference.
  • geval activates geval_steps and geval.
  • redteam adds or overrides custom redteam rubrics.

For the complete table, see Data Schema.

Common Runs

Estimate a small slice:

bash
nexagauge estimate eval \
  --input hf://org/dataset \
  --limit 5

Run grounding on rows that include context:

bash
nexagauge run grounding \
  --input hf://org/dataset \
  --limit 50 \
  --output-dir ./report-grounding

Run reference metrics on rows that include reference:

bash
nexagauge run reference \
  --input hf://org/dataset \
  --limit 50 \
  --output-dir ./report-reference