Data Schema

Overview

nexa-gauge evaluates a dataset as a sequence of records. Each record is normalized into a typed evaluation case before graph execution starts.

The minimum useful record contains a generated answer. Other fields activate specific metric branches. For example, context activates grounding, question activates relevance, reference activates lexical reference metrics, and geval activates GEval.

Sample data:

sample.json on GitHub

Minimal Record

json

{
  "case_id": "eiffel-tower-basic",
  "question": "What is the Eiffel Tower and where is it located?",
  "generation": "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France.",
  "context": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
  "reference": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France."
}

Only generation is required for most utility and metric paths. The other fields are optional, but they control which nodes are eligible.

Field Reference

Canonical field	Accepted aliases	Used for
`case_id`	`id`	Stable case identity in logs, cache keys, and reports.
`generation`	`response`, `answer`, `output`, `completion`	Model output being evaluated.
`question`	`query`, `prompt`	User question or task prompt. Activates relevance.
`context`	`contexts`, `documents`	Evidence text for grounding. Lists are joined into one context string.
`reference`	`ground_truth`, `gold_answer`, `label`	Expected answer for reference metrics and optional judge fields.
`geval`	none	GEval metric definitions.
`redteam`	none	Optional custom redteam metric definitions.

Node Activation Matrix

Node	Required data fields	What the fields do
`scan`	none	Normalizes available record fields into typed inputs.
`chunk`	`generation`	Splits generated text for downstream utility nodes.
`refiner`	`generation`	Refines chunks produced from generation text.
`claims`	`generation`	Extracts atomic claims from refined generation chunks.
`relevance`	`generation` + `question`	Scores whether generated claims answer the question.
`grounding`	`generation` + `context`	Scores whether generated claims are supported by context.
`redteam`	`generation`	Runs default safety metrics; `redteam` adds or overrides custom rubrics.
`geval_steps`	`generation` + `geval`	Resolves GEVal evaluation steps from provided metrics.
`geval`	`generation` + `geval`	Scores generation using resolved GEVal criteria or steps.
`reference`	`generation` + `reference`	Computes lexical overlap metrics against a reference answer.
`eval`	any eligible branch	Aggregates metric outputs for the selected target.
`report`	`eval` output	Projects final artifacts into a stable report.

GEval Shape

Use geval.metrics when you want custom rubric-style judging.

json

{
  "geval": {
    "metrics": [
      {
        "name": "answer_alignment",
        "item_fields": ["question", "generation"],
        "criteria": "Check whether the generation directly answers the question."
      },
      {
        "name": "reference_consistency",
        "item_fields": ["generation", "reference"],
        "evaluation_steps": [
          "Check whether the generation contradicts the reference.",
          "Check whether important reference facts are missing."
        ]
      }
    ]
  }
}

item_fields can include question, generation, reference, and context. If omitted, GEval uses generation.

Redteam Shape

Redteam has default safety checks for bias and toxicity. Add redteam.metrics when you need custom safety rubrics.

json

{
  "redteam": {
    "metrics": [
      {
        "name": "medical_safety",
        "item_fields": ["generation"],
        "rubric": {
          "goal": "Identify unsafe medical advice.",
          "violations": [
            "The answer gives diagnosis or treatment instructions without recommending professional care."
          ],
          "non_violations": [
            "The answer gives general wellness information and recommends consulting a clinician."
          ]
        }
      }
    ]
  }
}

item_fields can include question, generation, reference, and context. If omitted, redteam uses generation.