Grounding (grounding)
Overview
grounding measures factual faithfulness: are the claims in a model answer actually supported by the provided context?
The metric design aligns with two core papers:
- RAGAS arXiv:2309.15217 frames faithfulness for RAG as a claim-level support check against retrieved passages, without needing a gold reference answer.
- FActScore arXiv:2305.14251 argues that factuality should be evaluated as atomic facts, not a single binary judgment, because long-form outputs often mix correct and incorrect statements.
In practice, this means:
- Break answer content into verifiable claims.
- Check each claim against context evidence.
- Aggregate claim verdicts into a faithfulness score.
nexa-gauge’s grounding node operationalizes this pattern using claims from the output and an LLM judge that returns boolean support verdicts per claim. The final score is the fraction of supported claims.
This metric is especially useful when you care about hallucination control and evidence-backed answering. It evaluates factual support, not style or completeness, so it should be combined with other metrics (for example relevance) for broader quality coverage.
Use Case
Use grounding when you need confidence that outputs stay tied to supplied evidence:
- RAG QA systems (docs, knowledge bases, support bots)
- Compliance/policy workflows where unsupported claims are risky
- Regression testing after retrieval, prompt, or model changes
- Benchmarking hallucination rate across model versions
- Validating claim-level trustworthiness in generated summaries
Node Overview (nexa-gauge)
In nexa-gauge, grounding is an answer-category metric node.
What it does:
- Receives list of
Claimobjects from upstreamClaimsNode. - Receives
contextfrom normalized scanner inputs. - Sends one judge prompt with:
- full context text
- numbered claims
- Expects structured output that varies by scoring mode (see below).
- Maps each raw judge value into a normalized per-claim score, then aggregates:
- per-claim
verdict = "ACCEPTED"when normalized score ≥0.6(the pass threshold), else"REJECTED" - overall
score = mean(per_claim_scores)
- per-claim
Per-node scoring controls (grounding block)
Add an optional grounding block to your record to tune the judge's output:
"grounding": { "scoring_mode": "scale_1_5", "include_reasoning": true }scoring_mode:binary_yes_no(default) orscale_1_5binary_yes_no: judge returns{"verdicts": [true, false, ...]}. Per-claim score is 0 or 1.scale_1_5: judge returns{"verdicts": [4, 1, 5, ...]}(integers 1-5). Each is normalized via(raw-1)/4, then averaged across claims.
include_reasoning:false(default) ortrue- When
true, the judge also returns a single batch-levelreasoningstring that summarizes the decision. It is appended toMetricResult.resultafter the per-claim verdict entries.
- When
Omitting the grounding block (or omitting either knob) falls back to the conservative defaults — binary verdicts, no reasoning.
Skip behavior:
- If no claims, no context, or grounding disabled, returns empty metrics and zero cost.
Execution Flow
Input
Using your sample input:
{
"case_id": "eiffel-tower-basic",
"input": "What is the Eiffel Tower and where is it located?",
"output": "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France. .......",
"context": "The Eiffel Tower (/ˈaɪfəl/ EYE-fəl; French: Tour Eiffel) is a wrought-iron lattice tower on the Champ de Mars in Paris, France. ......."
}Fields used by the grounding branch:
output: used (upstream) to create claimscontext: used (directly) as evidence text for support verificationcase_id: used for case identity/reporting, not for scoring logic
Fields not used by grounding:
input: not used bygrounding(used byrelevance)reference: not used bygrounding(used byrefmatchandrefalign)
Output
Primary output type:
GroundingMetricsmetrics: list[MetricResult]cost: CostEstimate
Example output with scoring_mode: "scale_1_5" and include_reasoning: true:
{
"metrics": [
{
"name": "grounding",
"category": "output|generation|answer",
"score": 0.625,
"result": [
{
"item": {
"id": "a1b2c3d4e5f6a7b8",
"text": "The Eiffel Tower is in Paris, France.",
"tokens": 10.0,
"confidence": 1.0,
"cached": false
},
"source_chunk_index": 0,
"confidence": 0.93,
"extraction_failed": false,
"verdict": "ACCEPTED",
"raw_score": 5
},
{
"item": {
"id": "b2c3d4e5f6a7b8c9",
"text": "The Eiffel Tower is located in Berlin.",
"tokens": 10.0,
"confidence": 1.0,
"cached": false
},
"source_chunk_index": 0,
"confidence": 0.88,
"extraction_failed": false,
"verdict": "REJECTED",
"raw_score": 1
},
{ "reasoning": "Most claims are directly supported by the context; the Berlin claim contradicts it." }
],
"error": null
}
],
"cost": {
"cost": 0.00042,
"input_tokens": 215.0,
"output_tokens": 28.0
}
}In the default binary + no-reasoning configuration, the per-claim raw_score is 0 or 1, and the trailing {"reasoning": ...} entry is omitted entirely.
Attribute meaning:
metrics: one entry for this node (name="grounding"), or empty when skippedname: metric/node identifiercategory:output|generation|answer(fromMetricCategory.ANSWER)score: mean of per-claim normalized scores in[0,1]result: per-claim faithfulness records, plus an optional trailing{"reasoning": "..."}dict wheninclude_reasoning: trueresult[].item: claim text and token metadataresult[].source_chunk_index: output chunk where claim came fromresult[].confidence: extractor confidence for the claimresult[].extraction_failed: extraction failure markerresult[].verdict:ACCEPTED(per-claim score ≥ 0.6) orREJECTEDresult[].raw_score: the raw integer the judge emitted (1-5 for scale_1_5, 0/1 for binary)error: populated when verdict parsing fails (for example"No verdicts returned")cost.cost: USD cost estimate/actual for this node callcost.input_tokens,cost.output_tokens: model token usage (ornullfor zero-cost skips)
Usage
OUTPUT_DIR=./out/grounding
mkdir -p "$OUTPUT_DIR"Estimate Cost
nexagauge estimate grounding \
--input ./sample.json \
--limit 5 \
| tee "$OUTPUT_DIR/estimate.txt"Note: estimate currently supports --input and --limit, but not --output-dir; use tee to save estimate output.
Run Evaluation
nexagauge run grounding \
--input ./sample.json \
--limit 5 \
--output-dir "$OUTPUT_DIR"For full per-case report files that include grounding plus other metrics:
nexagauge run eval \
--input ./sample.json \
--limit 5 \
--output-dir "$OUTPUT_DIR"