RefMatch (refmatch)
Overview
refmatch is a lexical overlap evaluation node that compares a model output to a gold reference answer using ROUGE, BLEU, and METEOR metrics.
The metric family comes from established summarization and MT evaluation work: BLEU (Papineni et al., 2002), METEOR (Banerjee and Lavie, 2005), and ROUGE (Lin, 2004). In practice, these metrics provide fast, deterministic signals of lexical/phrase overlap between candidate and reference text.
In nexa-gauge, refmatch computes five scores in [0,1]:
rouge1(unigram overlap)rouge2(bigram overlap)rougeL(longest common subsequence style overlap)bleumeteor
Unlike judge-model metrics, this node does not call an LLM and always reports zero cost. It is most useful as a fast baseline similarity signal, typically combined with the semantic refalign node or LLM-judge metrics for fuller quality assessment.
Note: The shared
scoring_modeandinclude_reasoningknobs available on the LLM-judge nodes (geval,grounding,relevance,redteam) do not apply torefmatch. ROUGE/BLEU/METEOR are deterministic lexical metrics — there is no judge to configure.
Use Case
Use refmatch when you have trusted reference answers and want fast, deterministic overlap-based quality checks.
- Regression checks for answer fidelity against a gold target
- Benchmark scoring where deterministic, low-latency metrics are needed
- Sanity checking summarization or QA outputs before deeper judge-based evaluation
- Comparing model variants with a consistent lexical baseline
- Cost-sensitive pipelines that need non-LLM metrics
For paraphrase-sensitive comparison (same meaning, different wording), combine refmatch with refalign, which scores semantic similarity via embeddings.
Node Overview
In nexa-gauge, refmatch is an answer metric node.
What the node does:
- Reads normalized
outputandreferencetext - Skips when
referenceis missing or blank (returns empty metrics, zero cost) - Computes ROUGE-1/2/L (F1), BLEU (smoothed sentence BLEU), and METEOR
- Returns one
MetricResultper metric - Returns zero-cost
CostEstimatebecause no model calls are made
Execution Flow
Input
Using your sample input:
{
"case_id": "bitcoin-economics-medium",
"input": "What is Bitcoin and how does it work as a currency?",
"output": "Bitcoin is a decentralised digital currency created in 2009 by the pseudonymous Satoshi Nakamoto. Unlike traditional currencies issued by central banks, Bitcoin operates on a peer-to-peer network with no central authority. ....",
"reference": "Bitcoin is a decentralised digital currency launched in 2009, using blockchain technology and proof-of-work mining to verify transactions without a central authority. Its supply is capped at 21 million coins."
}Fields used by the refmatch node:
output: candidate text to scorereference: target text to compare against
Fields not used for scoring in this node:
inputcase_id(used for report identity, not metric computation)
Output
Primary output type is RefmatchMetrics (ng_core/types.py).
metrics: list[MetricResult]cost: CostEstimate
Example output:
{
"metrics": [
{
"name": "rouge1",
"category": "output|generation|answer",
"score": 0.7063,
"result": null,
"error": null
},
{
"name": "rouge2",
"category": "output|generation|answer",
"score": 0.4921,
"result": null,
"error": null
},
{
"name": "rougeL",
"category": "output|generation|answer",
"score": 0.6554,
"result": null,
"error": null
},
{
"name": "bleu",
"category": "output|generation|answer",
"score": 0.3712,
"result": null,
"error": null
},
{
"name": "meteor",
"category": "output|generation|answer",
"score": 0.5987,
"result": null,
"error": null
}
],
"cost": {
"cost": 0.0,
"input_tokens": null,
"output_tokens": null
}
}Attribute meaning:
metrics: five results whenreferenceis present, empty list when skippedname: metric identifier (rouge1,rouge2,rougeL,bleu,meteor)category:output|generation|answerscore: metric value in[0,1](higher is better overlap)result: unused for lexical metrics (null)error:nullon success; populated only if a metric-level failure occurscost.cost: always0.0cost.input_tokens,cost.output_tokens: alwaysnull(no LLM usage)
Usage
OUTPUT_DIR=./out/refmatch
mkdir -p "$OUTPUT_DIR"CLI: Estimate Cost
nexagauge estimate refmatch \
--input ./sample.json \
--limit 5 \
| tee "$OUTPUT_DIR/refmatch_estimate.txt"estimate supports --input and --limit; it does not expose a native --output-dir option, so redirect/tee is used with OUTPUT_DIR.
CLI: Run Evaluation
nexagauge run refmatch \
--input ./sample.json \
--output-dir "$OUTPUT_DIR" \
--limit 5For full per-case report JSON across all branches:
nexagauge run eval \
--input ./sample.json \
--output-dir "$OUTPUT_DIR" \
--limit 5