Custom Data

Overview

Bring your own data with --input. nexa-gauge accepts local files and maps common field names into the canonical schema.

bash

nexagauge estimate eval --input ./sample.json --limit 10
nexagauge run eval --input ./sample.json --limit 10 --output-dir ./report

For field names and node activation rules, see Data Schema.

Supported Local Sources

Source	Shape	Notes
`.json`	One object or an array of objects	Best for small and medium datasets.
`.jsonl`	One JSON object per line	Streamed row by row; best for large datasets.
`.csv`	Header row plus data rows	Rows are loaded as dictionaries.
Other text files	Raw text	Treated as a single record with `generation` equal to file contents.

JSON

A JSON input can be a single object:

json

{
  "case_id": "case-001",
  "question": "What is retrieval-augmented generation?",
  "generation": "Retrieval-augmented generation combines retrieval with generation.",
  "reference": "RAG retrieves relevant context and uses it to ground generated answers."
}

Or an array of objects:

json

[
  {
    "case_id": "case-001",
    "question": "What is RAG?",
    "generation": "RAG combines retrieval with generation."
  },
  {
    "case_id": "case-002",
    "question": "What is grounding?",
    "generation": "Grounding checks whether claims are supported.",
    "context": "Grounded answers should be supported by supplied evidence."
  }
]

Run it with:

bash

nexagauge run eval --input ./cases.json --output-dir ./report

JSONL

Use JSONL for larger datasets. Each line is one record.

jsonl

{"case_id":"case-001","question":"What is RAG?","generation":"RAG combines retrieval with generation."}
{"case_id":"case-002","question":"What is grounding?","generation":"Grounding checks whether claims are supported.","context":"Grounded answers should be supported by supplied evidence."}

Run a slice with:

bash

nexagauge estimate eval --input ./cases.jsonl --start 100 --end 200

CSV

CSV files are loaded with the header row as field names.

csv

case_id,question,generation,reference
case-001,What is RAG?,RAG combines retrieval with generation.,RAG retrieves context before generating an answer.
case-002,What is grounding?,Grounding checks whether claims are supported.,Grounding verifies generated claims against context.

Run it with:

bash

nexagauge run relevance --input ./cases.csv --limit 50

Field Aliases

You do not have to use the canonical names if your data already uses common alternatives.

Purpose	Accepted field names
Case ID	`case_id`, `id`
Generation	`generation`, `response`, `answer`, `output`, `completion`
Question	`question`, `query`, `prompt`
Context	`context`, `contexts`, `documents`
Reference	`reference`, `ground_truth`, `gold_answer`, `label`

Prefer canonical names for new datasets. Aliases are useful when adapting existing exports.

Practical Patterns

Use --limit for small test runs:

bash

nexagauge run eval --input ./cases.jsonl --limit 5 --output-dir ./report-smoke

Use --start and --end for deterministic slices:

bash

nexagauge estimate grounding --input ./cases.jsonl --start 500 --end 750

Use --fail-fast while developing a dataset:

bash

nexagauge run eval --input ./cases.csv --limit 10 --fail-fast