Custom Data
Overview
Bring your own data with --input. nexa-gauge accepts local files and maps common field names into the canonical schema.
bash
nexagauge estimate eval --input ./sample.json --limit 10
nexagauge run eval --input ./sample.json --limit 10 --output-dir ./reportFor field names and node activation rules, see Data Schema.
Supported Local Sources
| Source | Shape | Notes |
|---|---|---|
.json | One object or an array of objects | Best for small and medium datasets. |
.jsonl | One JSON object per line | Streamed row by row; best for large datasets. |
.csv | Header row plus data rows | Rows are loaded as dictionaries. |
| Other text files | Raw text | Treated as a single record with generation equal to file contents. |
JSON
A JSON input can be a single object:
json
{
"case_id": "case-001",
"question": "What is retrieval-augmented generation?",
"generation": "Retrieval-augmented generation combines retrieval with generation.",
"reference": "RAG retrieves relevant context and uses it to ground generated answers."
}Or an array of objects:
json
[
{
"case_id": "case-001",
"question": "What is RAG?",
"generation": "RAG combines retrieval with generation."
},
{
"case_id": "case-002",
"question": "What is grounding?",
"generation": "Grounding checks whether claims are supported.",
"context": "Grounded answers should be supported by supplied evidence."
}
]Run it with:
bash
nexagauge run eval --input ./cases.json --output-dir ./reportJSONL
Use JSONL for larger datasets. Each line is one record.
jsonl
{"case_id":"case-001","question":"What is RAG?","generation":"RAG combines retrieval with generation."}
{"case_id":"case-002","question":"What is grounding?","generation":"Grounding checks whether claims are supported.","context":"Grounded answers should be supported by supplied evidence."}Run a slice with:
bash
nexagauge estimate eval --input ./cases.jsonl --start 100 --end 200CSV
CSV files are loaded with the header row as field names.
csv
case_id,question,generation,reference
case-001,What is RAG?,RAG combines retrieval with generation.,RAG retrieves context before generating an answer.
case-002,What is grounding?,Grounding checks whether claims are supported.,Grounding verifies generated claims against context.Run it with:
bash
nexagauge run relevance --input ./cases.csv --limit 50Field Aliases
You do not have to use the canonical names if your data already uses common alternatives.
| Purpose | Accepted field names |
|---|---|
| Case ID | case_id, id |
| Generation | generation, response, answer, output, completion |
| Question | question, query, prompt |
| Context | context, contexts, documents |
| Reference | reference, ground_truth, gold_answer, label |
Prefer canonical names for new datasets. Aliases are useful when adapting existing exports.
Practical Patterns
Use --limit for small test runs:
bash
nexagauge run eval --input ./cases.jsonl --limit 5 --output-dir ./report-smokeUse --start and --end for deterministic slices:
bash
nexagauge estimate grounding --input ./cases.jsonl --start 500 --end 750Use --fail-fast while developing a dataset:
bash
nexagauge run eval --input ./cases.csv --limit 10 --fail-fast