CLI Reference
Use this page when you need exact command syntax and practical options. For a short walkthrough, start with the CLI Quick Start.
run
run executes a target node against your input data. Use it when you want real metric results, report files, cache writes, and eval summaries.
nexagauge run <target_node> --input <source> [shared options] [run options]Targets
| Target type | Nodes | Use when |
|---|---|---|
| Utility | chunk, refiner, chunk_reference, refine_reference | You want intermediate artifacts for debugging or inspection. chunk_reference/refine_reference prepare the reference text for refalign. |
| Metric Ess | claims, geval_steps | You want intermediate artifacts for debugging or inspection. |
| Metric | relevance, grounding, redteam, geval, refmatch, refalign | You want one evaluation dimension. |
| Full eval | eval | You want all eligible metrics plus aggregate summaries and reports. |
scan and report are not direct CLI targets. scan runs automatically, and report is produced by run eval when report output is available.
Run-only options
| Option | Purpose |
|---|---|
--output-dir, -o | Write run outputs. Creates case_report/ and metrics/ inside the directory. |
--llm-concurrency | Global cap on concurrent LLM calls across all workers. Lower this first if you hit provider rate limits. |
estimate
estimate uses the same target and branch planning as run, but previews uncached cost without making billable provider calls.
nexagauge estimate <target_node> --input <source> [shared options] [estimate options]Use estimate before run when you want to understand likely spend for a branch, dataset slice, or model-routing change.
Estimate-only options
| Option | Purpose |
|---|---|
--cache-dir | Use a specific cache directory for this estimate. Useful when comparing against an isolated cache. |
The estimate table reports the target branch by node, including cached work, uncached work, eligible uncached cases, and total estimated cost.
Shared run / estimate arguments
These options apply to both run and estimate.
| Option | Purpose |
|---|---|
--input, -i | Required. Local input file or hf://<dataset-id> source. See Data Schema. |
--start | Start row index, inclusive. |
--end | End row index, exclusive. |
--limit, -n | Maximum number of rows to process. |
--llm-model MODEL | Set the global primary judge model. |
--llm-model NODE=MODEL | Override the primary model for one node in the target branch. Repeat as needed. |
--llm-fallback MODEL | Set the global fallback model. |
--llm-fallback NODE=MODEL | Override the fallback model for one node in the target branch. |
--host-model-url URL | Route all branch nodes through one OpenAI-compatible endpoint URL (must end in /v1). |
--host-model-api-key KEY | Optional key used with --host-model-url. For localhost endpoints, nexa-gauge defaults to local when omitted. |
--chunker | Chunking strategy used by the chunk utility node. |
--refiner | Chunk refinement strategy (default: mmr). |
--refiner-top-k | Maximum number of chunks kept after refinement. |
--continue-on-error | Continue processing remaining cases after a case failure. This is the default. |
--fail-fast | Stop on the first failed case. |
--max-workers | Number of cases processed concurrently. Default is 1. |
--max-in-flight | Backpressure limit for submitted-but-not-yet-emitted cases. Useful with --max-workers > 1. |
--force | Ignore cache reads but still write new cache entries. |
--no-cache | Disable both cache reads and writes. |
--debug | Print per-node debug logs. For estimate, the progress bar is hidden while debug logs are enabled. |
cache
Cache commands inspect and remove node-level cache artifacts.
nexagauge cache dir
nexagauge cache delete [options]Use cache dir to see the active cache location. Use cache delete --dry-run before deleting so you can confirm the size and file count.
Cache options
| Option | Purpose |
|---|---|
--dry-run | Print what would be deleted without removing files. |
--yes, -y | Delete without the confirmation prompt. |
--cache-dir | Delete a specific cache directory instead of the default. |
Cache location is resolved in this order: --cache-dir, then NEXAGAUGE_CACHE_DIR, then the per-user default cache path.
Examples
run
# Run the full eval branch and write reports
nexagauge run eval --input sample.json --output-dir ./report
# Run one metric branch on the first 50 rows
nexagauge run grounding --input sample.json --limit 50
# Inspect claim extraction only
nexagauge run claims --input sample.json --limit 5 --debug
# Use one global judge model
nexagauge run eval --input sample.json --llm-model openai/gpt-4o-mini
# Route to a locally served llama.cpp model
nexagauge run eval \
--input sample.json \
--host-model-url http://localhost:8080/v1 \
--llm-concurrency 1 \
--max-in-flight 1
# Use a stronger model only for grounding and refmatch
nexagauge run eval \
--input sample.json \
--llm-model openai/gpt-4o-mini \
--llm-model grounding=openai/gpt-4o \
--llm-model refmatch=openai/gpt-4o
# Tune chunking/refinement behavior
nexagauge run grounding \
--input sample.json \
--chunker recursive \
--refiner mmr \
--refiner-top-k 5
# Run with more case-level parallelism while limiting provider pressure
nexagauge run eval \
--input sample.json \
--max-workers 4 \
--max-in-flight 8 \
--llm-concurrency 16
# Force a fresh run while still updating cache
nexagauge run relevance --input sample.json --force
# Run without reading or writing cache
nexagauge run redteam --input sample.json --no-cache
# Stop immediately if a case fails
nexagauge run eval --input sample.json --fail-fastestimate
# Estimate full eval cost
nexagauge estimate eval --input sample.json
# Estimate one metric branch
nexagauge estimate grounding --input sample.json --limit 100
# Estimate a specific row slice
nexagauge estimate eval --input sample.json --start 100 --end 200
# Estimate with a different global model
nexagauge estimate eval --input sample.json --llm-model openai/gpt-4o
# Estimate one node with a stronger model
nexagauge estimate eval \
--input sample.json \
--llm-model openai/gpt-4o-mini \
--llm-model redteam=openai/gpt-4o# Estimate using a different split (Hugging Face sources)
nexagauge estimate eval --input hf://<dataset_id> --split validation
# Estimate as if nothing were cached
nexagauge estimate eval --input sample.json --no-cache
# Estimate using an isolated cache directory
nexagauge estimate eval --input sample.json --cache-dir ./.tmp-cachecache
# Print the active cache directory
nexagauge cache dir
# Preview what would be deleted
nexagauge cache delete --dry-run
# Delete the default cache after confirmation
nexagauge cache delete
# Delete without prompt
nexagauge cache delete --yes
# Delete a custom cache directory
nexagauge cache delete --cache-dir ./.tmp-cache --yesHugging Face
nexagauge run and nexagauge estimate can read directly from a Hugging Face dataset by passing hf://<dataset-id> to --input. The flags below tune that path.
| Flag | Applies to | Purpose |
|---|---|---|
--input hf://<dataset-id> | run, estimate | Source the dataset from the Hugging Face Hub instead of a local file. |
--adapter | run, estimate | Force a specific adapter. Defaults to auto; valid values: auto, local, huggingface. |
--hf-config | run, estimate | Optional Hugging Face dataset config name (for datasets with multiple configs). |
--hf-revision | run, estimate | Optional dataset revision/tag/commit to pin a specific version. |
--split | run, estimate | Dataset split for adapter-backed sources. Default is train. |
--field LOGICAL=COLUMN | run, estimate | Map a row column to a canonical logical input. Repeatable. Logical keys: case_id, output, input, reference, context |
--extension-file <path> | run, estimate | Path to a Python file with @register_* decorators (transforms today; prompts/metrics in the future). Repeatable; each file is imported once before iteration. |
--transform <name> | run, estimate | Name of a registered transform to apply per record before scanning. Use for datasets whose shape can't be fixed by --field alone. See Extensions. |
Examples
# Run relevance on a dataset whose canonical fields already match
nexagauge run relevance \
--input hf://sentence-transformers/natural-questions \
--limit 25 \
--output-dir ./data/hg_exp_relevance
# Pin a specific dataset config and revision
nexagauge run grounding \
--input hf://wandb/RAGTruth-processed \
--hf-config default \
--hf-revision main \
--limit 10
# Force the Hugging Face adapter (auto-detection bypassed)
nexagauge run claims \
--input hf://<my-org/private-dataset> \
--adapter huggingface
# Map dataset columns to nexa-gauge logical fields
nexagauge run redteam \
--input hf://mteb/toxic_conversations_50k \
--field output=text \
--limit 3 \
--output-dir ./data/hg_exp_toxicity
# Multiple --field mappings in one run
nexagauge run relevance \
--input hf://<my-org/my-dataset> \
--field output=output_text \
--field input=user_prompt \
--field reference=expected_answer
# Same --field mappings work with estimate
nexagauge estimate eval \
--input hf://<my-org/my-dataset> \
--field output=output_text \
--field input=user_prompt
# Reshape a nested dataset schema with a Python transform
nexagauge run eval \
--input hf://hotpotqa/hotpot_qa \
--hf-config distractor \
--extension-file ./my_transforms.py \
--transform hotpot_qaThe built-in alias table (e.g. query/prompt already counts as input) is documented in Hugging Face datasets. Reach for --field only when your dataset uses a column name outside that table.
For end-to-end local and hosted endpoint setup examples, see Self-Hosted Endpoints.