CLI Reference

Use this page when you need exact command syntax and practical options. For a short walkthrough, start with the CLI Quick Start.

run

run executes a target node against your input data. Use it when you want real metric results, report files, cache writes, and eval summaries.

bash

nexagauge run <target_node> --input <source> [shared options] [run options]

Targets

Target type	Nodes	Use when
Utility	`chunk`, `refiner`, `chunk_reference`, `refine_reference`	You want intermediate artifacts for debugging or inspection. `chunk_reference`/`refine_reference` prepare the `reference` text for `refalign`.
Metric Ess	`claims`, `geval_steps`	You want intermediate artifacts for debugging or inspection.
Metric	`relevance`, `grounding`, `redteam`, `geval`, `refmatch`, `refalign`	You want one evaluation dimension.
Full eval	`eval`	You want all eligible metrics plus aggregate summaries and reports.

scan and report are not direct CLI targets. scan runs automatically, and report is produced by run eval when report output is available.

Run-only options

Option	Purpose
`--output-dir`, `-o`	Write run outputs. Creates `case_report/` and `metrics/` inside the directory.
`--llm-concurrency`	Global cap on concurrent LLM calls across all workers. Lower this first if you hit provider rate limits.

estimate

estimate uses the same target and branch planning as run, but previews uncached cost without making billable provider calls.

bash

nexagauge estimate <target_node> --input <source> [shared options] [estimate options]

Use estimate before run when you want to understand likely spend for a branch, dataset slice, or model-routing change.

Estimate-only options

Option	Purpose
`--cache-dir`	Use a specific cache directory for this estimate. Useful when comparing against an isolated cache.

The estimate table reports the target branch by node, including cached work, uncached work, eligible uncached cases, and total estimated cost.

Shared `run` / `estimate` arguments

These options apply to both run and estimate.

Option	Purpose
`--input`, `-i`	Required. Local input file or `hf://<dataset-id>` source. See Data Schema.
`--start`	Start row index, inclusive.
`--end`	End row index, exclusive.
`--limit`, `-n`	Maximum number of rows to process.
`--llm-model MODEL`	Set the global primary judge model.
`--llm-model NODE=MODEL`	Override the primary model for one node in the target branch. Repeat as needed.
`--llm-fallback MODEL`	Set the global fallback model.
`--llm-fallback NODE=MODEL`	Override the fallback model for one node in the target branch.
`--host-model-url URL`	Route all branch nodes through one OpenAI-compatible endpoint URL (must end in `/v1`).
`--host-model-api-key KEY`	Optional key used with `--host-model-url`. For localhost endpoints, nexa-gauge defaults to `local` when omitted.
`--chunker`	Chunking strategy used by the `chunk` utility node.
`--refiner`	Chunk refinement strategy (default: `mmr`).
`--refiner-top-k`	Maximum number of chunks kept after refinement.
`--continue-on-error`	Continue processing remaining cases after a case failure. This is the default.
`--fail-fast`	Stop on the first failed case.
`--max-workers`	Number of cases processed concurrently. Default is `1`.
`--max-in-flight`	Backpressure limit for submitted-but-not-yet-emitted cases. Useful with `--max-workers > 1`.
`--force`	Ignore cache reads but still write new cache entries.
`--no-cache`	Disable both cache reads and writes.
`--debug`	Print per-node debug logs. For `estimate`, the progress bar is hidden while debug logs are enabled.

cache

Cache commands inspect and remove node-level cache artifacts.

bash

nexagauge cache dir
nexagauge cache delete [options]

Use cache dir to see the active cache location. Use cache delete --dry-run before deleting so you can confirm the size and file count.

Cache options

Option	Purpose
`--dry-run`	Print what would be deleted without removing files.
`--yes`, `-y`	Delete without the confirmation prompt.
`--cache-dir`	Delete a specific cache directory instead of the default.

Cache location is resolved in this order: --cache-dir, then NEXAGAUGE_CACHE_DIR, then the per-user default cache path.

Examples

run

bash

# Run the full eval branch and write reports
nexagauge run eval --input sample.json --output-dir ./report

# Run one metric branch on the first 50 rows
nexagauge run grounding --input sample.json --limit 50

# Inspect claim extraction only
nexagauge run claims --input sample.json --limit 5 --debug

# Use one global judge model
nexagauge run eval --input sample.json --llm-model openai/gpt-4o-mini

# Route to a locally served llama.cpp model
nexagauge run eval \
  --input sample.json \
  --host-model-url http://localhost:8080/v1 \
  --llm-concurrency 1 \
  --max-in-flight 1

# Use a stronger model only for grounding and refmatch
nexagauge run eval \
  --input sample.json \
  --llm-model openai/gpt-4o-mini \
  --llm-model grounding=openai/gpt-4o \
  --llm-model refmatch=openai/gpt-4o

# Tune chunking/refinement behavior
nexagauge run grounding \
  --input sample.json \
  --chunker recursive \
  --refiner mmr \
  --refiner-top-k 5

# Run with more case-level parallelism while limiting provider pressure
nexagauge run eval \
  --input sample.json \
  --max-workers 4 \
  --max-in-flight 8 \
  --llm-concurrency 16

# Force a fresh run while still updating cache
nexagauge run relevance --input sample.json --force

# Run without reading or writing cache
nexagauge run redteam --input sample.json --no-cache

# Stop immediately if a case fails
nexagauge run eval --input sample.json --fail-fast

estimate

bash

# Estimate full eval cost
nexagauge estimate eval --input sample.json

# Estimate one metric branch
nexagauge estimate grounding --input sample.json --limit 100

# Estimate a specific row slice
nexagauge estimate eval --input sample.json --start 100 --end 200

# Estimate with a different global model
nexagauge estimate eval --input sample.json --llm-model openai/gpt-4o

# Estimate one node with a stronger model
nexagauge estimate eval \
  --input sample.json \
  --llm-model openai/gpt-4o-mini \
  --llm-model redteam=openai/gpt-4o

bash

# Estimate using a different split (Hugging Face sources)
nexagauge estimate eval --input hf://<dataset_id> --split validation

# Estimate as if nothing were cached
nexagauge estimate eval --input sample.json --no-cache

# Estimate using an isolated cache directory
nexagauge estimate eval --input sample.json --cache-dir ./.tmp-cache

cache

bash

# Print the active cache directory
nexagauge cache dir

# Preview what would be deleted
nexagauge cache delete --dry-run

# Delete the default cache after confirmation
nexagauge cache delete

# Delete without prompt
nexagauge cache delete --yes

# Delete a custom cache directory
nexagauge cache delete --cache-dir ./.tmp-cache --yes

Hugging Face

nexagauge run and nexagauge estimate can read directly from a Hugging Face dataset by passing hf://<dataset-id> to --input. The flags below tune that path.

Flag	Applies to	Purpose
`--input hf://<dataset-id>`	run, estimate	Source the dataset from the Hugging Face Hub instead of a local file.
`--adapter`	run, estimate	Force a specific adapter. Defaults to `auto`; valid values: `auto`, `local`, `huggingface`.
`--hf-config`	run, estimate	Optional Hugging Face dataset config name (for datasets with multiple configs).
`--hf-revision`	run, estimate	Optional dataset revision/tag/commit to pin a specific version.
`--split`	run, estimate	Dataset split for adapter-backed sources. Default is `train`.
`--field LOGICAL=COLUMN`	run, estimate	Map a row column to a canonical logical input. Repeatable. Logical keys: `case_id`, `output`, `input`, `reference`, `context`
`--extension-file <path>`	run, estimate	Path to a Python file with `@register_*` decorators (transforms today; prompts/metrics in the future). Repeatable; each file is imported once before iteration.
`--transform <name>`	run, estimate	Name of a registered transform to apply per record before scanning. Use for datasets whose shape can't be fixed by `--field` alone. See Extensions.

Examples

bash

# Run relevance on a dataset whose canonical fields already match
nexagauge run relevance \
  --input hf://sentence-transformers/natural-questions \
  --limit 25 \
  --output-dir ./data/hg_exp_relevance

# Pin a specific dataset config and revision
nexagauge run grounding \
  --input hf://wandb/RAGTruth-processed \
  --hf-config default \
  --hf-revision main \
  --limit 10

# Force the Hugging Face adapter (auto-detection bypassed)
nexagauge run claims \
  --input hf://<my-org/private-dataset> \
  --adapter huggingface

# Map dataset columns to nexa-gauge logical fields
nexagauge run redteam \
  --input hf://mteb/toxic_conversations_50k \
  --field output=text \
  --limit 3 \
  --output-dir ./data/hg_exp_toxicity

# Multiple --field mappings in one run
nexagauge run relevance \
  --input hf://<my-org/my-dataset> \
  --field output=output_text \
  --field input=user_prompt \
  --field reference=expected_answer

# Same --field mappings work with estimate
nexagauge estimate eval \
  --input hf://<my-org/my-dataset> \
  --field output=output_text \
  --field input=user_prompt

# Reshape a nested dataset schema with a Python transform
nexagauge run eval \
  --input hf://hotpotqa/hotpot_qa \
  --hf-config distractor \
  --extension-file ./my_transforms.py \
  --transform hotpot_qa

The built-in alias table (e.g. query/prompt already counts as input) is documented in Hugging Face datasets. Reach for --field only when your dataset uses a column name outside that table.

For end-to-end local and hosted endpoint setup examples, see Self-Hosted Endpoints.

CLI Reference

run

Targets

Run-only options

estimate

Estimate-only options

Shared run / estimate arguments

cache

Cache options

Examples

run

estimate

cache

Hugging Face

Examples

Shared `run` / `estimate` arguments