CLI Reference

Use this page when you need exact command syntax and practical options. For a short walkthrough, start with the CLI Quick Start.

run

run executes a target node against your input data. Use it when you want real metric results, report files, cache writes, and eval summaries.

bash
nexagauge run <target_node> --input <source> [shared options] [run options]

Targets

Target typeNodesUse when
Utilitychunk, refiner, chunk_reference, refine_referenceYou want intermediate artifacts for debugging or inspection. chunk_reference/refine_reference prepare the reference text for refalign.
Metric Essclaims, geval_stepsYou want intermediate artifacts for debugging or inspection.
Metricrelevance, grounding, redteam, geval, refmatch, refalignYou want one evaluation dimension.
Full evalevalYou want all eligible metrics plus aggregate summaries and reports.

scan and report are not direct CLI targets. scan runs automatically, and report is produced by run eval when report output is available.

Run-only options

OptionPurpose
--output-dir, -oWrite run outputs. Creates case_report/ and metrics/ inside the directory.
--llm-concurrencyGlobal cap on concurrent LLM calls across all workers. Lower this first if you hit provider rate limits.

estimate

estimate uses the same target and branch planning as run, but previews uncached cost without making billable provider calls.

bash
nexagauge estimate <target_node> --input <source> [shared options] [estimate options]

Use estimate before run when you want to understand likely spend for a branch, dataset slice, or model-routing change.

Estimate-only options

OptionPurpose
--cache-dirUse a specific cache directory for this estimate. Useful when comparing against an isolated cache.

The estimate table reports the target branch by node, including cached work, uncached work, eligible uncached cases, and total estimated cost.

Shared run / estimate arguments

These options apply to both run and estimate.

OptionPurpose
--input, -iRequired. Local input file or hf://<dataset-id> source. See Data Schema.
--startStart row index, inclusive.
--endEnd row index, exclusive.
--limit, -nMaximum number of rows to process.
--llm-model MODELSet the global primary judge model.
--llm-model NODE=MODELOverride the primary model for one node in the target branch. Repeat as needed.
--llm-fallback MODELSet the global fallback model.
--llm-fallback NODE=MODELOverride the fallback model for one node in the target branch.
--host-model-url URLRoute all branch nodes through one OpenAI-compatible endpoint URL (must end in /v1).
--host-model-api-key KEYOptional key used with --host-model-url. For localhost endpoints, nexa-gauge defaults to local when omitted.
--chunkerChunking strategy used by the chunk utility node.
--refinerChunk refinement strategy (default: mmr).
--refiner-top-kMaximum number of chunks kept after refinement.
--continue-on-errorContinue processing remaining cases after a case failure. This is the default.
--fail-fastStop on the first failed case.
--max-workersNumber of cases processed concurrently. Default is 1.
--max-in-flightBackpressure limit for submitted-but-not-yet-emitted cases. Useful with --max-workers > 1.
--forceIgnore cache reads but still write new cache entries.
--no-cacheDisable both cache reads and writes.
--debugPrint per-node debug logs. For estimate, the progress bar is hidden while debug logs are enabled.

cache

Cache commands inspect and remove node-level cache artifacts.

bash
nexagauge cache dir
nexagauge cache delete [options]

Use cache dir to see the active cache location. Use cache delete --dry-run before deleting so you can confirm the size and file count.

Cache options

OptionPurpose
--dry-runPrint what would be deleted without removing files.
--yes, -yDelete without the confirmation prompt.
--cache-dirDelete a specific cache directory instead of the default.

Cache location is resolved in this order: --cache-dir, then NEXAGAUGE_CACHE_DIR, then the per-user default cache path.

Examples

run

bash
# Run the full eval branch and write reports
nexagauge run eval --input sample.json --output-dir ./report

# Run one metric branch on the first 50 rows
nexagauge run grounding --input sample.json --limit 50

# Inspect claim extraction only
nexagauge run claims --input sample.json --limit 5 --debug

# Use one global judge model
nexagauge run eval --input sample.json --llm-model openai/gpt-4o-mini

# Route to a locally served llama.cpp model
nexagauge run eval \
  --input sample.json \
  --host-model-url http://localhost:8080/v1 \
  --llm-concurrency 1 \
  --max-in-flight 1

# Use a stronger model only for grounding and refmatch
nexagauge run eval \
  --input sample.json \
  --llm-model openai/gpt-4o-mini \
  --llm-model grounding=openai/gpt-4o \
  --llm-model refmatch=openai/gpt-4o

# Tune chunking/refinement behavior
nexagauge run grounding \
  --input sample.json \
  --chunker recursive \
  --refiner mmr \
  --refiner-top-k 5

# Run with more case-level parallelism while limiting provider pressure
nexagauge run eval \
  --input sample.json \
  --max-workers 4 \
  --max-in-flight 8 \
  --llm-concurrency 16

# Force a fresh run while still updating cache
nexagauge run relevance --input sample.json --force

# Run without reading or writing cache
nexagauge run redteam --input sample.json --no-cache

# Stop immediately if a case fails
nexagauge run eval --input sample.json --fail-fast

estimate

bash
# Estimate full eval cost
nexagauge estimate eval --input sample.json

# Estimate one metric branch
nexagauge estimate grounding --input sample.json --limit 100

# Estimate a specific row slice
nexagauge estimate eval --input sample.json --start 100 --end 200

# Estimate with a different global model
nexagauge estimate eval --input sample.json --llm-model openai/gpt-4o

# Estimate one node with a stronger model
nexagauge estimate eval \
  --input sample.json \
  --llm-model openai/gpt-4o-mini \
  --llm-model redteam=openai/gpt-4o
bash
# Estimate using a different split (Hugging Face sources)
nexagauge estimate eval --input hf://<dataset_id> --split validation

# Estimate as if nothing were cached
nexagauge estimate eval --input sample.json --no-cache

# Estimate using an isolated cache directory
nexagauge estimate eval --input sample.json --cache-dir ./.tmp-cache

cache

bash
# Print the active cache directory
nexagauge cache dir

# Preview what would be deleted
nexagauge cache delete --dry-run

# Delete the default cache after confirmation
nexagauge cache delete

# Delete without prompt
nexagauge cache delete --yes

# Delete a custom cache directory
nexagauge cache delete --cache-dir ./.tmp-cache --yes

Hugging Face

nexagauge run and nexagauge estimate can read directly from a Hugging Face dataset by passing hf://<dataset-id> to --input. The flags below tune that path.

FlagApplies toPurpose
--input hf://<dataset-id>run, estimateSource the dataset from the Hugging Face Hub instead of a local file.
--adapterrun, estimateForce a specific adapter. Defaults to auto; valid values: auto, local, huggingface.
--hf-configrun, estimateOptional Hugging Face dataset config name (for datasets with multiple configs).
--hf-revisionrun, estimateOptional dataset revision/tag/commit to pin a specific version.
--splitrun, estimateDataset split for adapter-backed sources. Default is train.
--field LOGICAL=COLUMNrun, estimateMap a row column to a canonical logical input. Repeatable. Logical keys: case_id, output, input, reference, context
--extension-file <path>run, estimatePath to a Python file with @register_* decorators (transforms today; prompts/metrics in the future). Repeatable; each file is imported once before iteration.
--transform <name>run, estimateName of a registered transform to apply per record before scanning. Use for datasets whose shape can't be fixed by --field alone. See Extensions.

Examples

bash
# Run relevance on a dataset whose canonical fields already match
nexagauge run relevance \
  --input hf://sentence-transformers/natural-questions \
  --limit 25 \
  --output-dir ./data/hg_exp_relevance

# Pin a specific dataset config and revision
nexagauge run grounding \
  --input hf://wandb/RAGTruth-processed \
  --hf-config default \
  --hf-revision main \
  --limit 10

# Force the Hugging Face adapter (auto-detection bypassed)
nexagauge run claims \
  --input hf://<my-org/private-dataset> \
  --adapter huggingface

# Map dataset columns to nexa-gauge logical fields
nexagauge run redteam \
  --input hf://mteb/toxic_conversations_50k \
  --field output=text \
  --limit 3 \
  --output-dir ./data/hg_exp_toxicity

# Multiple --field mappings in one run
nexagauge run relevance \
  --input hf://<my-org/my-dataset> \
  --field output=output_text \
  --field input=user_prompt \
  --field reference=expected_answer

# Same --field mappings work with estimate
nexagauge estimate eval \
  --input hf://<my-org/my-dataset> \
  --field output=output_text \
  --field input=user_prompt

# Reshape a nested dataset schema with a Python transform
nexagauge run eval \
  --input hf://hotpotqa/hotpot_qa \
  --hf-config distractor \
  --extension-file ./my_transforms.py \
  --transform hotpot_qa

The built-in alias table (e.g. query/prompt already counts as input) is documented in Hugging Face datasets. Reach for --field only when your dataset uses a column name outside that table.

For end-to-end local and hosted endpoint setup examples, see Self-Hosted Endpoints.