Troubleshooting

Ollama Not Running

Symptom: Health check shows Ollama as "unhealthy". Chat fails or times out.

Fix:

# Start Ollama (in a separate terminal)
ollama serve

# Pull a model if needed
ollama pull llama3.2:3b

Backend Not Reachable

Symptom: CLI or web UI cannot connect. "Connection refused" or similar.

Fix:

Ensure the backend is running: pnpm dev:web
Check the API URL: CLI defaults to http://localhost:4317
If 4317 was busy, installer/dev may have started on the next free port; use the printed URL (or set --api-url in CLI)
If using a different port, set --api-url for CLI

pnpm install Fails (EPERM, Windows)

Symptom: pnpm install fails with permission errors on Windows.

Fix:

Run terminal as Administrator
Or use a different drive (e.g. D:) if C: has restrictions
Disable antivirus temporarily for the install directory

Tests Fail from Root (Workspace Resolution)

Symptom: Root test runs fail with package import resolution errors (for example Cannot find package '@nous/shared') or discover unexpected dist/ test files.

Fix:

Run root tests through the workspace command: pnpm test
Run package-local tests with each package script: pnpm --filter <package-name> test
Avoid ad-hoc root invocation (pnpm exec vitest run) unless you explicitly pass the intended config file

Config Validation Errors

Symptom: Startup fails with ConfigError listing validation failures.

Fix:

Check the config file (JSON5) for syntax errors
Ensure required fields are present: profile, pfcTier, providers, storage, etc.
Remove the config file to regenerate defaults, or fix the reported fields

Voice Request Stays in Text Confirmation or Blocked State

Symptom: A voice-originated action does not execute even though the requested intent seems clear.

Meaning: Phase 11.3 keeps risky voice actions behind canonical confidence-governance and confirmation checks. The runtime can return clarification, text confirmation, dual-channel confirmation, or blocked posture instead of executing directly.

Fix:

Check whether the request is high-risk, destructive, or T3; those actions require text or dual-channel confirmation.
If the response mentions low confidence, repeat the request more explicitly or continue in text.
If the active Principal session is missing or stale, complete confirmation from a current trusted session instead of retrying voice-only.
Treat the returned reason code and confirmation posture as canonical runtime truth.

Voice Degraded Mode Remains Active

Symptom: Voice responses keep directing you to continue in text, or MAO/other surfaces show voice degraded mode even after one successful turn.

Meaning: Degraded mode is a safety posture. Phase 11.3 keeps risky controls text-first until recovery is sustained; a single improved turn does not automatically clear the degraded state.

Fix:

Continue risky control actions in text while degraded mode is active.
Check whether recent turns included low ASR confidence, low intent confidence, handoff instability, transport degradation, or interruption recovery.
After conditions stabilize, start a fresh voice turn and let the runtime clear degraded mode through the sustained recovery path.
If degraded mode persists unexpectedly, inspect the voice session projection and witness-linked evidence rather than forcing execution.

Cloud Model List Shows Stale or Missing Models

Symptom: The model picker shows a minimal list with a staleness indicator, or expected cloud models do not appear.

Meaning: The dynamic /v1/models API call for that provider failed or returned an error. Nous falls back to a minimal static list when the live API is unreachable.

Fix:

Check that the provider API key is stored (Configuration > Provider Keys). Providers without a stored key are skipped silently.
Verify network connectivity to the provider API (api.anthropic.com or api.openai.com).
Reopen the configuration panel to trigger a fresh fetch — cached responses expire after 5 minutes.
If the provider API is experiencing an outage, the fallback list is expected behavior until the API recovers.

Model Selection Does Not Take Effect

Symptom: After selecting a new model, chat responses still use the previous model.

Fix:

Verify the selection was saved successfully (no error banner in the configuration panel).
Model selections are applied immediately at runtime. If the selected model spec is invalid or malformed, it is rejected without corrupting provider config — check the configuration panel for error feedback.
If the selected model is from a cloud provider, ensure the provider key is still configured and valid.

CLI Provider Session Restart Fails

Symptom: Chat fails with PROVIDER_SESSION_RESTART_FAILED, or a local CLI-backed provider fails at the start of a chat turn.

Meaning: The selected provider uses the agent-CLI protocol and the runtime could not start or reuse the chat-bound provider session. Nous does not silently fall back to one-shot execution for chat-bound CLI fixtures. This error applies to compatible persistent-process CLI providers that are validly assigned to a chat role.

If Codex CLI appears in a Cortex Chat/System provider-session failure, the root cause is a stale or bypassed role assignment — Codex CLI declares session_bound_command and is not compatible with Cortex Chat/System persistent-process roles. Fix the model role assignment in Configuration rather than troubleshooting the CLI session.

Fix (for compatible persistent-process CLI providers):

Confirm the provider CLI is installed and runnable from the same environment that starts Nous.
Confirm local CLI authentication is valid.
If another executable shadows the expected one, set the appropriate CLI binary environment variable (e.g. NOUS_CODEX_CLI_BIN or CODEX_CLI_BIN for Codex).
Retry the chat turn after fixing the CLI environment. A lost chat-bound session is recreated on the next turn for the same provider/session key.

Codex CLI Provider Uses One-Shot

Symptom: A Codex CLI invocation starts a fresh codex exec process.

Meaning: This is expected. Codex CLI declares session_bound_command — it is command/session-bound and does not expose a persistent_process protocol. Transient and batch work uses one-shot codex exec. Cortex Chat and Cortex System roles require persistent_process and must use a different compatible provider.

Fix: No recovery is needed. Codex CLI is designed for command-bound agent roles and one-shot batch work. If you need a CLI provider for Cortex Chat/System, assign a provider that declares persistent_process.

First-Run Loop

Symptom: First-run flow keeps appearing.

Fix: First-run completes when you send a message and receive a response, or when a project exists. Ensure Ollama is running so the health check and model invocation succeed.

Model Recommendation Shows "Cannot verify availability"

Symptom: During the first-run Model Download step, one or more recommendation cards (or the custom-spec input) shows a gray dot and the label Cannot verify availability.

Meaning: The wizard's HEAD probe to https://ollama.com/library/<model> timed out, was refused, or failed at DNS / network / 5xx. Nous cannot prove the model is currently published, but it does not assume the model is missing — your local Ollama daemon may already have the model cached, or registry coverage may be reachable from the daemon even when the renderer cannot reach it.

Fix:

Check the machine's general internet connectivity (https://ollama.com/library should load in a browser).
If you are behind a corporate proxy or split-DNS, confirm ollama.com resolves and is reachable from the same network the wizard process uses.
The card is still selectable — pressing Download will attempt the pull through Ollama, which may succeed even when the wizard probe failed. If the pull also fails, the wizard surfaces the underlying Ollama error in place.
The probe result is cached for 30 minutes per spec. Restart the wizard (or call firstRun.resetWizard) to force a fresh check.

Model Recommendation Shows "Not currently available"

Symptom: A recommendation card or the custom-spec input shows an amber dot and the label Not currently available.

Meaning: The Ollama library returned 404 for that exact identifier, or the spec is malformed (empty, missing the ollama: prefix, or pointing to a non-Ollama provider). The model is not currently published at that exact spec.

Fix:

Pick a different recommendation card if one is shown as Available.
For custom specs, double-check the model identifier on https://ollama.com/library. A common cause is using a deprecated tag (for example llama3:8b after that line moved to llama3.2:8b).
The card is still selectable — Nous lets you try the download anyway. If the pull fails for the same 404 reason, the wizard surfaces the underlying Ollama error in place.

Model Recommendation Stays in "Validating availability"

Symptom: The amber animated dot and the label Validating availability never resolve to one of the other three states.

Meaning: The check is bounded by a 3-second timeout per spec, so this state should clear within seconds. Persisting in this state usually means either the renderer never received the prerequisites response, or the underlying Ollama library probe is hung at the network layer below the timeout.

Fix:

Check the desktop log for [nous:first-run] validation map: <n> validated, <m> unavailable, <k> offline — if the line is present, the validation map was returned and a stuck 'pending' indicates a renderer-side bug rather than a network issue (please report).
If the log line is absent, the prerequisites query has not yet returned. Confirm the desktop backend (@nous/shared-server) is running and reachable.
Restart the wizard via firstRun.resetWizard to re-issue the prerequisites query from scratch.

No Welcome Appears on First Chat Open

Symptom: You finished the first-run wizard and opened the chat panel for the first time, but no welcome message appeared.

Meaning: The welcome is a one-shot composition gated by the persisted agent.welcomeMessageSent flag. There are three common reasons it would not appear:

The model provider was unreachable when chat opened, so the composition failed and the flag stayed false. The welcome will retry on the next chat panel mount.
The flag was already true from a prior session — for example you completed the wizard once before, the welcome already fired, and re-opening chat short-circuits.
The renderer's mount-once guard latched but the underlying mutation never completed (for example, the desktop backend was unreachable at the moment of mount).

Fix:

Verify the model provider is healthy. If you are using a local Ollama model, run ollama serve and confirm the recommended model is pulled. If you are using a cloud provider, confirm the API key is configured (Configuration > Provider Keys) and reachable.
Reopen the chat panel after restoring provider health — the welcome retries on the next mount when the flag is still false.
If you want to re-experience the welcome from a clean state, call the firstRun.resetWizard mutation (see tRPC Procedures § firstRun). This clears the entire agent block including welcomeMessageSent; after re-completing the wizard, the next chat panel mount fires a fresh welcome composed against whatever identity you configure.
If the renderer mount-once guard appears stuck, fully restart the desktop app — the guard is per-mount and resets on app restart.
The chat is never blocked by a welcome failure; if the rest of chat works normally, the missing welcome is a cosmetic carry-forward only and a resetWizard will recover it on the next first-run.

See also Chat § Welcome Message (First Open) for the welcome's lifecycle and reset semantics.

Memory Inspector Shows a Global-Scope Warning

Symptom: The /memory page shows a reason-code banner after you switch Scope to Global only or Project + global, and global entries do not appear.

Meaning: The selected project either does not inherit global memory or the runtime denied global-scope inspection. This is an explicit policy result, not an empty-state guess.

Fix:

Read the banner's reason code and explanation first.
If you only need project-local memory, switch Scope back to Project only.
If you expected global visibility, check the project's memory-access policy and whether inheritsGlobal is enabled.
If the denial should not have happened, inspect trace and audit evidence for the underlying policy decision rather than retrying blindly.

Memory Export Includes More Than the Current Filtered View

Symptom: The exported memory bundle contains entries that are not visible in the current inspector result list.

Meaning: This is expected. The export flow always returns the full project memory bundle, including STM context, durable entries, audit history, and tombstones. Search and filter controls narrow the on-screen inspection view only.

Fix:

Use the inspector filters to review a subset.
Use export when you need the authoritative full-state bundle.
If you need a narrower dataset for analysis, filter the exported bundle after download.

Hard Delete Cannot Be Confirmed or Is Rejected

Symptom: The Memory inspector refuses to proceed with hard delete, or the result banner says the delete was not applied.

Meaning: Hard delete is confirmation-protected and requires a non-empty rationale before the request is sent. Even with a rationale, the governed mutation can still deny or defer the action.

Fix:

Enter a clear rationale before selecting Confirm hard delete.
Read the returned reason code in the result banner.
If the action was not applied, inspect memory.audit for the authoritative mutation outcome and evidence refs.
Treat deny or defer outcomes as runtime truth; clear the governing blocker instead of retrying without changes.

Learning Visibility Shows Missing Source or Evidence Diagnostics

Symptom: The /memory Learning detail view reports missing source records, missing evidence refs, degraded lineage integrity, or unavailable control-state context.

Meaning: The selected distilled pattern references canonical inputs that are no longer fully resolvable in the current project view. Phase 8.8 surfaces those gaps explicitly instead of hiding them.

Fix:

Read the diagnostic first; it is part of the contract, not a cosmetic warning.
Check whether the referenced source records were superseded, deleted, or created in an older state that no longer has full evidence linkage.
Use linked trace IDs, audit history, and evidence refs to determine whether the pattern should be refreshed, replaced, or retired through the governed runtime path.
Do not assume the missing input can be reconstructed from the UI alone.

Learning Visibility Says Governance Cards Are Representative

Symptom: The Learning view notes that lifecycle events are derived, governance cards are representative, or historicalDecisionLogAvailable is false.

Meaning: Phase 8.8 projects the current canonical confidence-governance outcome for a pattern, but it does not yet persist a per-pattern historical decision ledger in workflow runtime history.

Fix:

Use the card's outcome and reasonCode to understand the current governance posture of the pattern.
Use traces, audit evidence, and linked provenance when you need proof of a specific past decision.
Treat representative cards as interpretation aids for current behavior, not as historical proof that a particular runtime action already happened.

Dispatch or Action Blocked (WMODE-*)

Symptom: A workflow dispatch or lifecycle action is blocked with a reason code such as WMODE-002, WMODE-003, or WMODE-010.

Meaning: The admission guard rejected the action because it would violate the authority chain or workmode boundaries.

Code	Cause
WMODE-002	Authority widening — e.g. a worker attempted to dispatch, or an orchestrator tried to dispatch to cortex
WMODE-003	Nested orchestration — an orchestrator tried to dispatch to another orchestrator
WMODE-010	Worker escalation — a worker attempted to dispatch to an authoritative agent

Fix: The action is not permitted by design. Check that the dispatch source and target respect the authority chain: nous_cortex → orchestration_agent → worker_agent. See Workmode and Authority Boundaries.

Ingress Rejection (Phase 5.3)

Symptom: Automation trigger (scheduler, hook, webhook) returns rejected with a reason code.

Reason codes and fixes:

Code	Cause	Fix
`unauthenticated`	Webhook HMAC failed or missing	Verify HMAC signature; ensure key_id and auth_context_ref are correct
`scope_mismatch`	Principal not bound to workflow	Check credential scope matches project_id and workflow_ref
`event_forbidden`	Event type not in allowlist	Add event_name to credential's allowed_event_names
`policy_blocked`	Policy blocks this trigger class	Review project/workflow policy for external trigger allowance
`replay_detected`	Stale timestamp or duplicate nonce	Ensure occurred_at within +/- 5 min; use unique nonce per request
`rate_limited`	Rate limit exceeded	Wait and retry with backoff; reduce request frequency
`invalid_envelope`	Missing project_id, workflow_ref, or invalid trigger_type	Fix envelope; ensure all required fields present
`control_state_blocked`	Project hard_stopped or paused_review	Resume project or release hard_stop via Projects UI / operator-control
`workflow_admission_blocked`	Ingress was valid but canonical workflow admission still failed	Inspect the returned `reason_code`, project lookup, and workflow configuration before retrying

See Automation Gateway for full operator guidance.

Workflow Cannot Start Because the Definition Is Missing or Invalid

Symptom: A workflow start request returns workflow_definition_unavailable or workflow_definition_invalid, and no new run_id is created.

Meaning: The runtime could not resolve or validate the canonical workflow definition stored in the target project. This fails closed before any run starts.

Code	Cause	Fix
`workflow_definition_unavailable`	The project has no matching workflow definition or no valid default workflow reference	Check the project's stored workflow configuration and default workflow selection
`workflow_definition_invalid`	The canonical workflow definition failed validation	Correct the graph definition, then retry after validation passes

Common validation failures include:

Cycles in the node graph
Dangling edges that reference missing nodes
Missing or invalid entry nodes
Duplicate node or edge identities

Do not retry the same start request until the canonical project workflow definition is repaired.

Workflow Admission Blocked Before Run Creation (Phase 9.1)

Symptom: A start or automation request clears transport-level checks, but the workflow still does not start and returns an admission reason code.

Meaning: Workflow admission failed after definition resolution but before run creation. No canonical run state exists yet, so the fix is at the project, authority, or control-state layer.

Code	Cause	Fix
`AUTH-SCOPE-MISMATCH`	The request scope and resolved workflow definition do not match	Ensure the project/workflow identifiers point at the same stored definition
`POL-CONTROL-STATE-BLOCKED`	The project is hard stopped or otherwise blocked	Use operator-control to release the blocking state
`POL-PAUSED-BLOCKED`	The project is paused for review	Resume the project before retrying
`OPCTL-INVALID-STATE`	The request attempts to resume from an invalid control state	Fix the control-state transition first
`WMODE-001`	Unsupported workmode for this admission path	Use a supported workmode configuration
`WMODE-002`	Authority widening was attempted	Narrow the request back to the allowed authority chain
`WMODE-003`	Nested orchestration was attempted	Use the standard orchestration lane rather than orchestrating inside orchestration
`WMODE-010`	A worker attempted an authoritative start path	Reissue the request from the correct authority source

Admission failures are authoritative. Use the returned reasonCode and evidenceRefs before retrying.

Scheduled Trigger Repeats or Never Advances (Phase 9.3)

Symptom: A cron/calendar schedule appears stuck, keeps returning duplicate outcomes, or never seems to fire again after a restart.

Meaning: Phase 9.3 persists schedule definitions and due cursors, but actual run truth still comes from ingress and workflow state.

Fix:

Check the schedule's nextDueAt, lastDispatchedAt, workflowDefinitionId, and workmodeId.
If ingress returns accepted_already_dispatched, treat that as duplicate-safe recovery of the original run rather than a failure.
If the schedule keeps returning workflow_admission_blocked, repair the target project's workflow configuration instead of forcing redispatch.
If nextDueAt is missing or stale after restart, let the scheduler recompute it from the canonical trigger definition before assuming work was skipped.

Artifact Retrieve/List Returns Nothing Even Though a Write Happened (Phase 9.3)

Symptom: An artifact write appears to succeed, but default retrieve or list calls return nothing.

Meaning: Phase 9.3 keeps prepared versions hidden from default visibility. Only committed versions are normal runtime truth.

Fix:

Check whether the artifact version is still prepared instead of committed.
If the producing workflow is still waiting on checkpoint_commit, do not treat the artifact as durably available yet.
Verify you are reading from the correct projectId; artifact IDs are not global bearer tokens.
If a prepared version never commits, inspect recovery or checkpoint evidence instead of replaying the same write blindly.

Artifact Integrity Mismatch or Corruption Signal (Phase 9.3)

Symptom: Artifact retrieval fails closed, or downstream tooling reports an integrity mismatch.

Meaning: The stored payload bytes no longer match the artifact manifest's integrityRef (sha256:<64-char-hex>). The runtime is deliberately refusing to return corrupted content.

Fix:

Treat the failure as authoritative; do not bypass it by reading the raw payload directly.
Check whether the payload document was interrupted, partially rewritten, or manually edited.
Recreate or recommit the artifact from the canonical workflow output if the checkpoint and evidence chain allow it.
If the producing run had uncertain external effects, escalate into review rather than assuming the artifact can be reconstructed safely.

Workflow Run State or Dispatch Lineage Is Hard to Interpret

Symptom: A run exists, but it appears stalled, paused, blocked, or unexpectedly branched.

Meaning: Phase 9.1 and Phase 9.2 expose canonical run, node, wait, checkpoint, correction, and dispatch-lineage state. Those records explain what happened without requiring the UI to infer workflow truth.

Field	How to use it
`status`	Overall workflow state such as `ready`, `running`, `waiting`, `blocked_review`, `paused`, `completed`, or `failed`
`readyNodeIds`	Nodes eligible to dispatch now
`waitingNodeIds`	Nodes intentionally paused on a continuation path
`blockedNodeIds`	Nodes that forced the run into review-required posture
`completedNodeIds`	Nodes that have already finished
`nodeStates[<nodeId>].status`	Per-node state such as `pending`, `ready`, `running`, `waiting`, `completed`, `blocked`, or `failed`
`nodeStates[<nodeId>].activeWaitState`	Canonical wait details, including wait kind, reason code, evidence refs, and optional `resumeToken`
`nodeStates[<nodeId>].correctionArcs`	The corrective path recorded by the runtime (`resume`, `retry`, `reprompt`, or `rollback`)
`lastPreparedCheckpointId` / `lastCommittedCheckpointId`	Whether the run has only prepared checkpoint state or a durable checkpoint that is safe to continue from
`selectedBranchKey` / `activatedEdgeIds`	Which condition branch actually fired and which outbound edges became live
`dispatchLineage[*].dispatchLineageId`	Unique identifier for one dispatch attempt
`parentNodeId` / `viaEdgeId`	How the current node became ready
`reasonCode` / `evidenceRefs`	The authoritative explanation for a transition, pause, block, or failure

Fix:

If the run is paused, inspect control-state and operator actions first.
If the run is waiting, inspect activeWaitState.kind, reasonCode, and resumeToken on the waiting node before dispatching anything else.
If a node is blocked or failed, use the reasonCode and evidenceRefs to identify the governing blocker.
If the run is blocked_review, inspect the newest correctionArcs entry before deciding whether the path is resume, retry, reprompt, or rollback.
If branching looks wrong, inspect parentNodeId and viaEdgeId in dispatch lineage before assuming the UI or scheduler is wrong.
If checkpoint behavior looks wrong, compare lastPreparedCheckpointId and lastCommittedCheckpointId before resuming.
If no nodes are ready, inspect upstream node states and graph structure rather than replaying the same dispatch.

Workflow Is Waiting and Will Not Advance (Phase 9.2)

Symptom: A run remains waiting, and no new nodes dispatch even though the run is not failed.

Meaning: The runtime is intentionally holding progress on a canonical wait state. The wait kind tells you what must happen next.

Wait kind	Meaning	Fix
`async_batch`	External or long-running work has not been completed yet	Wait for the completion witness, then continue the same node with the current `resumeToken`
`human_decision`	A person must approve or reject the step	Submit the explicit decision instead of replaying the node
`retry_backoff`	Governance or retry policy deferred immediate progress	Clear the blocker or let the backoff condition resolve, then resume or reprompt the same node
`checkpoint_commit`	The node output exists, but the checkpoint commit is not durable yet	Wait for checkpoint commit completion and continue the same node rather than executing it again

If the control state is paused_review or resuming, the wait can remain active with reason codes such as workflow_wait_paused_review or workflow_wait_resuming until the control-state transition is resolved.

Workflow Is Blocked for Review After Resume, Decision, or Retry (Phase 9.2)

Symptom: A run enters blocked_review after a continuation, human decision, or retry-related step.

Meaning: The runtime refused blind progress and recorded a correction posture in correctionArcs.

Reason code	Cause	Fix
`workflow_continuation_token_mismatch`	A stale or wrong continuation token was supplied	Re-read the waiting node and use its current `resumeToken`
`workflow_resume_review_required`	The previous attempt has `unknown_external_effect`, so blind resume is unsafe	Review the evidence and explicitly approve continuation before resuming
`workflow_human_decision_rejected`	The human-decision node was rejected	Follow the correction arc and rollback/reprompt path instead of replaying the same continuation
`workflow_resume_denied_hard_stopped`	The project became `hard_stopped` during resume	Clear the hard stop before trying to continue
`workflow_retry_backoff_resolution_required`	A retry/backoff path needs explicit operator action	Apply the indicated retry or reprompt path, then continue from the same run

Treat correctionArcs as authoritative runtime state. They are not suggestions; they are the canonical record of how the workflow expects recovery to proceed.

Checkpoint Commit Stays Pending (Phase 9.2)

Symptom: The run stays waiting with activeWaitState.kind = checkpoint_commit, or lastPreparedCheckpointId changes while lastCommittedCheckpointId does not.

Meaning: The runtime prepared checkpoint state but has not yet durably committed it. Phase 9.2 does not allow blind continuation from provisional checkpoint state.

Fix:

Wait for the checkpoint commit to succeed, then continue the same node/run with the committed checkpoint reference.
Do not redispatch the node while checkpoint_commit is still active.
If commit never completes, inspect checkpoint-manager evidence and the node's reasonCode before deciding whether the path should remain waiting or be escalated into a review workflow.

Recovery Blocked or Failed (Phase 5.4)

Symptom: Workflow shows recovery_blocked_review_required or recovery_failed_hard_stop.

Meaning: The recovery system reached a terminal state that requires operator action or investigation.

State	Cause	Fix
`recovery_blocked_review_required`	`unknown_external_effect` flagged; retry blocked (missing idempotency); or policy/invariant violation	Review evidence; use operator-control to authorize resume or escalate
`recovery_failed_hard_stop`	Integrity mismatch; policy violation; or unrecoverable failure	Investigate logs and evidence; use operator-control to change control state; re-trigger if appropriate

Fix: See Recovery Governance for full operator guidance.

Troubleshooting

On this page