Jarvis Docs
Playbooks

Troubleshooting

Common issues and recovery steps

Troubleshooting

Ollama Not Running

Symptom: Health check shows Ollama as "unhealthy". Chat fails or times out.

Fix:

# Start Ollama (in a separate terminal)
ollama serve

# Pull a model if needed
ollama pull llama3.2:3b

Backend Not Reachable

Symptom: CLI or web UI cannot connect. "Connection refused" or similar.

Fix:

  • Ensure the backend is running: pnpm dev:web
  • Check the API URL: CLI defaults to http://localhost:4317
  • If 4317 was busy, installer/dev may have started on the next free port; use the printed URL (or set --api-url in CLI)
  • If using a different port, set --api-url for CLI

pnpm install Fails (EPERM, Windows)

Symptom: pnpm install fails with permission errors on Windows.

Fix:

  • Run terminal as Administrator
  • Or use a different drive (e.g. D:) if C: has restrictions
  • Disable antivirus temporarily for the install directory

Tests Fail from Root (Workspace Resolution)

Symptom: Root test runs fail with package import resolution errors (for example Cannot find package '@nous/shared') or discover unexpected dist/ test files.

Fix:

  • Run root tests through the workspace command: pnpm test
  • Run package-local tests with each package script: pnpm --filter <package-name> test
  • Avoid ad-hoc root invocation (pnpm exec vitest run) unless you explicitly pass the intended config file

Config Validation Errors

Symptom: Startup fails with ConfigError listing validation failures.

Fix:

  • Check the config file (JSON5) for syntax errors
  • Ensure required fields are present: profile, pfcTier, providers, storage, etc.
  • Remove the config file to regenerate defaults, or fix the reported fields

Voice Request Stays in Text Confirmation or Blocked State

Symptom: A voice-originated action does not execute even though the requested intent seems clear.

Meaning: Phase 11.3 keeps risky voice actions behind canonical confidence-governance and confirmation checks. The runtime can return clarification, text confirmation, dual-channel confirmation, or blocked posture instead of executing directly.

Fix:

  • Check whether the request is high-risk, destructive, or T3; those actions require text or dual-channel confirmation.
  • If the response mentions low confidence, repeat the request more explicitly or continue in text.
  • If the active Principal session is missing or stale, complete confirmation from a current trusted session instead of retrying voice-only.
  • Treat the returned reason code and confirmation posture as canonical runtime truth.

Voice Degraded Mode Remains Active

Symptom: Voice responses keep directing you to continue in text, or MAO/other surfaces show voice degraded mode even after one successful turn.

Meaning: Degraded mode is a safety posture. Phase 11.3 keeps risky controls text-first until recovery is sustained; a single improved turn does not automatically clear the degraded state.

Fix:

  • Continue risky control actions in text while degraded mode is active.
  • Check whether recent turns included low ASR confidence, low intent confidence, handoff instability, transport degradation, or interruption recovery.
  • After conditions stabilize, start a fresh voice turn and let the runtime clear degraded mode through the sustained recovery path.
  • If degraded mode persists unexpectedly, inspect the voice session projection and witness-linked evidence rather than forcing execution.

Cloud Model List Shows Stale or Missing Models

Symptom: The model picker shows a minimal list with a staleness indicator, or expected cloud models do not appear.

Meaning: The dynamic /v1/models API call for that provider failed or returned an error. Nous falls back to a minimal static list when the live API is unreachable.

Fix:

  • Check that the provider API key is stored (Configuration > Provider Keys). Providers without a stored key are skipped silently.
  • Verify network connectivity to the provider API (api.anthropic.com or api.openai.com).
  • Reopen the configuration panel to trigger a fresh fetch — cached responses expire after 5 minutes.
  • If the provider API is experiencing an outage, the fallback list is expected behavior until the API recovers.

Model Selection Does Not Take Effect

Symptom: After selecting a new model, chat responses still use the previous model.

Fix:

  • Verify the selection was saved successfully (no error banner in the configuration panel).
  • Model selections are applied immediately at runtime. If the selected model spec is invalid or malformed, it is rejected without corrupting provider config — check the configuration panel for error feedback.
  • If the selected model is from a cloud provider, ensure the provider key is still configured and valid.

First-Run Loop

Symptom: First-run flow keeps appearing.

Fix: First-run completes when you send a message and receive a response, or when a project exists. Ensure Ollama is running so the health check and model invocation succeed.

Memory Inspector Shows a Global-Scope Warning

Symptom: The /memory page shows a reason-code banner after you switch Scope to Global only or Project + global, and global entries do not appear.

Meaning: The selected project either does not inherit global memory or the runtime denied global-scope inspection. This is an explicit policy result, not an empty-state guess.

Fix:

  • Read the banner's reason code and explanation first.
  • If you only need project-local memory, switch Scope back to Project only.
  • If you expected global visibility, check the project's memory-access policy and whether inheritsGlobal is enabled.
  • If the denial should not have happened, inspect trace and audit evidence for the underlying policy decision rather than retrying blindly.

Memory Export Includes More Than the Current Filtered View

Symptom: The exported memory bundle contains entries that are not visible in the current inspector result list.

Meaning: This is expected. The export flow always returns the full project memory bundle, including STM context, durable entries, audit history, and tombstones. Search and filter controls narrow the on-screen inspection view only.

Fix:

  • Use the inspector filters to review a subset.
  • Use export when you need the authoritative full-state bundle.
  • If you need a narrower dataset for analysis, filter the exported bundle after download.

Hard Delete Cannot Be Confirmed or Is Rejected

Symptom: The Memory inspector refuses to proceed with hard delete, or the result banner says the delete was not applied.

Meaning: Hard delete is confirmation-protected and requires a non-empty rationale before the request is sent. Even with a rationale, the governed mutation can still deny or defer the action.

Fix:

  • Enter a clear rationale before selecting Confirm hard delete.
  • Read the returned reason code in the result banner.
  • If the action was not applied, inspect memory.audit for the authoritative mutation outcome and evidence refs.
  • Treat deny or defer outcomes as runtime truth; clear the governing blocker instead of retrying without changes.

Learning Visibility Shows Missing Source or Evidence Diagnostics

Symptom: The /memory Learning detail view reports missing source records, missing evidence refs, degraded lineage integrity, or unavailable control-state context.

Meaning: The selected distilled pattern references canonical inputs that are no longer fully resolvable in the current project view. Phase 8.8 surfaces those gaps explicitly instead of hiding them.

Fix:

  • Read the diagnostic first; it is part of the contract, not a cosmetic warning.
  • Check whether the referenced source records were superseded, deleted, or created in an older state that no longer has full evidence linkage.
  • Use linked trace IDs, audit history, and evidence refs to determine whether the pattern should be refreshed, replaced, or retired through the governed runtime path.
  • Do not assume the missing input can be reconstructed from the UI alone.

Learning Visibility Says Governance Cards Are Representative

Symptom: The Learning view notes that lifecycle events are derived, governance cards are representative, or historicalDecisionLogAvailable is false.

Meaning: Phase 8.8 projects the current canonical confidence-governance outcome for a pattern, but it does not yet persist a per-pattern historical decision ledger in workflow runtime history.

Fix:

  • Use the card's outcome and reasonCode to understand the current governance posture of the pattern.
  • Use traces, audit evidence, and linked provenance when you need proof of a specific past decision.
  • Treat representative cards as interpretation aids for current behavior, not as historical proof that a particular runtime action already happened.

Dispatch or Action Blocked (WMODE-*)

Symptom: A workflow dispatch or lifecycle action is blocked with a reason code such as WMODE-002, WMODE-003, or WMODE-010.

Meaning: The admission guard rejected the action because it would violate the authority chain or workmode boundaries.

CodeCause
WMODE-002Authority widening — e.g. a worker attempted to dispatch, or an orchestrator tried to dispatch to cortex
WMODE-003Nested orchestration — an orchestrator tried to dispatch to another orchestrator
WMODE-010Worker escalation — a worker attempted to dispatch to an authoritative agent

Fix: The action is not permitted by design. Check that the dispatch source and target respect the authority chain: nous_cortexorchestration_agentworker_agent. See Workmode and Authority Boundaries.

Ingress Rejection (Phase 5.3)

Symptom: Automation trigger (scheduler, hook, webhook) returns rejected with a reason code.

Reason codes and fixes:

CodeCauseFix
unauthenticatedWebhook HMAC failed or missingVerify HMAC signature; ensure key_id and auth_context_ref are correct
scope_mismatchPrincipal not bound to workflowCheck credential scope matches project_id and workflow_ref
event_forbiddenEvent type not in allowlistAdd event_name to credential's allowed_event_names
policy_blockedPolicy blocks this trigger classReview project/workflow policy for external trigger allowance
replay_detectedStale timestamp or duplicate nonceEnsure occurred_at within +/- 5 min; use unique nonce per request
rate_limitedRate limit exceededWait and retry with backoff; reduce request frequency
invalid_envelopeMissing project_id, workflow_ref, or invalid trigger_typeFix envelope; ensure all required fields present
control_state_blockedProject hard_stopped or paused_reviewResume project or release hard_stop via Projects UI / operator-control
workflow_admission_blockedIngress was valid but canonical workflow admission still failedInspect the returned reason_code, project lookup, and workflow configuration before retrying

See Automation Gateway for full operator guidance.

Workflow Cannot Start Because the Definition Is Missing or Invalid

Symptom: A workflow start request returns workflow_definition_unavailable or workflow_definition_invalid, and no new run_id is created.

Meaning: The runtime could not resolve or validate the canonical workflow definition stored in the target project. This fails closed before any run starts.

CodeCauseFix
workflow_definition_unavailableThe project has no matching workflow definition or no valid default workflow referenceCheck the project's stored workflow configuration and default workflow selection
workflow_definition_invalidThe canonical workflow definition failed validationCorrect the graph definition, then retry after validation passes

Common validation failures include:

  • Cycles in the node graph
  • Dangling edges that reference missing nodes
  • Missing or invalid entry nodes
  • Duplicate node or edge identities

Do not retry the same start request until the canonical project workflow definition is repaired.

Workflow Admission Blocked Before Run Creation (Phase 9.1)

Symptom: A start or automation request clears transport-level checks, but the workflow still does not start and returns an admission reason code.

Meaning: Workflow admission failed after definition resolution but before run creation. No canonical run state exists yet, so the fix is at the project, authority, or control-state layer.

CodeCauseFix
AUTH-SCOPE-MISMATCHThe request scope and resolved workflow definition do not matchEnsure the project/workflow identifiers point at the same stored definition
POL-CONTROL-STATE-BLOCKEDThe project is hard stopped or otherwise blockedUse operator-control to release the blocking state
POL-PAUSED-BLOCKEDThe project is paused for reviewResume the project before retrying
OPCTL-INVALID-STATEThe request attempts to resume from an invalid control stateFix the control-state transition first
WMODE-001Unsupported workmode for this admission pathUse a supported workmode configuration
WMODE-002Authority widening was attemptedNarrow the request back to the allowed authority chain
WMODE-003Nested orchestration was attemptedUse the standard orchestration lane rather than orchestrating inside orchestration
WMODE-010A worker attempted an authoritative start pathReissue the request from the correct authority source

Admission failures are authoritative. Use the returned reasonCode and evidenceRefs before retrying.

Scheduled Trigger Repeats or Never Advances (Phase 9.3)

Symptom: A cron/calendar schedule appears stuck, keeps returning duplicate outcomes, or never seems to fire again after a restart.

Meaning: Phase 9.3 persists schedule definitions and due cursors, but actual run truth still comes from ingress and workflow state.

Fix:

  • Check the schedule's nextDueAt, lastDispatchedAt, workflowDefinitionId, and workmodeId.
  • If ingress returns accepted_already_dispatched, treat that as duplicate-safe recovery of the original run rather than a failure.
  • If the schedule keeps returning workflow_admission_blocked, repair the target project's workflow configuration instead of forcing redispatch.
  • If nextDueAt is missing or stale after restart, let the scheduler recompute it from the canonical trigger definition before assuming work was skipped.

Artifact Retrieve/List Returns Nothing Even Though a Write Happened (Phase 9.3)

Symptom: An artifact write appears to succeed, but default retrieve or list calls return nothing.

Meaning: Phase 9.3 keeps prepared versions hidden from default visibility. Only committed versions are normal runtime truth.

Fix:

  • Check whether the artifact version is still prepared instead of committed.
  • If the producing workflow is still waiting on checkpoint_commit, do not treat the artifact as durably available yet.
  • Verify you are reading from the correct projectId; artifact IDs are not global bearer tokens.
  • If a prepared version never commits, inspect recovery or checkpoint evidence instead of replaying the same write blindly.

Artifact Integrity Mismatch or Corruption Signal (Phase 9.3)

Symptom: Artifact retrieval fails closed, or downstream tooling reports an integrity mismatch.

Meaning: The stored payload bytes no longer match the artifact manifest's integrityRef (sha256:<64-char-hex>). The runtime is deliberately refusing to return corrupted content.

Fix:

  • Treat the failure as authoritative; do not bypass it by reading the raw payload directly.
  • Check whether the payload document was interrupted, partially rewritten, or manually edited.
  • Recreate or recommit the artifact from the canonical workflow output if the checkpoint and evidence chain allow it.
  • If the producing run had uncertain external effects, escalate into review rather than assuming the artifact can be reconstructed safely.

Workflow Run State or Dispatch Lineage Is Hard to Interpret

Symptom: A run exists, but it appears stalled, paused, blocked, or unexpectedly branched.

Meaning: Phase 9.1 and Phase 9.2 expose canonical run, node, wait, checkpoint, correction, and dispatch-lineage state. Those records explain what happened without requiring the UI to infer workflow truth.

FieldHow to use it
statusOverall workflow state such as ready, running, waiting, blocked_review, paused, completed, or failed
readyNodeIdsNodes eligible to dispatch now
waitingNodeIdsNodes intentionally paused on a continuation path
blockedNodeIdsNodes that forced the run into review-required posture
completedNodeIdsNodes that have already finished
nodeStates[<nodeId>].statusPer-node state such as pending, ready, running, waiting, completed, blocked, or failed
nodeStates[<nodeId>].activeWaitStateCanonical wait details, including wait kind, reason code, evidence refs, and optional resumeToken
nodeStates[<nodeId>].correctionArcsThe corrective path recorded by the runtime (resume, retry, reprompt, or rollback)
lastPreparedCheckpointId / lastCommittedCheckpointIdWhether the run has only prepared checkpoint state or a durable checkpoint that is safe to continue from
selectedBranchKey / activatedEdgeIdsWhich condition branch actually fired and which outbound edges became live
dispatchLineage[*].dispatchLineageIdUnique identifier for one dispatch attempt
parentNodeId / viaEdgeIdHow the current node became ready
reasonCode / evidenceRefsThe authoritative explanation for a transition, pause, block, or failure

Fix:

  • If the run is paused, inspect control-state and operator actions first.
  • If the run is waiting, inspect activeWaitState.kind, reasonCode, and resumeToken on the waiting node before dispatching anything else.
  • If a node is blocked or failed, use the reasonCode and evidenceRefs to identify the governing blocker.
  • If the run is blocked_review, inspect the newest correctionArcs entry before deciding whether the path is resume, retry, reprompt, or rollback.
  • If branching looks wrong, inspect parentNodeId and viaEdgeId in dispatch lineage before assuming the UI or scheduler is wrong.
  • If checkpoint behavior looks wrong, compare lastPreparedCheckpointId and lastCommittedCheckpointId before resuming.
  • If no nodes are ready, inspect upstream node states and graph structure rather than replaying the same dispatch.

Workflow Is Waiting and Will Not Advance (Phase 9.2)

Symptom: A run remains waiting, and no new nodes dispatch even though the run is not failed.

Meaning: The runtime is intentionally holding progress on a canonical wait state. The wait kind tells you what must happen next.

Wait kindMeaningFix
async_batchExternal or long-running work has not been completed yetWait for the completion witness, then continue the same node with the current resumeToken
human_decisionA person must approve or reject the stepSubmit the explicit decision instead of replaying the node
retry_backoffGovernance or retry policy deferred immediate progressClear the blocker or let the backoff condition resolve, then resume or reprompt the same node
checkpoint_commitThe node output exists, but the checkpoint commit is not durable yetWait for checkpoint commit completion and continue the same node rather than executing it again

If the control state is paused_review or resuming, the wait can remain active with reason codes such as workflow_wait_paused_review or workflow_wait_resuming until the control-state transition is resolved.

Workflow Is Blocked for Review After Resume, Decision, or Retry (Phase 9.2)

Symptom: A run enters blocked_review after a continuation, human decision, or retry-related step.

Meaning: The runtime refused blind progress and recorded a correction posture in correctionArcs.

Reason codeCauseFix
workflow_continuation_token_mismatchA stale or wrong continuation token was suppliedRe-read the waiting node and use its current resumeToken
workflow_resume_review_requiredThe previous attempt has unknown_external_effect, so blind resume is unsafeReview the evidence and explicitly approve continuation before resuming
workflow_human_decision_rejectedThe human-decision node was rejectedFollow the correction arc and rollback/reprompt path instead of replaying the same continuation
workflow_resume_denied_hard_stoppedThe project became hard_stopped during resumeClear the hard stop before trying to continue
workflow_retry_backoff_resolution_requiredA retry/backoff path needs explicit operator actionApply the indicated retry or reprompt path, then continue from the same run

Treat correctionArcs as authoritative runtime state. They are not suggestions; they are the canonical record of how the workflow expects recovery to proceed.

Checkpoint Commit Stays Pending (Phase 9.2)

Symptom: The run stays waiting with activeWaitState.kind = checkpoint_commit, or lastPreparedCheckpointId changes while lastCommittedCheckpointId does not.

Meaning: The runtime prepared checkpoint state but has not yet durably committed it. Phase 9.2 does not allow blind continuation from provisional checkpoint state.

Fix:

  • Wait for the checkpoint commit to succeed, then continue the same node/run with the committed checkpoint reference.
  • Do not redispatch the node while checkpoint_commit is still active.
  • If commit never completes, inspect checkpoint-manager evidence and the node's reasonCode before deciding whether the path should remain waiting or be escalated into a review workflow.

Recovery Blocked or Failed (Phase 5.4)

Symptom: Workflow shows recovery_blocked_review_required or recovery_failed_hard_stop.

Meaning: The recovery system reached a terminal state that requires operator action or investigation.

StateCauseFix
recovery_blocked_review_requiredunknown_external_effect flagged; retry blocked (missing idempotency); or policy/invariant violationReview evidence; use operator-control to authorize resume or escalate
recovery_failed_hard_stopIntegrity mismatch; policy violation; or unrecoverable failureInvestigate logs and evidence; use operator-control to change control state; re-trigger if appropriate

Fix: See Recovery Governance for full operator guidance.

On this page