# Kleisli.IO Documentation — Full Content

# kli

## Getting Started

### Configuration


## What `kli init` Sets Up

Running `kli init` in your project configures five things:

1. **MCP server** — Claude gets access to task management and pattern tools
2. **Hooks** — Automatic context injection at session start and during tool use
3. **Skills** — Domain knowledge that Claude loads when you invoke kli commands
4. **Commands** — Slash commands like `/kli:plan`, `/kli:implement`, `/kli:research`
5. **Agents** — Specialized sub-agents Claude spawns during workflows (reflector, curator, graph analyst, etc.)

You don't interact with any of these directly. They are infrastructure that Claude uses behind the scenes when you run slash commands like `/kli:plan` or `/kli:implement`.

## Hooks

kli installs Claude Code hooks that run automatically during sessions:

| Hook | Event | What It Does |
|------|-------|-------------|
| session-start | SessionStart | Registers session, detects parallel sessions, injects git state and active task context |
| tool-call | PostToolUse | Records tool usage events for behavioral fingerprinting and domain detection |
| session-task-write | PostToolUse | Writes session files when tasks are claimed or released |
| file-conflict | PostToolUse | Warns when you edit a file recently touched by another session |
| playbook-activate | UserPromptSubmit | Detects programming domains from prompts and nudges pattern retrieval |
| feedback-nudge | Stop | Reminds Claude to give feedback on activated patterns before stopping |
| session-leave | SessionEnd | Cleans up session files and records session departure |

See the [Hooks Reference](/kli/hooks) for detailed documentation on each hook.

## Skills

Skills are domain knowledge documents that Claude loads on demand:

| Skill | Loaded When You Run | What Claude Learns |
|-------|-------------------|-------------------|
| kli-research | `/kli:research` | How to explore codebases, select agents, define exit criteria |
| kli-planning | `/kli:plan` | How to design phases, define success criteria, reuse research |
| kli-implementation | `/kli:implement` | TDD methodology, verification gates, design principles |
| kli-reflection | `/kli:reflect` | Pattern extraction, feedback loops |
| kli-workflow | All commands | Phase transitions, artifact flow between commands |

## MCP Server

kli runs one MCP server alongside Claude Code. It gives Claude access to 31 tools for task management, pattern learning, and session coordination — you never call these tools yourself. It handles task creation, observations, graph queries, conflict detection between parallel sessions, pattern search, feedback scoring, and session fingerprinting.

The server starts automatically and shuts down after idle timeout.

## CLAUDE.md

`kli init` adds a small section to your project's `CLAUDE.md` that tells Claude the task MCP server is available. You can customize this if needed, but the defaults work for most projects.


### Installation


## Prerequisites

- **Claude Code** — Anthropic's CLI for Claude ([install guide](https://docs.anthropic.com/en/docs/claude-code))

Nix is **not required** to use kli. Pre-built releases are pulled from GitHub automatically.

## Install

```bash
curl -fsSL https://kli.kleisli.io/install | sh
```

This downloads the kli binary and configures your Claude Code environment with the necessary MCP servers, hooks, and skills.

## Initialize a Project

In any project directory:

```bash
kli init
```

This sets up the kli plugin for the current project, adding the task MCP server configuration and Claude Code hooks.

## Verify Installation

Start Claude Code in your project and type:

```
/kli:research
```

If kli is installed correctly, Claude will begin the research workflow. You can also check available commands with `/help` in Claude Code.

## Optional: Build from Source

If you use Nix and want to build kli from source:

```bash
nix build github:kleisli-io/kli
```

This is only needed for development or deployment — not for using kli as a plugin.


### Quick Start


kli extends Claude Code with structured workflows for software engineering. You interact with kli through **slash commands** in Claude Code — Claude handles the task management, pattern matching, and coordination behind the scenes.

## The kli Workflow

A typical kli session follows four phases:

### 1. Research

Explore a codebase or problem space before making changes:

```
/kli:research
```

Claude investigates the codebase, spawns specialized agents (codebase explorers, web researchers), and produces a `research.md` artifact summarizing findings.

### 2. Plan

Create a phased implementation plan:

```
/kli:plan
```

Claude reads the research artifact, designs an implementation strategy broken into phases, and presents it for your approval. Each phase has explicit success criteria and verification steps.

### 3. Implement

Execute the plan phase by phase:

```
/kli:implement
```

Claude works through each phase using TDD (Red → Green → Refactor), runs automated verification after each phase, and requests your manual approval before proceeding.

### 4. Reflect

After completing work, capture lessons learned:

```
/kli:reflect
```

Claude analyzes the session's observations and outcomes, extracting patterns that improve future workflows.

## Other Useful Commands

| Command | Purpose |
|---------|---------|
| `/kli:handoff` | Create a handoff document when transferring work to a new session |
| `/kli:resume_handoff` | Resume work from a previous handoff |
| `/kli:create-task` | Create an event-sourced task for tracking complex work |
| `/kli:validate` | Verify implementation against plan criteria |

## What's Happening Behind the Scenes

When you use kli commands, Claude uses MCP (Model Context Protocol) tools to manage tasks, record observations, and coordinate agents. You don't need to interact with these tools directly — the slash commands handle everything.

For details on what each command does internally, see the [Command Reference](/kli/commands/plan), [Workflow Reference](/kli/workflows/planning), and [Agent Reference](/kli/agents/reflector) sections.


## Using kli

### Understanding Patterns


The playbook is kli's long-term memory — a collection of patterns learned from previous work. Patterns capture what worked, what didn't, and how to approach specific types of problems. Claude consults the playbook automatically when starting tasks and updates it after reflecting on completed work.

## What Patterns Look Like

A pattern is a short, actionable piece of guidance tagged with a domain and scored by how often it has been helpful or harmful. For example:

```
[lisp-000042] helpful=5 harmful=0 ::
When editing defstruct forms, always reload dependents —
SBCL doesn't propagate slot changes to compiled callers.
```

Patterns are prescriptive ("do X when Y") rather than descriptive ("X exists"). They capture the kind of knowledge that saves time on the second encounter.

## How Patterns Emerge

1. **During work**, Claude records observations in the task event log — things it discovers, constraints it hits, approaches that succeed or fail

2. **During reflection** (`/kli:reflect`), Claude reviews those observations and promotes the transferable ones to patterns. Not every observation qualifies — only insights that would help in future, unrelated tasks

3. **Over time**, patterns accumulate feedback. When a pattern helps Claude complete a task successfully, it gets a helpful vote. When it leads astray, it gets a harmful vote. High-scoring patterns surface more readily; low-scoring ones fade

## The Litmus Test

Not every observation becomes a pattern. To be promoted, an insight must pass all three criteria:

- **Transferable** — Useful beyond the original task
- **Actionable** — Provides specific guidance, not just information
- **Prescriptive** — Says what to do (or avoid), not just what exists

System-specific facts stay as observations on the task. Only insights that would help on a different project in a different context become patterns.

## How Patterns Help You

When Claude starts working on a task, it queries the playbook for patterns relevant to the current domain and problem. This happens automatically — you'll see Claude reference activated patterns in its reasoning.

The effect is cumulative:

- **First time** working in a new area, Claude relies on general knowledge
- **After a few tasks**, patterns from your specific codebase and conventions start activating
- **Over many sessions**, Claude develops a working knowledge of your project's idioms, pitfalls, and proven approaches

## Triggering Reflection

Run `/kli:reflect` after completing a piece of work. Claude will:

1. Review the session's observations
2. Identify insights that pass the litmus test
3. Create or update patterns in the playbook
4. Report what was learned

Reflection is most valuable after tasks that involved debugging, discovering non-obvious constraints, or finding approaches that worked better than expected.

## Domains

Patterns are tagged with domains like `lisp`, `nix`, `web`, or `ops`. Domain tags help Claude activate the right patterns — when you're working on Nix code, Nix patterns surface; when you're working on Lisp, Lisp patterns surface. The playbook-activate hook detects domains from your prompts and triggers pattern retrieval automatically.

## Pattern Lifecycle

The full lifecycle of a pattern:

1. **Discovery** — An insight surfaces during implementation
2. **Observation** — Claude records it in the task's event stream
3. **Promotion** — During `/kli:reflect`, observations that pass the litmus test become patterns
4. **Activation** — Retrieved via semantic search when Claude starts relevant new work
5. **Feedback** — Marked helpful or harmful based on application outcomes
6. **Evolution** — Content updated based on accumulated evidence

Patterns are never deleted. If harmful votes exceed helpful votes, a pattern is deprioritized rather than removed — preserving the record of what didn't work.

## Background

kli's workflow draws on two bodies of work. The research → plan → implement structure comes from Dex Horthy's [advanced context engineering](https://github.com/humanlayer/12-factor-agents) methodology at HumanLayer, which established that dividing AI coding work into sequential phases — each producing a compacted artifact as input for the next — dramatically improves output quality in large codebases. kli extends this with a fourth phase, **reflect**, which closes the feedback loop by promoting observations into reusable patterns.

The playbook concept itself is adapted from the [Agentic Context Engineering](https://arxiv.org/abs/2510.04618) paper (Stanford, SambaNova, UC Berkeley, 2025), which established the methodology of agents writing observations between phases of work. kli was first used with this methodology in October 2025 on a production project, where a file-based observation system accumulated 230 tasks and 117 handoff documents before hitting scalability limits.

kli's playbook system extends the original methodology with event-sourced task state (CRDT-based merging for safe parallel sessions), helpful/harmful scoring that lets patterns fade rather than requiring manual curation, and hybrid retrieval combining semantic search with spreading activation over a co-application graph.


### Workflow Overview


kli structures work into four phases: **research**, **plan**, **implement**, and **reflect**. You move through these phases by typing slash commands in Claude Code. Each phase produces artifacts that feed the next.

## Research

```
/kli:research
```

Claude explores the codebase, reads files, spawns sub-agents for deeper investigation, and writes a `research.md` document summarizing what it found. You guide the research by describing what you want to understand — Claude handles the file reading, code tracing, and documentation.

Research is iterative. Claude proposes findings, you correct or redirect, and Claude refines. The output is a markdown artifact that captures the current state of the codebase relevant to your task.

Use research when you're starting something unfamiliar, investigating a bug, or need to understand existing code before making changes.

## Plan

```
/kli:plan
```

Claude reads the research artifact and designs a phased implementation plan. Each phase has:

- A description of what to build or change
- Success criteria (automated checks and manual verification)
- Dependencies on other phases

The plan is presented for your approval before any code is written. You can ask Claude to revise phases, reorder work, add or remove steps. Once approved, the plan becomes a DAG of phase tasks in the task graph.

If requirements change mid-implementation, use `/kli:iterate_plan` to revise the plan while preserving completed work.

## Implement

```
/kli:implement
```

Claude works through the plan phase by phase. For each phase:

1. Claude reads the phase description and success criteria
2. Writes code following a test-first approach where applicable
3. Runs automated verification (builds, tests, linting)
4. Presents the results and asks for your manual approval before moving to the next phase

You stay in control throughout — Claude won't proceed to the next phase without your sign-off. If something needs adjustment, you direct Claude to fix it before approving.

## Reflect

```
/kli:reflect
```

After completing work, Claude reviews the session's observations and extracts reusable patterns. These patterns enter the playbook — a knowledge base that improves future sessions. See [Understanding Patterns](/kli/using-kli/understanding-patterns) for how this works.

Reflection is optional but valuable. The more you reflect, the better Claude becomes at tasks in your codebase.

## Supporting Commands

These commands support the main workflow:

| Command | When to Use |
|---------|-------------|
| `/kli:create-task` | Start tracking a piece of work before researching it |
| `/kli:resume-task` | Pick up where you left off on an existing task |
| `/kli:handoff` | Save context when you need to continue in a new session |
| `/kli:resume_handoff` | Resume work from a saved handoff document |
| `/kli:validate` | Check implementation against plan criteria after implementing |
| `/kli:commit` | Create a git commit with context-aware message generation |

## Skipping Phases

The four phases are a guide, not a requirement. For small changes, you might skip research and go straight to planning. For exploratory work, you might research without ever planning. Use what fits the task.


### Working with Tasks


Tasks are how kli tracks work across sessions. A task is a directory with an append-only event log that records everything Claude does — observations, decisions, artifacts, and state changes. Tasks persist between sessions, so you can stop and resume without losing context.

## Creating a Task

```
/kli:create-task
```

Claude asks what you're working on, creates the task directory, and sets it as the active task for the session. From this point, observations and artifacts are recorded against the task.

You don't always need to create a task explicitly. Running `/kli:plan` on an existing task or `/kli:research` will create or attach to tasks as needed.

## Resuming a Task

```
/kli:resume-task
```

Claude finds your active and recent tasks, lets you pick one, and loads its full context — observations, artifacts, graph neighbors, and any handoffs from previous sessions.

## Task Lifecycle

Tasks progress through three states:

1. **Created** — Initial state with a birth certificate describing the work
2. **Active** — Claude is working on the task, recording observations and artifacts
3. **Completed** — All work finished; the task rejects further mutations

## Phases

When you run `/kli:plan`, Claude breaks work into phases. Each phase is itself a task, linked to the parent plan with `phase-of` edges. Phases can depend on each other — Claude tracks these dependencies and only works on phases whose dependencies are complete.

During `/kli:implement`, Claude queries the plan to find the next ready phase, works on it, marks it complete, and moves to the next. You see this as a sequence of implementation steps with approval gates between them.

## The Task Graph

Tasks form a directed acyclic graph (DAG). Edges between tasks carry meaning:

| Edge | Meaning |
|------|---------|
| `phase-of` | Subtask/phase of a parent plan |
| `depends-on` | Must complete before this task starts |
| `related-to` | Informational relationship |
| `references` | Links to research or prior work |
| `same-day` | Automatically linked tasks from the same day |
| `topic` | Semantically similar tasks |

Claude uses the graph to find ready phases, track dependencies, detect related work from previous sessions, and coordinate parallel sessions. You don't interact with the graph directly — Claude handles all queries and mutations through the task MCP server.

## Observations

As Claude works, it records observations — discoveries, constraints, decisions, and outcomes. These are timestamped entries in the task's event log. Observations serve two purposes:

1. **Session context** — When you resume a task, Claude replays observations to understand what happened previously
2. **Pattern source** — During `/kli:reflect`, observations that are transferable and actionable get promoted to playbook patterns

You can direct Claude to record specific observations, but it also records them naturally during research and implementation.

## The Event Log

Every task mutation is recorded as an immutable event in `events.jsonl`. The current state is computed by replaying events — there is no mutable database. This gives you full history of what happened during a task.

Claude records these event types as it works:

```
session.join     — Claude started working on a task
session.claim    — Claude took exclusive ownership for conflict-sensitive work
observation      — Knowledge captured during work
task.complete    — Task marked as finished
task.reopen      — Completed task reopened
metadata.set     — Key-value metadata updated
edge.add         — Graph edge created
edge.remove      — Graph edge severed
handoff.create   — Handoff document generated
```

## Handoffs

When you need to continue work in a new Claude Code session:

```
/kli:handoff
```

Claude writes a handoff document summarizing the current state — what's done, what's in progress, key learnings, and recommended next steps. The handoff is stored in the task directory.

To resume:

```
/kli:resume_handoff
```

Claude reads the handoff, verifies the current state matches expectations, and presents an action plan for continuing.

## Parallel Sessions

Multiple Claude Code sessions can work on related tasks simultaneously. The task system handles this through:

- **Session tracking** — Each session registers when it joins a task
- **File conflict detection** — A hook warns when you edit a file recently touched by another session
- **CRDT merging** — Events from different sessions merge automatically using conflict-free replicated data type semantics: observations append (no conflicts possible), metadata uses last-writer-wins per key, edges are add/remove sets, and status uses max-progress ordering

The session-start hook shows you when parallel sessions are active so you're aware of concurrent work. Event sourcing means you always have a full audit trail — if something went wrong, the event log shows exactly what happened and when.


## Command Reference

### Create Task


> Scaffold a task with event-sourced tracking via the task MCP server

Scaffold an event-sourced task with the task MCP server. This is for quick, focused work that doesn't need the full KLI research/plan/implement cycle.

## When to Use This

- Quick bug fixes, small features, one-off investigations
- Work you want tracked but don't need formal phases for
- Creating parent tasks to organize subtasks later
- Linking new work to existing tasks in the graph

## Process

### Step 1: Parse Arguments

Parse $ARGUMENTS to determine intent:

| Pattern | Mode | Behavior |
|---------|------|----------|
| No args | Interactive | Ask what the task is about |
| Short name (1-4 words) | Direct | Use as task name, infer description |
| Sentence/description | Infer | Extract a kebab-case name, use input as description |

**Name convention**: kebab-case, descriptive, no date prefix (auto-added by the server as `YYYY-MM-DD-<name>`).

Names are validated for descriptiveness. The server rejects meaningless names.

**Good names** (pass validation):
- `fix-login-redirect` - verb + object
- `add-retry-logic-to-api-client` - descriptive action
- `research-caching-strategies` - clear intent

**Bad names** (rejected):
- `P1`, `P2` - letter+number only
- `phase-1` - no semantic content after prefix
- `stuff`, `misc`, `wip` - vague words
- `foo`, `bar` - too short

If no arguments provided, ask:

```
What are you working on?

Describe the task briefly - I'll create a tracked task for it.
```

### Step 2: Check Context

Before creating, gather context:

1. **Check for active task**: Call `task_get()` (no args) to see if a current task is already set
2. **If active task exists**: Ask whether this new task should be:
   - A **subtask** (phase-of the current task)
   - A **related** task (linked but independent)
   - A **standalone** task (no connection)

This avoids orphaned tasks and keeps the graph connected.

### Step 3: Create the Task

Based on Step 2:

**Standalone or no parent context:**
```
Call task_create(name="<kebab-name>", description="<description>")
```

**Subtask of existing task:**
```
Call task_fork(name="<kebab-name>", from="<parent-id>", edge_type="phase-of", description="<description>")
Then call task_set_current(task_id="<new-task-id>") to switch context
```

**Related to existing task:**
```
Call task_create(name="<kebab-name>", description="<description>")
Then call task_link(target_id="<related-id>", edge_type="related-to")
```

### Step 4: Set Metadata

Set useful metadata on the new task:

```
Call task_set_metadata(key="tags", value="<comma-separated relevant tags>")
```

Infer tags from the description. Common tags: `bugfix`, `feature`, `refactor`, `investigation`, `infrastructure`, `nix`, `lisp`, `mcp`, `dashboard`, `shell`.

If the task has a clear scope, also set:
```
Call task_set_metadata(key="scope", value="<component or area>")
```

### Step 5: Record Initial Observation

Record context that will be useful when reviewing this task later:

```
Call observe(text="Created via /create-task. <any additional context from the conversation>")
```

Include relevant context like:
- What triggered this task (error message, user request, discovery during other work)
- Key files or components likely involved
- Any constraints or decisions already made

### Step 6: Report

Present the created task:

```
Task created: <full-name-with-date>

Description: <description>
Tags: <tags>
Parent: <parent if applicable>
Link: <relationship if applicable>

The task is now your active context. All observations, artifacts, and
metadata will be tracked in the event stream.

When done, mark complete with task_complete() or hand off with /kli:handoff.
```

## Error Handling

| Error | Response |
|-------|----------|
| Task MCP unavailable | "Task MCP server not responding. Have you run `kli init` in this project?" |
| Name too vague | Ask for a more descriptive name |
| Duplicate name | The date prefix usually prevents this, but if it happens, suggest appending a disambiguator |
| Parent task not found | List available tasks with `task_list()` and let user pick |

## Guidelines

- Task names should be self-documenting: someone reading the task list should understand what each task is about
- Don't over-tag: 2-4 tags is plenty
- Always check for an active parent before creating standalone tasks
- The initial observation is important: it's the first thing someone sees when bootstrapping the task later
- This command does NOT enter plan mode or create research documents. It creates a tracked task and sets context. The user decides what to do next.


### Handoff


> Create handoff document for transferring work to another session

You are tasked with writing a handoff document to hand off your work to another agent in a new session.

You will create a handoff document that is thorough, but also **concise**. The goal is to compact and summarize your context without losing any of the key details of what you're working on.

## Process

### 1. Generate Scaffold via MCP Tool

**Call the handoff MCP tool** to generate the path, create the directory, emit the `:handoff.create` event, and write minimal placeholder content:

```
mcp__task__handoff(summary="brief-description-of-handoff")
```

This returns structured metadata:
- `path`: the full handoff file path (already created with minimal content)
- `task`: the current task ID
- `task_dir`: the task directory path (use for playbook-export-state)
- `timestamp`: ISO 8601 timestamp
- `session`: session ID

The MCP tool requires a current task. If no task is set, it will error - use `task_bootstrap` or `task_create` first.

**Parse the returned path and task_dir** - you will overwrite the file with rich content.

### 2. Write Handoff Document

Use the Write tool to write rich content to the path returned by the MCP tool. Use the following template structure with YAML frontmatter:

**Important**: Before writing the handoff document you need to read the handoff document that was auto-generated by using the mcp__task__handoff tool so as to avoid getting a Write error.

```markdown
---
date: [ISO 8601 timestamp from MCP response]
timestamp: [YYYY-MM-DD]
git_branch: [from git]
git_commit: [from git]
repository: [repository name]
task: "YYYY-MM-DD-description"
type: handoff
status: active
---

# Handoff: [Task Name] - [Brief Description]

**Created**: [ISO timestamp]
**Task Directory**: `[task_dir from MCP response]`

## Task(s)

[Description of the task(s) that you were working on, along with the status of each (completed, work in progress, planned/discussed). If working on an implementation plan, call out which phase you are on.]

## Critical References

[List 2-3 most important file paths that must be consulted:]
- `[task_dir]/research.md` - [Brief description]
- `[task_dir]/plan.md` - [Brief description]
- [Other critical files]

## Recent Changes

[Describe recent changes made to the codebase in file:line syntax:]
- `path/to/file.ext:line` - [Description of change]
- `another/file.ext:line-range` - [Description of change]

## Learnings

[Describe important things learned - patterns, root causes, key information for next session:]
- [Learning 1] - Evidence at `file.ext:line`
- [Learning 2] - Pattern found in `file.ext:line`
- [Important discovery with specific references]

## Artifacts

[Exhaustive list of artifacts produced or updated as filepaths/file:line references:]
- `[task_dir]/research.md` - [What it contains]
- `[task_dir]/plan.md` - [Current phase status]
- `.claude/commands/newcommand.md:1-50` - [What was created]

## Task Graph State

[If task has phases, capture current graph state:]
- **Current Phase**: [from task_graph(query="plan")]
- **Completed Phases**: [list of completed phases]
- **Pending Phases**: [list of pending/active phases]
- **Blocked By**: [any blocking tasks]
- **Related Tasks**: [any related-to edges]

[For complex tasks with many phases, optionally spawn graph-analyst to capture comprehensive state:]
```
Task(
    subagent_type="graph-analyst",
    prompt='{"question": "What is the complete state of task <task_id>? Include all phases, their status, and any related tasks."}',
    description="Capture task graph state for handoff"
)
```

## Action Items & Next Steps

[List of action items for next agent to accomplish:]
1. [Next action based on current state]
2. [Following priority action]
3. [Additional tasks identified]

## Other Notes

[Other notes, references, useful information:]
- [Relevant codebase sections]
- [Related documentation]
- [Important context not captured above]
```

### 3. Present to User

After creating the document, respond:

```
Handoff created at: [path from MCP tool]

To commit this handoff:
  git add [task_dir]/handoffs/
  git commit -m "docs: add handoff for [description]"

To resume from this handoff in a new session:
  /kli:resume_handoff [path from MCP tool]
```

## Important Guidelines

- **Be thorough and precise**: Include both top-level objectives and lower-level details
- **More information, not less**: This defines minimum content - add more if needed
- **Avoid excessive code snippets**: Prefer file:line references over large blocks
- **Cross-reference KLI artifacts**: Always link to research.md and plan.md if they exist
- **Specific file references**: Use `file.ext:line` format consistently
- **Concise but complete**: Compact context without losing key details


### Implement


> Implement approved plan phase-by-phase with TDD workflow and verification gates

Execute phased implementation plans using TDD workflow (Red → Green → Refactor), running verification gates (automated + manual), and marking phases complete via `task_complete()` only after all verification passes.

**The kli-implementation skill provides comprehensive guidance on:**
- TDD methodology (Red → Green → Refactor cycle with discipline)
- Design principles (Extensibility, Composability, Parametricity with examples)
- Zero TODOs policy enforcement
- Verification gate requirements (automated before manual, both blocking)
- Deviation handling patterns

## Initial Setup

**Set up task context:**
- If task name provided: `task_bootstrap(task_id)`
- If no parameter: Call `task_get()` to check current task. If none, ask "Which plan would you like to implement?"

**Load plan structure:**
```
task_query("(query \"plan\")")       → Phase structure: phases, status, dependencies
task_query("(query \"plan-ready\")") → Which phases are ready to work on
```

**Check plan status:**
- If all phases completed: "Plan already complete"
- If phases remain: Identify next phase from plan-frontier (phases are ranked by affinity score — higher affinity = better next candidate)

**If plan.md artifact exists:** Read it for detailed success criteria and verification commands.

**Present initial response** with plan overview, phase count, resume point (first ready phase).

## Implementation Process

### Step 0: Load Context

**0a. Activate playbook patterns** (REQUIRED):
```lisp
pq_query('(-> (activate "<brief task description>" :boost (<domain1> <domain2>)) (:take 5))')
```

**0b. Load relevant skills**: Determine what kind of implementation this is. Load any domain-specific skills relevant to the task (e.g., design skills for UI work, language-specific skills for specialized domains).

### Step 1: Execute Phases (Loop Until All Complete)

Use `task_query("(query \"plan-ready\")")` to find the next ready phase. For each phase:

1. **Switch to phase task and announce start**:
   ```
   task_bootstrap("phase-N-<name>")  → Get phase description with changes required and success criteria
   ```
   Show phase name, overview, changes required, success criteria.

2. **Record phase start**: `observe("Starting phase N: <goal>")`

3. **Search Playbook Patterns**: ALWAYS search using `pq_query('(-> (search "<phase topic>") (:take 5))')` and `pq_query('(-> (proven :min 3) (:take 10))')`.

4. **Read Referenced Files FULLY**: Use Read without limit/offset for all mentioned files.

5. **TDD Red - Write Failing Tests**: Create tests that fail for correct reason.
   ```
   observe("TDD Red: Tests written for <what>. Failure reason: <reason>")
   ```

6. **TDD Green - Implement to Pass**: Implement minimum code to make tests pass.
   ```
   observe("TDD Green: Implementation complete. Tests passing.")
   ```

7. **TDD Refactor - Improve While Green**: Apply design principles (Extensibility, Composability, Parametricity). Run tests after EACH change.
   ```
   observe("TDD Refactor: Applied <principle>. Tests still green.")
   ```

8. **Run Automated Verification**: Execute ALL checks from phase description (build, tests, TODO check). If ANY fail, fix immediately and re-run.

9. **Request Manual Verification**: Present to user with automated status, manual checklist, files changed, testing instructions. ALWAYS give evidence for how you have tested that the implementation works. Wait for "approved" or feedback.

    If issues found: Fix, re-run automated verification, request again.

10. **Mark Phase Complete**:
    ```
    observe("Phase N complete. Verification passed. Key outcomes: <summary>")
    task_complete()  # Marks this phase task as completed
    ```

11. **Give Pattern Feedback** (per phase):
    ```lisp
    pq_query('(-> (pattern "<pattern-id>") (:feedback! :helpful "<evidence>"))')
    pq_query('(-> (pattern "<pattern-id>") (:feedback! :harmful "<what went wrong>"))')
    ```

12. **Return to parent and continue**:
    ```
    task_set_current(parent_task_id)
    task_query("(query \"plan-ready\")")  → Find next ready phase
    ```

13. **Handle Deviations**: If code differs from plan, PAUSE and inform user with options. Wait for decision. Record via `observe()`.

### Step 2: All Phases Complete

```
task_set_current(parent_task_id)
observe("Implementation complete. All N phases done. Key challenges: <summary>")
task_set_metadata(key="phase", value="complete")
```

### Step 3: Final Review

1. **Verify all applied patterns have feedback** — give `helpful` or `harmful` for every activated pattern
2. **Record novel insights as observations** — workarounds, anti-patterns, techniques used 2+ times:
   ```
   observe("Implementation insight: <description>. Evidence: <what happened>")
   ```
3. **Do NOT use `(add! ...)`** — pattern promotion goes through `/kli:reflect` (Reflector → Curator)

Present completion summary with next steps (`/kli:validate`, `/core:commit`, `/kli:reflect`).

## Resuming Implementation

1. `task_bootstrap(parent_task_id)` — restores context with plan progress
2. `task_query("(query \"plan\")")` — see all phases with completion status
3. `task_query("(query \"plan-ready\")")` — find next ready phase
4. Continue from Step 1 at the next incomplete phase

## Remember

- Follow **TDD discipline** from kli-implementation skill (Red → Green → Refactor)
- Apply **design principles** from kli-implementation skill
- Run ALL automated verification before requesting manual verification
- **`task_complete()`** marks phase done — replaces ✓ checkmarks in plan.md
- **`task_query("(query \"plan-ready\")")`** finds next phase — replaces ✓ parsing
- **`observe()`** records progress — observations flow through the task event stream
- Wait for manual approval before next phase
- NEVER introduce TODOs (zero TODOs policy)
- Read files FULLY before modifying
- Search playbook patterns for EACH phase
- **Give feedback per phase** on patterns applied
- One phase at a time - complete ALL verification before proceeding
- ALWAYS give proof that a phase was implemented successfully

## See Also

- CLAUDE.md - Task model, PQ/TQ reference, playbook workflow


### Iterate Plan


> Iterate on existing implementation plans with thorough research and updates

You are tasked with updating existing implementation plans based on user feedback. You should be skeptical, thorough, and ensure changes are grounded in actual codebase reality.

## Initial Response

When this command is invoked:

1. **Set up task context**:
   - If task name or path provided: `task_bootstrap(task_id)`
   - If no parameter: Call `task_get()` to check current task. If none, ask user.
   - Use `task_graph(query="plan")` to see the current plan structure (phases, status, dependencies)

2. **Handle different input scenarios**:

   **If NO task/plan identified**:
   ```
   I'll help you iterate on an existing plan.

   Which task's plan would you like to update? Provide the task name or use task_list() to find it.
   ```
   Wait for user input.

   **If task identified but NO feedback**:
   ```
   I've found the plan. Current structure:
   [output of task_graph(query="plan")]

   What changes would you like to make?

   For example:
   - "Add a phase for migration handling"
   - "Update the success criteria to include performance tests"
   - "Adjust the scope to exclude feature X"
   - "Split Phase 2 into two separate phases"
   ```
   Wait for user input.

   **If BOTH task AND feedback provided**:
   - Proceed immediately to Step 1
   - No preliminary questions needed

## Process Steps

### Step 1: Understand Current Plan

1. **Load plan structure from task DAG**:
   - `task_graph(query="plan")` — shows phases, status, dependencies
   - `task_get()` — shows description, goals, observations, metadata
   - If plan.md exists as artifact, read it for detailed criteria

2. **Understand the requested changes**:
   - Parse what the user wants to add/modify/remove
   - Identify if changes require codebase research
   - Determine scope of the update

### Step 2: Research If Needed

**Only spawn research tasks if the changes require new technical understanding.**

If the user's feedback requires understanding new code patterns or validating assumptions:

1. **Record iteration intent**: `observe("Plan iteration: <what user wants changed>")`

2. **Spawn parallel sub-tasks for research**:
   Use the right agent for each type of research:

   **For code investigation:**
   - **codebase-locator** - To find relevant files
   - **codebase-analyzer** - To understand implementation details
   - **pattern-finder** - To find similar patterns

   **For historical context (use PQ queries):**
   - `pq_query('(-> (search "<topic>") (:take 5))')` - Find patterns
   - `pq_query('(-> (proven :min 3) (:take 10))')` - Get proven patterns (helpful >= 3)

   **Be EXTREMELY specific about directories**:
   - Include full path context in prompts
   - Specify exact directories to search

3. **Read any new files identified by research**:
   - Read them FULLY into the main context
   - Cross-reference with the plan requirements

4. **Wait for ALL sub-tasks to complete** before proceeding

### Step 3: Present Understanding and Approach

Before making changes, confirm your understanding:

```
Based on your feedback, I understand you want to:
- [Change 1 with specific detail]
- [Change 2 with specific detail]

My research found:
- [Relevant code pattern or constraint]
- [Important discovery that affects the change]

I plan to update the plan by:
1. [Specific modification to make]
2. [Another modification]

Does this align with your intent?
```

Get user confirmation before proceeding.

### Step 4: Update the Plan

Plans are task DAGs. Update the plan structure using task MCP tools:

1. **Modify the DAG as needed**:
   - **Add phases** (preferred): Use `scaffold-plan!` for multiple phases with dependencies:
     ```
     task_query("(scaffold-plan!
       (new-phase \"Implement new feature\" :after existing-phase)
       (follow-up \"Integration tests\" :after new-phase))")
     ```
   - **Add single phase**: `task_fork(name="implement-new-feature", from=parent_task_id, edge_type="phase-of", description="...")` + add dependency edges with `task_link`. Names are validated for descriptiveness (avoid `P1`, `phase-1`, etc.)
   - **Update phase description**: Switch to phase task with `task_set_current`, then `observe("Updated scope: <changes>")`, switch back
   - **Reorder phases**: Adjust `depends-on` edges with `task_link` / `task_sever`
   - **Remove phase(s)**: Use TQ bulk sever for efficiency, then record the decision:
     ```lisp
     ;; Single phase removal
     task_query("(-> (node \"obsolete-phase\") (:sever-from-parent! :phase-of))")

     ;; Multiple phases at once (replaces multiple task_sever calls)
     task_query("(-> (node \"phase-1\" \"phase-2\" \"phase-3\") (:sever-from-parent! :phase-of))")
     ```
     Then: `observe("Phases removed: <names>. Reason: <why>")`

2. **If plan.md artifact exists**, update it to match the DAG changes:
   - Use the Edit tool for surgical changes
   - Keep all file:line references accurate
   - Update success criteria if needed

3. **Ensure consistency**:
   - Verify with `task_graph(query="plan")` after changes
   - Maintain the distinction between automated vs manual success criteria
   - Include specific file paths for new content

4. **Record the iteration**: `observe("Plan iteration complete: <summary of changes>")`

### Step 5: Review and Complete

1. **Present the changes made**:
   ```
   I've updated the plan for task [task-name].

   Changes made:
   - [Specific change 1]
   - [Specific change 2]

   The updated plan now:
   - [Key improvement]
   - [Another improvement]

   Would you like any further adjustments?
   ```

2. **Be ready to iterate further** based on feedback

## Important Guidelines

1. **Be Skeptical**:
   - Don't blindly accept change requests that seem problematic
   - Question vague feedback - ask for clarification
   - Verify technical feasibility with code research
   - Point out potential conflicts with existing plan phases

2. **Be Surgical**:
   - Make precise edits, not wholesale rewrites
   - Preserve good content that doesn't need changing
   - Only research what's necessary for the specific changes
   - Don't over-engineer the updates

3. **Be Thorough**:
   - Read the entire existing plan before making changes
   - Research code patterns if changes require new technical understanding
   - Ensure updated sections maintain quality standards
   - Verify success criteria are still measurable

4. **Be Interactive**:
   - Confirm understanding before making changes
   - Show what you plan to change before doing it
   - Allow course corrections
   - Don't disappear into research without communicating

5. **Track Progress**:
   - Use `observe()` to record iteration decisions and progress
   - Verify plan DAG with `task_graph(query="plan")` after changes

6. **No Open Questions**:
   - If the requested change raises questions, ASK
   - Research or get clarification immediately
   - Do NOT update the plan with unresolved questions
   - Every change must be complete and actionable

## Success Criteria Guidelines

When updating success criteria, always maintain the two-category structure:

1. **Automated Verification** (can be run by execution agents):
   - Commands that can be run: `make test`, `npm run lint`, `pytest`, `cargo test`, etc.
   - Use your project's existing build/test commands
   - Specific files that should exist
   - Code compilation/type checking

2. **Manual Verification** (requires human testing):
   - UI/UX functionality
   - Performance under real conditions
   - Edge cases that are hard to automate
   - User acceptance criteria

## Sub-task Spawning Best Practices

When spawning research sub-tasks:

1. **Only spawn if truly needed** - don't research for simple changes
2. **Spawn multiple tasks in parallel** for efficiency
3. **Each task should be focused** on a specific area
4. **Provide detailed instructions** including:
   - Exactly what to search for
   - Which directories to focus on
   - What information to extract
   - Expected output format
5. **Request specific file:line references** in responses
6. **Wait for all tasks to complete** before synthesizing
7. **Verify sub-task results** - if something seems off, spawn follow-up tasks

## Example Interaction Flows

**Scenario 1: User provides everything upfront**
```
User: /iterate_plan 2025-10-16-feature - add phase for error handling
Assistant: [Reads plan, researches error handling patterns if needed, updates plan]
```

**Scenario 2: User provides just task name**
```
User: /iterate_plan 2025-10-16-feature
Assistant: I've found the plan. What changes would you like to make?
User: Split Phase 2 into two phases - one for backend, one for frontend
Assistant: [Proceeds with update]
```

**Scenario 3: User provides no arguments**
```
User: /iterate_plan
Assistant: Which task's plan would you like to update? Provide the task name or use task_list() to find it.
User: 2025-10-16-feature
Assistant: I've found the plan. What changes would you like to make?
User: Add more specific success criteria
Assistant: [Proceeds with update]
```


### Plan


> Create detailed implementation plans through iterative planning with artifact reuse

Create detailed, phased implementation plans as task DAGs following KLI methodology.

**The kli-planning skill provides comprehensive guidance on:**
- Research artifact reuse patterns
- Phase design principles (incremental, testable, clear boundaries, 3-7 phases optimal)
- Clarifying question templates (structured with context and options)
- Verification gate patterns (automated + manual, both required)
- Out-of-scope definition strategies
- Phase boundary specification
- Success criteria definition (automated + manual)

## Initial Setup

**When invoked without arguments:**
```
What would you like to create a plan for?

Examples:
- "Add WebSocket support to the API"
- "Refactor the CSS build pipeline"
- "Based on the research task, plan the migration"

What would you like to plan?
```

Wait for user input.

**When invoked with arguments:**
Goal is `$ARGUMENTS`. Proceed to planning.

### Task Setup

Call `task_get()` to check if there's already a current task. If so, use it. If not, create one:

```
task_create(name="plan-<goal>")
task_set_metadata(key="goals", value='["Create phased implementation plan for <goal>"]')
task_set_metadata(key="phase", value="planning")
```

Then call `task_get()` to retrieve the full task state — check for existing artifacts (research.md) and observations.

### Check Existing State

**If the task already has phase children** (check `task_query("(query \"plan\")")`):
```
Found existing plan with N phases (M complete, K pending).

Options:
1. Iterate on plan (modify phases)
2. Start fresh (create new task)

Which would you prefer?
```

**If `task_get()` shows a `research.md` artifact:**
- Read it fully — findings become foundations for the plan
- Token savings: Reusing research.md saves 40-50% tokens vs spawning duplicate sub-agents

## Planning Process

### Step 1: Handle Research Artifact (If Exists)

Read research.md FULLY (no limit/offset). Extract summary, findings, code references, playbook patterns, open questions. Present to user.

If no research: Gather current codebase state by spawning codebase-locator/analyzer as needed.

### Step 2: Activate Playbook Patterns (REQUIRED)

**Before planning**, activate relevant patterns:
```lisp
pq_query('(-> (activate "<planning task description>" :boost (<domain1> <domain2>)) (:take 5))')
```

This uses graph-based retrieval to find patterns for implementation approach, phasing strategies, and verification patterns. The activation is persisted for handoff continuity.

### Step 2.5: Discover Related Prior Work (Optional)

Spawn graph-analyst to find relevant prior tasks:

```
Task(
    subagent_type="graph-analyst",
    prompt='{"question": "What prior tasks relate to <planning goal>? Are there patterns or learnings I should consider?"}',
    description="Find related prior work"
)
```

This surfaces:
- Similar tasks that succeeded or failed
- Patterns that were helpful or harmful for similar work
- Potential dependencies or conflicts with existing work

**When to use:** If the planning goal involves work that may have been attempted before or relates to existing infrastructure.

**When to skip:** If this is clearly novel work with no prior history (e.g., integrating a brand new library).

### Step 3: Decompose into Phases

Break work into incremental phases with clear boundaries. Each phase independently testable.

**Phase design guidance:** See kli-planning skill for:
- Optimal phase count (3-7 phases)
- Phase boundary criteria
- Incremental delivery patterns
- Dependency management

### Step 4: Define Success Criteria

For each phase, specify automated verification (build, tests, TODO check) and manual verification (UI/UX, performance, acceptance).

### Step 5: Ask Clarifying Questions

List uncertainties requiring user input. For each, provide context and concrete options. Wait for responses. Update plan based on answers.

**When to iterate:** See kli-planning skill for exit criteria vs continue criteria.

### Step 6: Define Out-of-Scope

Explicitly list what's NOT being done to prevent scope creep.

### Step 7: Create Plan as Task DAG

Plans are task DAGs, not markdown files. Use TQ's `scaffold-plan!` for efficient creation.

**Present the plan outline to user for approval first.** Then create the DAG:

**Option 1: scaffold-plan! (Recommended for plans with dependencies)**
```
task_query("(scaffold-plan!
  (implement-core-library \"Core library with API surface\")
  (add-integration-layer \"Integration with existing system\" :after implement-core-library)
  (write-test-suite \"Comprehensive test coverage\" :after add-integration-layer))")
```
Creates all phases with dependencies in one expression. Names are validated for descriptiveness.

**Auto-improvement:** Short names like `p1` are auto-improved from descriptions:
- `(p1 "Research architecture")` → creates `research-architecture`

**Option 2: scaffold-chain! (For linear phase sequences)**
```
task_query("(scaffold-chain! \"Setup infrastructure\" \"Implement core logic\" \"Add test coverage\")")
```
Creates a linear dependency chain automatically.

**Option 3: task_fork (For complex custom structures)**
```
task_fork(name="implement-user-authentication", from=current_task_id, edge_type="phase-of",
          description="Implement OAuth2 flow\n\nChanges Required:\n- ...\n\nSuccess Criteria:\n- ...")
```
Use when you need more control over task naming or descriptions. Names are validated.

Verify the DAG: `task_query("(query \"plan\")")` — shows all phases with status, dependencies, and enriched fields (including `:alpha`, `:affinity` for Markov-aware ranking).

**Optionally write plan.md** as a human-readable artifact if the plan is complex enough to warrant it. The task DAG is the source of truth.

### Step 8: Record Planning Decisions

```
observe("Plan complete: N phases created. Key decisions: <decisions>. Open questions resolved: <questions>.")
```

### Step 9: Pattern Feedback

Give feedback on patterns that informed the plan:

```lisp
pq_query('(-> (pattern "<pattern-id>") (:feedback! :helpful "informed phase structure for X"))')
pq_query('(-> (pattern "<pattern-id>") (:feedback! :harmful "didnt apply to this planning context"))')
```

Record planning insights as observations (patterns are promoted during `/kli:reflect`):
```
observe("Planning insight: <description>. Evidence: plan phasing approach")
```

### Step 10: Present Plan to User

```
## Plan Complete: <Goal>

**Phases**: N phases created as task DAG

Plan DAG:
<output of task_query("(query \"plan\")")>

Next step: `/kli:implement` to execute the phases.
Use `task_query("(query \"plan-ready\")")` to see which phases are ready.
```

### Step 11: Iterate Plan (If Needed)

If user requests changes, modify the DAG and record via `observe()`:

**Add phases:**
```lisp
task_fork(name="new-phase", from=current_task_id, edge_type="phase-of", description="...")
```

**Remove phases (bulk sever):**
```lisp
;; Single phase
task_query("(-> (node \"obsolete-phase\") (:sever-from-parent! :phase-of))")

;; Multiple phases at once
task_query("(-> (node \"phase-1\" \"phase-2\") (:sever-from-parent! :phase-of))")
```

**Add dependencies:**
```lisp
task_query("(-> (node \"phase-2\") (:link! \"phase-1\" :depends-on))")
```

Record all changes: `observe("Plan iteration: <what changed and why>")`

## Resuming a Plan

1. `task_bootstrap(task_id)` — restores full context
2. `task_query("(query \"plan\")")` — see all phases with status
3. `task_query("(query \"plan-ready\")")` — see which phases are ready (non-completed)
4. Continue from appropriate step

## Remember

- **Plans are task DAGs** — use `scaffold-plan!` or `task_fork` to create phases
- **task_query("(query \"plan\")")** is the source of truth, not plan.md
- **Reuse research.md** if available (saves 40-50% tokens)
- Ask clarifying questions for ANY ambiguity
- Design phases following principles from kli-planning skill
- Both automated AND manual verification required for each phase
- Define out-of-scope explicitly
- **Give feedback** on patterns that informed the plan
- **Record decisions** via `observe()` — observations flow through the task event stream
- Get user confirmation before finalizing plan

## See Also

- CLAUDE.md - Task model, PQ/TQ reference, playbook workflow


### Reflect


> Reflect on completed task and evolve playbooks

Extract learnings from completed tasks by orchestrating the reflector and curator agents. The event stream IS the observation source — no separate observation files needed.

> **Architecture Note**: This command orchestrates the reflection workflow. The kli-reflection skill provides the methodology (WHAT to evaluate). This command provides the orchestration (HOW to execute).

**The kli-reflection skill provides comprehensive guidance on:**
- Sequential execution pattern (reflector → curator)
- Observation analysis methodology
- Pattern effectiveness evaluation criteria
- Harm signal tier definitions (Tier 1/2/3 with responses)
- Evidence-based learning principles
- New pattern discovery process
- Reflection artifact structure

IMPORTANT: Before starting this workflow you ABSOLUTELY NEED to load the kli-reflection skill, NO EXCEPTIONS.

## Workflow Overview

```
┌──────────────────────────────────────────────────────────────┐
│                    /reflect WORKFLOW                          │
├──────────────────────────────────────────────────────────────┤
│ 1. GATHER STATE    (task_get + timeline for observations)    │
│ 2. REFLECTOR       (analyze observations from event stream)  │
│ 3. CURATOR         (update playbooks via MCP tools)          │
│ 4. REPORT          (combined summary)                        │
└──────────────────────────────────────────────────────────────┘

Task Isolation: Each task is reflected independently.
Cross-cutting knowledge emerges through playbook accumulation.

Two feedback pathways feed into playbooks:
- Curator: Analysis-based updates from reflection.md via PQ mutations
- Real-time: Feedback given via `(:feedback! ...)` during work
```

## Initial Response

When this command is invoked:

**1. Set up task context:**
- If task name provided: `task_bootstrap(task_id)`
- If no parameter: Call `task_get()` to check current task. If none, ask user.

**2. Verify task has sufficient evidence:**

```
task_get()           → Check observations, artifacts, metadata, phase
timeline(limit=50)   → Get full event history with observations
task_graph(query="plan") → Check plan completion status
```

**If task has observations** (from `observe()` calls during work):
- Proceed with reflection — the event stream contains the evidence

**If task has NO observations:**
```
This task has no recorded observations.

Observations are recorded via observe() during KLI commands (/research, /plan, /implement).
Without observations, there's insufficient evidence for pattern effectiveness analysis.

Options:
1. Reflect on what artifacts exist (reduced analysis)
2. Skip reflection for this task
```

**Note**: Not all tasks go through every phase. Simple tasks may skip research. Verify what evidence exists and proceed with available data.

## Step 0.5: Gather Graph Context (Optional)

For complex tasks with many phases, spawn graph-analyst first to get comprehensive graph state:

```
Task(
    subagent_type="graph-analyst",
    prompt='{"question": "What is the complete state of task <task_id>? Include all phases, their status, and any related tasks."}',
    description="Get comprehensive task graph state"
)
```

Pass this graph context to the reflector agent for more informed analysis.

**When to use:**
- Task has 5+ phases
- Task has cross-task dependencies
- Multiple patterns were activated during the task

**When to skip:**
- Simple tasks with 1-3 phases
- No cross-task relationships

## Step 1: Orchestrate Reflector Agent

**Spawn reflector agent as Task:**

Use Task tool with subagent_type: "reflector"

Pass parameters in prompt:
```
Analyze completed task and produce reflection artifact.

Task ID: <task_id>
Task directory: <task_dir from task_get>
Context: <brief task description from task_get>

To get the task's observations and evidence, use these MCP tools:
- task_set_current("<task_id>") to set context
- task_get() to get state with observations (last 3 shown)
- timeline(limit=50) to get ALL observations and events
- task_graph(query="plan") to see phase completion

Also read any artifacts listed in task_get output (research.md, plan.md, etc.)

Evaluate pattern effectiveness with evidence from observations.

Classify harm signals into tiers:
- Tier 1 (Auto-Action): outcome=FAILURE, explicit rejection → auto-increment harmful
- Tier 2 (Flag for Review): excessive iterations, implicit correction → increment with review note
- Tier 3 (Track Only): minor iterations, context mismatch → track but no counter change

Identify new patterns discovered during task.

Generate reflection.md artifact in the task directory with:
- Complete frontmatter
- Patterns applied and effectiveness
- Harm Signals section (tiered)
- Challenges and resolutions
- New patterns discovered
- Playbook update recommendations

Return summary when complete.
```

**Wait for reflector agent to complete.**

## Step 2: Orchestrate Curator Agent

**After reflector returns, spawn curator agent as Task:**

Use Task tool with subagent_type: "curator"

Pass parameters in prompt:
```
Update playbooks based on reflection artifact.

Task directory: <task_dir>
Reflection: <task_dir>/reflection.md

Read reflection.md recommendations.

Update playbook using PQ mutations:
- `(-> (pattern "id") (:feedback! :helpful "evidence"))` for effective patterns
- `(-> (pattern "id") (:feedback! :harmful "evidence"))` for misleading patterns
- `(add! :domain :X :content "...")` for new patterns discovered
- `(-> (pattern "id") (:evolve! "new content" :reason "why"))` for pattern description updates

Process harm signals by tier:
- Tier 1: `(:feedback! :harmful ...)`
- Tier 2: `(:feedback! :harmful ...)` with review note in evidence
- Tier 3: Track only (no feedback call)

Return summary of all changes made.
```

**Wait for curator agent to complete.**

## Step 3: Record and Present Results

```
task_set_current("<original_task_id>")
observe("Reflection complete: <N> patterns evaluated, <M> helpful, <K> harmful, <L> new patterns added")
```

Present to user:
```
Reflection complete!

**Task:** <task_id>

## Reflection Analysis
- Reflection: <task_dir>/reflection.md
- Patterns evaluated: <N> patterns
- Harm signals detected:
  - Tier 1 (auto-action): <N>
  - Tier 2 (flagged): <M>
  - Tier 3 (tracked): <K>

## Playbook Updates (Curator)
- Helpful incremented: <N> patterns
- Harmful incremented: <M> patterns
- New patterns added: <K> patterns

Review reflection.md for full analysis.
```

## Important Notes

- **Task isolation** — each task is reflected independently
- **Event stream is source of truth** — observations from `observe()` calls, surfaced by `task_get()` and `timeline()`
- **No observation files required** — observations flow through the task event stream via `observe()`
- **Sequential execution** — Reflector → Curator (dependencies)
- **Playbook updates via PQ** — `(:feedback! ...)`, `(add! ...)`, `(:evolve! ...)` (not file edits)
- **Cross-cutting knowledge** — emerges through playbook accumulation

## Error Handling

**If task has no observations:**
- Offer reduced analysis from artifacts only
- Or skip reflection

**If reflector agent fails:**
- Present error details
- Offer to retry

**If curator agent fails:**
- Note that reflection.md was created
- Offer to run curator manually or apply updates via playbook MCP tools directly

## Remember

You are an **orchestrator** for per-task reflection. Key responsibilities:
1. Gather task state and observations via `task_get()` + `timeline()`
2. Delegate analysis to reflector (reads event stream)
3. Delegate playbook updates to curator (uses playbook MCP tools)
4. Record results via `observe()`
5. Present combined results

Cross-cutting knowledge accumulates in playbooks over many reflections.


### Research


> Document codebase as-is through iterative research with observation capture

Research the codebase or external topics by delegating to sub-agents.

**The kli-research skill provides comprehensive guidance on:**
- Documentarian philosophy (document what IS, not what SHOULD BE)
- Error amplification principles (research errors amplify 1000x downstream)
- Research decomposition patterns
- Exit criteria evaluation (when research is complete)

## Research Strategies

Five research strategies are available. Bundled agents are always present; other capabilities use the most specialized available agent type, falling back to general-purpose.

| Strategy | Keywords | Approach |
|----------|----------|----------|
| **codebase** | "how", "where", "implementation", "code", "architecture" | Bundled: codebase-locator, codebase-analyzer, pattern-finder |
| **visual** | "design", "UI", "visual", "component", "inspiration", "peer" | Sub-agent: describe visual research goal |
| **github** | "repo", "repository", "github.com", "open source", "package source" | Sub-agent: describe repo analysis goal |
| **external** | "library docs", "framework", "documentation", "how to use X" | Sub-agent: describe web research goal |
| **graph** | "prior tasks", "patterns for", "project health", "task history", "related tasks", "what has been done" | Bundled: graph-analyst |

## Initial Setup

When invoked without arguments, respond:
```
I'm ready to research. Please provide your research question:

- **Codebase research**: "How does authentication work?", "Where are API endpoints?"
- **Visual research**: "Find modern card component examples", "Analyze nordic design trends"
- **GitHub research**: "Analyze the tokio-rs/tokio repository", "Map the Next.js repo structure"
- **External docs**: "How does React Server Components work?", "Redis caching best practices"
- **Graph research**: "What prior tasks relate to MCP?", "What patterns have been effective for Lisp?", "Project health?"
- **Hybrid**: "How should we improve our navigation based on best practices?"
```

Wait for user's research query.

## Research Process

### Step 0: Set Up Task Context

Call `task_get()` to check if there's already a current task. If so, use it. If not, create one:

```
task_create(name="research-<description>")
task_set_metadata(key="goals", value='["Research <question>", "Document findings with file:line evidence"]')
task_set_metadata(key="phase", value="research")
```

Then call `task_get()` to retrieve the full task state including any existing observations and artifacts.

### Step 0.5: Activate Playbook Patterns (REQUIRED)

**Before researching**, activate relevant patterns:
```lisp
pq_query('(-> (activate "<research question>" :boost (<domain1> <domain2>)) (:take 5))')
```

This retrieves prior learnings and patterns that may inform the research. The activation is persisted for handoff continuity.

### Step 1: Classify Research Type

Analyze the query to determine strategy:

**Codebase keywords**: "how", "where", "implementation", "code", "architecture", "what calls", "imports"
**Visual keywords**: "design", "UI", "visual", "component", "inspiration", "peer", "navigation examples", "modern"
**GitHub keywords**: "repo", "repository", "github.com/", "open source", "package source", "analyze X repo"
**External keywords**: "docs", "documentation", "library", "framework", "how to use", "best practices for X library"
**Graph keywords**: "prior tasks", "prior work", "task history", "patterns for", "pattern effectiveness", "what patterns", "project health", "task health", "graph health", "related tasks", "what relates to", "dependencies", "stale", "blocked", "orphan", "what has been done for", "similar tasks"

### Step 3: Spawn Research Agents

**For codebase research**, spawn appropriate agents based on the question:

```
# For locating files/components
Task(
    subagent_type="codebase-locator",
    prompt="<research question>. Task dir: <task_dir>",
    description="Locate relevant files"
)

# For deep analysis of specific components
Task(
    subagent_type="codebase-analyzer",
    prompt="<research question>. Task dir: <task_dir>",
    description="Analyze implementation"
)

# For finding similar patterns
Task(
    subagent_type="pattern-finder",
    prompt="<research question>. Task dir: <task_dir>",
    description="Find related patterns"
)
```

**For web/external research** (documentation, articles, best practices):

Select the most specialized available agent type for web research;
fall back to general-purpose.

```
Task(
    subagent_type=<most specialized available for web research>,
    prompt="Research goal: <what to find and why>.
            Topics: <specific libraries, concepts, or questions>.
            Return: summary of findings with source URLs.",
    description="Web research: <topic>"
)
```

**For GitHub repository research:**

Select the most specialized available agent type for repository analysis;
fall back to general-purpose.

```
Task(
    subagent_type=<most specialized available for repo analysis>,
    prompt="Analyze the <owner>/<repo> repository.
            Focus: <structure | architecture | specific component>.
            Return: key files, architecture summary, patterns found.",
    description="Analyze <owner>/<repo>"
)
```

**For visual/design research** (UI patterns, inspiration, branding):

Select the most specialized available agent type for visual/design research;
fall back to general-purpose.

```
Task(
    subagent_type=<most specialized available for visual research>,
    prompt="Research visual design patterns for <goal>.
            Analyze: <URLs, site types, or design domains to examine>.
            Return: patterns found, layout/color/typography analysis.",
    description="Visual research: <topic>"
)
```

**For graph-based research (task/pattern graphs):**

```
Task(
    subagent_type="graph-analyst",
    prompt='{"question": "<research question about tasks, patterns, or project health>"}',
    description="Query task/pattern graphs"
)
```

Use for questions about prior work, pattern effectiveness, task relationships, or project health. The graph-analyst queries TQ (task graph) and PQ (pattern graph) to answer from the graph perspective.

**Capability selection guidance:**
- `codebase-locator` for "where is X?" questions (bundled, always available)
- `codebase-analyzer` for "how does X work?" questions (bundled, always available)
- `pattern-finder` for "how is X done elsewhere?" questions (bundled, always available)
- `graph-analyst` for prior tasks, patterns, project health (bundled, always available)
- Sub-agent for web research: external docs, articles, best practices
- Sub-agent for repo analysis: GitHub repository structure and architecture
- Sub-agent for visual research: design patterns, UI inspiration, branding

The agents will:
1. Research the question using their specialized tools
2. Generate findings in the task directory
3. Return with status, findings, and evidence

### Step 4: Handle Hybrid Research

If the query needs BOTH strategies (e.g., "How should we redesign our navigation?"):

1. First spawn codebase agents (bundled) to understand current implementation
2. Then spawn a sub-agent for visual/external research describing what patterns to find
3. Combine results in a summary

### Step 5: Present Results

After agents return:

1. Record key findings: `observe("Research findings: <summary of key discoveries>")`
2. Write research.md to the task directory (path from `task_get()`)
3. Present to user:
   - Status (success/partial/failure)
   - Key findings summary
   - Evidence references (file:line for codebase, screenshots for visual)
   - Suggested next steps

### Step 6: Record Research Findings as Observations

Research produces observations (system-specific findings), not patterns (reusable techniques). Record all discoveries via `observe()`:

```
observe("Research finding: <discovery with file:line evidence>")
observe("Architecture insight: <how X works based on code analysis>")
observe("Constraint found: <limitation discovered with evidence>")
```

**Do NOT use `(add! ...)` during research.** Research follows the documentarian philosophy — document what IS, not prescribe what to DO. Findings are observations by nature. Reusable patterns emerge later during reflection (`/kli:reflect`), which applies the litmus test:
- **Transferable**: Would help on a *different* project?
- **Actionable**: Says "when X, do Y" (not "X exists")?
- **Prescriptive**: Gives advice, not description?

**What to record as observations:**
- Architectural discoveries (how the system works)
- Anti-patterns found (things that don't work)
- Workarounds that succeeded
- Cross-cutting concerns observed

### Step 7: Handle Follow-Up Questions

**Simple clarification:** Re-spawn relevant agent with refined question
**Component extraction:** For visual research, ask if user wants code extracted
**Full iteration:** Spawn agents again with follow-up context

## CRITICAL: Delegation Required

**DO NOT research the codebase yourself.** You MUST delegate to specialized agents.

❌ **WRONG**: Using Read, Grep, Glob, or Search to investigate the question directly
❌ **WRONG**: Answering the research question from your own knowledge

✅ **CORRECT**: Spawn specialized agents via Task tool and let them do the work

The agents:
- Have access to appropriate tools for their specialty
- Return structured results
- Track findings in the task directory

**Your job**: Set up the task directory, spawn appropriate agents, synthesize and present results.

## Remember

- Follow **documentarian philosophy** from kli-research skill
- **ALWAYS delegate to specialized agents** - never research directly
- Auto-detect strategy from query keywords
- Use hybrid mode for questions needing both perspectives
- **Record findings** as observations via `observe()` — patterns are promoted during `/kli:reflect`
- **Record findings** via `observe()` — observations flow through the event stream
- Get user validation before marking complete
- No placeholder values in artifacts

## See Also

- CLAUDE.md - Task model, PQ/TQ reference, playbook workflow


### Resume Handoff


> Resume work from handoff document with context analysis and validation

You are tasked with resuming work from a handoff document through an interactive process.

## Context

These handoffs contain critical context, learnings, and next steps from previous work sessions that need to be understood and continued.

## Initial Response

When this command is invoked:

### 1. If the path to a handoff document was provided

**Example**: `/kli:resume_handoff <task-dir>/handoffs/2025-10-26_14-30-00_phase-1.md`

- Skip the default message
- Immediately read the handoff document FULLY (no limit/offset)
- Immediately read any research or plan documents it links to under "Critical References"
- Do NOT use a sub-agent to read these critical files
- Derive the task ID from the path (the task name is the directory containing `handoffs/`)
- Set current task via `task_bootstrap` so subsequent MCP calls have context
- Begin analysis process by ingesting context
- Propose course of action to user and confirm

### 2. If a task name was provided

**Example**: `/kli:resume_handoff 2025-10-26-handoff-commands`

- Bootstrap task: `task_bootstrap(task_id="2025-10-26-handoff-commands")` — sets context + returns state with artifacts
- Check timeline: `timeline(limit=20)` — look for `:handoff.create` events which contain the handoff path
- Also glob for handoff files using the task directory from `task_get()`: `<task_dir>/handoffs/*.md`
- If no handoffs exist: "I can't find any handoff documents in this task. Would you like to create one with /kli:handoff?"
- If one handoff: Proceed with that handoff
- If multiple handoffs: Use the most recent (by timestamp in filename YYYY-MM-DD_HH-MM-SS)
- Read handoff FULLY and linked artifacts
- Begin analysis process

### 3. If no parameters provided

**Example**: `/kli:resume_handoff`

**Step 1: Use Session Start Context (Already Injected)**

The session start hook injects task context at the beginning of each session. Look for this in the conversation startup:

```
TASK[1]{dir,phase,last_artifact}:
 <task-dir>,Phase 2: Implementation,research.md
```

- If `TASK[1]` context exists: Extract the task directory from it (first field before comma)
- This context is **already in your conversation** - no file reading needed
- The session start hook has done the discovery work for you

**Step 2: Find Handoffs for Active Task**

If active task found:

1. Bootstrap task: `task_bootstrap(task_id)`
2. Check timeline for `:handoff.create` events: `timeline(limit=30)`
3. Also glob for handoff files: `<task_dir>/handoffs/*.md`
4. If handoffs exist:
   - Use most recent by filename timestamp (YYYY-MM-DD_HH-MM-SS)
   - Announce: "Found active task with handoff. Resuming from `{handoff_path}`..."
   - Proceed to read handoff and linked artifacts
5. If no handoffs but task exists:
   - Get task state with `task_get()` for observations and artifacts
   - Ask: "Found active task but no handoffs. Read observations instead? [Y/n]"

**Step 3: Fallback (No TASK Context)**

If no `TASK[1]` in session startup (rare - usually means no active task):

```
I'll help you resume work from a handoff document.

Please provide either:
- Full path: `/kli:resume_handoff <path-to-handoff-file>`
- Task name: `/kli:resume_handoff 2025-10-26-task-name`

Or I can help you find available handoffs. What would you like to do?
```

Then wait for user input.

## Process Steps

### Step 1: Read and Analyze Handoff

1. **Restore task context and activate relevant patterns**:
   ```
   task_bootstrap(task_id)
   pq_query('(-> (activate "<task topic>" :boost (<domain1> <domain2>)) (:take 5))')
   ```
   The activate query uses task topic + graph context to surface semantically relevant patterns.

2. **Read handoff document completely**:
   - Use Read tool WITHOUT limit/offset parameters
   - Extract all sections:
     - Task(s) and their statuses
     - Critical References
     - Recent changes
     - Learnings
     - Artifacts
     - Action items and next steps
     - Other notes

2. **Read referenced artifacts**:
   - Read all files mentioned in "Critical References" section
   - Read research.md if referenced
   - Read plan.md if referenced
   - Read any other critical files mentioned
   - Use Read tool FULLY for each file

3. **Verify current state** (read-only validation):
   - Check if mentioned files still exist at specified paths
   - Use `task_get()` to see current task state vs handoff state
   - Check `timeline(limit=20)` for recent activity since handoff
   - Check if recent changes are still present
   - Note any discrepancies between handoff and current state

4. **Verify graph state** (if task has phases):
   Spawn graph-analyst to check current graph state:

   ```
   Task(
       subagent_type="graph-analyst",
       prompt='{"question": "What is the current state of task <task_id>? Are there any stale phases, new blocking dependencies, or changes since the handoff?"}',
       description="Verify task graph state"
   )
   ```

   This catches:
   - Phases completed by another session since handoff
   - New blocking dependencies introduced
   - Related tasks that were created
   - Changes to task metadata or goals

### Step 2: Synthesize and Present Analysis

Present comprehensive analysis to user:

```
I've analyzed the handoff from [date] for task [name].

**Original Tasks:**
- [Task 1]: [Status from handoff] → [Current verification]
- [Task 2]: [Status from handoff] → [Current verification]

**Key Learnings from Handoff:**
- [Learning 1 with file:line reference]
- [Learning 2 with pattern discovered]

**Recent Changes Status:**
- [Change 1] - [Verified present/Missing/Modified]
- [Change 2] - [Verified present/Missing/Modified]

**Critical Artifacts Reviewed:**
- [research.md]: [Key findings summary]
- [plan.md]: [Phase status summary]

**Graph State** (if task has phases):
- [Graph-analyst findings about current state]
- [Phases completed since handoff]
- [New blocking dependencies]
- [Related tasks created]

**Recommended Next Actions:**
Based on the handoff's action items:
1. [Most logical next step]
2. [Second priority]
3. [Additional tasks]

**Discrepancies Found** (if any):
- [File mentioned but not found]
- [Change mentioned but different]
- [State mismatch]

Shall I proceed with [recommended action 1], or would you like to adjust the approach?
```

Get user confirmation before proceeding.

### Step 3: Create Action Plan

1. **Use TodoWrite to create task list**:
   - Convert action items from handoff into todos
   - Add any new tasks discovered during analysis
   - Prioritize based on dependencies

2. **Present the plan**:
   ```
   I've created a task list based on the handoff:

   [Show todo list]

   Ready to begin with the first task?
   ```

### Step 4: Begin Work

1. Start with first approved task
2. Reference learnings from handoff throughout work
3. Apply patterns discovered in handoff
4. Update progress as tasks complete

## Guidelines

1. **Be Thorough in Analysis**:
   - Read entire handoff document
   - Verify ALL mentioned changes exist
   - Check for regressions or conflicts
   - Read all referenced artifacts

2. **Be Interactive**:
   - Present findings before starting work
   - Get buy-in on approach
   - Allow course corrections
   - Adapt based on current vs handoff state

3. **Leverage Handoff Wisdom**:
   - Pay attention to "Learnings" section
   - Apply documented patterns
   - Avoid repeating mistakes mentioned
   - Build on discovered solutions

4. **Validate Before Acting**:
   - Never assume handoff state matches current
   - Verify file references still exist
   - Check for breaking changes since handoff
   - Confirm patterns still valid

5. **Avoid Unnecessary Sub-Agents**:
   - Read files directly in main context
   - Only spawn agents if complex verification needed
   - Most handoff resumption is straightforward reading

## Common Scenarios

**Clean Continuation**:
- All changes present, no conflicts
- Proceed with recommended actions

**Diverged Codebase**:
- Some changes missing or modified
- Reconcile differences and adapt plan

**Incomplete Work**:
- Tasks marked "in_progress"
- Complete unfinished work first

**Stale Handoff**:
- Significant time passed
- Re-evaluate strategy based on current state


### Resume Task


> Resume work on a task by gathering context from event stream and graph state

Resume work on a task by gathering context from the task MCP server's event stream, graph state, and artifacts. Unlike `/kli:resume_handoff` which requires a handoff document, this command works directly with the task's live state.

## When to Use

- **No handoff exists** — work was interrupted without creating a handoff
- **Picking up where you left off** — same session or new session
- **Checking task status** — understand what's been done and what's next
- **Exploring a task** — unfamiliar with a task and need context

Use `/kli:resume_handoff` instead when a handoff document exists and you want to follow its specific guidance.

## Initial Response

When this command is invoked:

### 1. If task ID provided

**Example**: `/core:resume-task 2026-01-31-coalgebraic-task-infrastructure`

- Bootstrap the task: `task_bootstrap(task_id="2026-01-31-coalgebraic-task-infrastructure")`
- This single call sets current task, emits session.join, and returns:
  - Full computed state (description, observations, artifacts, metadata)
  - Graph neighbors (related tasks, dependencies)
  - Playbook query (enriched semantic query)
  - Handoff document (if one exists)
- Proceed to context gathering

### 2. If no parameter provided

**Example**: `/core:resume-task`

**Step 1: Check Session Start Context**

The session start hook injects task context at conversation startup. Look for:

```
TASK[1]{dir,phase,last_artifact}:
 <task-dir>,Phase 2: Implementation,research.md
```

If `TASK[1]` exists:
- Extract task ID from the first field (e.g., `2025-12-12-task-name`)
- Bootstrap: `task_bootstrap(task_id="2025-12-12-task-name")`
- Proceed to context gathering

**Step 2: Check for Current Task**

If no session context, call `task_get()` to check if a current task is already set.

If a task is current:
- Announce: "Resuming current task: `<task-id>`"
- Proceed to context gathering

**Step 3: List Available Tasks**

If no current task, use TQ to find recent active tasks:

```
task_query('(-> (query "recent") (:take 10) (:select :display-name :crdt-status :obs-count :alpha :affinity))')
```

Present the list and ask which task to resume.

## Context Gathering

Once a task is identified, gather comprehensive context using task MCP tools.

### Step 1: Get Core State

The `task_bootstrap` call already provides:
- **State**: description, status, claim, sessions, observations, artifacts, metadata
- **Neighbors**: typed edges to related tasks
- **Playbook query**: enriched semantic query for pattern activation

If you used `task_set_current` + `task_get` separately, you have the same information.

### Step 2: Get Timeline

Retrieve recent events to understand activity:

```
timeline(limit=30)
```

This shows:
- Recent observations
- Session joins/leaves
- Artifact registrations
- Metadata changes
- Handoff creations (`:handoff.create` events)

### Step 3: Check Plan Structure (If Task Has Phases)

If the task has children (phases), query the plan:

```
task_graph(query="plan")           # Full plan structure
task_graph(query="plan-frontier")  # Which phases are ready
```

This reveals:
- Phase completion status (completed vs active)
- Dependency ordering
- Which phases are unblocked and ready to work on (ranked by affinity score)
- Any blocked phases waiting on dependencies

### Step 4: Check Graph Health

Query task health to identify issues:

```
task_health()
```

Or spawn graph-analyst for deeper analysis:

```
Task(
    subagent_type="graph-analyst",
    prompt='{"question": "What is the current state of task <task_id>? Are there stale phases, blocked work, or issues I should know about?"}',
    description="Analyze task graph state"
)
```

### Step 5: Activate Relevant Patterns

Use the enriched query from bootstrap to get relevant playbook patterns:

```
pq_query('(-> (activate "<enriched_query>" :boost (<domain1> <domain2>)) (:take 5))')
```

This surfaces patterns that are semantically relevant to this task's topic and its graph neighbors.

### Step 6: Read Critical Artifacts

If the task has registered artifacts, read them:
- `research.md` — prior research findings
- `plan.md` — detailed plan document (if exists alongside DAG)
- Recent handoffs — if `:handoff.create` events exist in timeline

Read artifacts FULLY without limit/offset to get complete context.

## Synthesis and Presentation

Present your analysis to the user:

```
## Task: [Task Name]

**Status**: [active/completed] | **Claim**: [held by session/unclaimed]
**Created**: [date] | **Sessions**: [count]

### Goals
[from metadata.goals]

### Current State

[Summary of what has been accomplished based on observations and artifacts]

### Plan Progress (if phases exist)

[X/Y] phases complete | [Z] ready to work on

**Completed:**
- ✓ Phase 1: [name]
- ✓ Phase 2: [name]

**Ready:**
- ○ Phase 3: [name] — [brief description]

**Blocked:**
- ○ Phase 4: [name] — waiting on Phase 3

### Recent Activity

[Last 3-5 significant events from timeline]

### Relevant Patterns

[Top 2-3 patterns from playbook activation]

### Artifacts

[List of registered artifacts with brief descriptions]

### Graph Context

[Related tasks, dependencies, what this enables]

### Recommended Next Steps

Based on the task state, I recommend:

1. **[Most logical next action]** — [why]
2. **[Second priority]** — [why]
3. **[Additional consideration]** — [why]

Shall I proceed with [recommended action 1]?
```

Get user confirmation before taking action.

## Special Cases

### Task Has No Observations

```
This task was created but has no recorded observations yet.

Goals: [from metadata]
Description: [from state]

Would you like me to:
1. Start researching this task (/kli:research)
2. Create a plan (/kli:plan)
3. Just observe the current state and proceed
```

### Task Is Completed

```
This task is marked as completed.

Completed at: [timestamp if available]
Sessions: [list]
Observations: [count]
Artifacts: [list]

To continue working on it, I would need to reopen it first.

Would you like me to:
1. Reopen the task and continue
2. Create a new related task for follow-up work
3. Just review the completed work
```

### Task Has Handoff Documents

If `:handoff.create` events exist in timeline:

```
This task has [N] handoff document(s):
- [path1] — [timestamp]
- [path2] — [timestamp]

Would you like me to:
1. Resume from the latest handoff (/kli:resume_handoff)
2. Ignore handoffs and work from live task state
```

### Task Is Stale

If task hasn't had activity in a long time (check session timestamps):

```
This task hasn't been worked on since [date].

The codebase may have changed significantly. Consider:
1. Re-validating any existing plan
2. Re-checking file references in artifacts
3. Running /kli:validate if implementation was in progress
```

## Guidelines

1. **Task Bootstrap is Canonical**
   - `task_bootstrap` is the primary entry point — it does everything in one call
   - Use it instead of multiple separate calls when starting fresh

2. **TQ for Complex Queries**
   - Use `task_query` for complex graph traversals
   - Examples:
     - `(-> (current) (:follow :phase-of) :ids)` — get phases of current task
     - `(-> (current) (:back :depends-on) :ids)` — what depends on this task
     - `(-> (query "plan-ready") :enrich (:sort :affinity) (:take 3))` — ready phases ranked by affinity

3. **Timeline Over Artifacts**
   - The timeline is the source of truth
   - Artifacts are useful but observations in the event stream are more current

4. **Don't Assume Handoff State**
   - Unlike resume_handoff, don't expect handoff document guidance
   - Work from the live task state

5. **Interactive Confirmation**
   - Always present analysis before taking action
   - Let user choose the next step
   - Don't auto-proceed with implementation

## Comparison with resume_handoff

| Aspect | resume-task | resume_handoff |
|--------|-------------|----------------|
| Input | Task ID or current task | Handoff document path |
| Source of truth | Event stream + graph | Handoff markdown |
| Guidance | Inferred from state | Explicit next steps |
| When to use | No handoff exists | Handoff was created |
| Context depth | Comprehensive (all tools) | Focused (handoff content) |

## Integration with Workflow

```
[Task Created]
     ↓
/kli:research → /kli:plan → /kli:implement → /kli:validate → /kli:reflect
     ↑                                              ↓
     └──────────── /core:resume-task ←─────────────┘
                  (re-enter anywhere)
```

`/core:resume-task` is the universal re-entry point for any task, at any stage.


### Validate


> Validate implementation against plan, verify success criteria, identify issues

You are tasked with validating that an implementation plan was correctly executed, verifying all success criteria and identifying any deviations or issues.

## Context

Task state gathered via MCP:

Call `task_get()` to retrieve current task state including description, phase, observations, artifacts, and graph context. If no current task, use `task_list()` to find available tasks.

## Initial Setup

When invoked:

1. **Set up task context**:
   - If task name provided: `task_bootstrap(task_id)`
   - If no parameter: Call `task_get()` to check current task, or `task_list()` to find available tasks
   - Ask if no task can be determined: "Which task should I validate?"

2. **After getting task context**:
```
I'll validate task: [Task Name]

Loading task state via task_get() and task_graph(query="plan").

I'll verify:
1. All phases are complete
2. Automated checks pass
3. Code matches plan specifications
4. Manual verification is clear

Starting validation...
```

## Validation Process

### Step 1: Load Task State and Plan

**Retrieve task state and plan structure:**
```
task_get()                      → Full state: description, observations, artifacts, metadata
task_graph(query="plan")        → Phase structure: phases, status, dependencies
task_graph(query="plan-frontier") → Which phases are ready/completed
timeline(limit=20)              → Recent activity and observations
```

**Read task artifacts** (from artifacts list in task_get output):
- Read plan.md if registered as artifact (for verification commands and criteria)
- Read research.md if registered (for additional context)

**Extract validation context:**

From task state (`task_get`):
- Task description, goals, and phase metadata
- Observations from all phases (implementation decisions, challenges)
- Registered artifacts (what files were produced)

From plan DAG (`task_graph(query="plan")`):
- Phase completion status (completed vs active)
- Phase dependencies and ordering
- Success criteria from phase descriptions

**Identify scope:**
- Which files should have been modified? (from phase descriptions and artifacts)
- What functionality should exist? (from task goals)
- What tests should pass? (from phase success criteria)
- What patterns should be followed? (from observations)

**Verify Plan DAG Health:**

Spawn graph-analyst to verify plan integrity:

```
Task(
    subagent_type="graph-analyst",
    prompt='{"question": "Is the task plan DAG healthy? Are there stale phases, orphan tasks, or broken dependencies?"}',
    description="Verify plan DAG health"
)
```

This catches:
- Phases marked complete that still have incomplete dependencies
- Orphan phases not connected to the main task
- Stale phases that should be addressed
- Blocked tasks that might be unblocked now
- Missing Markov transition edges between related tasks
- Unorganized tasks (below observation threshold)

Include DAG health findings in the validation report.

### Step 2: Verify Automated Criteria

For each phase's "Automated Verification" section:

1. **Extract commands** from plan
   - Example: `npm run build`, `cargo build`, `go build ./...`
   - Example: `make test`, `pytest`, `cargo test`
   - Example: `npm run lint`, `eslint src/`

2. **Run each command**:
   - Execute exactly as specified in plan
   - Capture output (success/failure)
   - Note any warnings or errors

3. **Document results**:
   ```
   ✓ Phase 1 Automated Checks:
     ✓ Build succeeded
     ✓ tests pass (24 passed, 0 failed)

   ⚠️ Phase 2 Automated Checks:
     ✓ Build succeeded
     ✗ Linting failed (3 warnings in src/handler.py:42)
   ```

### Step 3: Code Review Against Plan

**Compare implementation to plan specifications:**

1. **Read mentioned files** from plan "Changes Required" sections
2. **Cross-reference with research.md** (if available):
   - Compare implementation against patterns documented in research
   - Verify code references from research were followed
   - Check if open questions from research were addressed

3. **Verify changes match plan**:
   - Were specified functions added/modified?
   - Does structure match plan?
   - Are there unexpected changes?
   - Do changes follow patterns from research.md?

4. **Spawn analyzer agents ONLY if needed**:
   Execute this step ONLY if:
   - Artifacts (plan.md + research.md) don't provide sufficient context, OR
   - Complex verification is needed beyond what artifacts document, OR
   - Inconsistencies found that require deeper analysis

   If spawning agents:
   - Use **codebase-analyzer** to verify complex changes
   - Use **pattern-finder** to check consistency
   - Provide context from artifacts to focus agent analysis

5. **Document findings**:
   ```
   Matches Plan:
   - Database migration added table as specified
   - API endpoints implement correct methods
   - Error handling follows plan pattern

   Deviations:
   - Used different variable name (minor)
   - Added extra validation (improvement)

   Potential Issues:
   - Missing index could impact performance
   - No rollback handling mentioned
   ```

### Step 4: Assess Manual Verification

**Review manual criteria from plan:**

1. **List what needs manual testing**:
   - UI functionality checks
   - Performance testing
   - Edge case verification
   - Integration testing

2. **Ensure criteria are clear and actionable**:
   - Can a developer follow these steps?
   - Are expected results specified?
   - Are edge cases covered?

3. **If criteria are vague**, suggest improvements

### Step 5: Generate Validation Report

**Present comprehensive findings:**

```markdown
## Validation Report: [Task Name]

**Task**: [task_id from task_get()]
**Date**: [Current date]
**Commits**: [git commit range if identifiable]

### Phase Completion Status

✓ Phase 1: [Name] - Complete
✓ Phase 2: [Name] - Complete
⚠️ Phase 3: [Name] - Issues found (see below)

### Plan DAG Health

**Graph-analyst findings:**
- ✓ No stale phases detected
- ✓ No orphan tasks
- ⚠️ 1 blocked task waiting on external dependency

(Include specific findings from graph-analyst output)

### Automated Verification Results

**Phase 1:**
✓ Build succeeds
✓ Tests pass (24 passed, 0 failed)

**Phase 2:**
✓ Build succeeds
✗ Linting: 3 warnings in src/handler.py:42-45
  - Warning: unused variable 'x'
  - Warning: missing type annotation

**Phase 3:**
✓ Integration tests pass

### Code Review Findings

#### Verified Against Research:
(If research.md exists)
- Follows pattern documented in research.md section X
- Code references from research (file:line) were followed
- Open questions from research addressed appropriately
- Implementation consistent with research findings

#### Matches Plan Specifications:
- Database migration correctly adds `users` table
- API endpoints implement specified REST methods
- Error handling follows documented pattern
- Test coverage added as planned

#### Deviations from Plan:
- **src/handler.py:42**: Used different approach than planned (minor, arguably better)
- **src/validator.py:89**: Added extra input validation (improvement, not in plan)
- **Naming**: Used `processRequest` instead of `handleRequest` (inconsistent)

#### Potential Issues:
- **Performance**: Missing index on foreign key `user_id` could impact queries
- **Error handling**: Migration has no rollback procedure
- **Documentation**: New API endpoints not documented
- **Edge case**: No handling for empty input in `processRequest`

### Manual Verification Assessment

**From Plan - Clear and Actionable:**
- [ ] Verify feature appears correctly in UI dashboard
- [ ] Test with >1000 users to check performance
- [ ] Confirm error messages are user-friendly

**From Plan - Needs Clarification:**
- [ ] "Test edge cases" - Which edge cases specifically?
  - Suggestion: Empty input, max length input, special characters

**Additional Manual Testing Recommended:**
- [ ] Verify integration with existing auth system
- [ ] Test rollback procedure for migration
- [ ] Check API documentation is updated

### Summary

**Overall Status**: ⚠️ **Implementation mostly complete, minor issues found**

**Blockers**: None

**Warnings**:
- 3 linting warnings should be addressed
- Missing index could impact production performance
- Documentation gaps exist

**Recommendations**:
1. Fix linting warnings before merge
2. Add index on `user_id` or document performance trade-off
3. Add API documentation for new endpoints
4. Clarify manual test cases for edge cases

**Ready for Reflection?** ✓ Yes, but fix linting warnings first

**Ready for PR?** ⚠️ After addressing warnings and documentation
```

## Special Cases

### Plan Not Found

```
Could not find a plan for this task. No phases found via task_query("(query \"plan\")").

Did you mean one of these recent tasks?
[List from task_list() or task_query("(query \"recent\")")]

Please provide the correct task name.
```

### No Checkmarks in Plan

```
The plan has no phase checkmarks yet.

Options:
1. Run /implement to execute the plan
2. If implementation is done but plan not updated, I can validate anyway
3. If validation shows implementation is complete, I can update the plan

How should I proceed?
```

### Validation Failures

```
⚠️ VALIDATION FAILURES DETECTED

Critical Issues:
- Build fails: build command returned error code 1
- Tests failing: 5/24 tests fail
- Missing implementation: Phase 3 not started

Recommendations:
1. Fix build errors before proceeding
2. Debug failing tests
3. Complete Phase 3 implementation

Cannot proceed to reflection until these are resolved.

Would you like me to help debug these issues?
```

## Important Guidelines

1. **Be thorough but practical**:
   - Focus on what matters for correctness
   - Don't nitpick trivial style differences
   - Highlight real issues that affect functionality

2. **Run all automated checks**:
   - Never skip verification commands
   - If a command fails, investigate why
   - Report failures clearly with error messages

3. **Think critically**:
   - Does implementation actually solve the problem?
   - Are there edge cases not handled?
   - Could this break existing functionality?

4. **Be constructive**:
   - Frame issues as opportunities to improve
   - Suggest solutions, not just problems
   - Acknowledge what was done well

5. **Consider maintainability**:
   - Is code readable and well-structured?
   - Are patterns consistent with codebase?
   - Will future developers understand this?

## Validation Checklist

Always verify:
- [ ] All phases marked complete in plan
- [ ] Plan DAG is healthy (no stale/orphan phases)
- [ ] All automated tests from plan executed
- [ ] Test results documented (pass/fail)
- [ ] Code changes match plan specifications
- [ ] No regressions introduced (existing tests still pass)
- [ ] Manual verification steps are clear
- [ ] Error handling is robust
- [ ] Documentation updated if needed

## Integration with Workflow

**Position in KLI cycle:**
```
/research → /plan → /implement → /validate → /reflect
                                      ↑
                                   You are here
```

**Relationship to other commands:**
- After `/implement` completes all phases
- Before `/reflect` updates playbooks
- Can help prepare for PR/commit

**When to use:**
- After implementation, before reflection
- Before creating PR
- When resuming work to verify state
- To catch issues early

## What NOT to Do

- Don't skip automated verification commands
- Don't validate without reading the plan
- Don't accept "looks good" without running checks
- Don't nitpick trivial style choices
- Don't proceed to reflection if critical issues exist
- Don't create validation artifact file (just report)

## Remember

Validation is your last chance to catch issues before reflecting and updating playbooks. Be thorough, be honest, and be constructive. The goal is to ensure quality before marking the task complete and learning from it.


## Workflow Reference

### Implementation


> Implementation phase domain knowledge including TDD methodology, design principles (Extensibility/Composability/Parametricity), and quality standards. Use when implementing features using TDD workflow, writing code with tests, or applying design principles. Activates for coding and testing tasks. DO NOT use for research or planning.

## Requirements & Standards

### TDD Discipline (CRITICAL)

**Test-Driven Development is NON-NEGOTIABLE in KLI implementation.**

The cycle MUST be followed:
1. **Red**: Write failing tests FIRST
2. **Green**: Implement minimum code to pass tests
3. **Refactor**: Improve design while keeping tests green

**Why This Matters:**
- Tests define behavior before implementation (design thinking)
- Failing tests confirm tests actually test something
- Passing tests confirm implementation meets requirements
- Green tests during refactor confirm no regressions
- Never proceed without passing automated verification

**Common TDD Violations:**
- ❌ Writing implementation before tests
- ❌ Skipping Red phase (tests never fail)
- ❌ Skipping Refactor phase (technical debt accumulates)
- ❌ Batching multiple features before testing
- ✅ One feature → Write test → Implement → Refactor → Verify

### Design Principles (The Three Pillars)

**Every implementation decision must consider:**

**1. Extensibility** - Can new variants be added without modifying existing code?
- Use: Directory-based registration (not hardcoded lists)
- Use: Plugin architectures (load from directories)
- Avoid: Enum/switch statements (require modification)
- Example: A `plugins/` directory where new plugins are added by creating files — no registration code to modify

**2. Composability** - Can components be combined in new ways?
- Use: Middleware/pipeline patterns (compose handlers)
- Use: Pure data structures (no hidden state)
- Use: Function composition (small, focused functions)
- Example: A middleware pipeline where handlers compose — each handler is independent, ordering is configuration

**3. Parametricity** - Are values parameterized instead of hardcoded?
- Use: Configuration via arguments/environment
- Avoid: Magic strings/numbers embedded in code
- Avoid: Assumptions about deployment context
- Example: A CLI tool where all paths, ports, and URLs come from config or arguments — no hardcoded values

**Applying These Principles:**
- Ask during refactor: "How would I add a new variant?"
- Ask during refactor: "Can I compose this with something else?"
- Ask during refactor: "Are there any hardcoded assumptions?"

### Zero TODOs Policy

**NO TODOs ARE ALLOWED IN COMMITTED CODE**

**Why:**
- TODOs in committed code become technical debt
- Incomplete work blocks phase completion
- Verification should catch TODOs automatically

**Enforcement:**
```bash
# Automated check (in every phase)
git diff --name-only | xargs rg "TODO|FIXME|HACK" && \
  echo "TODOs found - must fix" || \
  echo "No TODOs ✓"
```

**Handling TODOs:**
- Complete the work before phase ends
- If truly future work: Create GitHub issue, remove TODO
- If out of scope: Document in plan.md out-of-scope, remove TODO
- Never commit code with TODOs

## Overview

The implementation phase executes plans using strict TDD methodology, applies design principles (Extensibility/Composability/Parametricity), and enforces verification gates (automated + manual) before proceeding to next phases. It produces observe() calls for reflection.

**Role in Workflow:**
1. **Plan Navigation**: Uses `task_query("(query \"plan-ready\")")` to find next phase, `task_get()` on phase tasks for details
2. **Observation Recording**: Records progress via `observe()` into the event stream
3. **Phase-by-Phase**: One phase at a time, verification gates between, `task_complete()` marks phase done
4. **Pattern Application**: Apply patterns from PQ queries (`(-> (search "...") ...)`, `(-> (proven) ...)`)

**Key Characteristics:**
- TDD cycle for every feature (Red→Green→Refactor)
- Design principles applied during Refactor
- Automated + manual verification (both required, both blocking)
- Resume capability via task DAG completion status
- Deviation handling (pause and ask user)

## Quick Start

**Before starting any implementation:** Activate playbook patterns (REQUIRED):
```
pq_query('(-> (activate "<implementation task>" :boost (<relevant domains>)) (:take 5))')
```
This retrieves proven patterns via graph-based search and persists for handoff continuity.

**Basic Implementation Workflow (Per Phase):**

1. **Announce phase** start
   - Extract from phase task: Overview, Changes Required, Success Criteria
   - Record via observe()

2. **Reference activated patterns** for this phase
   - Apply patterns from `(activate ...)` output
   - Document pattern applications in observations

3. **Read referenced files FULLY**
   - All files mentioned in "Changes Required"
   - No limit/offset parameters
   - Full context before changes

4. **TDD Red**: Write failing tests
   - Tests that express desired behavior
   - Run to confirm they fail for right reason
   - Record via observe()

5. **TDD Green**: Implement to pass tests
   - Minimum code to make tests pass
   - Run tests frequently
   - Record via observe()

6. **TDD Refactor**: Improve design
   - Apply Extensibility/Composability/Parametricity
   - One change at a time
   - Keep tests green throughout
   - Record via observe()

7. **Run automated verification**
   - Build, tests, TODO check, etc.
   - ALL must pass before manual verification
   - Document results in observations

8. **Mark phase complete in task DAG**
   - ONLY after automated verification passes

9. **Request manual verification**
   - Present checklist from phase task description
   - Wait for user approval
   - Document approval in observations

10. **Proceed to next phase** or complete

## TDD Methodology

### Phase 1: Red (Write Failing Tests)

**Goal:** Create tests that fail for the RIGHT reason

**Steps:**

1. **Identify what to test:**
   - From phase "Changes Required" in task description
   - What behavior must the code exhibit?
   - What edge cases must be handled?

2. **Write tests expressing desired behavior:**
   ```python
   # Example test (any testing framework)
   def test_my_feature_works():
       """Tests that my_feature produces correct output"""
       assert my_feature(input) == expected_value
   ```

3. **Run tests to confirm failure:**
   ```bash
   # Run your project's test suite:
   <your test command>   # e.g., pytest, npm test, cargo test, go test ./...
   # Expected: FAILED - "my_feature not defined" or similar
   ```

4. **Verify failure reason:**
   - ✅ Good: "function not defined", "feature not implemented"
   - ❌ Bad: syntax error, import failure, test logic error
   - If bad failure: Fix test, try again

**Document in observations:**
```markdown
### TDD Red: Write Failing Tests

**Tests written:**
- `test/test_my_feature.py:15` - Tests correct output for valid input
- `test/test_my_feature.py:23` - Tests error handling for invalid input

**Test execution:**
$ <your test command>
Result: FAILED (as expected)
Reason: my_feature function not defined

**Effectiveness:** Tests fail for correct reason ✓
```

**Critical Success Factors:**
- Tests MUST fail before implementing
- Failure reason MUST be "not implemented" (not syntax/import errors)
- Tests MUST be specific and focused
- Tests MUST express desired behavior clearly

### Phase 2: Green (Implement to Pass Tests)

**Goal:** Implement MINIMUM code to make tests pass

**Steps:**

1. **Implement changes from plan:**
   - Follow "Changes Required" specification
   - Focus on making tests pass (not perfection yet)
   - Reference playbook patterns from `(-> (search "...") ...)` output

2. **Run tests FREQUENTLY:**
   ```bash
   # After each logical unit of code:
   <your test command>
   ```

3. **Iterate until tests pass:**
   - Add missing functionality
   - Handle edge cases caught by tests
   - Fix test failures

4. **Verify ALL tests pass:**
   ```bash
   <your test command>
   # Expected: SUCCESS - all tests pass
   ```

**Document in observations:**
```markdown
### TDD Green: Implement to Pass Tests

**Changes made:**
- `src/my_feature.py:42` - Added my_feature function
- `src/my_feature.py:58` - Added input validation
- `src/errors.py:15` - Added custom error for invalid input

**Challenges encountered:**
- Input validation required additional error type
- Resolution: Added custom exception class

**Test execution:**
$ <your test command>
Result: PASSED ✓

**Effectiveness:** All tests passing after implementation
```

**Critical Success Factors:**
- Implementation focused on passing tests (perfection comes in Refactor)
- Tests run frequently (catch issues early)
- All tests pass before proceeding to Refactor
- Changes align with plan specification

### Phase 3: Refactor (Improve Design While Tests Green)

**Goal:** Improve code quality WITHOUT changing behavior

**The Three Design Principles (Apply in Order):**

**1. Extensibility Check:**

Question: "How would I add a new variant without modifying this code?"

Patterns:
- **Directory-based registration**: New variants added by creating files
  ```
  plugins/
    http.py        # HTTP plugin
    file.py        # File plugin
    network.py     # Network plugin (added later without modifying existing)
  ```

- **Plugin architectures**: Load modules from directories at runtime
  ```python
  def load_plugins(directory):
      return [import_module(f) for f in glob(f"{directory}/*.py")]
  ```

- **Avoid hardcoded lists/enums**: Use discovery instead
  ```python
  # ❌ Bad: Hardcoded list (requires modification)
  def available_backends():
      return ["http", "file", "network"]  # Need to edit when adding new backend

  # ✅ Good: Discovery-based (no modification needed)
  def available_backends():
      return [b.name for b in load_backends("src/backends/")]
  ```

**2. Composability Check:**

Question: "Can I combine this with other components in new ways?"

Patterns:
- **Middleware/pipeline patterns**: Build complex operations from simple handlers
  ```python
  # Handlers compose naturally
  app = Pipeline(
      authenticate,
      authorize,
      handle_request,
  )
  ```

- **Pure data structures**: No hidden state, easy to reason about
  ```python
  # ✅ Good: Pure function, composes easily
  def process_data(data, config):
      return transform(data, config.transform_fn)
  ```

- **Function composition**: Small, focused functions that combine
  ```python
  def process_pipeline(input):
      return persist(transform(validate(input)))
  ```

**3. Parametricity Check:**

Question: "Are there any hardcoded values that should be parameters?"

Patterns:
- **No magic strings/numbers**:
  ```python
  # ❌ Bad: Magic number
  def retry_operation(op):
      for _ in range(3):  # Why 3?
          attempt(op)

  # ✅ Good: Parameterized
  def retry_operation(op, max_retries=3):
      for _ in range(max_retries):
          attempt(op)
  ```

- **Configuration via arguments**:
  ```python
  # ❌ Bad: Assumes environment
  def connect_db():
      return connect("localhost", 5432)  # Hardcoded!

  # ✅ Good: Configurable
  def connect_db(host, port):
      return connect(host, port)
  ```

- **No deployment assumptions**:
  ```python
  # ❌ Bad: Assumes specific path
  def load_config():
      return read_file("/etc/myapp/config.toml")

  # ✅ Good: Path provided by caller
  def load_config(config_path):
      return read_file(config_path)
  ```

**Refactoring Process:**

1. Identify refactoring opportunity (apply one principle)
2. Make ONE change at a time
3. Run tests after EACH change
4. Confirm tests still pass (GREEN)
5. Repeat for next refactoring

**Document in observations:**
```markdown
### TDD Refactor: Improve Design While Tests Green

**Refactorings applied:**

1. **Extensibility improvement:**
   - Changed: backend loader from hardcoded list to directory-based discovery
   - Why: New backends can be added by creating files (no code changes)
   - Tests: Still passing ✓

2. **Composability improvement:**
   - Changed: extracted validate_input as separate pure function
   - Why: Can now compose with other validators, reuse in tests
   - Tests: Still passing ✓

3. **Parametricity improvement:**
   - Changed: retry count from hardcoded 3 to parameter
   - Why: Different use cases need different retry counts
   - Tests: Still passing ✓

**Final test execution:**
$ <your test command>
Result: PASSED ✓ (tests green throughout refactoring)
```

**Critical Success Factors:**
- Tests REMAIN GREEN throughout (run after every change)
- One refactoring at a time (don't batch)
- Apply all three principles systematically
- Reference playbook patterns for guidance

## Workflows

### Standard Phase Implementation Workflow

```
1. Find ready phase: task_query("(query \"plan-ready\")") → pick first
   Switch to phase task: task_bootstrap("phase-N")
   - Extract from phase description: overview, changes, success criteria
   - observe("Starting phase N: <goal>")
   ↓
2. Apply patterns from PQ queries (`(-> (search "...") ...)`, `(-> (proven) ...)`)
   - Reference relevant patterns for this phase's domain
   - Record pattern applications via observe()
   ↓
3. Read referenced files FULLY
   - All files in "Changes Required"
   - Use Read without limit/offset
   - Understand full context
   ↓
4. TDD Red: Write failing tests
   - Express desired behavior in tests
   - Run tests, confirm failure for right reason
   - observe("TDD Red: <tests and failure reason>")
   ↓
5. TDD Green: Implement to pass tests
   - Write minimum code to pass
   - Run tests frequently
   - observe("TDD Green: <implementation summary>")
   ↓
6. TDD Refactor: Improve design
   - Apply Extensibility check
   - Apply Composability check
   - Apply Parametricity check
   - Keep tests green throughout
   - observe("TDD Refactor: <improvements applied>")
   ↓
7. Run automated verification
   - Build, tests, TODO check, etc.
   - Record results via observe()
   - Fix failures immediately (don't proceed)
   ↓
8. Request manual verification
   - Present checklist from phase description
   - Wait for user approval
   ↓
9. Mark phase complete:
   - observe("Phase N complete. Key outcomes: <summary>")
   - task_complete()  # Marks phase task as completed
   ↓
10. Return to parent: task_set_current(parent_id)
    Continue to next phase via task_query("(query \"plan-ready\")")
    If issues found: Fix, re-verify, request approval again
```

### Resume Capability Pattern

**Phase completion tracked via task DAG:**

```
task_query("(query \"plan\")")       → Shows all phases with completion status
task_query("(query \"plan-ready\")") → Shows phases ready to work on (non-completed)
```

**Resume behavior:**
- `task_bootstrap(parent_task_id)` then `task_query("(query \"plan-ready\")")` finds resume point
- Completed phases are immutable (task_complete guard)
- Record resume via `observe("Resuming from phase N")`

### Deviation Handling Pattern

**When code/reality differs from plan:**

1. **PAUSE immediately**
2. **Inform user of discrepancy:**
   ```
   PAUSE: Code differs from plan

   Issue: <Describe discrepancy>
   Plan expected: <What plan said>
   Reality found: <What actually exists>

   Proposed adaptation: <How to adjust>

   Options:
   1. Proceed with adapted approach (document in observations)
   2. Update plan to reflect reality
   3. Different approach (please specify)
   ```
3. **Wait for user decision**
4. **Document in observations:**
   ```markdown
   ### Deviation from Plan

   Issue: <discrepancy>
   Plan vs Reality: <comparison>
   User decision: <option chosen>
   Adaptation: <how implementation adjusted>
   ```

**Common Deviations:**
- File structure differs from plan
- API signatures changed since planning
- Dependencies added/removed
- Assumptions invalid (discovered during implementation)

## Verification & Quality Gates

### Automated Verification Checklist

**MUST pass before requesting manual verification:**

```bash
# 1. Build passes
<your build command>   # e.g., npm run build, cargo build, go build ./...
# Result: SUCCESS

# 2. Tests pass
<your test command>    # e.g., pytest, npm test, cargo test, go test ./...
# Result: SUCCESS

# 3. TODO check (Zero TODOs policy)
git diff --name-only | xargs rg "TODO|FIXME|HACK"
# Result: No matches (exit code 1 from rg)

# 4. Syntax/lint validation (language-specific)
# Use your ecosystem's linter or type checker, e.g.:
#   eslint src/           # JavaScript/TypeScript
#   mypy src/             # Python
#   cargo clippy          # Rust
#   go vet ./...          # Go

# 5. Additional checks (from plan Success Criteria)
<Custom checks as specified>
```

**If ANY check fails:**
- Fix immediately
- Re-run verification
- Update observations with fix
- Do NOT proceed until all pass

### Manual Verification Checklist

**Present to user for judgment:**

```
Phase N - READY FOR MANUAL VERIFICATION

Automated Verification: ✅ PASSED
- Build: ✓
- Tests: ✓
- TODO check: ✓
- <Other checks>: ✓

Manual Verification Required:

Please verify (from plan):
□ <Manual check 1 from plan>
□ <Manual check 2 from plan>
□ <Manual check 3 from plan>

Files changed:
- <List files modified in this phase>

How to test:
<Testing instructions if applicable>

Type "approved" to proceed or describe issues found.
```

**Wait for user response** - do NOT proceed without approval

### TodoWrite Integration Pattern

**Use TodoWrite for phase tracking:**

```
At phase start:
- Create subtasks for phase components
- Mark first subtask "in_progress"

During implementation:
- Mark subtasks "completed" as work finishes
- Update progress incrementally

Before phase completion:
- Verify all phase subtasks "completed"
- Verify only ONE task "in_progress" (next phase or none)
```

**Debugging tip:** Add explicit tracing for debugging integration issues

### Pattern Tracking with Source Attribution

**IMPORTANT: Track pattern applications for reflection pipeline**

When you apply a pattern during implementation, document it in observations with:
- **Pattern ID** - The `[id]` from the pattern bullet
- **Source** - Either "skill" (from loaded skill) or "playbook" (from playbook tools)
- **Context** - Why/where you applied it
- **Outcome** - SUCCESS or FAILURE

**Document patterns in observations:**

```markdown
## Pattern Applications

### Applied Patterns

| Pattern | Source | Context | Outcome |
|---------|--------|---------|---------|
| Verification gate | skill:kli-implementation | Verification gate before phase 2 | SUCCESS |
| Phase-by-phase | skill:kli-implementation | Phase-by-phase execution | SUCCESS |
| Directory registration | playbook | Plugin loader uses discovery | SUCCESS |

### Patterns That Helped
- Verification gate caught 2 issues early
- Directory registration avoided hardcoded list

### Patterns That Caused Issues
- <pattern-id> - Led to extra iterations, approach didn't fit context
```

**When to record:**
1. **On application** - When you consciously apply a pattern from playbook tools or loaded skill
2. **On outcome** - When you can determine if the pattern helped or caused issues
3. **At phase end** - Summarize patterns used and their effectiveness

**Source attribution matters because:**
- Patterns from skills need effectiveness tracking
- Patterns from playbooks need promotion/demotion signals
- Reflection pipeline uses this for curator decisions

## Reference

### Core Principles

- Implement phase-by-phase, run automated verification after each
- Never proceed without passing automated verification
- Don't batch phases - implement incrementally
- Add explicit tracing for debugging
- Integration validation - test real interaction, not mocks
- TDD workflow - Red (failing tests), Green (implement to pass), Refactor (design principles). Especially critical for infrastructure code with nested data structures

### Playbook Source

Patterns are managed via PQ queries:
- Proven patterns: `(-> (proven :min 3) (:take 10))`
- Domain-specific: `(-> (activate "query" :boost (lisp nix)) (:take 5))`

### Next Steps

After completing all phases:

1. **Give feedback on all applied patterns:**
   Review the Pattern Applications table in observations. For each row with SUCCESS/FAILURE:
   ```lisp
   pq_query('(-> (pattern "<pattern-id>") (:feedback! :helpful "<evidence from Outcome column>"))')
   pq_query('(-> (pattern "<pattern-id>") (:feedback! :harmful "<evidence from Outcome column>"))')
   ```

2. **Record discoveries as observations** (patterns promoted during `/kli:reflect`):
   ```
   observe("Implementation discovery: <description>. Evidence: <what happened>")
   ```

3. Set parent task metadata: `task_set_metadata(key="phase", value="complete")`
4. Record final status via `observe()`: completion summary, challenges, patterns used
5. Run `/kli:validate` to verify implementation against plan
6. Run `/core:commit` to create conventional commit message
7. Run `/kli:reflect` to extract learnings and update playbooks

## TQ Cheatsheet (Task Query Language)

**Plan Inspection:**
```
task_query("(query \"plan\")")       # All phases with enriched state
task_query("(query \"plan-ready\")") # Non-completed phases (ready to work on)
task_query("(current)")              # Current task as node-set
```

**Phase Navigation:**
```
task_query("(-> (current) (:follow :phase-of) :enrich)")  # Phases with full state
task_query("(-> (current) (:follow :phase-of) :ids)")     # Just phase IDs
```

**Batch Mutations:**
```
task_query("(-> (node \"temp-\") (:complete!))")           # Complete matching tasks
task_query("(-> (current) (:observe! \"Implementation complete\"))")  # Add observation
```


### Planning


> Planning phase domain knowledge including phase design principles, artifact reuse patterns, and verification strategies. Use when creating implementation plans, structuring phased work, or designing incremental delivery. Activates for planning and decomposition tasks. DO NOT use for research or implementation.

## Requirements & Standards

### Error Amplification in Planning

Planning errors amplify **100x** through implementation:

```
1 unclear phase → 10+ hours debugging scope
1 missing verification → 50+ test failures discovered late
1 technical detail missed → 100+ lines of rework
```

**Exit Criteria Enforcement:**
Because of 100x amplification, you must meet ALL exit criteria:
- ✅ All phases clearly defined with boundaries
- ✅ Success criteria specified (automated + manual)
- ✅ No open questions or uncertainties
- ✅ All technical decisions made with evidence

### Phase Design Principles (CRITICAL)

Every phase must be:

1. **Incremental**: Single logical unit of work
   - NOT: "Add feature X, Y, and Z"
   - YES: "Phase 1: Add X, Phase 2: Add Y, Phase 3: Add Z"

2. **Testable**: Clear verification points
   - NOT: "Update code"
   - YES: "Update code (verified by: tests pass)"

3. **Clear Boundaries**: No overlap or ambiguity
   - Each file/function belongs to exactly one phase
   - No "finish anything from Phase 1" in Phase 2

4. **Build on Previous**: Sequential dependency chain
   - Phase N+1 assumes Phase N complete
   - Cannot skip phases

5. **3-7 Phases Typical**: Right granularity
   - < 3 phases: Too coarse, verification gaps
   - > 7 phases: Too fine, overhead dominates
   - Adjust based on complexity

### Verification Requirements

**Both Required** (never proceed without verification):

**Automated Verification:**
- Build success: `<your build command>`
- Tests pass: `<your test command>`
- Syntax/lint checks: `<your linter>` (e.g., eslint, mypy, clippy, go vet)
- TODO detection: Zero TODOs policy
- Pattern: Run automatically, blocking, before manual

**Manual Verification:**
- User judgment required (UI/UX, performance, acceptance)
- Pattern: Present clear checklist to user
- Pattern: Wait for explicit approval before next phase

**Both are BLOCKING** - implementation cannot proceed without both passing.

## Overview

The KLI planning phase transforms research findings into detailed, phased implementation plans. It reuses research artifacts (saves 40-50% tokens), defines clear phase boundaries with verification gates, and incorporates playbook patterns to ensure implementation success.

**Role in KLI Workflow:**
1. **Generator Role**: Plan command implements Generator directly
2. **Artifact Production**: Creates plan as task DAG (via `scaffold-plan!` or `task_fork`), optionally writes plan.md as artifact
3. **Artifact Consumption**: Detects and reuses `research.md`
4. **Foundation for Implementation**: Provides detailed roadmap via `task_query("(query \"plan\")")`

**Key Characteristics:**
- Iterative planning with user feedback loops
- Clear phase boundaries and verification gates
- Research artifact reuse for efficiency
- Playbook pattern application
- Out-of-scope explicit definition

## Quick Start

**Before starting any planning:** Activate playbook patterns (REQUIRED):
```
pq_query('(-> (activate "<planning task>" :boost (<relevant domains>)) (:take 5))')
```
This retrieves proven patterns via graph-based search and persists for handoff continuity.

**Basic Planning Workflow:**

1. **Check for research.md** (artifact reuse)
   - If exists → Read FULLY, present to user, ask how to proceed
   - If not exists → Gather context via subagents (locator, analyzer, pattern-finder)

2. **Reference activated patterns**
   - Apply planning/implementation patterns from `(activate ...)` output
   - Use phasing strategies from patterns
   - Include verification approaches from patterns

3. **Draft initial plan structure**
   - Break into 3-7 phases
   - Each phase: incremental, testable, clear boundaries
   - Define success criteria (automated + manual)

4. **Iterate on plan structure**
   - Present structure to user for feedback
   - Ask clarifying questions for uncertainties
   - Spawn additional agents if technical details unclear
   - Refine until no open questions remain

5. **Write plan.md** artifact
   - Frontmatter with metadata (from task-metadata)
   - Overview, current state, phases with verification
   - References to research.md and playbook patterns
   - Explicit out-of-scope section

## Phase Design Methodology

### Incremental Phasing Strategy

**Good Phase Breakdown Example:**
```
Phase 1: Infrastructure Setup
- Create directory structure
- Add configuration files
- Verify: directories exist, configs load

Phase 2: Core Functionality
- Implement main logic
- Add unit tests
- Verify: tests pass, no TODOs

Phase 3: Integration
- Connect to existing system
- Add integration tests
- Verify: system tests pass, manual smoke test
```

**Bad Phase Breakdown Example:**
```
Phase 1: Implement Everything
- Add files
- Write tests
- Make it work
- Verify: somehow works?

Phase 2: Fix Issues
- Fix whatever is broken from Phase 1
- Verify: no more issues?
```

### Testable Boundaries

**Each phase needs:**

1. **Automated Verification**:
   ```markdown
   **Automated Verification:**
   - [ ] Build passes: `<your build command>`
   - [ ] Tests pass: `<your test command>`
   - [ ] Lint/type check passes: `<your linter>`
   - [ ] No TODOs: `! grep -r TODO phase-files`
   ```

2. **Manual Verification**:
   ```markdown
   **Manual Verification:**
   - [ ] Feature X works as expected (test: try scenario Y)
   - [ ] No regressions in feature Z (test: verify Z still works)
   - [ ] Performance acceptable (test: operation completes in < 5s)
   ```

3. **Exit Criteria**:
   ```markdown
   **Phase N complete when:**
   - All automated checks pass
   - All manual checks approved by user
   - No open issues or questions
   - Ready to proceed to Phase N+1
   ```

### Success Criteria Patterns

**Automated Criteria Examples:**
- Build/compilation success
- Test suite passes
- Syntax validation
- Linting passes
- TODO detection (zero TODOs)
- Performance benchmarks met
- API contract tests pass

**Manual Criteria Examples:**
- UI/UX appearance correct
- User acceptance criteria met
- No regressions in related features
- Documentation updated
- Edge cases handled appropriately
- Error messages clear and helpful

## Workflows

### Research Artifact Reuse

**When research.md exists:**

```
1. Read research.md FULLY (no limit/offset)
   ↓
2. Extract sections:
   - Summary
   - Detailed Findings
   - Code References (file:line)
   - Playbook Patterns
   - Open Questions
   ↓
3. Present to user:
   "I found existing research. Here's what was discovered:
    [Summary of findings]

    How would you like to proceed?
    1. Proceed with planning (use research as-is)
    2. Supplement research (focused investigation)
    3. Re-research (fresh start)"
   ↓
4. Handle user choice:
   - Option 1: Continue to planning with research context
   - Option 2: Spawn targeted agents, update research.md
   - Option 3: Offer to run /kli:research instead
```

**Why this matters:**
- Saves 40-50% tokens in planning phase (measured impact)
- Avoids duplicate research via subagents
- Maintains single source of truth
- Research quality already vetted

**When research.md does NOT exist:**
- Spawn context-gathering agents (locator, analyzer, pattern-finder)
- Document findings in observations
- Consider suggesting user run /kli:research first if complex

### Standard Planning Workflow

```
1. Check for research.md
   ↓
2. If exists: Read and present
   If not: Gather context via subagents
   ↓
3. Record planning observations via observe()
   ↓
4. Apply patterns from PQ queries (`(-> (search "...") ...)`, `(-> (proven) ...)`)
   - Reference planning patterns
   - Reference implementation patterns
   - Include verification approaches
   ↓
5. Draft initial plan structure
   - Break into 3-7 phases
   - Each phase: incremental, testable, clear
   - Define success criteria
   ↓
6. Present structure to user
   ↓
7. User provides feedback
   ↓
8. Gap analysis:
   - Are there open questions?
   - Are technical details clear?
   - Is verification approach defined?
   ↓
9. If gaps exist:
   - Ask clarifying questions
   - Spawn additional agents if needed
   - Iterate (return to step 5)
   ↓
10. If no gaps:
   - Create plan as task DAG (task_fork + phase-of edges)
   - Optionally write plan.md as artifact
   - Record final status via observe()
```

### Clarifying Questions Pattern

**When to ask:**
- Technical detail unclear (which approach?)
- User preference unknown (UI/UX decisions)
- Multiple valid options (need user choice)
- Scope ambiguity (what's in/out of scope?)

**Question Structure:**
```markdown
**Question N:** <Specific question>
- Context: <Why this matters for implementation>
- Current state: <What research/analysis showed>
- Options: <If applicable>
  1. <Option A>: <Implications>
  2. <Option B>: <Implications>
```

**Examples:**

**Good Clarifying Question:**
```
**Question 1:** Should error messages be user-facing or developer-facing?
- Context: Affects error handling strategy and message content
- Current state: Research shows existing errors are developer-facing
- Options:
  1. User-facing: Add translation layer, simpler messages
  2. Developer-facing: Keep technical, include stack traces
```

**Bad Clarifying Question:**
```
**Question 1:** How should we do it?
- Context: Need to know
```

### Iteration Decision Patterns

**Continue iterating if:**
- Open questions remain unanswered
- User feedback requires structural changes
- Technical approach unclear or unverified
- Verification strategy undefined
- Phase boundaries ambiguous
- Success criteria incomplete

**Exit iteration loop only when:**
- ✅ All phases have clear scope and boundaries
- ✅ Success criteria defined (automated + manual) for each phase
- ✅ No open questions remain
- ✅ Technical approach is clear and validated
- ✅ Playbook patterns referenced
- ✅ Research.md referenced (if exists)
- ✅ Out-of-scope explicitly defined

### Out-of-Scope Definition Strategy

**Why explicit out-of-scope matters:**
- Prevents scope creep during implementation
- Sets user expectations clearly
- Provides reference for deviation decisions
- Documents intentional limitations

**Pattern:**
```markdown
## What We're NOT Doing

**Explicit out-of-scope items to prevent scope creep:**

- NOT implementing feature Y (reason: complexity/timeline/dependencies)
- NOT modifying component Z (reason: separate concern)
- NOT supporting use case W (reason: edge case, defer to future)
- NOT optimizing for X (reason: premature optimization)
```

**Common out-of-scope items:**
- Edge cases (document for future)
- Performance optimizations (unless critical)
- UI polish (unless UX-critical)
- Additional features (keep focused)
- Refactoring unrelated code (separate task)

## Verification & Completion

### Plan Completeness Checks

**Before marking status: draft complete:**

✅ **Phases Defined:**
- Each phase has clear name and description
- Boundaries explicit (no overlap)
- Sequential dependencies clear
- Typically 3-7 phases

✅ **Verification Specified:**
- Every phase has automated verification list
- Every phase has manual verification list
- Both automated AND manual required
- Verification criteria specific and testable

✅ **No Open Questions:**
- All technical decisions made
- All ambiguities resolved
- User preferences captured
- Approach validated

✅ **Evidence-Based:**
- References research.md findings (if exists)
- References playbook patterns with IDs
- References code locations (file:line)
- All claims backed by evidence

✅ **Out-of-Scope Defined:**
- Explicit list of what's NOT being done
- Rationale for each exclusion
- Clear boundaries established

### Validation Requirements

**Self-Check Questions:**
1. Could someone implement from this plan without guessing?
2. Are phase boundaries clear and non-overlapping?
3. Are verification criteria specific and testable?
4. Have all open questions been resolved?
5. Is out-of-scope explicitly defined?
6. Does plan reference research.md if it exists?
7. Are playbook patterns applied and referenced?

**Common Failure Modes:**
- Phases too large (not incremental)
- Verification criteria vague ("make it work")
- Open questions swept under rug
- Phase overlap/ambiguity
- Missing out-of-scope definition
- Not reusing research.md
- Not applying playbook patterns

## Reference

### Core Principles

- Create detailed plan before implementation (not rough outline)
- Implement phase-by-phase, run automated verification after each
- Never proceed to next phase without passing automated verification
- Don't batch multiple phases - implement incrementally with verification
- Detect and reuse research.md (saves 40-50% tokens)
- Phase scope reduction is valid mid-implementation when earlier phases deliver sufficient user value - strategic prioritization, not scope creep

### Phase Templates

**Infrastructure Phase Template:**
```markdown
## Phase N: Infrastructure/Setup

**Overview:** <What infrastructure is being created>

**Changes Required:**
- Create directory X
- Add configuration Y
- Initialize component Z

**Success Criteria:**

Automated Verification:
- [ ] Directories exist: `test -d path`
- [ ] Configs load: `<your validation command>`
- [ ] No syntax errors

Manual Verification:
- [ ] Structure matches design
- [ ] Conventions followed
```

**Feature Implementation Phase Template:**
```markdown
## Phase N: Feature Implementation

**Overview:** <What feature is being added>

**Changes Required:**
- Implement logic in file.ext
- Add tests in test-file.ext
- Update documentation

**Success Criteria:**

Automated Verification:
- [ ] Build passes: `<your build command>`
- [ ] Tests pass: `<your test command>`
- [ ] No TODOs: `! grep TODO phase-files`

Manual Verification:
- [ ] Feature works as expected (test: specific scenario)
- [ ] Edge cases handled
- [ ] No regressions
```

### Playbook Feedback Workflow

**After completing a plan, give feedback on patterns used:**

1. **Review patterns from `(activate ...)` query** that informed the plan structure.

2. **Give feedback:**
   ```lisp
   pq_query('(-> (pattern "<pattern-id>") (:feedback! :helpful "shaped phase structure for X"))')
   pq_query('(-> (pattern "<pattern-id>") (:feedback! :harmful "didnt apply to this planning context"))')
   ```

3. **Planning-specific patterns to capture:**
   - Phasing strategies that worked well
   - Decomposition approaches
   - Verification patterns that would catch issues early
   - Out-of-scope definition strategies

4. **Record planning insights as observations** (patterns promoted during `/kli:reflect`):
   ```
   observe("Planning insight: <description>. Evidence: plan approach")
   ```

### Next Steps

After completing planning:
1. **Give feedback** on patterns that informed the plan
2. User reviews plan for completeness
3. If approved → Run `/kli:implement` to execute plan phase-by-phase
4. If changes needed → Iterate on plan
5. Implementation phase will load plan and execute phases sequentially

## TQ Cheatsheet (Task Query Language)

Plans are task DAGs. Use TQ to inspect and manipulate them:

**Inspecting Plans:**
```
task_query("(query \"plan\")")                    # All phases with enriched state
task_query("(query \"plan-ready\")")              # Non-completed phases only
task_query("(-> (current) (:follow :phase-of) :ids)")  # Just phase IDs
```

**Creating Plans (scaffold-plan!):**
```
task_query("(scaffold-plan!
  (implement-core-library \"Core library with validation\")
  (add-integration-layer \"Integration\" :after implement-core-library)
  (write-test-suite \"Tests\" :after add-integration-layer))")
```
Creates phases with dependencies in one expression. Phase names are validated for descriptiveness.

**Auto-improvement:** If you use short names like `p1` with descriptions, they're auto-improved:
- `(p1 "Research architecture")` → creates phase named `research-architecture`

**Creating Linear Chains:**
```
task_query("(scaffold-chain! \"Setup infrastructure\" \"Implement core\" \"Add tests\")")
```
Creates a linear dependency chain automatically.

**Restructuring Plans (bulk sever):**
```
;; Remove single phase from plan
task_query("(-> (node \"obsolete-phase\") (:sever-from-parent! :phase-of))")

;; Remove multiple phases at once (replaces multiple task_sever calls)
task_query("(-> (node \"phase-1\" \"phase-2\" \"phase-3\") (:sever-from-parent! :phase-of))")

;; Add dependency between phases
task_query("(-> (node \"phase-2\") (:link! \"phase-1\" :depends-on))")
```
`:sever-from-parent!` finds each node's parent via reverse graph lookup and severs the edge. This is the inverse of severing TO a target — it severs FROM parent.

### Phase Naming Guidelines

Phase names are validated to ensure the task graph remains navigable.

**Good names** (pass validation):
- `implement-user-auth` - verb + object
- `phase-1-fix-login-redirect` - prefix + semantic content
- `research-caching-strategies` - action + topic
- `CREATE-VALIDATION-MODULE` - descriptive (case insensitive)

**Bad names** (rejected or auto-improved from description):
- `P1`, `P2`, `P3` - letter+number only, no meaning
- `phase-1` - no semantic content after prefix
- `stuff`, `misc`, `wip` - vague words
- `foo`, `bar` - too short

**Named Queries:**
- `"plan"` - Phases of current task with enriched state
- `"plan-ready"` - Ready (non-completed) phases
- `"recent"` - Tasks by session count (most active)
- `"busy"` - Tasks by observation count
- `"hub-tasks"` - Tasks by edge count (most connected)


### Reflection


> Reflection phase domain knowledge including pattern evaluation, observation analysis, and evidence-based learning. Use when analyzing task outcomes, evaluating pattern effectiveness, or updating knowledge bases with learnings. Activates for retrospective and learning tasks. DO NOT use for research, planning, or implementation.

This skill provides **declarative knowledge** for the reflection phase: methodology, criteria, schemas, and principles. For the **procedural workflow** (orchestration steps), see the `/kli:reflect` command.

## Evidence-Based Learning (CRITICAL)

**All pattern evaluations MUST be backed by observation evidence.**

Reflection is NOT about opinions or intuitions - it's about extracting learnings from captured task trajectories:

- "Pattern X seems useful" - NOT VALID (no evidence)
- "Pattern X applied in implementation observations (via task_get/timeline):142, resulted in 40% token savings (measured)" - VALID (evidence-based)

### Why Observations Matter

| Purpose | How Observations Help |
|---------|----------------------|
| Concrete evidence | Pattern effectiveness backed by real application |
| Challenge documentation | Shows problems encountered and how resolved |
| Pattern tracking | Shows which patterns were actually applied |
| Impact measurement | Time saved, errors avoided, quality improved |
| Playbook evolution | Data-driven counter updates |

### Observation Requirements

Observations are recorded via `observe()` during KLI commands and surfaced via:
- `task_get()` — shows the last 3 observations
- `timeline(limit=50)` — shows all events including observations

Evidence categories:
- **Research observations** — research iteration trajectory, findings, agent effectiveness
- **Planning observations** — planning decisions, phase design rationale
- **Implementation observations** — TDD cycles, challenges, verification results

**No Observations = No Reflection**: Cannot evaluate patterns without evidence, cannot update counters without justification.

---

## Sequential Execution Pattern

**reflector → curator (MUST be sequential, NOT parallel)**

The agents have dependencies:
1. **Reflector** reads event stream (via `task_get` + `timeline`), produces reflection.md
2. **Curator** reads reflection.md, updates playbooks via PQ mutations (`(:feedback! ...)`, `(add! ...)`, `(:evolve! ...)`)

**Why Sequential**: Each agent depends on the previous agent's output. Spawning in parallel means later agents have no input to work with.

---

## Pattern Evaluation Methodology

### What to Extract from Observations

Use `task_get()` + `timeline(limit=50)` to surface observations. Look for:

**From research phase observations:**
- Which agents were effective?
- Iteration count (fewer is better)
- Gap analysis and resolution
- Exit criteria achievement
- **Evidence**: Agent effectiveness notes, iteration counts

**From planning phase observations:**
- Research artifact reuse (token savings from reusing research.md)
- Plan iteration count, phase design decisions
- Clarifying questions asked
- **Evidence**: Research reuse notes, iteration counts

**From implementation phase observations:**
- TDD discipline (Red → Green → Refactor documented)
- Design principles applied (Extensibility, Composability, Parametricity)
- Verification attempts per phase
- Challenges encountered and resolved
- **Evidence**: TDD iterations, refactoring notes, verification results

### Pattern Effectiveness Criteria

**Helpful Indicators:**
```
✅ Pattern was applied (documented in observations)
✅ Led to positive outcome (faster, fewer errors, cleaner code)
✅ Had measurable impact (X% faster, Y fewer iterations)
✅ Matched intended use case
✅ Would recommend using again
```

**Harmful Indicators:**
```
❌ Pattern was applied (documented in observations)
❌ Led to negative outcome (slower, more errors, confusion)
❌ Had measurable negative impact
❌ Mismatched use case or misleading
❌ Would NOT recommend using again
```

**Neutral (No Counter Change):**
```
⚪ Pattern mentioned but not actually applied
⚪ Applied but no observable impact
⚪ Insufficient evidence to evaluate
```

### Evaluation Process

1. Find pattern reference in observations (e.g., "artifact reuse saved tokens")
2. Extract context: What was done? What was the outcome?
3. Look for measurable evidence: time saved, errors avoided, quality improved
4. Classify: Helpful, Harmful, or Neutral
5. Document evidence in reflection.md

---

## Harm Signal Tier Definitions

Pattern harm is classified into tiers for appropriate response:

### Tier 1: Auto-Action (Definitive Harm)

**Signals:**
- outcome=FAILURE recorded
- Git reverts of pattern application
- Explicit user rejection ("that didn't work")
- Test failures directly caused by pattern

**Response:** Auto-increment harmful counter. Clear evidence of damage.

### Tier 2: Flag for Review (Probable Harm)

**Signals:**
- Excessive iterations (>5 for simple task)
- Implicit correction (user redoes work differently)
- Confusion requiring clarification
- Time wasted on wrong approach

**Response:** Increment harmful counter with review note. Needs human judgment.

### Tier 3: Track Only (Uncertain)

**Signals:**
- Minor iterations (normal debugging)
- Context mismatch (pattern applied to wrong domain)
- Partial success (worked but not optimal)

**Response:** Track in reflection.md but no counter change. Insufficient evidence.

---

## New Pattern Discovery

### When to Identify New Pattern

- Novel approach used that isn't in playbooks
- Recurring solution that proved effective (seen 2+ times)
- Workaround for common issue
- Integration technique that worked well

### New Pattern Documentation Template

```markdown
### New Pattern Discovered

**Pattern**: [temp-id] <Short description>
**Context**: <When this pattern applies>
**Approach**: <What to do>
**Outcome**: <Result with evidence from this task>
**Recommendation**: <Add to playbook>
**Domain**: <your domain>  (e.g., python, typescript, rust, infrastructure, api, frontend)
```

### Quality Criteria for New Patterns

| Criterion | Requirement |
|-----------|-------------|
| Specific | Actionable advice, not vague guidance |
| Evidence | Effectiveness proven in observations |
| Reusable | Applicable in similar future contexts |
| Novel | Not covered by existing patterns |

---

## Agent Responsibilities

### Reflector Agent

| Aspect | Details |
|--------|---------|
| **Input** | Task ID — uses `task_get()` + `timeline()` for observations, reads artifacts |
| **Process** | Analyze observations from event stream, evaluate patterns, classify harm signals, discover new patterns |
| **Output** | reflection.md artifact with recommendations |
| **Tools** | mcp__task__*, Read, Grep, Search |

> **Note**: The event stream (observations from `observe()`) is the source of truth for pattern evidence.

### Curator Agent

| Aspect | Details |
|--------|---------|
| **Input** | reflection.md artifact |
| **Process** | Update playbook via MCP tools, process harm signals by tier, add new patterns |
| **Output** | Playbook updates via PQ mutations (`(:feedback! ...)`, `(add! ...)`, `(:evolve! ...)`) |
| **Tools** | pq_query, Read, Grep, Search |

---

## Reflection Artifact Structure

### reflection.md Template

```markdown
---
date: <ISO timestamp>
task: <task directory>
status: complete
---

# Reflection: <Task Name>

## Patterns Applied

### [pattern-id]: <Pattern Name>
- **Applied in**: timeline event N
- **Context**: <What was done>
- **Outcome**: <Result with evidence>
- **Effectiveness**: Helpful | Harmful | Neutral
- **Recommendation**: increment helpful | increment harmful | no change

## Harm Signals

### Tier 1 (Auto-Action)
- [pattern-id]: <evidence of definitive harm>

### Tier 2 (Flagged for Review)
- [pattern-id]: <evidence of probable harm>

### Tier 3 (Tracked Only)
- [pattern-id]: <uncertain signal>

## New Patterns Discovered

### [temp-id]: <Pattern Name>
- **Context**: <When this applies>
- **Approach**: <What to do>
- **Outcome**: <Result with evidence>
- **Recommendation**: Add to <playbook-name>

## Challenges & Resolutions

### Challenge: <Description>
- **Context**: <When encountered>
- **Resolution**: <How resolved>
- **Pattern**: <Existing or new>

## Playbook Update Recommendations

**Playbook Updates:**
- [pattern-NNN]: increment helpful (evidence: <ref>)
- [pattern-MMM]: increment harmful (evidence: <ref>)
- Add new pattern: [temp-id] via `(add! :domain :ace :content "...")`

## Summary

**Patterns Evaluated:** <N>
**Helpful:** <count> | **Harmful:** <count> | **Neutral:** <count>
**New Patterns:** <count>
**Harm Signals:** Tier 1: <N>, Tier 2: <M>, Tier 3: <K>
**Key Learnings:** <3-5 bullet points>
```

---

## TQ and Observation Tools for Reflection

Use these to gather evidence for pattern evaluation:

```
obs_search(query="pattern X applied")       # Find observations mentioning a pattern
enriched_retrieve(k=10)                      # Context-aware retrieval for current task
obs_feedback(text="...", outcome="success")  # Record observation quality feedback
task_query("(query \"plan\")")               # Review plan phases and their status
task_query("(query \"busy\")")               # Tasks with most observations (richest evidence)
```

---

## Reference

### Core Principles

- Document what IS, not what SHOULD BE
- Reflector agent analyzes observations
- Add explicit tracing for debugging

### Playbook Access

Patterns are managed via PQ queries, not file paths:
- All patterns: `(-> :all (:group-by :domain))`
- Proven patterns: `(-> (proven :min 3) (:take 10))`
- Domain-specific: `(-> :all (:where (domain= :lisp)) :ids)`


### Research


> Research phase domain knowledge including documentarian philosophy, decomposition patterns, agent selection, and exit criteria. Use when conducting codebase research, investigating technical questions, or documenting what exists. Activates for exploration and discovery tasks. DO NOT use for planning or implementation.

## Requirements & Standards

### Apply Documentarian Philosophy (CRITICAL)

**Primary goal**: Document and explain codebase AS IT EXISTS TODAY

**NEVER:**
- Suggest improvements or changes (unless explicitly asked)
- Perform root cause analysis (unless explicitly asked)
- Propose future enhancements or refactoring
- Critique implementation or identify problems
- Recommend optimizations or architectural changes
- Evaluate code quality or suggest best practices

**ALWAYS:**
- Describe what exists, where it exists, how it works, how components interact
- Document current patterns, conventions, and design implementations
- Provide concrete evidence with `file:line` references for ALL claims
- State hypotheses as hypotheses when investigating bugs (NOT facts)
- Use tentative language: "may be", "appears to", "suggests"
- Mark status as `incomplete` if root causes unverified
- Create technical map/documentation of existing system

**Core Principle**: Document what IS, not what SHOULD BE

### Error Amplification Awareness

Research errors amplify **1000x** through downstream phases:

```
1 incorrect assumption → 100+ lines wrong planning → 1000+ lines wrong implementation
1 missed component → cascades through plan → wasted implementation
1 misunderstood pattern → applied incorrectly throughout codebase
```

**Impact on Thoroughness:**
- Iterate until ALL gaps resolved (no premature exit)
- Verify ALL claims with `file:line` evidence (never assume)
- Spawn additional agents when ANY uncertainty exists
- NEVER make assumptions - always verify in codebase
- Research phase has highest leverage: extra iteration here prevents 1000x waste downstream

## Overview

Establish accurate codebase understanding before planning or implementation. Use observation-driven discovery through parallel specialized agents, iterative gap analysis, and artifact reuse to create comprehensive `research.md` documents with full `file:line` evidence.

**Execute Generator Role in KLI Workflow**:
1. **Implement Generator**: Run research command directly (cannot spawn other commands)
2. **Produce Artifacts**: Create `research.md`, record findings via `observe()`
3. **Build Evidence Base**: Provide foundation for planning phase
4. **Apply Patterns**: Reference playbooks for established patterns

**Follow Key Principles**:
- Apply documentarian approach (describe, don't prescribe)
- Spawn agents in parallel for efficiency
- Iterate in feedback loops until complete
- Read files fully before spawning agents

## Quick Start

**Before starting any research:** Activate playbook patterns (REQUIRED):
```
pq_query('(-> (activate "<research question>" :boost (<relevant domains>)) (:take 5))')
```
This retrieves prior learnings via graph-based search and persists for handoff continuity.

**Basic Research Workflow:**

1. **Read mentioned files FULLY** (if user provides paths)
   - Use Read tool WITHOUT limit/offset parameters
   - Extract full context before decomposing research

2. **Analyze and decompose** research question
   - Reference activated patterns for prior learnings
   - Identify components, patterns, concepts
   - Determine relevant directories/files/architectures
   - Consider if online research needed (rare)

3. **Spawn parallel agents** in single message
   - USUALLY: codebase-locator (find WHERE)
   - OFTEN: codebase-analyzer (understand HOW)
   - SOMETIMES: pattern-finder (find examples)
   - RARELY: sub-agent for external research (docs, articles)

4. **Compile results** and analyze gaps
   - Extract findings from each agent
   - Identify missing information
   - Decide if additional iteration needed

5. **Write research.md** artifact
   - Comprehensive findings with `file:line` references
   - Clear structure with sections
   - Status: complete or incomplete
   - Document playbook patterns used

## Playbook Workflow

### Pattern Discovery During Research

Research is a prime source of new patterns. Actively capture reusable insights:

**Observation triggers:**
- "This approach worked well" → `observe()` with evidence
- "I've seen this before" → `observe()` referencing existing pattern
- "This failed, worth noting" → `observe()` documenting the failure

**Discovery workflow:**
1. Record findings as observations: `observe("Research finding: <description with evidence>")`
2. All findings are observations by nature — research follows the documentarian philosophy (document what IS)
3. Patterns are promoted during `/kli:reflect` via the litmus test (transferable? actionable? prescriptive?)

**Observation quality bar:**
- GOOD: "When debugging Nix build failures, check derivation inputs first (not runtime). Evidence: 3/3 issues were input-related in this research."
- BAD: "Nix debugging is hard" (not actionable, no evidence)

**What to record as observations:**
- Architectural discoveries (how the system works)
- Anti-patterns found (things that don't work)
- Workarounds that succeeded
- Cross-cutting concerns observed
- File organization conventions

**Do NOT use `(add! ...)` during research.** Research findings are observations, not patterns.

## Research Methodology

### Decomposition Patterns

**Three-Level Decomposition:**

1. **Components**: What are the major pieces?
   - Identify modules, classes, functions
   - Map dependencies and relationships
   - Document interfaces and APIs

2. **Patterns**: What design patterns are used?
   - Registry patterns, builders, factories
   - Data structures and algorithms
   - Architectural patterns (MVC, effects systems, etc.)

3. **Concepts**: What domain concepts exist?
   - Business logic abstractions
   - Domain-specific terminology
   - Conceptual models and their implementations

**Decomposition Strategy:**
- Start with high-level structure (directories, modules)
- Drill down into specific components as needed
- Follow dependencies to understand interactions
- Use breadth-first initially, then depth-first for details

### Agent Selection Criteria

**Playbook MCP tools** (use before spawning agents):
- **When**: At start of every task before spawning other agents
- **Why**: Find established patterns that inform approach
- **PQ queries**: `(-> (search "query") (:take 5))`, `(-> (proven :min 3) ...)`
- **How to use**: Search for relevant patterns, reference them in your research approach

**codebase-locator** (WHERE questions):
- **When**: Need to find WHERE code lives
- **Why**: Fast file/directory discovery without deep reading
- **What to tell it**: Feature/component to locate
- **What it knows**: Glob/Grep patterns, directory structures

**codebase-analyzer** (HOW questions):
- **When**: Need to understand HOW specific code works
- **Why**: Deep reading and flow documentation
- **What to tell it**: Which files to analyze, what to document
- **What it knows**: Code reading, flow documentation, pattern explanation

**pattern-finder** (EXAMPLE questions):
- **When**: Need similar examples of patterns
- **Why**: Find existing implementations to understand conventions
- **What to tell it**: Pattern to find (e.g., "builder pattern", "registry")
- **What it knows**: Code similarity search, example documentation

### Research Capabilities (tool-adaptive)

For research beyond the local codebase, ALWAYS spawn a sub-agent describing
the goal. Select the most specialized available agent type for the capability;
fall back to general-purpose if no specialized agent exists. Never do external
research directly in the main session.

**Web research** (EXTERNAL TEXT questions):
- **When**: Need external documentation, technical articles, best practices
- **Goal**: Find and summarize relevant external information with source URLs
- **Prompt the sub-agent with**: What to find, what sources matter, what format to return
- **Default**: Prefer codebase-only research unless clear external need
- **Scope**: Documentation, articles, library/framework docs, concepts

**Repository analysis** (GITHUB questions):
- **When**: Need to analyze external GitHub repositories
- **Goal**: Map repo structure, understand architecture, identify key patterns
- **Prompt the sub-agent with**: Repo owner/name or URL, what to investigate (structure, stack, specific files)

**Visual/design research** (VISUAL questions):
- **When**: Need UI patterns, component examples, design inspiration, branding analysis
- **Goal**: Gather visual design evidence and document patterns found
- **Prompt the sub-agent with**: What patterns to find, which sites/URLs to analyze, what to document
- **Use cases**: Component examples, design system analysis, layout patterns, branding research

**Code extraction** (EXTRACT questions):
- **When**: Need to extract UI component code from a live site
- **Goal**: Produce usable code in a target framework from an existing page
- **Prompt the sub-agent with**: URL, target element/selector, output framework

**Research Type Detection**:

**Visual/Design Research Indicators** → Visual research capability:
- Component examples, UI patterns, design systems
- Screenshots, images, visual inspiration
- Branding, color palettes, typography analysis
- Layout patterns, responsive design examples
- Keywords: "modern", "beautiful", "design", "UI", "component", "screenshot", "visual"

**Text/Documentation Research** → Web research capability:
- Technical concepts, best practices, documentation
- Code examples (non-visual), architecture patterns
- Library/framework usage (non-UI documentation)
- Keywords: "how to", "best practices", "documentation", "guide", "tutorial"

**Mixed Research** → Multiple capabilities in parallel:
- UI framework research (Tailwind, React, Vue components)
- Design + implementation patterns
- Spawn one sub-agent for visual evidence, another for technical docs
- Synthesize both findings in research.md

**Spawning Strategy** (parallel spawning):
```
Start broad → Then targeted → Multiple parallel when independent
(use PQ queries first: `(-> (search "...") (:take 5))`, `(-> (proven) ...)`)

Example (codebase research):
Message 1: codebase-locator (find files)
Message 2: codebase-analyzer on promising files found by locator

Example (external research):
Message 1: sub-agent with web research goal (parallel with playbook patterns)
→ sub-agent finds docs, articles, examples; returns summary with URLs

Example (mixed research):
Message 1: sub-agent(visual goal) + sub-agent(docs goal) in parallel
→ Visual sub-agent: design patterns, component examples
→ Docs sub-agent: framework documentation, best practices
```

**Agent Prompting Principles:**
- Don't write detailed HOW instructions - agents know their jobs
- Focus on WHAT to search/analyze
- Remind them: "document, don't evaluate"
- Keep prompts concise but clear on scope

### Root Cause Verification

**NEVER claim root cause without verification**

This is one of the most damaging research errors:
- Downstream phases build on false assumptions
- Planning locks in wrong solution
- Entire implementation wasted
- User loses trust in research quality

**For Bug Investigation:**

❌ **DON'T**: "The issue is caused by X" (without verification)
✅ **DO**: "X appears to be a likely cause based on [evidence], but requires verification"

❌ **DON'T**: "This fixes the problem" (without testing)
✅ **DO**: "Testing hypothesis: does changing X resolve the issue?"

**Verification Requirements:**
1. Actually reproduce the issue in test environment
2. Apply proposed fix and confirm issue resolves
3. Verify fix doesn't introduce new issues
4. Document test methodology and results

**Status Marking:**
- Mark status: `incomplete` if root cause not verified through testing
- Include in Open Questions: "Root cause hypothesis requires verification: [specific test needed]"
- Only mark `complete` after successful verification

**Tentative Language for Unverified Hypotheses:**
- "may be caused by"
- "appears to suggest"
- "evidence points to"
- "hypothesis: X is responsible for Y"

## Workflows

### Standard Research Workflow

```
1. User provides research question
   ↓
2. Read any directly mentioned files FULLY (main context)
   ↓
3. Record initial context via observe()
   ↓
4. Analyze and decompose question
   - Break into components/patterns/concepts
   - Identify relevant areas to explore
   - Create TodoWrite task list
   ↓
5. Spawn parallel agents (Iteration 1)
   - Single message, multiple Task calls
   - Use PQ queries first (`(-> (search "...") ...)`, `(-> (proven) ...)`)
   - Usually: codebase-locator
   - Often: codebase-analyzer, pattern-finder
   ↓
6. Wait for ALL agents to complete
   ↓
7. Compile results and analyze gaps
   - Extract findings from each agent
   - Identify missing information
   - Note which agents were effective
   ↓
8. Gap analysis decision:
   - If gaps exist → Spawn additional agents (Iteration N+1)
   - If complete → Write research.md artifact
   ↓
9. Write research.md with full evidence
   - Comprehensive findings
   - All file:line references
   - Status: complete or incomplete
   - Playbook patterns referenced
   ↓
10. Record final status via observe()
```

### Iteration Decision Patterns

**Spawn Additional Iteration If:**
- Key questions unanswered
- File:line references missing for claims
- Component interactions unclear
- Patterns insufficiently documented
- Gaps identified in agent results
- ANY uncertainty exists

**Exit Iteration Loop Only When:**
- ✅ Research question fully answered with concrete evidence
- ✅ All code references documented with `file:line`
- ✅ No gaps in understanding current implementation
- ✅ Playbook patterns identified and referenced
- ✅ Status marked: `complete`

**Common Gap Patterns:**
- Locator found files, but need analyzer to understand HOW
- Analyzer explained one component, but dependencies unclear
- Pattern found, but need more examples to confirm convention
- External library used, but need sub-agent for external docs

### Artifact Reuse Pattern

When creating `research.md`:

**Structure for Reuse:**
```markdown
# Research: <Topic>

## Summary
<High-level 2-3 sentence summary>

## Detailed Findings

### Component 1
<Findings with file:line references>

### Component 2
<Findings with file:line references>

## Architecture Diagram
<If helpful>

## Playbook Patterns
- [pattern-id-1]: description
- [pattern-id-2]: description

## Open Questions
<List any remaining uncertainties>

## References
<All file:line references in one place>
```

**Why This Matters:**
- Planning phase detects and reuses research.md
- Comprehensive research = better plans = correct implementation
- Invest extra iteration here to prevent waste downstream

## Verification & Exit Criteria

### Research Completeness Checks

**Before marking status: complete:**

✅ **Question Answered:**
- Original research question has concrete answer
- No "I think" or "probably" - only verified facts
- Hypotheses clearly marked as hypotheses

✅ **Evidence Complete:**
- All claims backed by `file:line` references
- Code snippets included where helpful
- References section lists all files examined

✅ **No Gaps:**
- All components mentioned are documented
- All dependencies traced and explained
- No "TODO: investigate further" items remaining

✅ **Patterns Referenced:**
- Playbook patterns identified and cited with IDs
- Pattern effectiveness noted in observations
- Relevant patterns from playbook tools included

✅ **Root Causes Verified** (if investigating bugs):
- Hypotheses tested and confirmed
- Verification methodology documented
- Test results included

### Validation Requirements

**Self-Check Questions:**
1. Could someone implement from this research without guessing?
2. Are all file:line references accurate and sufficient?
3. Have I documented what IS (not what SHOULD BE)?
4. Are unverified hypotheses clearly marked?
5. Did I iterate enough to resolve all gaps?

**Common Failure Modes:**
- Premature exit from iteration loop
- Assumptions without verification
- Missing file:line references
- Mixing description with prescription
- Root cause claims without testing

## TQ for Research

Use TQ to discover prior work and navigate task context:

```
task_query("(query \"active-roots\")")   # Find top-level tasks
task_query("(-> (active) (:where (matches \"auth\")) :ids)")  # Find tasks by keyword
obs_search(query="OAuth2 implementation")  # Search observations across tasks
enriched_retrieve(k=5)                     # Context-aware observation retrieval
```

## Next Steps

After completing research:
1. User reviews research.md for completeness
2. If approved → Run `/kli:plan` to create implementation plan (reuses research.md)
3. If gaps → Additional research iteration
4. Planning phase will detect and load research.md automatically


### Team


> Team coordination domain knowledge for bridging Claude Code teams with the persistent task graph and observation system. Use when leading agent teams, spawning teammates, coordinating parallel work, or interpreting swarm awareness. Activates for team and multi-agent tasks. DO NOT use for single-agent work.

## Two Tool Systems

This skill uses tools from two layers that serve different purposes:

**Team tools** (PascalCase) — in-session coordination between teammates:
`TeamCreate`, `SendMessage`, `TaskCreate`, `TaskUpdate`, `TaskGet`, `TeamDelete`

**Task MCP tools** (snake_case) — persistent task graph that survives across sessions:
`task_bootstrap`, `observe`, `file_activity`, `check_conflicts`, `task_fork`, `task_claim`, `task_complete`, `handoff`

Both are needed. Team tools coordinate the current session's agents. Task MCP tools read and write the shared knowledge layer that future sessions inherit.

**Teammate access gap**: Teammates do NOT have MCP tool access (known limitation). Instead, teammates use the `kli` CLI via Bash: `kli <tool> --task <id>`. See the `using-kli-cli` skill for full reference. When spawning teammates, include kli CLI examples in their task descriptions so they know how to record observations and query the task graph.

## The Bridge Pattern (CRITICAL)

Claude Code teams and the persistent task graph are separate systems that don't automatically share context. **Your job as team lead is to bridge them.**

Without bridging:
- Teammates get generic task descriptions with no prior context
- Work overlaps because nobody checks file ownership
- Findings vanish when teammates shut down (no observations recorded)
- Future sessions can't learn from team outcomes

With bridging:
- Task descriptions carry observations, patterns, and conflict warnings from the task graph
- File activity checks prevent overlapping edits
- Teammate findings are aggregated as observations for future sessions
- Playbook patterns (reusable insights scored by helpfulness) evolve from team outcomes

## Three-Phase Lifecycle

### 1. Setup: Bootstrap Context, Then Spawn

**Before creating any team or tasks**, load context from the task graph:

```
task_bootstrap(task_id)
```

This returns:
- **Observations** from prior sessions on this task
- **Swarm awareness**: who else is working, similar tasks, departures
- **Graph neighbors**: related tasks, dependencies, blockers
- **Playbook patterns**: reusable insights with helpfulness scores, activated when relevant

**Use this context to write informed task descriptions:**

```
# BAD - context-free task description
TaskCreate(subject="Implement auth module")

# GOOD - informed by task graph context
TaskCreate(
  subject="Implement auth module",
  description="Prior research (obs: 'OAuth2 flow documented in research.md') "
              "identified 3 endpoints needed. session.lisp:449 already has "
              "fingerprint struct. Check file_activity('session.lisp') before "
              "editing - another session touched it 30min ago."
)
```

**Check for conflicts before assigning file-overlapping work:**

```
check_conflicts(file_paths="src/auth.lisp,src/session.lisp,src/routes.lisp")
```

If conflicts found, either: assign those files to one teammate only, or sequence the tasks with dependencies.

**Spawn the team after tasks are ready:**

```
TeamCreate(team_name="auth-impl")
```

Then spawn teammates via Task tool with `team_name` parameter and assign tasks via TaskUpdate with `owner`.

### 2. Monitor: Watch for Conflicts and Departures

During team execution, `task_bootstrap` output includes team-aware swarm awareness:

- **Team members**: `Team 'auth-impl': 3 members (researcher, implementer, tester)`
- **Cross-team similarity**: `[team 'other-team']` annotations on similar sessions
- **Departures**: `teammate in 'auth-impl'` when a member leaves
- **Orphaned phases**: Phases claimed by departed sessions that need pickup

**Respond to departures:**
1. Check if departed teammate's task is complete (TaskGet)
2. If incomplete: reassign to another teammate or handle yourself
3. If orphaned phase: claim it via `task_claim()`

**Detect file conflicts between teammates:**

```
file_activity(file_path="src/shared-module.lisp")
```

If two teammates are editing the same file, message one to pause:

```
SendMessage(type="message", recipient="implementer",
  content="Hold off on session.lisp - researcher is still editing it. "
          "Work on routes.lisp first.",
  summary="File conflict avoidance")
```

### 3. Synthesize: Aggregate Findings, Then Shut Down

When teammates complete their work:

1. **Aggregate findings as observations** (this bridges team work back into the persistent task graph):

```
observe("Team finding: Auth module needs 3 endpoints (login, refresh, logout). "
        "Researcher discovered existing session struct at session.lisp:449. "
        "Implementer confirmed OAuth2 flow works with existing token store.")
```

2. **Create handoff** for future sessions:

```
handoff(summary="Auth implementation complete via 3-person team")
```

3. **Shut down teammates gracefully:**

```
SendMessage(type="shutdown_request", recipient="researcher",
  content="Work complete, shutting down")
SendMessage(type="shutdown_request", recipient="implementer",
  content="Work complete, shutting down")
```

Wait for approval responses, then:

```
TeamDelete()
```

4. **Mark task complete:**

```
task_complete()
```

## Communication Principles

### Message Type Selection

| Type | Use When | Cost |
|------|----------|------|
| `message` (DM) | **Default.** Task updates, questions, coordination | 1 message |
| `broadcast` | Critical blockers affecting ALL teammates | N messages (1 per teammate) |
| `shutdown_request` | Graceful termination | 1 message + response |

**Default to DM.** Broadcast is almost never needed.

### Anti-Patterns

| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Broadcasting routine updates | Wastes resources, creates noise | DM the affected teammate |
| Spawning before bootstrapping | Teammates lack context, duplicate work | Always `task_bootstrap` first |
| No file conflict checks | Teammates overwrite each other's work | `check_conflicts` before assigning |
| Ignoring departures | Orphaned work, incomplete tasks | Reassign or claim orphaned phases |
| Not recording team findings | Knowledge lost when team shuts down | `observe()` before shutdown |
| Micromanaging via messages | Slows teammates, wastes tokens | Check in at milestones, not every step |
| Using UUIDs instead of names | Hard to read, error-prone | Always use teammate names |

### Plan Approval Workflow

When a teammate is spawned with `mode: "plan"`:
1. Teammate creates plan using read-only tools
2. Teammate calls ExitPlanMode → sends `plan_approval_request` to you
3. Review the plan, respond with `plan_approval_response` (approve or reject with feedback)

## Quick Reference: Tools for Team Leads

**Team tools** (in-session coordination):

| Tool | Phase | Purpose |
|------|-------|---------|
| `TeamCreate` | Setup | Create a team |
| `TaskCreate`/`TaskUpdate` | Setup | Create and assign work items to teammates |
| `SendMessage` | Monitor | Communicate with teammates |
| `TeamDelete` | Synthesize | Clean up team resources |

**Task MCP tools** (persistent task graph):

| Tool | Phase | Purpose |
|------|-------|---------|
| `task_bootstrap` | Setup | Load full task context (observations, swarm awareness, patterns) |
| `check_conflicts` | Setup | Batch file conflict check |
| `file_activity` | Setup/Monitor | Single file conflict check |
| `observe` | All | Record findings for future sessions |
| `task_fork` | Setup | Create phase subtasks in task graph |
| `task_claim` | Monitor | Claim orphaned phases |
| `handoff` | Synthesize | Create handoff document |
| `task_complete` | Synthesize | Mark task done |


### Using-cli


> Use the kli CLI to interact with task graphs via Bash when MCP tools are unavailable. Teaches kli <tool> --task <id> syntax for observations, queries, and task management. Use when teammates need task graph access or MCP is unavailable.

The `kli` binary exposes every MCP tool as a CLI subcommand. This is the primary way for **teammates** (agents spawned via Task tool) to interact with the persistent task graph, since MCP tools are not available to teammates.

## Quick Start

```bash
# Record an observation
kli observe "Found the bug in parser.lisp:42" --task 2026-02-22-my-task

# Get task state
kli task_get --task 2026-02-22-my-task

# Search observations
kli obs_search "parser bug" --task 2026-02-22-my-task

# List all tasks (default: 50 most recent)
kli task_list
```

## Syntax

```
kli <tool_name> [positional_args...] [--named_arg value...] [--task <task_id>]
```

- **Positional args** fill required parameters in schema declaration order
- **Named args** use `--param_name value` syntax
- **`--task <id>`** sets task context before executing (equivalent to `task_set_current`)
- **Stdin pipe** for large text: `echo "long text" | kli observe --task <id>`

## Essential Commands for Teammates

### Recording Work

```bash
# Record observation (most common teammate action)
kli observe "Discovery: the auth module uses JWT not sessions" --task <id>

# Pipe large content
echo "Multi-line finding about
the codebase architecture" | kli observe --task <id>
```

### Reading Context

```bash
# Get full task state (description, observations, edges)
kli task_get --task <id>

# Search observations by meaning
kli obs_search "authentication flow" --task <id>

# View recent events
kli timeline --limit 20 --task <id>

# Get enriched retrieval (graph-aware observation search)
kli enriched_retrieve --task <id>
```

### Task Management

```bash
# List tasks (default 50, use --limit 0 for all)
kli task_list
kli task_list --grouped true

# Set metadata on task
kli task_set_metadata phase implementation --task <id>

# Mark task complete
kli task_complete --task <id>

# Create handoff document
kli handoff "Completed auth module implementation" --task <id>
```

### Querying

```bash
# Task query (TQ)
kli task_query '(-> (query "plan") :enrich (:select :display-name :crdt-status))'

# Pattern query (PQ)
kli pq_query '(-> (search "authentication") (:take 5))'
```

## Discovery

```bash
# List all available tools
kli help

# Get help for a specific tool
kli <tool_name> --help

# Example
kli observe --help
```

## For Team Leads

When spawning teammates that need task graph access, include the `--task` flag pattern in the task description:

```
Task(
  prompt="Research the auth module. Record findings with:
    kli observe 'your finding here' --task 2026-02-22-auth-research
  When done:
    kli task_complete --task 2026-02-22-auth-research",
  subagent_type="general-purpose"
)
```

Teammates have `Bash(kli:*)` permission by default — no extra configuration needed.

---

## Full Tool Reference

All 31 tools organized by category. Each tool shows its positional args in order, then named args with `--`.

### Recording & Feedback

#### `observe`
Record an observation for the current task.
```bash
kli observe <text> [--task <id>]
kli observe "Auth uses JWT tokens, not sessions" --task my-task

# Pipe large content from stdin
echo "detailed analysis..." | kli observe --task my-task
```

#### `obs_feedback`
Record whether an observation was helpful. Improves future search ranking.
```bash
kli obs_feedback <text> <outcome>
kli obs_feedback "Auth uses JWT tokens" success
kli obs_feedback "Misleading note about sessions" failure
```
- `outcome`: `success` or `failure`

#### `handoff`
Generate a handoff document for the current task.
```bash
kli handoff <summary> [--task <id>]
kli handoff "Completed auth module, tests passing" --task my-task
```

### Searching & Retrieval

#### `obs_search`
Search observations by meaning and keywords. Returns ranked results.
```bash
kli obs_search <query> [--k <int>] [--task_id <id>]
kli obs_search "authentication flow" --task my-task
kli obs_search "parser bug" --k 5
```

#### `enriched_retrieve`
Graph-aware observation search. Finds observations from this task and related tasks.
```bash
kli enriched_retrieve [--task_id <id>] [--k <int>]
kli enriched_retrieve --task_id my-task --k 10
```

#### `timeline`
Show recent events for a task.
```bash
kli timeline [--task_id <id>] [--limit <int>]
kli timeline --task_id my-task --limit 20
```

### Task Lifecycle

#### `task_create`
Create a new top-level task.
```bash
kli task_create <name> [--description <str>] [--depot <str>]
kli task_create 2026-02-22-auth-research --description "Research auth patterns"
```

#### `task_fork`
Create a subtask linked to a parent.
```bash
kli task_fork <name> [--from <parent_id>] [--edge_type <type>] [--description <str>] [--depot <str>]
kli task_fork phase-1-setup --from my-parent-task --description "Initial setup"
```
- `edge_type` defaults to `phase-of`. Use `related-to` for cross-depot references.

#### `spawn`
Spawn a child task from the current task (shorthand for fork from current).
```bash
kli spawn <name> [--reason <str>] [--task <id>]
kli spawn investigate-perf --reason "Unexpected slowdown in queries" --task my-task
```

#### `task_complete`
Mark a task as completed. Completed tasks reject further mutations.
```bash
kli task_complete [--task_id <id>]
kli task_complete --task_id my-task
```

#### `task_reopen`
Reopen a completed task, allowing mutations again.
```bash
kli task_reopen [--task_id <id>]
kli task_reopen --task_id my-task
```

### Task Context

#### `task_bootstrap`
Bootstrap full task context in one call. Sets current task, returns state, neighbors, playbook patterns, handoff, and swarm awareness.
```bash
kli task_bootstrap <task_id>
kli task_bootstrap 2026-02-22-auth-research
```

#### `task_set_current`
Lightweight context switch. Sets current task with minimal output.
```bash
kli task_set_current <task_id>
kli task_set_current my-task
```

#### `task_get`
Get computed state for a task (read-only, does not change current task).
```bash
kli task_get [--task_id <id>]
kli task_get --task_id my-task
```

#### `task_claim`
Claim exclusive ownership of a task for current session. Use for phases that modify shared files.
```bash
kli task_claim [--task_id <id>]
kli task_claim --task_id my-phase
```

#### `task_release`
Release claim on a task.
```bash
kli task_release [--task_id <id>]
kli task_release --task_id my-phase
```

#### `task_set_metadata`
Set a metadata key-value pair on the current task.
```bash
kli task_set_metadata <key> <value> [--task <id>]
kli task_set_metadata phase implementation --task my-task
kli task_set_metadata tags "auth,security" --task my-task
```
Convention keys: `display-name`, `goals` (JSON array), `scope`, `phase`, `tags` (comma-separated), `depends-on` (JSON array), `enables` (JSON array), `related-to` (JSON array).

### Graph Edges

#### `task_link`
Create a typed edge from current task to target.
```bash
kli task_link <target_id> <edge_type> [--task <id>]
kli task_link other-task depends-on --task my-task
```
Edge types: `depends-on`, `related-to`, `blocks`, `phase-of`.

#### `task_sever`
Remove an edge from current task to target.
```bash
kli task_sever <target_id> <edge_type> [--task <id>]
kli task_sever other-task depends-on --task my-task
```

#### `task_reclassify`
Change the type of an existing edge.
```bash
kli task_reclassify <target_id> <old_type> <new_type> [--task <id>]
kli task_reclassify other-task related-to depends-on --task my-task
```

### Querying

#### `task_list`
List all tasks with status.
```bash
kli task_list [--grouped <bool>] [--limit <int>]
kli task_list
kli task_list --grouped true --limit 20
```

#### `task_query`
Execute a TQ (Task Query) expression against the task graph. Pipeline-based S-expression language.
```bash
kli task_query '<tq_expr>' [--safety_limit <int>]
kli task_query '(-> (active) :count)'
kli task_query '(-> (query "plan") :enrich (:select :display-name :crdt-status))'
kli task_query '(-> (query "active-roots") :enrich (:sort :obs-count) (:take 5))'
```
Named queries: `active-roots`, `orphans`, `leaf-tasks`, `stale-phases`, `plan`, `plan-ready`, `recent`, `busy`, `hub-tasks`.

#### `task_graph`
Query the task relationship graph with preset query types.
```bash
kli task_graph [--query <type>] [--task_id <id>]
kli task_graph --query stats
kli task_graph --query plan --task_id my-task
kli task_graph --query plan-markdown --task_id my-task
```
Query types: `stats`, `frontier`, `temporal`, `knowledge`, `plan`, `plan-frontier`, `plan-markdown`, `plan-json`.

#### `pq_query`
Execute a PQ (Playbook Query) expression against the pattern graph.
```bash
kli pq_query '<pq_expr>'
kli pq_query '(-> :all :count)'
kli pq_query '(-> (search "authentication") (:take 5))'
kli pq_query '(-> (activate "debugging" :boost (lisp)) (:take 3))'
```

### Conflict Detection

#### `check_conflicts`
Batch check multiple files for concurrent edits by other sessions.
```bash
kli check_conflicts <file_paths> [--task_id <id>]
kli check_conflicts "src/auth.lisp,src/session.lisp,src/routes.lisp"
```

#### `file_activity`
Check who recently touched a single file.
```bash
kli file_activity <file_path> [--task_id <id>] [--hours <num>]
kli file_activity src/session.lisp --hours 2
```

### Health & Status

#### `task_health`
Task health report: stale forks, dead ends, unlinked roots.
```bash
kli task_health
```

#### `task_patterns`
Get pattern activation counts for the current task.
```bash
kli task_patterns [--task <id>]
```

#### `playbook_status`
Get playbook server status.
```bash
kli playbook_status
```

#### `playbook_graph_health`
Pattern graph health report: orphans, edge distribution, embedding coverage.
```bash
kli playbook_graph_health
```

### System

#### `session_register_pid`
Register Claude Code's PID for session file correlation. Called automatically by SessionStart hook — not typically used manually.
```bash
kli session_register_pid <pid>
```


### Workflow


> Meta-orchestration skill for KLI framework. Provides high-level overview of research/planning/implementation/reflection phases and coordinates when to use phase-specific skills (kli-research, kli-planning, kli-implementation, kli-reflection). Auto-invoked for general KLI workflow questions. For phase-specific guidance, the appropriate phase skill will be loaded.

## Overview

Coordinate KLI workflow using three roles: Generator (produce artifacts), Reflector (analyze outcomes), Curator (maintain knowledge). Use phase-specific skills for detailed execution guidance.

### Three-Role Architecture

**Generator Role** - Execute via slash commands:
- Run `/kli:research` to document codebase → Load **kli-research skill** for methodology
- Run `/kli:plan` to design phased implementation as task DAG → Load **kli-planning skill** for phase design
- Run `/kli:implement` to execute with TDD workflow → Load **kli-implementation skill** for TDD discipline
- Record progress via `observe()` (files tracked automatically by PostToolUse hook)

**Reflector Role** - Analyze via reflector agent:
- Read observations from event stream (`task_get()` + `timeline()`)
- Evaluate pattern effectiveness (helpful/harmful counters)
- Discover new patterns from task execution
- Generate reflection.md with playbook update recommendations

**Curator Role** - Maintain via curator agent:
- Update playbooks via PQ mutations (`(:feedback! ...)`, `(add! ...)`, `(:evolve! ...)`)
- Increment helpful/harmful counters based on evidence
- Add new patterns discovered by Reflector

### Observation-Driven Learning

Record observations via `observe()` in all Generator commands:
- Implementation trajectory and decisions
- Challenges encountered and resolutions
- Patterns applied and their effectiveness

Observations flow through the event stream. Reflector reads them via `task_get()` + `timeline()` → Curator updates playbooks via MCP tools → evidence-based knowledge evolution.

## Phase Coordination

### Phase Skill Selection

**Load kli-research for:**
- Starting new tasks requiring codebase understanding
- Investigating technical questions or root causes
- Understanding HOW code works or WHERE components live
- Documenting current state before planning changes

**Load kli-planning for:**
- Creating implementation plans for features or refactorings
- Breaking work into incremental phases with verification gates
- Defining success criteria (automated + manual)
- Designing phased approaches with clear boundaries

**Load kli-implementation for:**
- Executing implementation plans phase-by-phase
- Applying TDD workflow (Red → Green → Refactor)
- Following design principles (Extensibility/Composability/Parametricity)
- Managing verification gates and phase completion

**Load kli-reflection for:**
- Extracting learnings from completed tasks
- Analyzing patterns that helped or hindered
- Updating playbooks with evidence-based insights
- Discovering new patterns from implementation experience

### Phase Selection Decision Tree

```
Task type determines phase:
├─ Understand existing code → Load kli-research
├─ Design implementation approach → Load kli-planning
├─ Write code and tests → Load kli-implementation
└─ Extract learnings from results → Load kli-reflection
```

## Phase Transitions

### Execute Standard Workflow: Research → Plan → Implement → Reflect

**1. Research Phase**
- Run: `/kli:research`
- Produce: research.md, findings via `observe()`
- **→ Transition to Planning:** Research questions answered, codebase understood

**2. Planning Phase**
- Run: `/kli:plan`
- Produce: Plan as task DAG (`task_fork` + `phase-of` edges), optionally plan.md as artifact
- Record: Planning decisions via `observe()`
- **Reuse: research.md** (40-50% token savings)
- **→ Transition to Implementation:** Plan approved, phases clearly defined

**3. Implementation Phase**
- Run: `/kli:implement`
- Produce: Code changes, test files
- Record: TDD cycles, challenges via `observe()` on phase tasks
- Navigate: `task_query("(query \"plan-ready\")")` for next phase, `task_complete()` marks done
- **→ Transition to Reflection:** All phases complete, all verification passed

**4. Reflection Phase**
- Run: `/kli:reflect`
- Produce: reflection.md (pattern evaluation)
- Update: Playbooks via PQ mutations (`(:feedback! ...)`, `(add! ...)`, `(:evolve! ...)`)
- **→ Complete:** Knowledge extracted, playbooks evolved

### Skip Research or Planning (When Appropriate)

**Skip research for:**
- Tasks not requiring codebase understanding (pure documentation, small fixes)
- Code already deeply understood
- Research completed in previous related task

**Skip planning for:**
- Trivial tasks (<50 lines, single file, obvious approach)
- Exploratory work where plan would be premature
- Emergency hotfixes with well-understood single change

**Never skip:**
- Implementation verification gates (verify before proceeding)
- Reflection for complex tasks (capture knowledge required)

## Artifact Flow

### How Artifacts Connect Phases

```
research.md ─────────────> plan (task DAG)
  ↓                            ↓
  └─> observe() events    task_fork(phase-of) creates phases
                               ↓
                          /kli:implement navigates via plan-frontier
                               ↓
                          observe() per phase, task_complete() marks done
                               ↓
                          All observations (event stream) ──> /kli:reflect
                                                                ↓
                                                             reflection.md
                                                                ↓
                                                             playbook MCP tools
```

**Apply Artifact Reuse Pattern**:
- Detect existing research.md in `/kli:plan`
- Avoid re-spawning research subagents
- Reference findings directly in plan
- Save 40-50% tokens in planning phase

### Use Event Stream for Reflection

Record via `observe()` in all commands to provide:
- Complete trajectory documentation for Reflector analysis
- Evidence trail for pattern effectiveness evaluation
- Input for playbook helpful/harmful counter updates
- Historical record surfaced by `task_get()` + `timeline()`

## Load Phase Skills for Detailed Guidance

### Access Execution Methodology

Load phase-specific skills for detailed methodology, workflows, and checklists:

- **kli-research** - Documentarian philosophy, agent selection, decomposition patterns, exit criteria
- **kli-planning** - Phase design principles, verification strategies, artifact reuse, clarifying questions
- **kli-implementation** - TDD methodology, design principles (Extensibility/Composability/Parametricity), verification gates
- **kli-reflection** - Sequential execution, pattern evaluation, observation analysis, playbook evolution

Phase skills auto-invoke when using corresponding commands. This meta-skill coordinates WHEN to use each, while phase skills provide HOW to execute.

## Quick Reference

### Phase Comparison Table

| Phase | Command | Input | Output | Key Skill |
|-------|---------|-------|--------|-----------|
| Research | /kli:research | Task description | research.md | kli-research |
| Planning | /kli:plan | research.md (optional) | Task DAG (+ optional plan.md) | kli-planning |
| Implementation | /kli:implement | Task DAG | Code + tests | kli-implementation |
| Reflection | /kli:reflect | Event stream (timeline) | reflection.md | kli-reflection |

### Verification Gate Pattern

Critical principle across all phases:

```
Complete phase work
    ↓
Run automated verification (must pass)
    ↓
Mark phase complete in task DAG
    ↓
Request manual verification (must approve)
    ↓
ONLY THEN → Proceed to next phase
```

**Never proceed without passing BOTH automated and manual verification.**

### Core Principles

- Never proceed to next phase without passing automated verification
- Implement phase-by-phase, run automated verification after each
- Don't batch multiple phases without verification
- Reuse research.md artifacts (saves 40-50% tokens in planning)

### Next Steps

1. **For research tasks**: Load kli-research skill for detailed methodology
2. **For planning tasks**: Load kli-planning skill for phase design patterns
3. **For implementation**: Load kli-implementation skill for TDD discipline
4. **For reflection**: Load kli-reflection skill for pattern evaluation

## TQ Quick Reference (Task Query Language)

Plans are task DAGs. Use TQ (`task_query`) to inspect and navigate:

```
task_query("(query \"plan\")")           # All phases with status
task_query("(query \"plan-ready\")")     # Next phases to work on
task_query("(query \"health\")")         # Stale phases + orphans + leaf tasks
task_query("(query \"active-roots\")")   # Top-level active tasks
task_query("(-> (active) :enrich (:sort :session-count) (:take 5))")  # Most active tasks
```

For plan creation and restructuring, see the kli-planning skill's TQ Cheatsheet.

### Documentation

- CLAUDE.md - Task model, PQ/TQ reference, playbook workflow
- Patterns are managed via PQ queries (see CLAUDE.md for syntax)


## Agent Reference

### Codebase Analyzer


> Analyzes codebase implementation details for specific components. Input{files_to_analyze,analysis_focus}. Refuses without both.

## Available Tools

Read, Grep, Glob, LS, Search

## Process

Follow these steps in order:

### Step 1: Read Entry Points

- Start with main files mentioned in the input
- Look for exports, public methods, or route handlers
- Identify the "surface area" of the component

### Step 2: Follow the Code Path

- Trace function calls step by step
- Read each file involved in the flow
- Note where data is transformed
- Identify external dependencies
- Take time to think deeply about how all pieces connect

### Step 3: Document Key Logic

- Document business logic as it exists
- Describe validation, transformation, error handling
- Explain any complex algorithms or calculations
- Note configuration or feature flags being used
- **DO NOT** evaluate if the logic is correct or optimal
- **DO NOT** identify potential bugs or issues

### Step 4: Return Analysis

Format all findings in JSON with:
- Accurate metrics
- All analysis entries with file:line references
- Clear component categorization

## Quality Standards

| Standard | Requirement |
|----------|-------------|
| **Evidence-based** | Every claim includes file:line reference |
| **Precise** | Exact function names and variables |
| **Thorough** | Cover entry points, core logic, data flow, errors |
| **Neutral** | Describe without evaluating or suggesting |
| **Complete** | Don't skip error handling or edge cases |


### Codebase Locator


> Locates files, directories, and components. Use as "super grep" for multi-search. Input{query,scope?}. Refuses without query.

## Available Tools

Grep, Glob, LS, Search

## Process

Follow these steps in order:

### Step 1: Analyze the Query

Think deeply about effective search patterns:
- Common naming conventions in this codebase
- Language-specific directory structures
- Related terms and synonyms that might be used

### Step 2: Broad Search

Start with broad searches:
1. Use Grep for finding keywords in file contents
2. Use Glob for file name patterns
3. Use LS to explore directory structures
4. Use Search for semantic matches

### Step 3: Refine by Context

Adapt search based on codebase type:
- **JavaScript/TypeScript**: Look in src/, lib/, components/, pages/, api/
- **Python**: Look in src/, lib/, pkg/, module names matching feature
- **Go**: Look in pkg/, internal/, cmd/
- **Nix**: Look in nix/, modules/, overlays/
- **General**: Check for feature-specific directories

### Step 4: Find Common Patterns

Search for these naming patterns:
- `*service*`, `*handler*`, `*controller*` - Business logic
- `*test*`, `*spec*` - Test files
- `*.config.*`, `*rc*` - Configuration
- `*.d.ts`, `*.types.*` - Type definitions
- `README*`, `*.md` in feature dirs - Documentation

### Step 5: Categorize and Return

Group all findings by category and return the JSON object.

## Quality Standards

| Standard | Requirement |
|----------|-------------|
| **Thorough** | Check multiple naming patterns and locations |
| **Organized** | Group files logically by purpose |
| **Accurate** | Provide full paths from repository root |
| **Complete** | Include tests, configs, docs - not just implementation |
| **Neutral** | Report what exists without critique |


### Curator


> Updates playbooks from reflection artifacts. Input{task_dir,reflection_path}. Refuses without both.

## Available Tools

Read, Grep, mcp__task__pq_query, mcp__task__playbook_graph_health, mcp__task__playbook_status, mcp__task__obs_search, mcp__task__enriched_retrieve, mcp__task__obs_feedback, mcp__task__task_get, mcp__task__task_bootstrap, mcp__task__task_set_current, mcp__task__timeline, mcp__task__task_patterns

## Process

### Step 0: Validate Input

**Read reflection.md completely:**
- Use Read tool on `{reflection_path}`
- If file doesn't exist or is empty, return failure

**Verify actionable content exists:**
- At least one of: counter updates, new patterns, description updates, or harm signals

If no actionable content found, return:
```json
{
  "status": "success",
  "summary": "Reflection contained no playbook update recommendations",
  "feedback_given": 0,
  "patterns_added": 0,
  "patterns_evolved": 0,
  "duplicates_merged": 0
}
```

### Step 1: Extract All Recommendations

Parse the following sections from reflection.md:

| Section | Data to Extract |
|---------|----------------|
| **Harm Signals → Tier 1** | Pattern IDs + evidence for auto-harmful |
| **Harm Signals → Tier 2** | Pattern IDs + evidence + review context |
| **Harm Signals → Tier 3** | Pattern IDs for tracking only |
| **Increment Helpful** | Pattern IDs + reasons |
| **Increment Harmful** | Pattern IDs + reasons |
| **Add New Patterns** | Domain, content, litmus test results |
| **Update Descriptions** | Pattern IDs + new content |

### Step 2: Process Harm Signals (PRIORITY)

**Process harm signals BEFORE any helpful feedback.** This ensures the harm-first principle is maintained.

#### Tier 1 (Auto-Action)

For each Tier 1 pattern:
1. `pq_query('(pattern "id")')` — verify pattern exists
2. `pq_query('(-> (pattern "id") (:feedback! :harmful "evidence"))')` — record harmful signal
3. Log action in running tally

#### Tier 2 (Flag for Review)

For each Tier 2 pattern:
1. `pq_query('(-> (pattern "id") :full)')` — read current content
2. `pq_query('(-> (pattern "id") (:feedback! :harmful "evidence"))')` — record harmful signal
3. `pq_query('(-> (pattern "id") (:evolve! "content + REVIEW note" :reason "Flagged for review"))')` — append review note
4. Log action in running tally

#### Tier 3 (Track Only)

For each Tier 3 pattern:
- Note in summary, no playbook changes
- These are tracked for aggregate analysis across sessions

### Step 3: Process Counter Feedback

**For each "Increment Harmful" recommendation** (process harmful before helpful):
1. `pq_query('(pattern "id")')` — verify pattern exists
2. `pq_query('(-> (pattern "id") (:feedback! :harmful "reason"))')` — record signal
3. Log action

**For each "Increment Helpful" recommendation:**
1. `pq_query('(pattern "id")')` — verify pattern exists
2. `pq_query('(-> (pattern "id") (:feedback! :helpful "reason"))')` — record signal
3. Log action

### Step 4: Add New Patterns (with Quality Gates)

For each new pattern proposal in reflection.md:

#### Gate 1: Litmus Test Verification

Check the reflector's litmus test verdict in reflection.md:

| Check | Required for Addition |
|-------|----------------------|
| **Transferable?** | Must be `yes` — helps on a *different* project |
| **Actionable?** | Must be `yes` — says "when X, do Y" |
| **Prescriptive?** | Must be `yes` — gives advice, not description |
| **Verdict** | Must be `PATTERN` (not `OBSERVATION ONLY`) |

If any check fails, skip the pattern and log:
```
Skipped: "<description>" — failed litmus test (verdict: OBSERVATION ONLY)
```

#### Gate 2: Semantic Duplicate Detection

For patterns that pass the litmus test:
1. `pq_query('(-> (search "description" :domain) (:take 5))')` — find similar existing patterns
2. Read top results and assess semantic similarity
3. **If similar pattern exists** (same concept, different wording):
   - `pq_query('(-> (pattern "existing-id") (:evolve! "merged content" :reason "Merged with new finding"))')` — incorporate new nuance
   - Increment `duplicates_merged` counter
   - Log: `Merged with [existing-id]: "<merged description>"`
4. **If truly novel** (no semantic overlap):
   - Proceed to addition

#### Addition

For novel patterns that passed both gates:
1. `pq_query('(add! :domain :X :content "...")')` — auto-generates ID
2. Log the new pattern ID returned
3. Increment `patterns_added` counter

### Step 5: Evolve Pattern Descriptions

For each description update recommendation:
1. `pq_query('(-> (pattern "id") :full)')` — read current content
2. `pq_query('(-> (pattern "id") (:evolve! "updated content" :reason "reason"))')` — apply update
3. Increment `patterns_evolved` counter

**Rules for evolution:**
- Preserve the core meaning
- Add nuance or context discovered
- Keep content concise
- Include reason from reflection.md

### Step 6: Generate Summary

Return JSON with complete tally of all operations:

```json
{
  "status": "success",
  "summary": "Processed N patterns: X helpful, Y harmful, Z added, W evolved, V merged",
  "feedback_given": "<total :feedback! mutations>",
  "patterns_added": "<total (add! ...) mutations>",
  "patterns_evolved": "<total :evolve! mutations>",
  "duplicates_merged": "<count of merge-instead-of-add>"
}
```

## Quality Standards

| Standard | Requirement |
|----------|-------------|
| **Harm-first** | Process all harmful feedback before any helpful feedback |
| **Evidence-based** | Only update based on reflection.md recommendations |
| **Litmus-gated** | Every new pattern must pass transferable+actionable+prescriptive |
| **Duplicate-aware** | `(search ...)` before every `(add! ...)` |
| **Verified** | `(pattern "id")` before every `:feedback!` or `:evolve!` |
| **Auditable** | Every operation logged in summary with reason |


### Graph Analyst


> Answers questions from the task/pattern graph perspective. Use when question relates to task status, patterns, relationships, or project health. Input{question}. Refuses without question.

## Available Tools

mcp__task__task_query, mcp__task__task_get, mcp__task__task_bootstrap, mcp__task__task_set_current, mcp__task__task_list, mcp__task__task_graph, mcp__task__task_health, mcp__task__timeline, mcp__task__obs_search, mcp__task__enriched_retrieve, mcp__task__pq_query, mcp__task__playbook_graph_health, mcp__task__playbook_status, mcp__task__task_patterns, Read, Grep, Search

## Process

### Step 1: Analyze the Question

Determine what the question is really asking:
- Is it about task status, progress, dependencies? → Task graph
- Is it about pattern effectiveness, domains, quality? → Pattern graph
- Is it about relationships between work and patterns? → Both graphs
- Is it about a specific task's history or development? → Task MCP tools (bootstrap, timeline, obs_search)
- Is it unrelated to graphs? → Return failure (not your domain)

### Step 2: Set Context (if question targets a specific task)

If the question references a specific task:
- `task_bootstrap(task_id)` — loads full task state, sets context for subsequent calls
- This gives you observations, session count, edges, artifacts, and graph neighbors in one call

### Step 3: Plan and Execute Queries

Choose the right tools for the question:

**For task structure and relationships** → TQ queries:
- `task_query(query="...")` for pipeline queries over the task graph
- `task_graph(query="plan")` for phase DAGs, `task_graph(query="stats")` for summary

**For task history and development narrative** → Task MCP tools:
- `timeline(limit=N)` for chronological event stream
- `obs_search(query="...")` for semantic search across observations
- `task_get(task_id)` to peek at related tasks without switching context

**For pattern effectiveness** → PQ queries:
- `pq_query(query="...")` for pipeline queries over the pattern graph
- `playbook_graph_health()` for overall pattern health

**For details not available through structured tools** → Read/Grep:
- `Read` a specific handoff or artifact file referenced in timeline events
- `Grep` across task directories for cross-cutting patterns

**Important:** Never run mutation queries (anything ending in `!`).
Start with structured MCP tools. Only fall back to Read/Grep if you need detail the structured tools don't provide.

### Step 4: Synthesize Findings

Translate query results into findings:
- Each finding should cite its source (task-graph, pattern-graph, cross-graph)
- Include specific evidence (task IDs, pattern IDs, counts)
- Connect findings back to the original question

### Step 5: Answer the Question

Write a summary that directly answers the question from the graph perspective:
- Lead with the answer, not the methodology
- Be specific (numbers, names)
- Acknowledge if graphs only partially answer the question

## Quality Standards

| Standard | Requirement |
|----------|-------------|
| **Relevant** | Only run queries that help answer the question |
| **Specific** | Cite task IDs, pattern IDs, counts in findings |
| **Honest** | Say if graphs can't fully answer the question |
| **Concise** | Summary should directly answer, not describe process |
| **Read-only** | Never execute mutation queries |


### Pattern Finder


> Finds similar patterns and examples in codebase. Input{pattern_description,scope?}. Refuses without pattern_description.

## Available Tools

Grep, Glob, Read, LS, Search

## Process

Follow these steps in order:

### Step 1: Identify Key Characteristics

Understand what makes the pattern recognizable:
- What keywords or function names define it?
- What structural elements are consistent?
- What imports or dependencies are common?

### Step 2: Search Broadly

Use multiple search approaches:
- Grep for function names, keywords, or patterns
- Glob for similar file structures or naming conventions
- Read representative files to confirm patterns

### Step 3: Categorize Findings

Group similar implementations:
- Exact matches (same pattern, different data)
- Close variations (pattern with minor differences)
- Related patterns (different approach to same problem)

### Step 4: Extract Examples

For each category:
- Find 2-5 representative examples
- Extract minimal code snippets that show the pattern
- Note file:line locations
- Describe the context briefly

### Step 5: Return Results

Format all findings in JSON with:
- Accurate metrics
- Examples with location, context, and code
- Commonalities across examples
- Variations between examples
- Related locations for additional examples

## Quality Standards

| Standard | Requirement |
|----------|-------------|
| **Real code** | Extract actual snippets from codebase |
| **Contextual** | Briefly explain where/why pattern is used |
| **Thorough** | Find multiple examples (3-5 ideal) |
| **Focused** | Extract minimal code showing the pattern |
| **Precise** | Always include file:line locations |


### Reflector


> Produces reflection artifacts with pattern evaluation and discovery. Input{task_dir,task_id,context}. Refuses without all three.

## Available Tools

Read, Bash, Grep, Write, Search, mcp__task__task_query, mcp__task__task_get, mcp__task__task_bootstrap, mcp__task__task_set_current, mcp__task__task_list, mcp__task__task_graph, mcp__task__task_health, mcp__task__timeline, mcp__task__obs_search, mcp__task__enriched_retrieve, mcp__task__observe, mcp__task__handoff, mcp__task__pq_query, mcp__task__playbook_graph_health, mcp__task__playbook_status, mcp__task__task_patterns, mcp__task__obs_feedback

## Process

### Step 0: Set Task Context

**Set MCP context and validate:**
```
task_bootstrap(task_id)  → Sets context + returns status, observations, artifacts
```

If `task_get()` fails or returns no observations, return failure response:
```json
{
  "status": "failure",
  "artifact": null,
  "summary": "Task has no observations to reflect on",
  "error": "task_get() returned no observations for task_id"
}
```

### Step 1: Read All Context

**Primary source — MCP event stream:**
```
task_get()           → observations summary, artifacts, metadata
timeline(limit=100)  → full event history with observation text
task_patterns()      → patterns activated during this task's sessions
```

**Phase subtask traversal — collect observations from entire plan DAG:**
```
task_graph(query="plan")  → discover phase subtasks (if any)
```

If the task has `phase-of` children (common for multi-phase plans):
1. Record parent task observations from `timeline()` and patterns from `task_patterns()` above
2. For each phase subtask returned by `task_graph`:
   - `task_set_current(phase_task_id)`
   - `timeline(limit=100)` → collect phase-specific observations
   - `task_patterns()` → collect phase-specific pattern activations
3. `task_set_current(original_task_id)` → restore parent context
4. Merge all observations chronologically and union all pattern activations for Step 2

**Secondary sources — read if they exist as artifacts:**
- `{task_dir}/research.md`
- `{task_dir}/plan.md`


### Step 2: Harm-First Pattern Analysis (PRIORITY)

**This step runs BEFORE helpful analysis.** The most common form of harm — activated but unused — is currently invisible.

**Get activated patterns:**
```
task_patterns()  → List of pattern IDs activated in this task's sessions
```

**Get observation text:**
```
timeline(limit=100)  → Extract all OBSERVATION events, get their text
```

**For each activated pattern, classify:**

| Signal | Detection Method | Tier | Action |
|--------|-----------------|------|--------|
| **Activated, never referenced** | Pattern ID not mentioned in any observation text | Tier 2 | Recommend HARMFUL (irrelevant, wasted context) |
| **Activated, work contradicted it** | Pattern ID in observations + "instead" / "actually" / backtracking language nearby | Tier 1 | Recommend HARMFUL (misleading) |
| **Applied, caused rework** | Observation mentions pattern + subsequent backtracking/emergence note | Tier 1 | Recommend HARMFUL (wasted work) |
| **Applied, partially useful** | Observation mentions pattern + "but" / "with modifications" | Tier 3 | Track only |
| **Applied, worked well** | Observation mentions pattern + positive outcome | — | Recommend HELPFUL |

**Default assumption**: An activated pattern that is never referenced in observations is irrelevant and should receive harmful feedback. The burden of proof is on helpfulness, not harm.

### Step 3: Analyze Git Changes

**Get diff of all changes:**
```bash
git diff --stat
git diff
```

**Identify:**
- Files created/modified/deleted
- Lines of code changed
- Scope of changes vs. plan

### Step 4: Extract Pattern Applications and Challenges

**From observation text in timeline(), identify:**

**Patterns Applied:**
- Which [pattern-ID] references appear in observation text?
- How were they applied?
- What was the outcome?

**Challenges:**
- What problems were encountered?
- How were they resolved?
- What was learned?

### Step 5: Discover New Patterns (with Litmus Test Gate)

**Look for novel approaches in observations.** For each potential pattern, apply the litmus test:

| Check | Pattern (recommend `(add! ...)`) | Observation (document only) |
|-------|------------------------------------|-----------------------------|
| **Transferable?** | Helps on a *different* project | Describes *this* codebase |
| **Actionable?** | "When X, do Y" | "X exists" or "X has property P" |
| **Prescriptive?** | Gives advice | Gives description |
| **Cross-context?** | Useful in 2+ situations | Point-in-time fact |

**If it passes the litmus test**, format as a pattern candidate:
```
Proposed Pattern: [domain-XXX] :: <description>
Evidence: <file:line references>
Litmus: Transferable=yes, Actionable=yes, Prescriptive=yes
```

**If it fails the litmus test**, document in the reflection as an observation only:
```
Observation (not a pattern): <description>
Reason: System-specific / descriptive / not transferable
```

### Step 6: Generate Reflection Artifact

**Get metadata:**
```bash
git rev-parse --abbrev-ref HEAD    # branch
git rev-parse --short HEAD         # commit
basename "$(git rev-parse --show-toplevel)"  # repository
```

**Create artifact:** `{task_dir}/reflection.md`

## Quality Standards

| Standard | Requirement |
|----------|-------------|
| **Harm-first** | Analyze harmful/irrelevant patterns BEFORE looking for helpful ones |
| **Evidence-based** | Every pattern assessment has observation evidence from event stream |
| **Complete coverage** | Full timeline read, all observations analyzed |
| **Objective** | Distinguish "X happened after Y" from "X caused Y" |
| **Litmus-gated** | Every new pattern recommendation passes transferable+actionable+prescriptive test |
| **Specific** | Avoid vague statements like "worked well" |
| **Actionable** | Clear recommendations for curator |


## Hooks Reference

### Feedback Nudge


> Stop hook handler for pattern feedback reminders

**Event:** `Stop`

Reminds agents to give feedback on activated playbook patterns before
stopping. Uses feedback-state.json (written by playbook) as bridge
between MCP server memory and hook filesystem.
Rate-limited to max 10 nudges per session via nudge-count file.

## Handler

`feedback-nudge-handler` — Stop hook handler: nudge agent to give pattern feedback before stopping.
   Uses decision=block to force continuation. Rate-limited to 10 nudges.
   Checks stop_hook_active to prevent infinite loops.


### File Conflict


> PostToolUse hook handler for swarm file conflict detection

**Event:** `PostToolUse`  
**Matches:** `Edit|Write`

Warns Claude after editing a file recently touched by another session.
Delegates to the task-mcp daemon's /file-conflict endpoint which enforces
the Session Attribution Invariant (PID liveness + session-aware scanning).

## Handler

`file-conflict-handler` — PostToolUse handler: warn about recent file activity after Edit/Write.
   Delegates to the task-mcp daemon for PID-checked, session-aware conflict detection.


### Playbook Activate


> UserPromptSubmit hook handler

**Event:** `UserPromptSubmit`

Lightweight domain detection + nudge. Detects programming domains
from prompt text and nudges Claude to use pq_query with (activate ...).

## Handler

`playbook-activate-handler` — UserPromptSubmit handler: detect domains and nudge playbook activation.


### Session Leave


> SessionEnd hook handler

**Event:** `SessionEnd`

Best-effort: SessionEnd fires non-deterministically (cleanup-on-start
in session-start is the primary mechanism).
1. Emit :session.leave event to task's events.jsonl
2. Update playbook co-application ledger
3. Clean up session files
Merged from: session-leave.lisp + playbook-session-end.nix

## Handler

`session-leave-handler` — SessionEnd handler: emit leave event, update playbook, cleanup.


### Session Start


> SessionStart hook handler

**Event:** `SessionStart`

## What It Does

- Stale PID cleanup (primary SessionEnd compensation)
- Session lock + active-PID tracking
- PID registration with task-mcp + daemon start if not running
- Ollama lifecycle (start if binary on PATH but server not running)
- Git branch + short commit
- Parallel session detection
- Playbook session initialization (merged from playbook-session-start)

## Handler

`session-start-handler` — SessionStart handler: git state + parallel sessions + playbook init.


### Session Task Write


> PostToolUse hook handler for session file writes

**Event:** `PostToolUse`  
**Matches:** `mcp__task__task_set_current|mcp__task__task_bootstrap|mcp__task__task_release`

Writes claude-{PPID}.json after task_set_current or task_bootstrap.
Deletes it after task_release.

## Handler

`session-task-write-handler` — PostToolUse handler: write/delete session file based on task context.


### Task Complete Reflect


> PostToolUse hook for auto-reflection nudge

**Event:** `PostToolUse`  
**Matches:** `mcp__task__task_complete`

Fires after mcp__task__task_complete. When the completed task has
sufficient observations and no prior reflection, injects a system
reminder nudging the agent to run /kli:reflect.
This breaks the cold-start spiral where playbooks stay sparse because
reflection requires explicit human invocation.

## Handler

`task-complete-reflect-handler` — PostToolUse handler for task_complete: nudge reflection when warranted.
   Returns post-tool-context with nudge, or empty-response when silent.


### Tool Call


> PostToolUse hook handler for tool tracking

**Event:** `PostToolUse`  
**Matches:** `Edit|Write|Grep|Glob|Bash|Task|WebFetch|WebSearch|LSP`

Records :tool.call events via task-mcp HTTP endpoint for behavioral
fingerprinting. Also detects domain from edited file paths for
playbook session state (merged from playbook-post-tool).

## Handler

`tool-call-handler` — PostToolUse handler: record tool.call event and detect domains.


## Reference

### Architecture


kli is a single Common Lisp binary built with SBCL. One process handles the MCP server, hook dispatch, the dashboard, and project initialization.

```
Claude Code <--http-->  kli serve              task graph + pattern learning
Claude Code ---calls--> kli hook <event>       session, tool, conflict hooks
Browser     <--http-->  kli dashboard          web UI on :9091
Developer   ---runs---> kli init               project setup
```

## Event Sourcing

Each task is an append-only event log (`events.jsonl`). There is no mutable database — state is computed by replaying events through CRDT merge functions.

This design gives you:

- **Consistency** — Vector clocks establish causal order. Concurrent events from different sessions merge deterministically.
- **Auditability** — Every observation, edge change, and metadata update is preserved with its timestamp and session ID. Nothing is overwritten.
- **Multi-agent safety** — OR-Sets handle edge add/remove, LWW-Registers handle metadata, PN-Counters handle feedback scores. Concurrent writes resolve without coordination locks.

### CRDT Types

| CRDT | Used for | Merge semantics |
|------|----------|-----------------|
| G-Set | Observations, session joins | Union — additions never conflict |
| OR-Set | Graph edges | Add/remove with unique tags; concurrent add + remove preserves the add |
| LWW-Register | Metadata values, task status | Last writer wins by vector clock comparison |
| LWW-Map | Task metadata as a whole | Per-key LWW-Register semantics |
| PN-Counter | Helpful/harmful feedback scores | Increment/decrement merge independently |
| Vector Clock | Causal ordering | Component-wise max |

### Event Types

```
session.join     — A session started working on this task
session.claim    — A session took exclusive ownership (optional)
observation      — Knowledge captured during work
task.complete    — Task marked as finished
task.reopen      — Completed task reopened for further work
metadata.set     — Key-value metadata updated
edge.add         — Graph edge created (phase-of, depends-on, etc.)
edge.remove      — Graph edge severed
handoff.create   — Handoff document generated for session transfer
```

## Pattern Scoring

Patterns carry `helpful` and `harmful` feedback counters. The ranking score uses a Bayesian Beta-Binomial posterior mean:

```
score = (h + 1) / (h + m + 2)
```

Where `h` = helpful count, `m` = harmful count. This gives:

| State | Score | Meaning |
|-------|-------|---------|
| h=0, m=0 | 0.50 | Uncertain — new pattern, no evidence |
| h=5, m=0 | 0.86 | Likely helpful — consistent positive feedback |
| h=16, m=0 | 0.94 | Proven — high confidence |
| h=1, m=2 | 0.40 | Likely harmful — more negative than positive evidence |

The Beta-Binomial model handles small sample sizes gracefully. A pattern with 1 helpful vote isn't treated as 100% reliable — it scores 0.67, reflecting genuine uncertainty. Confidence grows with evidence.

### Pattern Retrieval

Retrieval combines three signals:

1. **Bayesian score** — Patterns with higher scores are seeded with higher activation
2. **Embedding similarity** — Cosine similarity between the query embedding and pattern embeddings (768-dim, via Ollama). Patterns above 0.3 similarity are included.
3. **Graph spread activation** — Activation spreads along edges in the pattern co-application graph. Patterns that co-occur with already-activated patterns receive a boost (10% of edge weight times the activating pattern's score).

## Query Languages

### TQ (Task Queries)

TQ is a pipeline-based query language for the task graph. Queries are S-expressions — they parse, they don't eval. No code execution, no injection surface.

**Sources** start a pipeline:

```clojure
:all                          ;; All tasks
(active)                      ;; Tasks with event logs
(current)                     ;; Current task
(node "pattern")              ;; Tasks matching a substring
(query "plan-ready")          ;; Named query (ready phases)
```

**Steps** transform the pipeline:

```clojure
(:where predicate)            ;; Filter
(:sort :field)                ;; Sort descending
(:take n)                     ;; Limit
:enrich                       ;; Load full CRDT state
(:select :field1 :field2)     ;; Project fields
(:group-by :field)            ;; Group
(:follow :edge-type)          ;; Traverse edges forward
(:back :edge-type)            ;; Traverse edges backward
:ids                          ;; Extract IDs
:count                        ;; Count
```

**Mutations** modify tasks in the pipeline:

```clojure
(:complete!)                  ;; Mark complete
(:reopen!)                    ;; Reopen
(:observe! "text")            ;; Add observation
(:set-meta! :key "value")     ;; Set metadata
(:link! "target" :edge-type)  ;; Add edge
(:sever! "target" :edge-type) ;; Remove edge
(:sever-from-parent! :type)   ;; Bulk sever from parent
```

**Scaffolding** creates phased plans:

```clojure
(scaffold-plan!
  (p1 "Design schema"
      :objective "Normalize user tables"
      :acceptance "Migration passes, tables exist"
      :steps "1. Define schema\n2. Run migrations\n3. Verify")
  (p2 "Implement API" :after p1
      :objective "REST endpoints for CRUD"
      :acceptance "All endpoints return 200")
  (p3 "Write tests" :after p2))
```

Phase metadata keywords: `:objective`, `:acceptance`, `:steps`, `:context`, `:constraints`.

### PQ (Pattern Queries)

PQ queries the pattern graph with the same pipeline syntax:

```clojure
;; All patterns
(-> :all :count)

;; Patterns proven helpful
(-> (proven :min 3) (:take 5))

;; Semantic + graph activation for current context
(-> (activate "CRDT conflict resolution" :boost (lisp nix)) (:take 3))

;; Search by content
(-> (search "testing strategy") (:take 5))

;; Give feedback
(-> (pattern "lisp-042") (:feedback! :helpful "Confirmed in REPL"))
```

## Libraries

| Library | Description |
|---------|-------------|
| `lib/crdt` | G-Set, OR-Set, PN-Counter, LWW-Register, LWW-Map, Vector Clock |
| `lib/task` | Event log, CRDT state, DAG algorithms, TQ engine, Markov clustering |
| `lib/playbook` | PQ engine, pattern store, helpful/harmful scoring, activation graph |
| `lib/mcp-framework` | MCP server framework: JSON-RPC 2.0, tool registry, schema generation |
| `lib/mcp-http` | HTTP+SSE transport with session management |
| [`lol-reactive`](https://github.com/kleisli-io/lol-reactive) | Reactive web framework for the dashboard (flake input — HTMX, signals, Tailwind) |
| `lib/claude-hooks` | Claude Code hook handlers (session lifecycle, tool tracking, conflict detection) |

### Building

kli uses Nix for reproducible builds. The standalone flake at `kli/flake.nix` pins all dependencies including `cl-deps` (Common Lisp package set) and `lol-reactive`.

```bash
nix build          # produces result/bin/kli
nix run            # run directly
nix flake check    # run test suites (CRDT, task, playbook, hooks)
```

From source with SBCL and Quicklisp:

```lisp
(asdf:load-system :kli)
(kli:main)
```


# nix-effects

## Guide

### Introduction


Nix configurations fail late. A misspelled option name, a string where an
integer belongs, a firewall rule that references a port no service
listens on — these surface at build time, at deploy time, or when
a user files a ticket. The NixOS module system catches some of this
with `mkOption` and `types.str`, but the checking is shallow: it
validates individual fields, not relationships between them. "The build
system must appear in the declared platforms list" is a constraint no
existing Nix tool can express as a type.

nix-effects is a freer-monad effect layer for pure Nix, with a
dependent type checker built on top of it. The effect layer is where
the DX comes from. Validation is phrased as a `typeCheck` effect, and
the same validator can run under different handlers that choose what
happens on a failure. The handler is where the policy lives, not the
validator. Everything runs at `nix eval` time, before anything builds
or ships.

On top of the effect layer sits a Martin-Löf dependent type checker in
`src/tc/` with Pi, Sigma, identity types with J, cumulative universes,
HOAS elaboration, and verified extraction of plain Nix functions from
proof terms. The kernel itself is pure functions over values,
independent of the effect layer; the effect layer is what surfaces
kernel errors to users. The bidirectional checker sends `typeCheck`
effects carrying a field-path context, so type errors in deeply nested
terms come back localized to the field that broke.

## What it looks like

A port number is an integer between 1 and 65535. In nix-effects, that's
a refinement type:

```nix
let
  inherit (fx.types) Int refined;

  Port = refined "Port" Int (x: x >= 1 && x <= 65535);
in {
  ok  = Port.check 8080;   # true
  bad = Port.check 99999;  # false
}
```

Behind the scenes, `.check` runs the MLTT kernel's decision procedure —
the value is elaborated into a kernel term, type-checked, and the
predicate is evaluated. For a refinement type like `Port`, this is fast:
the kernel confirms the base type (`Int`), the guard confirms the range.
You write normal Nix and the kernel runs behind the scenes.

But checking individual values is only the starting point. The kernel
can also verify entire functions — confirm that a validator you wrote
is type-correct, then extract it as an ordinary Nix function.

## Verified functions over real data

Write an implementation in HOAS (Higher-Order Abstract Syntax), the
kernel type-checks it, and `v.verify` extracts a callable Nix function.

Here's a derivation spec validator. The kernel verifies a function that
takes a record with `license`, `platforms`, and `system` fields, then
checks three constraints: the build system appears in the platforms
list, the system is one of the supported architectures, and the license
is approved. All three checks use string comparison inside the kernel
— `strEq` is a kernel primitive, not a Nix-level hack.

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  Spec = H.record [
    { name = "license";   type = H.string; }
    { name = "platforms"; type = H.listOf H.string; }
    { name = "system";    type = H.string; }
  ];

  licenses = mkStrList [ "MIT" "Apache-2.0" "BSD-3-Clause" ];
  systems  = mkStrList [ "x86_64-linux" "aarch64-linux" "x86_64-darwin" ];

  # Kernel-verified: system ∈ platforms AND system ∈ supported AND license ∈ approved
  validateSpec = v.verify (H.forall "s" Spec (_: H.bool))
    (v.fn "s" Spec (s:
      v.if_ H.bool (v.strElem (v.field Spec "system" s)
                               (v.field Spec "platforms" s)) {
        then_ = v.if_ H.bool (v.strElem (v.field Spec "system" s) systems) {
          then_ = v.strElem (v.field Spec "license" s) licenses;
          else_ = v.false_;
        };
        else_ = v.false_;
      }));
in {
  ok   = validateSpec {
    system = "x86_64-linux"; license = "MIT";
    platforms = [ "x86_64-linux" "aarch64-linux" ];
  };   # true

  bad  = validateSpec {
    system = "arm64"; license = "MIT";
    platforms = [ "x86_64-linux" ];
  };   # false — arm64 not in supported systems

  mismatch = validateSpec {
    system = "x86_64-linux"; license = "MIT";
    platforms = [ "aarch64-linux" ];
  };   # false — system not in its own platforms list
}
```

`validateSpec` is a plain Nix function. You call it with a plain Nix
attrset. But the implementation was verified by the MLTT kernel before
extraction — the kernel confirmed that the function matches its type
(`Spec → Bool`), that field projections are well-typed, and that the
string membership checks compose correctly. If you made a type error in
the implementation — say, compared a `Bool` where a `String` was
expected — the kernel would reject it at `nix eval` time.

The record type (`H.record`) elaborates to nested Sigma in the kernel.
`v.field` desugars to the right chain of first/second projections.
`v.strEq` is a kernel primitive that reduces `strEq "foo" "foo"` to
`true` during normalization. `v.strElem` folds over a list with `strEq`.
None of this is Nix-level string comparison — it's computation inside
the proof checker.

## Proofs as programs

The same kernel that verifies functions also checks mathematical
proofs. Both are the same judgment — `Γ ⊢ t : T` — applied
differently. A verified function proves that an implementation inhabits
its type. An equality proof proves that two expressions reduce to the
same normal form.

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  # Verified addition: Nat → Nat → Nat by structural recursion
  add = v.verify (H.forall "m" H.nat (_: H.forall "n" H.nat (_: H.nat)))
    (v.fn "m" H.nat (m: v.fn "n" H.nat (n:
      v.match H.nat m {
        zero = n;
        succ = _k: ih: H.succ ih;
      })));

in {
  five = add 2 3;     # 5

  # Prove 3 + 5 = 8: the kernel normalizes both sides, Refl witnesses equality
  proof = (H.checkHoas (H.eq H.nat (add (H.natLit 3) (H.natLit 5)) (H.natLit 8))
                       H.refl).tag == "refl";    # true
}
```

The `add` function is extracted exactly like `validateSpec` — write in
HOAS, kernel checks, extract a Nix function. The equality proof goes
one step further: the kernel normalizes `add(3, 5)` by running the
structural recursion, arrives at `8`, and confirms `Refl` witnesses
`8 = 8`. This is computational proof — the kernel computes the answer
and verifies that computation agrees with the claim.

## The effect system

The "effects" in nix-effects are algebraic effects implemented via a
freer monad (Kiselyov & Ishii 2015). A computation is a tree of
effects with continuations. A handler walks the tree, interpreting
each effect:

```nix
let
  inherit (fx) pure bind run;
  inherit (fx.effects) get put;

  # Read state, double it, write it back
  doubleState = bind get (s: bind (put (s * 2)) (_: pure s));

  result = run doubleState fx.effects.state.handler 21;
  # result.value = 21, result.state = 42
in result
```

This matters for type checking because it separates *what* to check
from *how* to report. When `DepRecord.validate` finds a type error,
it sends a `typeCheck` effect. The handler decides the policy:

- **Strict** — abort on the first error
- **Collecting** — gather all errors, keep checking
- **Logging** — record every check, pass or fail

Same validation logic, different handler. The type-checking kernel
itself runs as an effectful computation on this same infrastructure —
the kernel is just another program running on the effects substrate.

## The verification spectrum

Not everything needs proofs. nix-effects supports four levels of
assurance, and you pick the one that fits:

**Level 1 — Contract.** Write normal Nix. Types check values via
`.check`. The kernel runs behind the scenes. Zero cost to adopt.

```nix
Port = refined "Port" Int (x: x >= 1 && x <= 65535);
Port.check 8080    # true
```

**Level 2 — Boundary.** Data is checked by the kernel at module
interfaces. Every type has a `kernelType` and `.check` is derived from
the kernel's `decide` procedure. This is what all built-in types do by
default — `(ListOf String).check ["a" "b"]` elaborates the list into a
kernel term and type-checks it.

**Level 3 — Property.** Write proof terms in HOAS that the kernel
verifies. Prove that `3 + 5 = 8`, or that double negation on booleans
is the identity, or that `append([1,2], [3]) = [1,2,3]`.

**Level 4 — Full.** The implementation IS the proof term. Write in
HOAS, the kernel verifies, `extract` produces a Nix function correct by
construction. The `validateSpec` example above is Level 4 — the kernel
verified the validator before extracting it as a callable function.

Most users will stay at levels 1 and 2. The kernel is there when you
need it. With record types and string comparison now in the kernel,
Level 4 handles real-world validation — not just arithmetic on natural
numbers.

## How this document is organized

The rest of the guide builds up from here:

- **[Getting Started](/nix-effects/guide/getting-started)** walks through installation,
  your first type, your first effect, and the end-to-end derivation demo.
- **[Proof Guide](/nix-effects/guide/proof-guide)** builds proofs incrementally, from
  computational equality through the J eliminator to verified
  extraction of plain Nix functions from kernel-checked HOAS terms.
- **[Theory](/nix-effects/guide/theory)** covers the papers that shaped the design,
  algebraic effects and freer monads, FTCQueue for O(1) bind, dependent
  type theory in the Martin-Löf and Mini-TT lineage, the handler
  pattern, and refinement and graded types, and how they compose as a
  practical engineering layer with a dependent type checker on top.
- **[Trampoline](/nix-effects/guide/trampoline)** explains how `builtins.genericClosure`
  becomes a trampoline for stack-safe evaluation at scale.
- **[Systems Architecture](/nix-effects/guide/systems-architecture)** describes the
  kernel-first design: one notion of type, one checking mechanism, no
  adequacy bridge.
- **[Kernel Architecture](/nix-effects/guide/kernel-architecture)** details the MLTT
  kernel internals — NbE, bidirectional checking, HOAS elaboration,
  extraction, and the trust model.
- **[Kernel Specification](/nix-effects/guide/kernel-spec)** gives the formal typing
  rules.

## References

1. Martin-Lof, P. (1984). *Intuitionistic Type Theory*. Bibliopolis.

2. Kiselyov, O., & Ishii, H. (2015). *Freer Monads, More Extensible
   Effects*. Haskell Symposium 2015.
   [[pdf](https://okmij.org/ftp/Haskell/extensible/more.pdf)]

3. Plotkin, G., & Pretnar, M. (2009). *Handlers of Algebraic Effects*.
   ESOP 2009.
   [[doi](https://doi.org/10.1007/978-3-642-00590-9_7)]

4. Findler, R., & Felleisen, M. (2002). *Contracts for Higher-Order
   Functions*. ICFP 2002.
   [[doi](https://doi.org/10.1145/581478.581484)]


### Getting Started


## Installation

Add nix-effects as a flake input:

```nix
{
  inputs.nix-effects.url = "github:kleisli-io/nix-effects";

  outputs = { nix-effects, nixpkgs, ... }:
    let
      fx = nix-effects.lib;
    in {
      # fx.types, fx.run, fx.send, fx.bind, fx.effects, fx.stream ...
    };
}

```

Or import directly without flakes:

```nix
let fx = import ./path/to/nix-effects { lib = nixpkgs.lib; };
in ...

```

## Your first type

Define a type with `fx.types.refined`:

```nix
let
  inherit (fx.types) Int refined;

  Port = refined "Port" Int (x: x >= 1 && x <= 65535);
in {
  # Kernel decision procedure — fast boolean check
  ok  = Port.check 8080;   # true
  bad = Port.check 99999;  # false

  # Effectful validate — runs through the trampoline, produces blame context
  result = fx.run (Port.validate 99999)
    fx.effects.typecheck.collecting [];
  # result.state = [ { context = "Port"; message = "Expected Port, got int"; ... } ]
}

```

## Your first dependent type

One field's type depends on another field's value — this is a genuine
dependent type, checked by the MLTT proof-checking kernel:

```nix
let
  inherit (fx.types) Bool Int String ListOf DepRecord refined;

  FIPSCipher = refined "FIPSCipher" String
    (x: builtins.elem x [ "AES-256-GCM" "AES-192-GCM" "AES-128-GCM" "AES-256-CBC" "AES-128-CBC" ]);

  ServiceConfig = DepRecord [
    { name = "fipsMode"; type = Bool; }
    { name = "cipherSuites"; type = self:
        if self.fipsMode then ListOf FIPSCipher else ListOf String; }
  ];
in {
  ok  = ServiceConfig.checkFlat { fipsMode = true;  cipherSuites = [ "AES-256-GCM" ]; };  # true
  bad = ServiceConfig.checkFlat { fipsMode = true;  cipherSuites = [ "3DES" ]; };         # false
}

```

## Your first effect

Write a computation, then choose the handler:

```nix
let
  inherit (fx) pure bind run;
  inherit (fx.effects) get put;

  # Double the state
  doubleState = bind get (s: bind (put (s * 2)) (_: pure s));

  result = run doubleState fx.effects.state.handler 21;
  # result.value = 21  (old state returned)
  # result.state = 42  (state doubled)
in result

```

## Running the showcase

The repo includes a working end-to-end demo:

```bash
git clone https://github.com/kleisli-io/nix-effects
cd nix-effects

# Valid config — build succeeds
nix build .#cryptoService

# Invalid config (3DES in FIPS mode) — caught at eval time
nix build .#buggyService
# error: Type errors in ServiceConfig:
#   - List[FIPSCipher][3]: "3DES" is not a valid FIPSCipher

# Run all tests
nix flake check

```

## The kernel behind every type

Every `.check` call runs the MLTT type-checking kernel. When you write
`Port.check 8080`, the kernel elaborates `8080` into a term,
type-checks it, and returns a boolean. This is not a separate system —
it is what `.check` does.

You can also write verified implementations using HOAS combinators:

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  # Write a function in HOAS, kernel type-checks it, extract as Nix function
  succ = v.verify (H.forall "x" H.nat (_: H.nat))
                   (v.fn "x" H.nat (x: H.succ x));
in
  succ 5    # 6 — a certified Nix function
```

The kernel checks the implementation against its type at `nix eval` time.
If the types don't match, you get an error before anything builds.
See the [Kernel Architecture](/nix-effects/guide/kernel-architecture) chapter for the
full pipeline.

## What's in the box

The `fx` attrset is the entire public API:

| Namespace | Contents |
|-----------|---------|
| `fx.pure`, `fx.bind`, `fx.send`, `fx.map`, `fx.seq` | Freer monad kernel |
| `fx.run`, `fx.handle` | Trampoline interpreter |
| `fx.adapt`, `fx.adaptHandlers` | Handler composition |
| `fx.types.*` | Type system (primitives, constructors, dependent, refinement, universe) |
| `fx.types.hoas` | HOAS surface combinators for the kernel |
| `fx.types.elaborateType`, etc. | Elaboration bridge: fx.types ↔ kernel |
| `fx.types.verified` | Convenience combinators for writing verified implementations |
| `fx.effects.*` | Built-in effects (state, error, reader, writer, acc, choice, conditions, typecheck, linear) |
| `fx.stream.*` | Effectful lazy sequences |


### Sugar


nix-effects ships an opt-in syntax layer. The kernel doesn't import it,
nothing in the effect interpreter depends on it, and removing it leaves
the library unchanged. What it buys is readability. A three-step state
computation without sugar:

```nix
bind state.get (n:
  bind (state.put (n + 1)) (_:
    bind state.get (n2:
      pure n2)))
```

and with sugar, two forms:

```nix
# Combinator
do [ (_: state.get) (n: state.put (n + 1)) (_: state.get) ]

# Operator
state.get / (n: state.put (n + 1)) / (_: state.get)
```

Both evaluate to the same value under the state handler. A third form
— `letM` — applies to parallel effects whose results you want under
named bindings:

```nix
# Without sugar
bind (reader.asks (e: e.host)) (host:
  bind (reader.asks (e: e.port)) (port:
    pure "${host}:${toString port}"))

# With letM
letM {
  host = reader.asks (e: e.host);
  port = reader.asks (e: e.port);
} (b: pure "${b.host}:${toString b.port}")
```

`letM` evaluates its attrs independently and passes the result attrset
to the continuation. Use it when the effects don't depend on each
other's values.

## Opting in

Sugar is a hybrid namespace. Effect combinators sit at the top level
of `fx.sugar`, so `with fx.sugar;` brings `do`, `letM`, `pure`,
`bind`, `run`, and `handle` into scope immediately. Division and
types are one level deeper, under `operators` and `types`
respectively.

```
fx.sugar
  ├── do, letM
  ├── pure, bind, map, seq, pipe, kleisli
  ├── run, handle
  ├── operators
  │     └── __div
  └── types
        ├── wrap
        └── Int, String, Bool, Float, Path, Null, Unit, Any
```

The combinator-only form is safe under every Nix dialect. No operator
magic, no `with`, no chance of surprising anyone:

```nix
let inherit (fx.sugar) do letM;
in do [
  (_: state.get)
  (n: state.put (n * 3))
  (_: state.get)
]
```

Adding `/` as left-associative bind turns long `bind` chains into
pipelines:

```nix
let inherit (fx.sugar.operators) __div;
in state.get / (s: state.put (s + 7)) / (_: state.get)
```

For a file that's mostly computation, reach for `with fx.sugar;` and
pair it with the `__div` inherit:

```nix
let
  inherit (fx.sugar.operators) __div;
in with fx.sugar;
  state.get / (s: state.put (s * 2)) / (_: state.get)
```

The division operator is always nested under `operators`. `with
fx.sugar;` alone will not activate `/` — you have to reach for
`operators` explicitly. That nesting is the entire reason the
namespace is hybrid.

## Effect combinators

### `do`: sequence of steps

`do` takes a list of functions, each of which receives the previous
step's value and returns the next computation. The first step gets
`null`:

```nix
do [
  (_: pure 1)
  (x: pure (x + 1))
  (x: pure (x * 10))
]
# runs to 20
```

Empty lists produce `pure null`. Singletons are equivalent to calling
the one step on `null`. The argument binding is positional: `(n: ...)`
means "the previous step's value is bound to `n`." If you don't need
it, use `_`.

### `letM`: named results

`letM` collects an attrset of computations, evaluates each one, and
hands the result attrset to a continuation. The Reader-pattern example
from the test suite:

```nix
letM {
  host = reader.asks (e: e.host);
  port = reader.asks (e: e.port);
} (b: pure "${b.host}:${toString b.port}")
# runs to "example.com:443"
```

The continuation receives `{ host; port; }`. When a computation and
its continuation don't need sequencing by intermediate values but do
need named results in scope, `letM` is cleaner than nested `bind`.

### `__div`: operator-style bind

`__div` is a magic attribute name. When both operands of `/` are
non-numeric and `__div` is lexically in scope, Nix dispatches the
operator through it. `fx.sugar.operators.__div` is `fx.bind` under
another name.

```nix
let inherit (fx.sugar.operators) __div;
in state.get / (n: pure (n + 1)) / (n: pure (n * 2))
```

The form is left-associative: `a / f / g` is `bind (bind a f) g`.
This matches the usual reading of a pipeline.

### Re-exports

For convenience, `fx.sugar` re-exports `pure`, `bind`, `map`, `seq`,
`pipe`, `kleisli`, `run`, and `handle` verbatim from `fx`. `with
fx.sugar;` gives you everything the effect layer exposes without
a second `inherit` line.

## Type sugar

### Primitives and refinement

`fx.sugar.types` pre-wraps the eight zero-ary primitives —
`Int`, `String`, `Bool`, `Float`, `Path`, `Null`, `Unit`, `Any` —
with a `__functor` that builds a refinement when you apply a
predicate. A port number, written without sugar:

```nix
let inherit (fx.types) Int refined;
in refined "Port" Int (x: x >= 0 && x <= 65535)
```

and with sugar:

```nix
let inherit (fx.sugar.types) Int;
in Int (x: x >= 0) (x: x <= 65535)
```

Both produce a kernel-identical type. The difference is readability
when you're composing several predicates.

### Name cascading

Every refinement appends a `?` to the base type's name. Repeated
refinement cascades:

```nix
let inherit (fx.sugar.types) Int;
  P0 = Int;                       # "Int"
  P1 = Int (x: x >= 0);           # "Int?"
  P2 = Int (x: x >= 0) (x: x < 10); # "Int??"
in builtins.toString P2
# "Int??"
```

The name is what shows up in error messages. If `Int??` isn't
descriptive enough, drop back to `fx.types.refined` and give the type
an explicit name:

```nix
let inherit (fx.types) refined Int;
in refined "Port" Int (x: x >= 0 && x <= 65535)
```

### `wrap` for user-defined types

Types built with `fx.types.mkType` don't get sugar by default. Wrap
them with `fx.sugar.types.wrap` to opt in:

```nix
let
  inherit (fx.types) mkType hoas;
  inherit (fx.sugar.types) wrap;

  UserInt = mkType { name = "UserInt"; kernelType = hoas.int_; };
  Sugared = wrap UserInt;
in
  (Sugared (x: x > 0)).check 5   # true
```

Wrapping is purely additive. It only adds `__functor` (for refinement
application) and `__toString` (for the name). The base type's kernel,
check, description, universe, and every other field stay untouched —
so a sugared type is interchangeable with the desugared original
everywhere the kernel looks at it.

### Sugar inside Record fields

Constructors like `Record`, `ListOf`, `Maybe`, and `Either` already
consume a first argument — their schema. Wrapping them with
`__functor` would collide with that call shape, so `fx.sugar.types`
doesn't wrap them. It doesn't need to: a sugared field-type inside a
Record schema composes for free, because Record reads only the
kernel, which sugar preserves:

```nix
let inherit (fx.types) Record;
    inherit (fx.sugar.types) Int String Bool;
in Record {
  age = Int (x: x >= 0);
  name = String (s: builtins.stringLength s > 0);
  active = Bool;
}
```

This Record has the same `_kernel` as the hand-refined version. The
sugar is pushed *into* the schema, where it needs no special support
from the constructor.

## Caveats

A few details worth knowing before you reach for sugar.

### `+` can't be overloaded

Nix's `+` operator is `ExprConcatStrings` in the parser <sup>1</sup>.
The runtime dispatches on operand types (string, path, number) without
consulting any magic attribute. There's no `__plus` to implement.
Applied to two types, `+` will either concatenate (if they're
strings), add (if they're numbers), or error out. It is not a hook.

For the same reason, there's no way to overload `==`, `<`, or most
other operators. Sugar uses what Nix already dispatches through:
`__functor` for callable attrsets, `__toString` for string coercion,
and `__div` for `/`.

### `with` does not activate `__div`

Nix's `/` operator looks up `__div` by name in the enclosing lexical
scope. It does not search `with`-scoped values. This surprises people
who expect the two forms to be interchangeable:

```nix
# Works:
let inherit (fx.sugar.operators) __div; in (state.get / f)

# Does NOT work — raises an arithmetic division error at runtime:
with fx.sugar.operators; (state.get / f)
```

The reason is that `with` only extends the free-variable lookup
chain — it does not introduce `__div` as a bound name in the scope
that `/` consults. The full-sugar form above wraps `inherit
(fx.sugar.operators) __div;` in the same `let` that brings in the
combinators for exactly this reason.

A witness test lives at `tests/sugar-effects-test.nix` under
`withOperatorsDoesNotActivateDiv`. It asserts that `with
fx.sugar.operators; (6 / 2) == 3` — plain arithmetic, not `__div`
dispatch.

### Lix 2.92+ rejects `__div` shadow

Lix <sup>2</sup> deprecated the pattern of binding a name prefixed
with `__` that shadows a builtin-reserved operator slot. As of Lix
2.92, `let inherit (ops) __div; in ...` produces
`shadow-internal-symbols` errors during parse. If your codebase
targets Lix, stick to `do` and `letM` — they don't touch this
mechanism.

This is not a CppNix limitation. `__div` works under CppNix 2.18+ and
2.31 (the release we test against). The Lix deprecation is a
deliberate policy choice in that fork.

### Scope pollution with `with`

`with fx.sugar.operators;` brings `__div` into the lookup chain as a
value, not as an operator hook. As just noted, it won't make `/`
dispatch to it. But it does make the name `__div` available for
reference — a minor footgun if you were relying on shadowing.
Prefer `inherit` over `with` for operator opt-in.

### Name cascading versus explicit names

Chained refinements produce names like `Int??`. That's intentional
("you refined this twice") but not always helpful in error messages.
If your domain has a real name, use `fx.types.refined` directly:

```nix
let inherit (fx.types) refined Int;
    Even = refined "Even" Int (x: builtins.bitAnd x 1 == 0);
    Positive = refined "Positive" Int (x: x > 0);
    EvenPositive = refined "EvenPositive" Even (x: x > 0);
in EvenPositive.name   # "EvenPositive"
```

Sugar is for in-place predicates, not named types that outlive their
definition.

## Forward-compat notes

Sugar is strictly additive and never references anything the type
system marks for retirement. Three commitments hold across future
changes to the kernel and type modules.

**Kernel preservation.** A sugared type has the same `_kernel` as its
base. Constructors (`Record`, `ListOf`, `Maybe`, `Either`, `Variant`)
read only `_kernel` — so a sugared field is indistinguishable from a
desugared one to every kernel consumer.

**Refinement delegation.** Sugar never constructs refined types
directly. Every `sugared T (pred)` call goes through
`fx.types.refined`, which is the user-facing API point guaranteed to
survive kernel-internal reorganizations. If `refined` changes shape,
sugar follows automatically.

**No diagnostic emission.** Sugar never builds values from
`src/diag/positions.nix` or `src/diag/error.nix`. Error annotation
happens via the base type's `description` and `name`, which
propagate through `refined` without sugar-specific code. When the
diagnostic layer gains structure, sugar will inherit it.

Active witness tests for each of these live in
`tests/sugar-compat-test.nix`. Running the test file as part of
`nix flake check` keeps the commitments observable.

## When to reach for which form

`do` and `letM` are the default. Reach for `__div` when a pipeline
has three or more obvious-effect steps and the parentheses are
hurting readability. Reach for `with fx.sugar;` when you're writing a
file that's mostly computation, not mostly plumbing.

For types, use `fx.sugar.types` for one-off refinements inside Record
schemas. Drop back to `fx.types.refined` when the type deserves a
name you'll reference elsewhere.

---

<sup>1</sup> `src/libexpr/parser.y` in the Nix source, handling
`ExprConcatStrings`.

<sup>2</sup> Lix is a community fork of Nix. Relevant deprecation:
`shadow-internal-symbols` in Lix 2.92 release notes.


### Proof Guide


Nix configurations are concrete at eval time. Every field, every value,
every list element is known before anything builds. The nix-effects
dependent type checker exploits this: it normalizes both sides of an
equation via NbE, and if they reduce to the same value, `Refl` proves
them equal. No symbolic reasoning, no induction over unknowns — just
computation on concrete data, checked through the freer-monad effect
layer in 1,300 lines of pure Nix.

This chapter builds proofs incrementally, from `0 + 0 = 0` through
the J eliminator to verified extraction of plain Nix functions from
kernel-checked HOAS terms. Every example is runnable. The code comes
from three files in the repository:
[`proof-basics.nix`](https://github.com/kleisli-io/nix-effects/blob/main/examples/proof-basics.nix),
[`equality-proofs.nix`](https://github.com/kleisli-io/nix-effects/blob/main/examples/equality-proofs.nix), and
[`verified-functions.nix`](https://github.com/kleisli-io/nix-effects/blob/main/examples/verified-functions.nix).

**Prerequisites.** You should know what a function is and what `let`
bindings do in Nix. Familiarity with the Getting Started chapter helps
but isn't required. You do not need to know type theory.

## Your first proof

A proof in nix-effects is a term that type-checks against an equality
type. The simplest equality type is `Eq(Nat, 0+0, 0)` — the claim
that adding zero to zero produces zero. The proof term is `Refl`, which
says "both sides are the same." The kernel checks this by normalizing
`0 + 0`, arriving at `0`, and confirming that `Refl` witnesses `0 = 0`.

```nix
let
  H = fx.types.hoas;
  inherit (H) nat eq zero refl checkHoas;

  # Addition by structural recursion on the first argument
  add = m: n:
    H.ind (H.lam "_" nat (_: nat)) n
      (H.lam "k" nat (_: H.lam "ih" nat (ih: H.succ ih))) m;
in
  # Prove: 0 + 0 = 0
  (checkHoas (eq nat (add zero zero) zero) refl).tag == "refl"
  # → true
```

`checkHoas` is the kernel's entry point. It takes a type and a term,
runs bidirectional type checking with normalization by evaluation, and
returns a result. If the result's `tag` is `"refl"`, the proof was
accepted. If it has an `error` field, the kernel rejected it.

The kernel doesn't pattern-match on `0 + 0 = 0` as a special case. It
evaluates `add(zero, zero)` by running the `NatElim` eliminator — the
base case fires, returns `n` (which is `zero`), and the kernel sees
`Eq(Nat, zero, zero)`. `Refl` witnesses any `Eq(A, x, x)`, so the
proof goes through.

Larger numbers work the same way. The kernel unrolls the recursion:

```nix
# 3 + 5 = 8
(checkHoas (eq nat (add (H.natLit 3) (H.natLit 5)) (H.natLit 8)) refl).tag == "refl"

# 10 + 7 = 17
(checkHoas (eq nat (add (H.natLit 10) (H.natLit 7)) (H.natLit 17)) refl).tag == "refl"
```

Both reduce to `true`. The kernel normalizes `add(3, 5)` step by
step — three `succ` peels, then the base case returns `5`, then three
`succ` wrappers are reapplied — and confirms the result is `8`.

## Dependent witnesses

A computational equality says "these two things are the same." A
dependent witness says "here is a value, and here is evidence that
it has a property." The Sigma type `Σ(x:A).P(x)` packages both: a
value `x` of type `A`, and a proof that `P(x)` holds.

```nix
let
  H = fx.types.hoas;
  inherit (H) nat eq sigma zero pair refl checkHoas;
in {
  # "There exists x : Nat such that x = 0" — witnessed by (0, Refl)
  witness = let
    ty = sigma "x" nat (x: eq nat x zero);
    tm = pair zero refl;
  in (checkHoas ty tm).tag == "pair";
  # → true
}
```

The type `Σ(x:Nat). Eq(Nat, x, 0)` says "a natural number equal to
zero." The term `(0, Refl)` inhabits it: `0` for the value, `Refl` for
the proof that `0 = 0`. The kernel checks both components — it
confirms `0 : Nat` and `Refl : Eq(Nat, 0, 0)`.

Witnesses get more interesting when the property involves computation:

```nix
# "There exists x such that 3+5 = x" — witnessed by (8, Refl)
witnessAdd = let
  add = m: n:
    H.ind (H.lam "_" nat (_: nat)) n
      (H.lam "k" nat (_: H.lam "ih" nat (ih: H.succ ih))) m;
  ty = sigma "x" nat (x: eq nat (add (H.natLit 3) (H.natLit 5)) x);
  tm = pair (H.natLit 8) refl;
in (checkHoas ty tm).tag == "pair";
```

The kernel normalizes `add(3, 5)` to `8`, checks that `8` matches the
witness value, and accepts the proof. If you claimed the witness was
`7`, the kernel would reject it — `Refl` can't witness `8 = 7`.

## Eliminators

Eliminators are how you compute over inductive types in type theory.
Where Nix uses `if`/`else` and list folds, the kernel uses eliminators:
structured recursion with a *motive* that declares what type the result
has. The motive is what makes these dependently typed — the return type
can vary based on the input.

### Booleans

`H.boolElim k motive trueCase falseCase scrutinee` — case analysis on
a derived boolean. `H.bool` is `μ ⊤ (plus (retI tt) (retI tt)) tt`,
and `H.boolElim` is defined in terms of `desc-ind` on that
description (see `src/tc/hoas/combinators.nix`). The user-facing
behavior is the standard boolean eliminator: with a constant motive
(return type doesn't depend on the boolean), it's equivalent to an
if/else.

```nix
let
  H = fx.types.hoas;
  inherit (H) nat bool eq zero refl boolElim checkHoas;
in {
  # if true then 42 else 0 = 42
  trueCase = let
    result = boolElim 0 (H.lam "_" bool (_: nat)) (H.natLit 42) zero H.true_;
  in (checkHoas (eq nat result (H.natLit 42)) refl).tag == "refl";

  # if false then 42 else 0 = 0
  falseCase = let
    result = boolElim 0 (H.lam "_" bool (_: nat)) (H.natLit 42) zero H.false_;
  in (checkHoas (eq nat result zero) refl).tag == "refl";
}
```

### Natural numbers

`NatElim(motive, base, step, n)` — structural recursion. The base
case handles zero, the step case takes the predecessor `k` and the
inductive hypothesis `ih` (the result for `k`) and produces the result
for `S(k)`:

```nix
let
  H = fx.types.hoas;
  inherit (H) nat eq refl checkHoas;

  # double(n): double(0) = 0, double(S(k)) = S(S(double(k)))
  double = n: H.ind (H.lam "_" nat (_: nat)) H.zero
    (H.lam "k" nat (_: H.lam "ih" nat (ih: H.succ (H.succ ih)))) n;
in
  # double(4) = 8
  (checkHoas (eq nat (double (H.natLit 4)) (H.natLit 8)) refl).tag == "refl"
```

The kernel unrolls four steps: `double(4) = S(S(double(3))) = ... = 8`.

### Lists

`ListElim(elemType, motive, nilCase, consCase, list)` — structural
recursion on lists. The nil case provides the base value, the cons case
takes the head, tail, and inductive hypothesis:

```nix
let
  H = fx.types.hoas;
  inherit (H) nat eq refl checkHoas;

  list123 = H.cons nat (H.natLit 1) (H.cons nat (H.natLit 2)
              (H.cons nat (H.natLit 3) (H.nil nat)));

  # sum(xs): fold with addition
  sumList = xs: H.listElim nat (H.lam "_" (H.listOf nat) (_: nat)) H.zero
    (H.lam "h" nat (h: H.lam "t" (H.listOf nat) (_:
      H.lam "ih" nat (ih:
        H.ind (H.lam "_" nat (_: nat)) ih
          (H.lam "k" nat (_: H.lam "ih2" nat (ih2: H.succ ih2))) h)))) xs;
in
  # sum([1, 2, 3]) = 6
  (checkHoas (eq nat (sumList list123) (H.natLit 6)) refl).tag == "refl"
```

### Sums (coproducts)

`SumElim(L, R, motive, leftCase, rightCase, scrutinee)` — case
analysis on `Left(a)` or `Right(b)`:

```nix
let
  H = fx.types.hoas;
  inherit (H) nat bool sum eq zero refl checkHoas;
in {
  # case Left(5) of { Left n → n; Right _ → 0 } = 5
  leftCase = let
    scrut = H.inl nat bool (H.natLit 5);
    result = H.sumElim nat bool (H.lam "_" (sum nat bool) (_: nat))
      (H.lam "n" nat (n: n))
      (H.lam "b" bool (_: zero))
      scrut;
  in (checkHoas (eq nat result (H.natLit 5)) refl).tag == "refl";
}
```

## The J eliminator

Everything above uses `Refl` on equalities that the kernel verifies
by computation — normalize both sides, confirm they match. But what if
you want to reason *about* equalities? Prove that equality is
symmetric, or that applying a function to equal inputs gives equal
outputs? That requires the J eliminator, the fundamental proof
principle for identity types in Martin-Löf type theory [1].

J says: if you can prove something about `x = x` (the reflexive case),
you can prove it about any `x = y` where the equality is witnessed.

```
J(A, a, P, pr, b, eq)
  A  : type
  a  : left side of the equality
  P  : λ(y:A). λ(_:Eq(A,a,y)). Type    — the motive
  pr : P(a, refl)                       — the base case (when y = a)
  b  : right side
  eq : Eq(A, a, b)                      — proof that a = b
  Returns: P(b, eq)

Computation rule: J(A, a, P, pr, a, refl) = pr
```

When the equality proof is `Refl`, J returns the base case directly.
The kernel reduces `J(..., refl)` to `pr`, and the proof goes through.

### Congruence

If `x = y`, then `f(x) = f(y)` for any function `f`. This is the
standard *cong* combinator, derived from J:

```nix
let
  H = fx.types.hoas;
  inherit (H) nat eq u forall refl checkHoas;

  congType =
    forall "A" (u 0) (a:
      forall "B" (u 0) (b:
        forall "f" (forall "_" a (_: b)) (f:
          forall "x" a (x:
            forall "y" a (y:
              forall "_" (eq a x y) (_:
                eq b (H.app f x) (H.app f y)))))));

  congTerm = H.lam "A" (u 0) (a:
    H.lam "B" (u 0) (b:
      H.lam "f" (forall "_" a (_: b)) (f:
        H.lam "x" a (x:
          H.lam "y" a (y:
            H.lam "p" (eq a x y) (p:
              H.j a x
                (H.lam "y'" a (y':
                  H.lam "_" (eq a x y') (_:
                    eq b (H.app f x) (H.app f y'))))
                refl y p))))));
in
  (checkHoas congType congTerm).tag == "lam"
```

The derivation: J eliminates the proof `p : Eq(A, x, y)`. The motive
says "given `y'` equal to `x`, produce `Eq(B, f(x), f(y'))`." In the
base case, `y' = x`, so the goal is `Eq(B, f(x), f(x))` — which
`Refl` proves. J then transports this to `Eq(B, f(x), f(y))`.

This generic combinator type-checks with abstract variables `A`, `B`,
`f`, `x`, `y` — the kernel verifies the reasoning is valid for all
inputs. On concrete data, J receives `Refl` (since concrete equalities
reduce by computation), and the kernel simplifies:

```nix
# Concrete: from add(2,1) = 3, derive succ(add(2,1)) = succ(3)
congConcrete = let
  add21 = add (H.natLit 2) (H.succ H.zero);
  three = H.natLit 3;
in (checkHoas
    (eq nat (H.succ add21) (H.succ three))
    (H.j nat add21
      (H.lam "y" nat (y:
        H.lam "_" (eq nat add21 y) (_:
          eq nat (H.succ add21) (H.succ y))))
      refl three refl)).tag == "j";
```

### Symmetry

If `x = y`, then `y = x`. The motive is `λy'.λ_. Eq(A, y', x)` —
when `y' = x`, the goal is `Eq(A, x, x)`, proved by `Refl`:

```nix
# sym : Π(A:U₀). Π(x:A). Π(y:A). Eq(A,x,y) → Eq(A,y,x)
symTerm = H.lam "A" (u 0) (a:
  H.lam "x" a (x:
    H.lam "y" a (y:
      H.lam "p" (eq a x y) (p:
        H.j a x
          (H.lam "y'" a (y': H.lam "_" (eq a x y') (_: eq a y' x)))
          refl y p))));
```

### Transitivity

If `x = y` and `y = z`, then `x = z`. Fix `p : Eq(A, x, y)`, then
eliminate `q` with J. The motive is `λz'.λ_. Eq(A, x, z')` — when
`z' = y`, the goal is `Eq(A, x, y)`, proved by `p`:

```nix
# trans : Π(A:U₀). Π(x:A). Π(y:A). Π(z:A). Eq(A,x,y) → Eq(A,y,z) → Eq(A,x,z)
transTerm = H.lam "A" (u 0) (a:
  H.lam "x" a (x:
    H.lam "y" a (y:
      H.lam "z" a (z:
        H.lam "p" (eq a x y) (p:
          H.lam "q" (eq a y z) (q:
            H.j a y
              (H.lam "z'" a (z': H.lam "_" (eq a y z') (_: eq a x z')))
              p z q))))));
```

### Transport

The most general form. If `x = y` and `P(x)` holds, then `P(y)` holds.
Congruence, symmetry, and transitivity are all special cases.

```nix
# transport : Π(A:U₀). Π(P:A→U₀). Π(x:A). Π(y:A). Eq(A,x,y) → P(x) → P(y)
transportTerm = H.lam "A" (u 0) (a:
  H.lam "P" (forall "_" a (_: u 0)) (bigP:
    H.lam "x" a (x:
      H.lam "y" a (y:
        H.lam "p" (eq a x y) (p:
          H.lam "px" (H.app bigP x) (px:
            H.j a x
              (H.lam "y'" a (y': H.lam "_" (eq a x y') (_: H.app bigP y')))
              px y p))))));
```

### Chaining proofs

J applications compose. Here we chain congruence (lift through `succ`)
with symmetry (reverse the equality) — two J applications, the output
of the first feeding as the equality proof to the second:

```nix
# From Eq(Nat, add(2,1), 3):
#   Step 1 (cong succ): Eq(Nat, S(add(2,1)), S(3))
#   Step 2 (sym):       Eq(Nat, S(3), S(add(2,1)))
combinedProof = let
  add21 = add (H.natLit 2) (H.succ H.zero);
  three = H.natLit 3;
  sadd21 = H.succ add21;
  sthree = H.succ three;
  # Step 1: cong succ
  congStep = H.j nat add21
    (H.lam "y" nat (y: H.lam "_" (eq nat add21 y) (_: eq nat sadd21 (H.succ y))))
    refl three refl;
  # Step 2: sym on the cong result
in (checkHoas (eq nat sthree sadd21)
    (H.j nat sadd21
      (H.lam "y" nat (y: H.lam "_" (eq nat sadd21 y) (_: eq nat y sadd21)))
      refl sthree congStep)).tag == "j";
```

## Verified extraction

Proofs establish that properties hold. Verified extraction goes
further: write an implementation in HOAS, the kernel type-checks it
against a specification, and `v.verify` extracts a callable Nix
function. The result is an ordinary Nix value — an integer, a boolean,
a function, a list — but one whose implementation was machine-checked
before use.

### The simplest case

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  # Kernel-verified successor: Nat → Nat
  succFn = v.verify (H.forall "x" H.nat (_: H.nat))
                    (v.fn "x" H.nat (x: H.succ x));
in
  succFn 5    # → 6
```

`v.verify` does three things: elaborates the HOAS into kernel terms,
type-checks the implementation against the type, and extracts the
result as a Nix value. The extracted function is plain Nix — no kernel
overhead at call time.

`v.fn` is a convenience wrapper around `H.lam` that threads the
extraction metadata. You could write raw `H.lam` instead, but `v.fn`
handles the plumbing for multi-argument functions and pattern matching.

### Pattern matching

`v.match` builds a `NatElim` with a constant motive. You provide the
result type, the scrutinee, and branches for `zero` and `succ`:

```nix
# Verified addition: Nat → Nat → Nat
addFn = v.verify (H.forall "m" H.nat (_: H.forall "n" H.nat (_: H.nat)))
  (v.fn "m" H.nat (m: v.fn "n" H.nat (n:
    v.match H.nat m {
      zero = n;
      succ = _k: ih: H.succ ih;
    })));

addFn 2 3    # → 5
addFn 0 7    # → 7
```

The `succ` branch receives two arguments: `k` (the predecessor) and
`ih` (the inductive hypothesis — the result for `k`). For addition,
`ih` is `add(k, n)`, so wrapping it with `succ` gives `add(S(k), n)`.

### Boolean and cross-type elimination

`v.if_` elaborates to `H.boolElim` on the derived `H.bool`:

```nix
# Verified not: Bool → Bool
notFn = v.verify (H.forall "b" H.bool (_: H.bool))
  (v.fn "b" H.bool (b:
    v.if_ H.bool b { then_ = v.false_; else_ = v.true_; }));

notFn true     # → false
notFn false    # → true
```

Cross-type elimination — scrutinize one type, return another — works
by specifying a different result type:

```nix
# Verified isZero: Nat → Bool
isZeroFn = v.verify (H.forall "n" H.nat (_: H.bool))
  (v.fn "n" H.nat (n:
    v.match H.bool n {
      zero = v.true_;
      succ = _k: _ih: v.false_;
    }));

isZeroFn 0    # → true
isZeroFn 5    # → false
```

### List operations

`v.map`, `v.filter`, and `v.fold` are verified list combinators. Each
takes HOAS terms, not Nix functions — the kernel verifies the entire
pipeline:

```nix
# Composed pipeline: filter zeros, then sum
# Input: [0, 3, 0, 2, 1] → Filter: [3, 2, 1] → Sum: 6
composedResult = let
  input = H.cons H.nat (v.nat 0) (H.cons H.nat (v.nat 3)
    (H.cons H.nat (v.nat 0) (H.cons H.nat (v.nat 2)
      (H.cons H.nat (v.nat 1) (H.nil H.nat)))));
  nonZero = v.fn "n" H.nat (n:
    v.match H.bool n {
      zero = v.false_;
      succ = _k: _ih: v.true_;
    });
  addCombine = v.fn "a" H.nat (a: v.fn "acc" H.nat (acc:
    v.match H.nat a {
      zero = acc;
      succ = _k: ih: H.succ ih;
    }));
in v.verify H.nat (v.fold H.nat H.nat (v.nat 0) addCombine
                     (v.filter H.nat nonZero input));
# → 6
```

The kernel verifies the filter predicate (`Nat → Bool`), the fold
combinator (`Nat → Nat → Nat`), and their composition before extracting
the result. A type error in any component — say, returning a `Nat`
where the filter expects a `Bool` — fails at `nix eval`, not at runtime.

### Records and string operations

The kernel supports record types (elaborated as nested Sigma) and
string equality (`strEq` is a kernel primitive). Together they handle
verified functions over structured data with string fields:

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  RecTy = H.record [
    { name = "name";   type = H.string; }
    { name = "target"; type = H.string; }
  ];

  # Verified: does the name match the target?
  matchFn = v.verify (H.forall "r" RecTy (_: H.bool))
    (v.fn "r" RecTy (r:
      v.strEq (v.field RecTy "name" r) (v.field RecTy "target" r)));
in {
  yes = matchFn { name = "hello"; target = "hello"; };    # → true
  no  = matchFn { name = "hello"; target = "world"; };    # → false
}
```

`v.field` desugars to the right chain of `fst`/`snd` projections for
the field's position in the Sigma chain. `v.strEq` reduces in the
kernel via the `StrEq` primitive — it compares string literals during
normalization, producing `true` or `false` as kernel values.

## What the kernel can and cannot prove

The nix-effects kernel implements Martin-Löf type theory with
universes, dependent functions, dependent pairs, identity types,
natural numbers, lists, sums, unit, an indexed-description family
(`Desc I`, `μ`, `desc-ind`), and seven axiomatized Nix primitives
(String, Int, Float, Attrs, Path, Function, Any). Booleans and Void
are derived — `H.bool` as `μ ⊤ (plus (retI tt) (retI tt)) tt`,
`H.void` as `Fin 0` — and come with derived eliminators
(`H.boolElim` via `desc-ind`, `H.absurd` via a direct `J`-transport).
The kernel can prove any property that reduces to a comparison of
normal forms.

**It can prove:**

- Equalities between computed values: `add(3, 5) = 8`,
  `length([1,2,3]) = 3`, `append([1,2], [3]) = [1,2,3]`
- Properties of concrete data: "this config field is in the allowed
  set," "this port number is valid," "these two strings match"
- Generic combinators: `cong`, `sym`, `trans`, and `transport`
  type-check with abstract variables
- Verified function extraction: any function expressible with the
  kernel's eliminators can be verified and extracted

**It cannot prove:**

- **Symbolic induction.** `forall n, n + 0 = n` requires induction over
  an abstract variable. The `NatElim` evaluator only reduces on
  concrete values (`VZero`, `VSucc`), not symbolic ones. You can prove
  `3 + 0 = 3` and `100 + 0 = 100`, but not the universal statement.
  The evaluator would need to produce symbolic normal forms, which is a
  fundamentally different normalization strategy.

- **Properties of Nix builtins.** The kernel axiomatizes `String`,
  `Int`, `Float`, etc. as opaque types. `builtins.stringLength` is not
  a kernel function — the kernel has `strEq` for string comparison but
  no string operations beyond equality and list membership.

- **Eta-expansion.** The kernel does not identify `f` with `λx.f(x)`.
  Functions that are extensionally equal but intensionally different are
  not convertible.

- **User-defined recursive types.** The kernel has primitive
  inductives (Nat, List, Sum) with their own eliminators and an
  indexed-description family (`Desc I` / `μ` / `desc-ind`) that the
  macro layer uses to build derived inductives such as `Bool`, `Fin`,
  `Vec`, and `Eq`-as-description. Arbitrary user-defined inductive
  families (binary trees, red-black trees, etc.) require the
  description-macro layer; they are not written directly against the
  kernel.

  The macro layer exposes four user-facing entry points for defining
  inductive types. `H.datatype name cons` compiles a monomorphic,
  ⊤-indexed datatype from a list of `H.con name fields` specs
  (`H.field`, `H.fieldD`, `H.recField`, `H.piField`, `H.piFieldD` for
  the field shapes). `H.datatypeP name params mkCons` adds a
  parameter layer, threading each parameter through an outer Π
  binder. `H.datatypeI name I consList` adds an arbitrary index type
  `I : U`; constructors use `H.conI name fields targetIdx` to specify
  their target index as a function of earlier field markers, and
  recursive fields at non-default indices use `H.recFieldAt name
  idxFn` (plain `H.recField` is rejected at `I ≠ ⊤`). `H.datatypePI
  name params indexFn mkCons` combines parameters and indexing — the
  index type itself may depend on parameters, which is what
  `Eq A a : A → U` requires. Each macro returns a record exposing
  `.D : Desc I`, `.T : Π(i:I). U` (or `μ ⊤ D tt` at the ⊤-sugar
  path), per-constructor fields, and `.elim` built on `desc-ind`. The
  prelude's `FinDT`, `VecDT`, and `EqDT` are the canonical indexed
  instances and drive the surface `H.fin` / `H.vec` / `H.eqDT`
  bindings as thin forwarders.

For Nix, the "concrete data" restriction is less of a limitation than
it sounds. Nix evaluates configurations completely before building —
every module option, every service config, every package attribute is a
concrete value at eval time. The kernel verifies all computable
properties of that concrete data. What it gives up is proving things
about *all possible* configurations generically. In practice, you prove
properties of the specific configuration being built, which is the one
that matters.

## Quick reference

| Pattern | Type | Proof term |
|---------|------|------------|
| Computational equality | `Eq(A, x, y)` where `x`, `y` normalize to same value | `Refl` |
| Dependent witness | `Σ(x:A). P(x)` | `(value, proof)` |
| Case analysis (bool, derived) | `H.boolElim k motive true_case false_case b` | Result of elimination |
| Structural recursion (nat) | `NatElim(motive, base, step, n)` | Result of elimination |
| List recursion | `ListElim(elem, motive, nil_case, cons_case, xs)` | Result of elimination |
| Sum dispatch | `SumElim(L, R, motive, left_case, right_case, s)` | Result of elimination |
| Congruence | `Eq(A,x,y) → Eq(B, f(x), f(y))` | `J(A, x, λy'.λ_. Eq(B,f(x),f(y')), Refl, y, p)` |
| Symmetry | `Eq(A,x,y) → Eq(A,y,x)` | `J(A, x, λy'.λ_. Eq(A,y',x), Refl, y, p)` |
| Transitivity | `Eq(A,x,y) → Eq(A,y,z) → Eq(A,x,z)` | `J(A, y, λz'.λ_. Eq(A,x,z'), p, z, q)` |
| Transport | `Eq(A,x,y) → P(x) → P(y)` | `J(A, x, λy'.λ_. P(y'), px, y, p)` |
| Ex falso (derived `H.void = Fin 0`) | `H.void → A` | `H.absurd A x` (routes through `absurdFin0`) |
| Verified function | `v.verify type impl` | Extracted Nix function |

## References

1. Martin-Löf, P. (1984). *Intuitionistic Type Theory*. Bibliopolis.

2. The Univalent Foundations Program (2013). *Homotopy Type Theory:
   Univalent Foundations of Mathematics*. Institute for Advanced Study.
   [[pdf](https://homotopytypetheory.org/book/)]

3. Norell, U. (2007). *Towards a practical programming language based
   on dependent type theory*. PhD thesis, Chalmers.
   [[pdf](https://www.cse.chalmers.se/~ulfn/papers/thesis.pdf)]


### Theory


Nine papers shaped nix-effects. Here's how each one maps to code.

## Algebraic effects and the freer monad

A computation is a tree of effects with continuations. A handler walks the
tree, interpreting each effect — either resuming the continuation with a
value or aborting it. That's the handler model from Plotkin & Pretnar
(2009), and nix-effects implements it directly.

A computation is either:

- `Pure value` — finished, returning a value
- `Impure effect continuation` — suspended, waiting for a handler to
  service `effect` and feed the result to `continuation`

`send` creates an `Impure` node:

```nix
send "get" null
# Impure { effect = { name = "get"; param = null; }; queue = [k]; }

```

`bind` appends to the continuation queue:

```nix
bind (send "get" null) (s: pure (s * 2))
# Impure { effect = get; queue = [k1, k2] }  — O(1) per bind

```

Handlers provide the interpretation:

```nix
handlers = {
  get = { param, state }: { resume = state; inherit state; };
  put = { param, state }: { resume = null; state = param; };
};

```

`resume` feeds a value to the continuation. `abort` discards it and halts.

## FTCQueue: O(1) bind

Naïve free monads have O(n²) bind chains. The problem is reassociation:

```
(m >>= f) >>= g  ≡  m >>= (f >=> g)

```

Each reassociation traverses the whole tree. Kiselyov & Ishii (2015)
solved this by storing continuations in a catenable queue (FTCQueue)
instead of a list. `snoc` is O(1); queue application (`qApp`) amortizes
the reassociation across traversal.

Total cost: O(n) for n bind operations, regardless of nesting depth. This
matters in practice — a `DepRecord` with 100 fields sends 100 effects, each
of which binds. Without the queue, validation time would be quadratic in
the number of fields.

The interpreter that processes these queued continuations uses
defunctionalization (Reynolds 1972): the recursive handler becomes a data
structure — effect name, parameter, handler result — and a worklist loop
(`builtins.genericClosure`) iterates over steps instead of recursing. This
is the pattern Van Horn & Might (2010) identified in *Abstracting Abstract
Machines*: store-allocated continuations plus worklist iteration give you
bounded stack depth. The [Trampoline](/nix-effects/guide/trampoline) chapter covers the
implementation — how `genericClosure` becomes a trampoline, why `deepSeq`
prevents thunk accumulation, and what the 1,000,000-operation benchmark
actually measures.

## Value-dependent types

Martin-Löf (1984) is where types that depend on values come from. In
nix-effects, all types bottom out in the MLTT kernel (`src/tc/`), which
handles type checking, universe level computation, and proof verification.
The user-facing API provides convenience constructors on top.

**Sigma (Σ)** — the dependent pair. The second component's type is a
function of the first component's value:

```nix
Σ(fipsMode : Bool). if fipsMode then ListOf FIPSCipher else ListOf String

```

In nix-effects:

```nix
Sigma { fst = Bool; snd = b: if b then ListOf FIPSCipher else ListOf String; }

```

`Sigma.validate` decomposes the check: validate `fst` first, then — only
if it passes — evaluate `snd fst-value` and validate that. The dependent
expression is never evaluated on a wrong-typed input. That ordering is
the whole point.

**Pi (Π)** — dependent function type. The return type depends on the
argument:

```nix
Pi { domain = String; codomain = _: Int; }

```

The kernel's decision procedure checks `isFunction` — closures are
opaque, so that's all it can verify at introduction. Full verification
happens at elimination via the kernel's type-checking judgment.

**Universe hierarchy.** Types themselves have types, stratified from
`Type_0` through `Type_4` to guard against Russell's paradox:

```nix
(typeAt 0).check Int  # true — Int lives at universe 0
level String           # 0
(typeAt 1).check (typeAt 0)  # true — Type_0 lives at universe 1

```

Universe levels are computed by the kernel's `checkTypeLevel`: `level(Pi(A,B))
= max(level(A), level(B))`, `level(U(i)) = i+1`. Self-containing universes
(`U(i) : U(i)`) are rejected — `level(U(i)) = i+1 > i`, so the check
fails. This prevents both accidental and adversarial paradoxes for every
kernel-backed type.

## Refinement types

Sometimes you need a type that's narrower than `Int` but wider than an
enum. Freeman & Pfenning (1991) introduced the refinement type: given a
base type T and a predicate P, the type {x:T | P(x)} admits only values
of T that satisfy P. `refined` is the direct implementation — `refined
"Port" Int (x: x >= 1 && x <= 65535)` is {x:Int | 1 ≤ x ≤ 65535} with
a name attached. Rondon et al. (2008) later scaled the idea with
SMT-based inference under the name *Liquid Types*. We skip the solver and
use runtime predicate checking:

```nix
Port     = refined "Port"     Int (x: x >= 1 && x <= 65535);
NonEmpty = refined "NonEmpty" String (s: builtins.stringLength s > 0);
Nat      = refined "Nat"      Int (x: x >= 0);

```

`Port.check` composes the kernel's decision (`Int`) with the refinement
predicate.
Combinators for building compound predicates:

```nix
allOf [ pred1 pred2 ]  # conjunction
anyOf [ pred1 pred2 ]  # disjunction
negate pred            # negation

```

## Soundness and what Nix provides

The kernel's soundness is standard MLTT metatheory. Martin-Löf (1984)
set out the rules, and the Mini-TT lineage (Coquand et al. 2009, Kovács
2022, and the elaboration-zoo and pi-forall tutorials) gives the
bidirectional elaboration and normalization-by-evaluation recipe the
kernel follows. Nothing in the kernel is novel on the metatheory side,
and none of the soundness argument routes through anything Nix-specific.

What Nix contributes is a faithful runtime for definitional equality.
NbE requires that reducing open terms is deterministic and side effect
free, and pure Nix evaluation gives you exactly that. `builtins.trace`
and `builtins.throw` are not observable through definitional equality,
so they cannot perturb conversion checking.

The effect layer sits above the kernel as meta-level freer monad data.
`Impure` and `Pure` attrsets are values walked by pure handlers. They
are not object-language effects in the kernel's grammar, and the kernel
has no constructor for them. This is how the effect layer can surface
kernel errors as `typeCheck` effects carrying context paths without
widening the trusted core.

## Graded linear types

Orchard, Liepelt & Eades (2019) introduced a type system where each
variable carries a usage grade from a resource semiring. We implement
three points on that spectrum: `Linear` (exactly one use), `Affine` (at
most one), and `Graded` (exactly n uses).

In practice, the handler maintains a resource map counting each `consume`
call against a `maxUses` bound. At handler exit, a finalizer checks that
every resource was consumed the expected number of times. The grade
discipline is enforced at runtime through the effect system, not
statically — so you get usage tracking without a custom type checker, but
violations show up at eval time rather than before it. That's a real
trade-off, and for configuration validation we're comfortable with it.

## Higher-order contracts and blame

Findler & Felleisen (2002) solved a problem that shows up immediately
when you try to check function types: you can't test a function contract
at the point of definition. A function is a closure — opaque. The only
way to check it is to wrap it and verify at application boundaries.

In nix-effects, this is exactly what happens. `decide(H.forall ..., f)`
can only confirm `builtins.isFunction f` — the kernel can't look inside
a Nix closure. For full verification, you write the implementation in
HOAS, the kernel type-checks the term, and `extract` wraps the result as
a Nix function that elaborates its arguments at every call boundary. The
contract is enforced at application, not definition. That's Findler &
Felleisen.

Their other contribution is blame tracking. When a check fails, the error
needs to say *which* contract was violated and *where*. In nix-effects,
`.validate` sends `typeCheck` effects carrying blame context — type name,
field path, rejected value — and the handler decides the error policy:
`strict` throws immediately, `collecting` accumulates all failures,
`logging` records every check. Same kernel judgment, different reporting
strategy — the handler pattern (Plotkin & Pretnar) composes with the
contract pattern (Findler & Felleisen) to separate what to check from
how to report.

## References

1. Plotkin, G., & Pretnar, M. (2009). *Handlers of Algebraic Effects*.
   ESOP 2009. [[doi](https://doi.org/10.1007/978-3-642-00590-9_7)]

2. Kiselyov, O., & Ishii, H. (2015). *Freer Monads, More Extensible Effects*.
   Haskell Symposium 2015. [[pdf](https://okmij.org/ftp/Haskell/extensible/more.pdf)]

3. Martin-Löf, P. (1984). *Intuitionistic Type Theory*. Bibliopolis.

4. Rondon, P., Kawaguchi, M., & Jhala, R. (2008). *Liquid Types*.
   PLDI 2008. [[doi](https://doi.org/10.1145/1375581.1375602)]

5. Findler, R., & Felleisen, M. (2002). *Contracts for Higher-Order Functions*.
   ICFP 2002. [[doi](https://doi.org/10.1145/581478.581484)]

6. Van Horn, D., & Might, M. (2010). *Abstracting Abstract Machines*.
   ICFP 2010. (See [Trampoline](/nix-effects/guide/trampoline))

7. Freeman, T., & Pfenning, F. (1991). *Refinement Types for ML*.
   PLDI 1991. [[doi](https://doi.org/10.1145/113445.113468)]

8. Orchard, D., Liepelt, V., & Eades, H. (2019). *Quantitative Program
   Reasoning with Graded Modal Types*. ICFP 2019.
   [[doi](https://doi.org/10.1145/3341714)]

## Prior art

- Borja, V. (2026). *nfx: Nix Algebraic Effects System with Handlers*.
  [[github](https://github.com/vic/nfx)] — Implements algebraic effects
  in pure Nix using a context-passing model with `immediate`/`pending`
  constructors. nix-effects adopted nfx's `adapt` handler combinator,
  `mk { doc, value, tests }` API pattern, and effect module vocabulary
  (`state`, `acc`, `conditions`, `choice`, streams), while building a
  new core on the freer monad encoding from Kiselyov & Ishii (2015)
  and adding value-dependent types and a type-checking kernel that nfx
  does not attempt.


### Trampoline


The trampoline is how nix-effects interprets freer monad computations
with O(1) stack depth in a language with no iteration primitives and no
tail-call optimization.

## The problem

Nix is a pure, lazy, functional language. It has no loops. Every
"iteration" is recursion. A naïve free monad interpreter using mutual
recursion would build a call stack proportional to the computation length:

```
run (bind (bind (bind ... (send "get" null) ...) ...) ...)
  → run step1
    → run step2
      → run step3
        → ...  (N frames deep)

```

For validation of a large config — say, a NixOS module with hundreds of
fields — this would blow the stack.

## The solution: `builtins.genericClosure`

Nix's `builtins.genericClosure` is the only built-in iterative primitive.
It implements a worklist algorithm:

```
genericClosure {
  startSet = [ initialNode ];
  operator = node -> [ ...nextNodes ];
}

```

`operator` is called on each node. New nodes returned by `operator` are
added to the worklist if their `key` hasn't been seen before. The result
is the set of all reachable nodes.

nix-effects repurposes this as a trampoline: each step of computation is
a node. The `operator` function handles one effect and produces the next
step as a singleton list. The computation terminates when `operator`
returns `[]` (i.e., when we reach a `Pure` node).

```nix
steps = builtins.genericClosure {
  startSet = [{ key = 0; _comp = comp; _state = initialState; }];
  operator = step:
    if isPure step._comp
    then []          # halt
    else [ nextStep ]; # one more step
};

```

Stack depth: **O(1)**. `genericClosure` handles its own iteration
internally; the `operator` function is never deeply nested.

## The thunk problem and `deepSeq`

`genericClosure` only forces the `key` field of each node (for
deduplication). All other fields — including `_state` and `_comp` — are
lazy thunks.

Without intervention, after N steps the `_state` field would be:

```
f(f(f(... f(initialState) ...)))  # N thunks deep

```

Forcing the final `_state` would then rebuild the entire call stack in
thunk evaluation, defeating the purpose.

The fix: make `key` depend on `builtins.deepSeq newState`:

```nix
key = builtins.deepSeq newState (step.key + 1)

```

Since `genericClosure` forces `key`, it also forces `deepSeq newState`,
which eagerly evaluates the state at each step. No thunk chain builds up.

Test suite validates 100,000 operations; manual runs confirm
1,000,000 operations in ~3 seconds with constant memory.

## Defunctionalization

The interpreter defunctionalizes (**Reynolds 1972**) the recursive handler:
the continuation moves from the call stack into an explicit data structure
(the FTCQueue). The worklist loop processes these continuations iteratively
rather than recursively — the same pattern identified by **Van Horn & Might
(2010)** in *Abstracting Abstract Machines*.

**Gibbons (2022)** *Continuation-Passing Style, Defunctionalization,
Accumulations, and Associativity* shows the hidden precondition: this transformation is valid
when the accumulated operation is associative. For nix-effects, the
handler state transformations compose associatively because function
composition is associative.

## References

- Reynolds, J. C. (1972). *Definitional Interpreters for Higher-Order Programming Languages*. ACM Annual Conference.
- Van Horn, D., & Might, M. (2010). *Abstracting Abstract Machines*. ICFP 2010.
- Gibbons, J. (2022). *Continuation-Passing Style, Defunctionalization, Accumulations, and Associativity*. The Art, Science, and Engineering of Programming, 6(2). [[doi](https://doi.org/10.22152/programming-journal.org/2022/6/7)]
- Kiselyov, O., & Ishii, H. (2015). *Freer Monads, More Extensible Effects*.


### Systems Architecture


nix-effects is a freer-monad effect layer with a dependent type checker
on top, all running at `nix eval` time. The effect layer is the
foundation — handlers, blame tracking, and error policy all flow through
it. The type checker is a Lean-light MLTT core that uses the effect
infrastructure for its own checking pipeline.

The type-checking kernel lives in `src/tc/`. Every `fx.types` type
carries a `_kernel` field — an HOAS tree that elaborates to a kernel
type — and `.check` is derived from `decide(_kernel, v)`. Universe
levels are computed by `checkTypeLevel`. Refinement types layer a
guard predicate on top of the kernel's structural check
(`check = kernelDecide(v) ∧ guard(v)`).

## Foundation layers

nix-effects has two foundation layers:

**The effects kernel.** Freer monad with FTCQueue for O(1) bind.
`builtins.genericClosure` trampoline for O(1) stack depth. Handler-swap
pattern for configurable interpretation. This layer is solid — tested at
1,000,000 operations, constant memory, ~3 seconds.

**The type-checking kernel.** Every type is defined by its kernel
representation — an HOAS type tree that the MLTT kernel can check.
`.check` is derived mechanically from the kernel's `decide` procedure.
`.validate` sends `typeCheck` effects through the freer monad for blame
tracking. You choose the error policy by choosing the handler.

For first-order types, the kernel's decision procedure is the full check.
But it has an inherent limitation for higher-order types: `Pi.check`
can only verify `isFunction` — it can't verify that a function maps
every A-value to a B-value. Decision procedures are decidable and total,
which makes them practical, but they can only state properties about the
values in front of them.

"For ALL services in this configuration, if they listen on a port, a
firewall rule exists" — that's a universally quantified statement. No
decision procedure can check it. You need structural verification of a
proof term, not evaluation of a predicate.

## Kernel integration

```
Type system API (src/types/)
  Record, ListOf, DepRecord, refined, Pi, Sigma, ...
       |
       | elaboration (src/tc/elaborate/, src/tc/hoas/)
       v
Type-checking kernel (MLTT, src/tc/)
       |
       | typeError sent as effect request
       v
Effects kernel (freer monad + FTCQueue, src/kernel.nix)
       |
       | handler (strict / collecting / ...) interprets effects
       v
Pure Nix

```

Types in `fx.types.*` compile to kernel HOAS trees via
`elaborate.nix`. `Record`, `ListOf`, `DepRecord`, and `refined` all
resolve to Σ/Π/μ constructions the kernel can check. `.check` runs
`decide(_kernel, v)`; `.validate v` wraps the same pipeline in a
`typeCheck` effect request so handlers in `src/effects/typecheck.nix`
can attach blame context. `.prove` runs bidirectional kernel
checking on a HOAS proof term. All three go through the same
judgment `Γ ⊢ t : T`.

## The trusted kernel

The kernel is small and auditable. It implements a core dependent type
theory — something in the neighborhood of MLTT with natural numbers,
identity types, and a cumulative universe hierarchy.

### Core judgments

The kernel checks four judgments:

```
ctx ⊢ term : type       (type checking)
ctx ⊢ term ⇒ type       (type inference)
type_a ≡ type_b         (definitional equality, via normalization)
⊢ ctx ok                (context well-formedness)

```

### The term language

Terms are Nix attrsets. Each has a `tag` field for the constructor:

```nix
# Core constructors (see kernel-spec.md §2 for the full grammar)
{ tag = "var"; idx = 0; }                     # de Bruijn index
{ tag = "pi"; name = "x"; domain = ...; codomain = ...; }  # Π type
{ tag = "lam"; name = "x"; domain = ...; body = ...; }     # λ abstraction
{ tag = "app"; fn = ...; arg = ...; }         # application
{ tag = "sigma"; name = "x"; fst = ...; snd = ...; }       # Σ type
{ tag = "pair"; fst = ...; snd = ...; }                      # pair
{ tag = "fst"; pair = ...; }                  # first projection
{ tag = "snd"; pair = ...; }                  # second projection
{ tag = "nat"; }                              # ℕ
{ tag = "zero"; }                             # 0
{ tag = "succ"; pred = ...; }                 # S(n)
{ tag = "list"; elem = ...; }                 # List A
{ tag = "nil"; elem = ...; } { tag = "cons"; elem = ...; head = ...; tail = ...; }
{ tag = "unit"; } { tag = "tt"; }             # ⊤
{ tag = "sum"; left = ...; right = ...; }     # A + B
{ tag = "inl"; left = ...; right = ...; term = ...; }
{ tag = "inr"; left = ...; right = ...; term = ...; }
{ tag = "eq"; type = ...; lhs = ...; rhs = ...; }   # Id type
{ tag = "refl"; }                                     # reflexivity
{ tag = "j"; type = ...; lhs = ...; motive = ...; base = ...; rhs = ...; eq = ...; }
{ tag = "U"; level = 0; }                    # universe
{ tag = "ann"; term = ...; type = ...; }     # annotation
{ tag = "let"; name = "x"; type = ...; val = ...; body = ...; }  # let
# Eliminators: nat-elim, list-elim, sum-elim, desc-ind
# Description universe: desc, desc-ret, desc-arg, desc-rec, desc-pi, desc-plus
# Indexed fixpoint: mu, desc-con

```

We use de Bruijn indices internally. The surface language uses names
(see "Making the syntax livable" below). A small elaborator translates
named terms to de Bruijn core terms.

### The core operations (Normalization by Evaluation)

The kernel uses **NbE** (Normalization by Evaluation) rather than
explicit substitution. Terms (Tm) are interpreted into a semantic
domain (Val) by `eval`, and read back to normal forms by `quote`.
This avoids the quadratic cost of explicit substitution.

**Evaluation** (`eval : Env × Tm → Val`). Interprets a term in an
environment of values, performing beta and iota reductions eagerly.
Closures `(env, body)` capture the environment, avoiding substitution.
Trampolined via `genericClosure` for recursive eliminators (NatElim,
ListElim) to guarantee O(1) stack depth.

**Quotation** (`quote : ℕ × Val → Tm`). Converts a value back to
a term, translating de Bruijn levels to indices. Trampolined for
deep VSucc/VCons chains.

**Conversion** (`conv : ℕ × Val × Val → Bool`). Checks definitional
equality of two values. Purely structural on normalized values — no
type information used. No eta expansion. Trampolined for deep
VSucc and VCons chains.

**Bidirectional type checking** (`check`/`infer`). Inference mode
synthesizes a type from a term; checking mode verifies a term against
an expected type. Switching between modes happens at annotations and
eliminators. The algorithm follows Dunfield & Krishnaswami (2021).
Type errors are reported via effects; the handler determines policy.

### Why the trampoline is essential

Normalization of proof terms is iterative. A proof by induction on a
natural number n unfolds n reduction steps. A naive recursive normalizer
blows the stack for large proofs.

The `builtins.genericClosure` trampoline that nix-effects already uses
for effect interpretation handles this identically:

```nix
normalize = term:
  let
    steps = builtins.genericClosure {
      startSet = [{ key = 0; _term = term; }];
      operator = step:
        let next = whnfStep step._term;
        in if next.done then []
           else [{ key = builtins.deepSeq next.term (step.key + 1);
                   _term = next.term; }];
    };
  in (lib.last steps)._term;

```

O(1) stack depth. `deepSeq` breaks thunk chains. The same technique that
lets nix-effects run 1,000,000 effect operations lets the kernel
normalize complex proof terms without hitting Nix's stack limit.

### Why the FTCQueue matters

During type checking, the checker processes a sequence of obligations:
check this argument, then check that body, then verify this equality.
These are continuations — "after you finish checking A, check B with
the result."

The FTCQueue (catenable queue) from Kiselyov & Ishii gives O(1)
continuation chaining. Without it, a deeply nested proof term with
1000 nested applications would produce O(n^2) overhead from left-nested
bind chains in the checker's own computation. With it: O(n) total.

## The checker as an effectful computation

The checker itself is a nix-effects computation. Its operations are
effects:

```nix
# Core effects of the type-checking kernel
check = ctx: term: type: send "check" { inherit ctx term type; };
infer = ctx: term: send "infer" { inherit ctx term; };
unify = a: b: send "unify" { inherit a b; };
freshLevel = send "freshLevel" null;
typeError = msg: send "typeError" msg;

```

The handler determines checking behavior:

```nix
# Strict: abort on first error
strictChecker = {
  typeError = { param, state }:
    { abort = null; state = state ++ [param]; };
  ...
};

# Collecting: gather all errors
collectingChecker = {
  typeError = { param, state }:
    { resume = null; state = state ++ [param]; };
  ...
};

# Interactive: yield on error for tactic guidance
interactiveChecker = {
  typeError = { param, state }:
    { resume = null; state = state // { paused = param; }; };
  ...
};

```

Same handler-swap pattern that the current `ServiceConfig.validate`
uses. Same trampoline running the computation. The kernel is just
another effectful program running on the effects infrastructure.

## Types grounded in the kernel

This is where the kernel-first approach differs from adding a proof
checker alongside ad hoc contracts. Every type in the public API compiles
to a kernel construction. A type IS its kernel representation, and all
operations are derived from it.

### Elaboration: Nix values to kernel terms

When you check a value against a type, elaboration translates the Nix
value into a kernel term, and the kernel checks it:

```nix
# Nat.check 42
# Elaboration: 42 → succ^42(zero)
# Kernel: ⊢ succ(succ(...(zero)...)) : Nat  ✓

# (ListOf Nat).check [1, 2, 3]
# Elaboration: [1,2,3] → cons(succ(zero), cons(succ(succ(zero)), cons(succ(succ(succ(zero))), nil)))
# Kernel: ⊢ cons(1, cons(2, cons(3, nil))) : List Nat  ✓

```

### Decidable fast paths

Elaborating `42` to `succ^42(zero)` and checking structurally is
correct but expensive. For decidable properties — which is everything
the decision procedure handles — we derive a fast path from the kernel
type definition.

The kernel defines `Nat` as an inductive type with `zero` and `succ`.
From that definition, a decision procedure is mechanically derived:

```nix
# Decision procedure derived from kernel Nat definition
Nat.check = v: builtins.isInt v && v >= 0;

```

This is the same predicate the surface API exposes. It's justified by
the kernel, not ad hoc. You prove once (by structural induction on the
derivation rules) that the decision procedure agrees with the kernel
type. Then you use the fast predicate at eval time and fall back to the
kernel for properties the predicate can't express.

This is how Lean handles `Decidable` instances. For decidable
propositions, evaluation IS proof. The decision procedure is the
computational content of the decidability proof, extracted as a
function.

### How current types map to kernel types

| Current API | Kernel construction | Fast path |
|------------|-------------------|-----------|
| `Nat` | Inductive type (zero, succ) | `isInt v && v >= 0` |
| `String` | Primitive (axiom) | `builtins.isString v` |
| `ListOf A` | Inductive type (nil, cons A) | `builtins.isList v && all A.check v` |
| `Record { a = A; b = B }` | Sigma (a : A) (b : B) | Field-wise guard |
| `DepRecord [...]` | Nested Sigma | Dependent field-wise guard |
| `Sigma { fst, snd }` | Kernel Sigma directly | `fst.check v.fst && (snd v.fst).check v.snd` |
| `Pi { domain, codomain }` | Kernel Pi directly | `isFunction` (guard only) |
| `refined "P" A pred` | Subset type `{ x : A \| P(x) }` | `A.check v && pred v` |
| `Either A B` | Sum type (inl, inr) | Tag-based dispatch |
| `typeAt n` | `Type n` (universe) | `v ? universe && v.universe <= n` |

For first-order types (Nat, String, ListOf, Record), the fast path IS
the full check — these are decidable. The kernel adds nothing at
runtime for individual values; it adds the ability to state and verify
universal properties about families of values.

For higher-order types (Pi), the fast path can only check the
introduction form (`isFunction`). The kernel adds full verification
at elimination: `⊢ f(a) : B(a)` for specific `a`, or `⊢ p : Pi A B`
for a proof term witnessing the universal property.

### Blame tracking as an effect

Elaboration-mode type checking can send blame effects just as the
current `validate` does — the kernel judgment emits `typeCheck` effects
with context paths (`List[Nat][3]`) for error reporting. The handler
determines whether to abort, collect, or log. Same pattern, now backed
by a kernel judgment rather than an ad hoc predicate.

```nix
# Effectful checking with blame: elaboration sends kernel judgments as effects
checkWithBlame = type: value: context:
  let judgment = elaborate type value;
  in bind (send "typeCheck" { inherit type value context; }) (_:
    kernelCheck judgment);

```

## Infinite universes via streams

The current hardcoded `Type_0` through `Type_4` becomes a lazy stream:

```nix
universes = stream.iterate (u: {
  level = u.level + 1;
  type = typeAt (u.level + 1);
}) { level = 0; type = typeAt 0; };

```

The stream unfolds on demand. If your types max out at level 3, level 4
is never computed. The trampoline handles the stream iteration.

### Universe level inference

The kernel computes levels by structural recursion on types:

```
level(Nat)           = 0
level(Pi A B)        = max(level(A), level(B))
level(Sigma A B)     = max(level(A), level(B))
level(Type n)        = n + 1

```

No manual annotations. The kernel infers levels and verifies
stratification. The current `universe` field — a trusted declaration
that nothing enforces — becomes a computed, verified property.

### Universe polymorphism as an effect

Level allocation is an algebraic effect:

```nix
# A universe-polymorphic definition requests a level
polyList = bind freshLevel (u:
  pure (pi (typeAt u) (A: typeAt u)));

# Different handlers instantiate differently
atLevel3 = fx.run polyList (fixedLevel 3) null;

# Or: all instantiations as a stream
allLevels = stream.map (u:
  fx.run polyList (fixedLevel u) null
) (stream.iterate (n: n + 1) 0);

```

The definition doesn't commit to a level. The handler decides.

### Constraint solving via genericClosure

Level constraints (`?u >= max(?v, ?w)`) accumulate during checking.
The solver iterates to a fixed point:

```nix
solveLevels = constraints:
  let
    steps = builtins.genericClosure {
      startSet = [{ key = 0; solved = {}; changed = true; }];
      operator = state:
        if !state.changed then []
        else
          let next = propagate state.solved constraints;
          in [{ key = builtins.deepSeq next.solved (state.key + 1);
                inherit (next) solved changed; }];
    };
  in (lib.last steps).solved;

```

Same trampoline. Same `deepSeq` trick. The universe solver reuses
the exact infrastructure that runs effect handlers.

## Making the syntax livable

Writing proof terms as raw attrsets is not viable. Four techniques,
from least to most ambitious:

### HOAS: Nix lambdas as binders

Higher-Order Abstract Syntax uses Nix's own functions for variable
binding. The combinator applies a Nix lambda to a fresh variable
attrset, getting scope and shadowing for free:

```nix
let inherit (proof) forall lam nat zero succ eq refl cong ind;
in

# Proposition: forall n : Nat, n + 0 = n
prop = forall nat (n: eq nat (plus n zero) n);

# Proof: induction on n
pf = ind nat
  (k: eq nat (plus k zero) k)   # motive
  refl                          # base: 0 + 0 = 0
  (k: ih: cong succ ih)         # step: cong S on IH
;

```

The combinator `forall nat (n: ...)` calls the Nix function with a
fresh `{ tag = "var"; ... }` and builds the `pi` AST node. Variable
names, scope, and alpha-equivalence are handled by Nix's own evaluator.

### Tagless final: construction IS checking

Combinators type-check during construction. If the expression evaluates
without error, the proof is valid:

```nix
let
  # lam checks the body type against the codomain during construction
  lam = domain: bodyFn:
    let v = mkTypedVar domain;
        body = bodyFn v;
    in { term = mkLam domain body.term; type = mkPi domain body.type; };

  # app checks function/argument type compatibility during construction
  app = fn: arg:
    let _ = assertTypeEq fn.type.domain arg.type;
    in { term = mkApp fn.term arg.term;
         type = subst fn.type.codomain arg.term; };
in ...

```

Error messages point to the combinator call that failed, not to a
position in a flat AST. Nix's eval trace tells you which `lam` or
`app` had the wrong types.

### Builder pattern: method chaining for readability

Wrap terms in attrsets with methods for infix-like notation:

```nix
let
  E = expr: type: {
    inherit expr type;
    plus = other: E (mkApp plusFn (mkPair expr other.expr)) nat;
    eq = other: E (mkEq nat expr other.expr) (typeAt 0);
    ap = arg: E (mkApp expr arg.expr) (subst type.codomain arg.expr);
  };
  n = E (var 0) nat;
  z = E zero nat;
in
  (n.plus z).eq n  # reads as: n + 0 = n

```

## Why the kernel cannot reason about its own types

Girard (1972) showed that a type theory with `Type : Type` is
inconsistent, and Hurkens (1995) gave a compact MLTT rendering of the
same paradox. The standard fix is a strict universe hierarchy, and the
kernel enforces it by computing `level(U(i)) = i + 1` from the typing
derivation rather than trusting a declared level.

Concretely, the kernel at universe level N reasons about computations
at levels 0 through N-1, and its own checker effects live at level N.
A proof term that tried to reference the kernel's own universe would
force the level solver into a cycle where `?u` depends on `?u`, which
the constraint solver detects and rejects.

Universe stratification is therefore a computed, verified property
rather than a trusted declaration. The kernel infers levels, the solver
checks them, and Girard's paradox cannot be constructed because no term
can be placed at a level that contains itself.

## The infrastructure reuse

Every piece of the kernel is built on machinery nix-effects already
provides:

| Component | Built on |
|-----------|---------|
| Normalization loop | `builtins.genericClosure` trampoline |
| Thunk prevention | `builtins.deepSeq` in worklist key |
| Continuation chaining | FTCQueue (O(1) bind) |
| Checker effects | Freer monad (`send`, `bind`, `pure`) |
| Error policy | Handler swap (strict / collecting / interactive) |
| Universe tower | `stream.iterate` |
| Level constraint solving | `genericClosure` as fixed-point |
| Surface syntax parsing | `builtins.match` + `genericClosure` Pratt parser |
| Blame tracking | `typeCheck` effect with context paths |
| Elaboration | Nix attrset → kernel term translation |

The kernel doesn't require new infrastructure. It reuses the trampoline,
the queue, the monad, the handlers, and the streams. nix-effects was
built to do effectful computation in a language that has no effects. A
type checker is effectful computation.

## Current architecture

### Effects substrate

The effects kernel — `pure`, `impure`, `send`, `bind`, `run`, `handle`,
`adapt`, FTCQueue, trampoline, all effect modules (state, error, reader,
writer, acc, choice, conditions, linear), streams — is the substrate
the type-checking kernel runs on.

### Type system grounding

The type system layer in `src/types/` is grounded in the kernel:

- `foundation.nix` — `mkType` with `kernelType` + optional refinement `guard`
- `primitives.nix` — `String`, `Int`, `Bool`, etc. wrapping `builtins.is*`
- `constructors.nix` — `Record`, `ListOf`, `Maybe`, `Either`, `Variant`
- `dependent.nix` — `Pi`, `Sigma`, `Certified`, `Vector`, `DepRecord`
- `refinement.nix` — `refined`, predicate combinators
- `universe.nix` — `typeAt`, `Type_0` through `Type_4`, `level`

All types have kernel backing via `kernelType`. The architecture is:

1. **Kernel module** (`src/tc/`, ~2200 lines) — term/value
   representations, NbE evaluator, quotation, conversion, bidirectional
   checking. Uses environments and closures, not explicit substitution.

2. **Elaboration module** (`src/tc/elaborate.nix`, `hoas.nix`) —
   translates the surface API into kernel types and translates Nix
   values into kernel terms. HOAS combinators for readable proof terms.

3. **Extraction layer** (`extract` in `elaborate.nix`) — reverses
   elaboration: kernel values back to Nix values. Enables verified
   functions: write in HOAS, kernel-check, extract usable Nix code.

4. **Convenience combinators** (`src/tc/verified.nix`) — higher-level
   interface for writing verified implementations with automatic motive
   construction and annotation insertion.

5. **Decision procedures** — `.check` on every type is the kernel's
   `decide` procedure: elaborate the value to HOAS, run bidirectional
   kernel checking, return a boolean. Refinement types extend this
   with a guard predicate conjoined at the leaf.

6. **Surface API** — the public-facing `fx.types.*` attrset. Same
   names, same usage patterns. `Record`, `ListOf`,
   `DepRecord`, `refined`, `Pi`, `Sigma` all work. `T.check v` runs
   the kernel's decide procedure. `T.prove prop` checks a proof term.

### What the API looks like

```nix
# Checking a value (decide via kernel)
fx.types.Nat.check 42                    # true
fx.types.(ListOf Nat).check [1, 2, 3]   # true

# Effectful validation with blame
fx.run (fx.types.Nat.validate 42) handlers []

# Proving a universal property
fx.types.prove (
  forall nat (n: eq nat (plus n zero) n)
) proofTerm                              # { ok = true; } or type error

# Verified function extraction
v.verify (H.forall "x" H.nat (_: H.nat)) (v.fn "x" H.nat (x: H.succ x))
# → Nix function, correct by construction

```

## What exists

1. **Type-checking kernel** (`src/tc/eval.nix`, `check.nix`,
   `quote.nix`, `conv.nix`, `term.nix`, `value.nix`). Pi, Sigma, Nat,
   List, Sum, Unit, identity types, indexed descriptions (`Desc I`,
   `μ`, `desc-ind`) with a first-class plus coproduct, cumulative
   universes, and 7 axiomatized primitive types (String, Int, Float,
   Attrs, Path, Function, Any). `Bool` and `Void` are derived:
   `H.bool = μ ⊤ (plus (retI tt) (retI tt)) tt`, `H.void = Fin 0`.
   Bidirectional checking with NbE. Fuel-bounded evaluation with
   `genericClosure` trampolining for stack safety.

2. **HOAS surface combinators** (`src/tc/hoas.nix`). `forall`, `lam`,
   `sigma`, `pair`, `natLit`, `natElim`, `boolElim`, `listElim`,
   `sumElim`, `j`, `refl`, `eq`, and more. Variable binding via Nix
   lambdas. Automatic elaboration from HOAS to de Bruijn core terms.

3. **Elaboration and extraction** (`src/tc/elaborate.nix`). Maps all
   type values to kernel terms via `elaborateValue`. `decide` function
   provides `.check` for all types. `extract` reverses elaboration,
   converting kernel values back to Nix values. Sentinel-based
   constant family detection for non-dependent Sigma/Pi.

4. **Kernel-grounded foundation** (`src/types/foundation.nix`). `mkType`
   requires `kernelType` and derives `.check` from `decide`. No
   hand-written predicates. All types — primitives, constructors,
   dependent types — have kernel backing.

5. **Convenience combinators** (`src/tc/verified.nix`). Higher-level
   interface: `v.verify type impl` writes in HOAS, kernel-checks,
   and extracts a usable Nix function. Includes `if_`, `match`,
   `matchList`, `matchSum`, `map`, `fold`, `filter`.

## Future work

The following features remain unimplemented:

- **Universe polymorphism as an effect.** Level allocation via
  `freshLevel` effect with handler-determined instantiation.
- **Constraint solving via genericClosure.** Level constraint
  propagation to a fixed point for universe inference.
- **Tagless final construction.** Type-checking during combinator
  construction.
- **Builder pattern notation.** Method chaining for readable proof terms.

## References

- Dunfield, J., & Krishnaswami, N. (2021). *Bidirectional Typing*.
  ACM Computing Surveys.
- Girard, J.-Y. (1972). *Interprétation fonctionnelle et élimination
  des coupures de l'arithmétique d'ordre supérieur*. Thèse d'État,
  Université Paris 7.
- Hurkens, A. J. C. (1995). *A Simplification of Girard's Paradox*.
  TLCA 1995.
- Kiselyov, O., & Ishii, H. (2015). *Freer Monads, More Extensible
  Effects*. Haskell Symposium 2015.
- Plotkin, G., & Pretnar, M. (2009). *Handlers of Algebraic Effects*.
  ESOP 2009.
- de Bruijn, N. (1972). *Lambda Calculus Notation with Nameless Dummies*.
  Indagationes Mathematicae.
- Martin-Löf, P. (1984). *Intuitionistic Type Theory*. Bibliopolis.
- Findler, R., & Felleisen, M. (2002). *Contracts for Higher-Order
  Functions*. ICFP 2002.


### Kernel Architecture


This chapter describes the type-checking kernel: its pipeline, its
primitives, and how to write verified implementations that the kernel
checks and extracts back to usable Nix functions.

## Two kernels

nix-effects is a freer-monad effect layer with a dependent type
checker on top. There are two kernels:

- The **effects kernel** (`src/kernel.nix`, `src/comp.nix`,
  `src/queue.nix`) implements the freer monad with FTCQueue. It
  defines the `Computation` ADT (`Pure a | Impure (Effect x)
  (FTCQueue x a)`) and the monadic operations `pure`, `impure`,
  `send`, `bind`, `map`, `seq`, `pipe`, `kleisli`.
- The **type-checking kernel** (`src/tc/`) implements Martin-Löf
  type theory with normalization by evaluation and bidirectional
  checking. Six modules: `term`, `eval`, `value`, `quote`, `conv`,
  `check` (with `check/` split into `check`, `infer`, `type`).

The type-checking kernel's higher layers use the effects kernel for
error reporting. When `check` or `infer` rejects a term, it does not
throw — it calls `send "typeError" { msg; expected; got; term; }`,
producing an `Impure` computation. Handlers in
`src/effects/typecheck.nix` (`strict`, `collecting`, and others)
interpret that request with different strategies: `strict` throws on
the first error, `collecting` accumulates errors into handler state.
The TCB (`eval`, `quote`, `conv`) never sends effects — it only
throws on kernel-invariant violations.

```
Type system API (src/types/)
  Record, ListOf, DepRecord, refined, Pi, Sigma, ...
       |
       | elaboration (src/tc/elaborate/, src/tc/hoas/)
       v
Type-checking kernel (MLTT, src/tc/)
       |
       | typeError sent as effect request
       v
Effects kernel (freer monad + FTCQueue, src/kernel.nix)
       |
       | handler (strict / collecting / ...) interprets effects
       v
Pure Nix
```

Every `fx.types` type carries a `_kernel` field — a HOAS tree that
elaborates to a kernel type. `.check` is derived from
`decide(_kernel, v)`; `.validate` wraps `decide` in a `typeCheck`
effect so handlers can do blame-annotated reporting; `.prove`
type-checks HOAS proof terms; `verifyAndExtract` runs the full
pipeline (check → eval → extract) to produce a Nix value from a HOAS
implementation. Refinement types add a `guard` predicate that runs
alongside the kernel check (`check = kernelDecide(v) ∧ guard(v)`),
handling constraints the kernel cannot express.

## The kernel pipeline

The kernel implements normalization by evaluation (NbE) with
bidirectional type checking. Six modules, each with a single
responsibility:

```
term.nix --> eval.nix --> value.nix
                              |
                          quote.nix --> term.nix
                              |
                          conv.nix
                              |
                          check.nix
```

| Module | Function | Signature |
|--------|----------|-----------|
| `term.nix` | Term constructors | `mkVar`, `mkPi`, `mkLam`, `mkApp`, ... |
| `eval.nix` | Evaluation | `Env × Tm -> Val` |
| `value.nix` | Value constructors | `VLam`, `VPi`, `VPair`, `VZero`, ... |
| `quote.nix` | Quotation | `ℕ × Val -> Tm` |
| `conv.nix` | Conversion checking | `ℕ × Val × Val -> Bool` |
| `check.nix` | Type checking | `Ctx × Tm × Val -> Tm` / `Ctx × Tm -> Tm × Val` |

**Terms** (`Tm`) are the syntax — de Bruijn indexed expressions with
explicit binding structure. **Values** (`Val`) are the semantics —
fully normalized forms where lambdas are Nix closures (this is the NbE
trick). **Evaluation** converts terms to values. **Quotation** reads
values back to terms. **Conversion** checks whether two values are
definitionally equal by comparing their quoted forms.

The type checker is bidirectional:

- `check(Γ, t, T)` — check that term `t` has type `T` (type-directed)
- `infer(Γ, t)` — infer the type of term `t` (term-directed)
- `checkTypeLevel(Γ, T)` — compute the universe level of a type

### Trust model

The kernel has three layers with decreasing trust requirements:

**Layer 0 — Trusted Computing Base.** `eval`, `quote`, `conv`. Pure
functions. No side effects. No imports from the effect system. Bugs
here compromise soundness.

**Layer 1 — Semi-trusted.** `check`, `infer`, `checkTypeLevel`. Uses
the TCB and sends effects for error reporting. Bugs may produce wrong
error messages or reject valid terms, but cannot cause unsoundness.

**Layer 2 — Untrusted.** The elaborator (`hoas.nix`, `elaborate.nix`).
Translates surface syntax to core terms. Can have arbitrary bugs
without compromising safety — the kernel verifies the output.

## Axiomatized primitives

The kernel understands seven Nix primitive types as axioms. Each has a
type former, a literal constructor, and a typing rule. None have
eliminators — the kernel says "String is a type at level 0" and "a
string literal inhabits String" but cannot structurally decompose
these values.

| Nix type | Kernel type | Literal | Level |
|----------|-------------|---------|-------|
| `string` | `String` | `StringLit(s)` | 0 |
| `int` | `Int` | `IntLit(n)` | 0 |
| `float` | `Float` | `FloatLit(f)` | 0 |
| `set` | `Attrs` | `AttrsLit` | 0 |
| `path` | `Path` | `PathLit` | 0 |
| `lambda` | `Function` | `FnLit` | 0 |
| (any) | `Any` | `AnyLit` | 0 |

The structural types with kernel introduction and elimination rules
are `Nat`, `Unit`, `List`, `Sum`, `Sigma`, `Pi`, and `Eq`, together
with the indexed-description family (`Desc I`, `μ`, `desc-ind`).
`Bool` and `Void` are derived, not primitive: `H.bool` elaborates to
`μ ⊤ (plus (retI tt) (retI tt)) tt` (a plus-coproduct of two unit
points), and `H.void` elaborates to `Fin 0`. Their eliminators —
`H.boolElim`, `H.absurd` — are defined in `src/tc/hoas/combinators.nix`
in terms of `desc-ind` and a direct `J`-transport respectively.

This gives the kernel enough structure to compute with natural
numbers, booleans, lists, pairs, functions, and user-defined
inductive families, but treats strings, integers, and other
Nix-native types as opaque tokens.

Axiomatized primitives are critical for real-world use. Without them,
verified modules can only work over `Nat`/`Bool`/`List`/`Sigma`/`Sum`.
With them, modules can handle ports (`Int`), service names (`String`),
configuration records (nested `Sigma`/`Attrs`), and so on.

## HOAS: the surface API

Writing de Bruijn indexed terms by hand is error-prone. The HOAS
(Higher-Order Abstract Syntax) layer lets you use Nix lambdas for
variable binding. The public API is `fx.types.hoas`.

### Type combinators

```nix
H = fx.types.hoas;

H.nat                         # Nat
H.bool                        # Bool
H.string                      # String
H.int_                        # Int
H.float_                      # Float
H.unit                        # Unit (nullary product)
H.void                        # Void (empty type)
H.listOf H.nat                # List Nat
H.sum H.nat H.bool            # Nat + Bool
H.forall "x" H.nat (_: H.bool)  # Π(x : Nat). Bool
H.sigma "x" H.nat (_: H.bool)   # Σ(x : Nat). Bool
H.u 0                         # U₀ (universe of small types)
```

### Term combinators

```nix
# Lambda: λ(x : Nat). x
H.lam "x" H.nat (x: x)

# Application
H.app f arg

# Natural number literals
H.zero                        # 0
H.succ H.zero                 # 1
H.natLit 42                   # 42 (sugar for 42 succs)

# Boolean literals
H.true_
H.false_

# Pairs
H.pair fst snd

# Projections
H.fst_ p
H.snd_ p

# Sum injections
H.inl leftTy rightTy term
H.inr leftTy rightTy term

# String / Int / Float literals
H.stringLit "hello"
H.intLit 42
H.floatLit 3.14

# Type annotation
H.ann term type
```

### Eliminators

```nix
# Natural number induction
H.ind motive base step scrut

# Boolean elimination (k : Level is the motive's universe level)
H.boolElim k motive onTrue onFalse scrut

# List elimination
H.listElim elemType motive onNil onCons scrut

# Sum elimination
H.sumElim leftTy rightTy motive onLeft onRight scrut

# Equality elimination (J)
H.j type lhs motive base rhs eq
```

### How HOAS compiles

Binding combinators produce HOAS markers — lightweight attrsets that
stand for bound variables at a specific depth. The `elaborate`
function (in `hoas.nix`) converts these to de Bruijn indexed `Tm`
terms:

```
H.lam "x" H.nat (x: H.succ x)
  │
  │ HOAS: x is a marker { _hoas = true; level = 0; }
  │
  ▼ elaborate (depth=0)
  │
  │ marker at level 0 -> T.mkVar(0 - 0 - 1) = T.mkVar(0)
  │
  ▼
Lam("x", Nat, Succ(Var(0)))   ← de Bruijn term
```

The elaboration of nested binding forms is trampolined via
`builtins.genericClosure` for stack safety on deeply nested terms.

## The elaboration bridge

The elaboration bridge (`elaborate.nix`) connects the user-facing type
system to the kernel. It has six operations:

| Operation | Signature | Direction |
|-----------|-----------|-----------|
| `elaborateType` | `FxType -> HoasTree` | type system -> kernel |
| `elaborateValue` | `HoasTree × NixVal -> HoasTree` | Nix value -> kernel term |
| `extract` | `HoasTree × Val -> NixValue` | kernel value -> Nix value |
| `decide` | `HoasTree × NixVal -> Bool` | decision procedure |
| `decideType` | `FxType × NixVal -> Bool` | elaborate type, then decide |
| `verifyAndExtract` | `HoasTree × HoasTree -> NixValue` | full pipeline |

### elaborateType

Converts an `fx.types` type into a HOAS tree. Dispatches on three
things, in order:

1. The `_kernel` field (types built via `mkType` with `kernelType`)
2. Structural fields (Pi: `domain`/`codomain`, Sigma: `fstType`/`sndFamily`)
3. Name convention (`"Bool"` -> `H.bool`, `"String"` -> `H.string`, etc.)

### elaborateValue

Converts a Nix value into a HOAS term, guided by a HOAS type. For
example, given `H.nat` and the Nix integer `3`, produces
`H.succ (H.succ (H.succ H.zero))`. Given `H.string` and `"hello"`,
produces `H.stringLit "hello"`.

### extract — the reverse direction

`extract` converts kernel values back to Nix values. It is the reverse
of `elaborateValue`. This is where the verification story becomes
interesting: you write an implementation as a HOAS term, the kernel
verifies it, `eval` produces a kernel value, and `extract` converts
the result to a usable Nix value.

```nix
# extract : HoasTree -> Val -> NixValue
extract H.nat (VSucc (VSucc VZero))    # -> 2
extract H.bool (VDescCon ... (VInl ... VRefl))   # -> true
                                       #    (H.bool is derived,
                                       #     so true/false are
                                       #     plus-encoded μ values)
extract H.string (VStringLit "hi")     # -> "hi"
extract (H.listOf H.nat) (VCons ...)   # -> [1 2 3]
extract (H.forall "x" ...) (VLam ...)  # -> Nix function (!)
```

The Pi case is the most important. Extracting a verified function
produces a Nix function that:

1. Elaborates its argument into a kernel value (Nix -> kernel)
2. Applies the kernel-verified closure
3. Extracts the result back (kernel -> Nix)

Correct by construction — the kernel verified the term, `eval`
produced the closure, `extract` wraps with value conversion at the
boundaries.

### decide

The decision procedure. Returns `true` iff both elaboration and
kernel type-checking succeed:

```nix
decide = hoasTy: value:
  let
    result = builtins.tryEval (
      let
        hoasVal = elaborateValue hoasTy value;
        checked = H.checkHoas hoasTy hoasVal;
      in !(checked ? error)
    );
  in result.success && result.value;
```

This is the function that `mkType` uses to derive `.check`. Every
type's `.check` is `v: decide(kernelType, v)` — no hand-written
predicates.

### verifyAndExtract — the full pipeline

Type-check a HOAS term against a HOAS type, then extract the result:

```nix
verifyAndExtract = hoasTy: hoasImpl:
  let
    checked = H.checkHoas hoasTy hoasImpl;
  in if checked ? error
    then throw "verifyAndExtract: type check failed"
    else
      let
        tm = H.elab hoasImpl;         # HOAS -> de Bruijn
        val = E.eval [] tm;           # evaluate to Val
      in extract hoasTy val;          # Val -> Nix value
```

## Convenience combinators

Raw HOAS is verbose. The convenience combinator layer (`fx.types.verified`,
accessed as `v`) provides sugar that produces valid HOAS term trees:

```nix
v = fx.types.verified;
H = fx.types.hoas;
```

### Literals

```nix
v.nat 5          # H.natLit 5
v.str "hello"    # H.stringLit "hello"
v.int_ 42        # H.intLit 42
v.float_ 3.14    # H.floatLit 3.14
v.true_          # H.true_
v.false_         # H.false_
v.null_          # H.tt (unit)
```

### Binding forms

```nix
# Lambda: λ(x : Nat). body
v.fn "x" H.nat (x: body)

# Let: let x : Nat = 5 in body
v.let_ "x" H.nat (v.nat 5) (x: body)
```

### Eliminators with inferred motives

The convenience combinators construct the required motive automatically
from the result type. The motive is always constant (non-dependent):
`λ_. resultTy`.

```nix
# Boolean: if b then 1 else 0
v.if_ H.nat v.true_ { then_ = v.nat 1; else_ = v.nat 0; }

# Natural number: pattern match on n
v.match H.nat n {
  zero = v.nat 42;
  succ = k: ih: H.succ ih;
}

# List: fold over elements
v.matchList H.nat H.nat list {
  nil = v.nat 0;
  cons = h: t: ih: H.succ ih;   # count elements
}

# Sum: case split
v.matchSum H.nat H.bool H.nat scrut {
  left = x: H.succ x;
  right = _: v.nat 0;
}
```

### Derived combinators

```nix
# Map: apply f to each element
v.map H.nat H.nat succFn myList

# Fold: combine elements with accumulator
v.fold H.nat H.nat (v.nat 0) addFn myList

# Filter: keep elements matching predicate
v.filter H.nat isZeroFn myList
```

### Pairs and sums

```nix
v.pair fstTerm sndTerm sigmaType
v.fst p
v.snd p
v.inl leftTy rightTy term
v.inr leftTy rightTy term
v.app f arg
```

### The verify pipeline

`v.verify` wraps `verifyAndExtract` — type-check and extract in one
call:

```nix
v.verify type implementation
# = verifyAndExtract type implementation
```

## Writing verified implementations

The key insight: instead of writing Nix functions and trying to
elaborate them into kernel terms (which fails for closures), write
implementations as HOAS terms and **extract** Nix functions out. This
is the approach taken by Coq (extraction), Idris (compilation), and
F\*.

### Example: verified addition

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  addTy = H.forall "m" H.nat (_: H.forall "n" H.nat (_: H.nat));
  addImpl = v.fn "m" H.nat (m: v.fn "n" H.nat (n:
    v.match H.nat m {
      zero = n;
      succ = _k: ih: H.succ ih;
    }));

  # Kernel verifies: addImpl : Π(m:Nat). Π(n:Nat). Nat
  # Then extracts a 2-argument Nix function
  add = v.verify addTy addImpl;
in
  add 2 3    # -> 5
```

What happens step by step:

1. `v.verify` calls `H.checkHoas addTy addImpl` — the kernel
   type-checks the HOAS term against the HOAS type
2. `H.elab addImpl` converts HOAS to de Bruijn indexed `Tm`
3. `E.eval [] tm` evaluates to a `VLam` value (a Nix closure via NbE)
4. `extract addTy val` wraps the `VLam` as a Nix function that
   elaborates arguments at call boundaries

The resulting `add` is an ordinary Nix function. Call it with Nix
integers. At each call, the argument is elaborated into a kernel
value, the verified closure runs, and the result is extracted back.

### Example: verified list operations

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  # Successor function: Nat -> Nat
  succFn = v.fn "x" H.nat (x: H.succ x);

  # Map successor over a list
  input = H.cons H.nat (v.nat 0)
    (H.cons H.nat (v.nat 1)
      (H.cons H.nat (v.nat 2) (H.nil H.nat)));

  result = v.verify (H.listOf H.nat) (v.map H.nat H.nat succFn input);
in
  result    # -> [1 2 3]
```

### Example: verified filter

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  # isZero : Nat -> Bool
  isZero = v.fn "n" H.nat (n:
    v.match H.bool n {
      zero = v.true_;
      succ = _k: _ih: v.false_;
    });

  input = H.cons H.nat (v.nat 0)
    (H.cons H.nat (v.nat 1)
      (H.cons H.nat (v.nat 0)
        (H.cons H.nat (v.nat 2) (H.nil H.nat))));

  result = v.verify (H.listOf H.nat) (v.filter H.nat isZero input);
in
  result    # -> [0 0]
```

## The verification spectrum

The architecture supports a spectrum of assurance levels. Each level
offers a different trade-off between verification strength and
implementation cost.

### Level 1: Contract

The baseline. Types check values at introduction via `.check` (which
calls `decide` under the hood). No HOAS, no proof terms — just write
normal Nix code and let the type system validate data at boundaries.

```nix
let
  inherit (fx.types) Int String refined;

  Port = refined "Port" Int (x: x >= 1 && x <= 65535);

  # Refinement with string guard — kernel checks String, guard checks membership
  LogLevel = refined "LogLevel" String
    (x: builtins.elem x [ "debug" "info" "warn" "error" ]);
in {
  ok    = Port.check 8080;       # true — decide says Int, guard says in range
  bad   = Port.check 99999;      # false — guard rejects (> 65535)
  wrong = Port.check "http";     # false — decide rejects (not Int)

  info  = LogLevel.check "info";    # true
  trace = LogLevel.check "trace";   # false — not in the allowed set

  # Effectful validation with blame context
  result = fx.run (Port.validate 99999)
    fx.effects.typecheck.collecting [];
  # result.state = [ { context = "Port"; message = "Expected Port, got int"; ... } ]
}
```

**Cost:** Zero — write normal Nix. The kernel runs behind the scenes.
Refinement types compose: `ListOf Port` checks that every element is
an integer in range. The kernel elaborates the list, the guard runs
per element.

### Level 2: Boundary

Data values are checked by the kernel at module interfaces. Types
carry `kernelType` and `.check` is derived from the kernel's `decide`
procedure. This is what every type does by default.

```nix
let
  inherit (fx.types) Bool Int String ListOf DepRecord refined;

  FIPSCipher = refined "FIPSCipher" String
    (x: builtins.elem x [ "AES-256-GCM" "AES-128-GCM" "AES-256-CBC" ]);

  # Dependent record: when fipsMode is true, cipherSuites must be FIPS-approved.
  # The kernel elaborates the record to a Sigma chain, checks each field's type
  # against its kernelType, and the guard on FIPSCipher validates membership.
  ServiceConfig = DepRecord [
    { name = "fipsMode"; type = Bool; }
    { name = "cipherSuites"; type = self:
        if self.fipsMode then ListOf FIPSCipher else ListOf String; }
  ];
in {
  ok  = ServiceConfig.checkFlat {
    fipsMode = true;
    cipherSuites = [ "AES-256-GCM" ];
  };   # true — kernel verifies each cipher is a valid FIPSCipher

  bad = ServiceConfig.checkFlat {
    fipsMode = true;
    cipherSuites = [ "3DES" ];
  };   # false — "3DES" fails the FIPSCipher guard

  lax = ServiceConfig.checkFlat {
    fipsMode = false;
    cipherSuites = [ "ChaCha20" "RC4" ];
  };   # true — non-FIPS mode accepts any string
}
```

**Cost:** Low — add `kernelType` to custom types (built-in types
already have it). The dependent record pattern shows how boundary
checking scales: the kernel handles structural verification, guards
handle domain predicates, and the dependency between fields is
resolved at check time.

### Level 3: Property

Universal properties verified via proof terms. Write proofs in HOAS
that the kernel checks. The proof term is separate from the
implementation — you write separate Nix code alongside, and the
kernel verifies that the stated property holds.

```nix
let
  H = fx.types.hoas;
  inherit (H) nat bool eq forall refl checkHoas;

  # Define addition by structural recursion on the first argument
  add = m: n:
    H.ind (H.lam "_" nat (_: nat)) n
      (H.lam "k" nat (_: H.lam "ih" nat (ih: H.succ ih))) m;

  not_ = b:
    H.boolElim 0 (H.lam "_" bool (_: bool)) H.false_ H.true_ b;
in {
  # Prove: 3 + 5 = 8
  # The kernel normalizes add(3,5) via NatElim, arrives at 8,
  # and confirms Refl witnesses Eq(Nat, 8, 8).
  arithmetic = (checkHoas
    (eq nat (add (H.natLit 3) (H.natLit 5)) (H.natLit 8))
    refl).tag == "refl";

  # Prove: not(not(true)) = true
  # The kernel evaluates two BoolElim steps and confirms the result.
  doubleNeg = (checkHoas
    (eq bool (not_ (not_ H.true_)) H.true_)
    refl).tag == "refl";

  # Prove: append([1,2], [3]) = [1,2,3]
  # ListElim unfolds the first list, cons-ing each element onto [3].
  listAppend = let
    list12  = H.cons nat (H.natLit 1) (H.cons nat (H.natLit 2) (H.nil nat));
    list3   = H.cons nat (H.natLit 3) (H.nil nat);
    list123 = H.cons nat (H.natLit 1) (H.cons nat (H.natLit 2)
                (H.cons nat (H.natLit 3) (H.nil nat)));
    append = xs: ys:
      H.listElim nat (H.lam "_" (H.listOf nat) (_: H.listOf nat)) ys
        (H.lam "h" nat (h: H.lam "t" (H.listOf nat) (_:
          H.lam "ih" (H.listOf nat) (ih: H.cons nat h ih)))) xs;
  in (checkHoas (eq (H.listOf nat) (append list12 list3) list123) refl).tag == "refl";
}
```

**Cost:** Medium — write proofs in HOAS. The proofs are separate from
production code, so you can add them incrementally to an existing
codebase without rewriting anything.

### Level 4: Full

The implementation IS the proof term. Write the entire implementation
in HOAS, the kernel verifies it, and `extract` produces a Nix
function that is correct by construction. The extracted function is
plain Nix — no kernel overhead at call time.

```nix
let
  H = fx.types.hoas;
  v = fx.types.verified;

  RecTy = H.record [
    { name = "name";   type = H.string; }
    { name = "target"; type = H.string; }
  ];

  # Verified record validator: checks if two string fields match.
  # The kernel type-checks the implementation against its type
  # (Record -> Bool), verifies that field projections are well-typed,
  # and confirms strEq composes correctly. Then extracts a Nix function.
  matchFn = v.verify (H.forall "r" RecTy (_: H.bool))
    (v.fn "r" RecTy (r:
      v.strEq (v.field RecTy "name" r) (v.field RecTy "target" r)));

  # Verified addition: structural recursion extracted as Nix function
  add = v.verify (H.forall "m" H.nat (_: H.forall "n" H.nat (_: H.nat)))
    (v.fn "m" H.nat (m: v.fn "n" H.nat (n:
      v.match H.nat m {
        zero = n;
        succ = _k: ih: H.succ ih;
      })));
in {
  sum   = add 2 3;    # -> 5, correct by construction
  yes   = matchFn { name = "hello"; target = "hello"; };    # -> true
  no    = matchFn { name = "hello"; target = "world"; };    # -> false
}
```

**Cost:** High — write the implementation in HOAS. Best reserved for
code where the cost is justified by the assurance. See the
[Proof Guide](/nix-effects/guide/proof-guide) for a progressive tutorial from simple
proofs through the J eliminator to verified extraction of plain Nix
functions.

## How mkType derives .check

Every type is built by `mkType` (in `foundation.nix`). The kernel
type IS the type. `.check` is its decision procedure, derived
mechanically:

```
_kernel : HoasType        ← the type IS this
check : Value -> Bool       ← derived from decide(kernelType, value)
kernelCheck : Value -> Bool ← same as check (legacy alias)
prove : HoasTerm -> Bool    ← kernel proof checking
universe : Int             ← computed from checkTypeLevel(kernelType)
```

For refinement types, an optional `guard` adds a runtime predicate
on top of the kernel check: `check = decide(kernelType, v) && guard(v)`.
The guard handles predicates the kernel cannot express — for example,
`x >= 0` for natural numbers, or membership in a finite set for
validated strings.

## Limitations

**Nix closures are opaque.** `decide(H.forall ..., f)` can only
check `builtins.isFunction f`. For full function verification, write
the function in HOAS and use `v.verify` to extract.

**Refinement predicates are opaque.** The kernel cannot represent
`x >= 0` as a type-level assertion. Refinement types always need a
hand-written guard predicate.

**`builtins.tryEval` is limited.** It only catches `throw` and
`assert false`. Cross-type comparison errors, boolean coercion errors,
and missing attribute access crash Nix uncatchably. This affects
`decide` for types whose elaboration might trigger such errors.

**Dependent extraction is limited.** Extracting a dependent Pi or
Sigma requires a sentinel test to detect non-dependence. If the
type family is truly dependent, extraction throws and requires
explicit type annotation.

**Opaque types cannot be extracted.** `Attrs`, `Path`, `Function`,
and `Any` are axiomatized — the kernel knows they exist but discards
their payloads. Extracting a value of these types throws. They work
for type-checking (deciding membership) but not for the full
verify-and-extract pipeline.

**Extraction has boundary cost.** Extracted functions elaborate their
arguments at every call (Nix -> kernel value -> apply -> extract -> Nix).
For hot paths, the contract layer's `.check` fast path is more
efficient.


### Kernel Formal Specification


This document is the contract the implementation must satisfy. Every
typing rule, compute rule, and conversion rule is stated precisely.
Every test is derived from this spec. Every invariant the kernel must
maintain is listed.

The spec uses standard type-theoretic notation. No Nix code appears
here — this document is reviewable by anyone who reads dependent type
theory, regardless of implementation language.

---

## 1. Trust Model

The kernel has three layers with strictly decreasing trust requirements.

**Layer 0 — Trusted Computing Base (TCB).** The evaluator, quotation,
and conversion checker. Pure functions. No side effects. No imports
from the effect system. Bugs here compromise soundness. Every line
must be auditable.

- `eval : Env × Tm → Val`
- `quote : ℕ × Val → Tm`
- `conv : ℕ × Val × Val → Bool`

**Layer 1 — Semi-trusted.** The bidirectional type checker. Uses the
TCB and sends effects for error reporting. Bugs here may produce wrong
error messages or reject valid terms, but cannot cause unsoundness
(the TCB rejects ill-typed terms independently).

- `check : Ctx × Tm × Val → Tm`
- `infer : Ctx × Tm → Tm × Val`
- `checkTypeLevel : Ctx × Tm → Tm × ℕ`

**Layer 2 — Untrusted.** The elaborator. Translates surface syntax
(named variables, implicit arguments, level inference, eta-insertion)
into fully explicit core terms. Can have arbitrary bugs without
compromising safety — the kernel verifies the output.

### Failure modes

| Condition | Response | Rationale |
|-----------|----------|-----------|
| Kernel invariant violation | `throw` (crash) | TCB may be buggy; cannot trust own output |
| User type error | Effect `typeError` | Normal operation; handler decides policy |
| Normalization budget exceeded | `throw` (crash) | Layer 0 has no effect access; `tryEval` catches it |
| Unknown term tag | `throw` (crash) | Exhaustiveness violation = kernel bug |

---

## 2. Syntax

### 2.1 Terms (Tm)

The core term language. All binding uses de Bruijn indices. Name
annotations are cosmetic (for error messages only).

```
Tm ::=
  -- Variables and binding
  | Var(i : ℕ)                             -- de Bruijn index
  | Let(n : Name, A : Tm, t : Tm, u : Tm)  -- let n : A = t in u

  -- Functions
  | Pi(n : Name, A : Tm, B : Tm)       -- Π(n : A). B
  | Lam(n : Name, A : Tm, t : Tm)      -- λ(n : A). t
  | App(t : Tm, u : Tm)                -- t u

  -- Pairs
  | Sigma(n : Name, A : Tm, B : Tm)     -- Σ(n : A). B
  | Pair(a : Tm, b : Tm, T : Tm)        -- (a, b) as T
  | Fst(t : Tm)                         -- π₁ t
  | Snd(t : Tm)                         -- π₂ t

  -- Natural numbers
  | Nat                                 -- ℕ
  | Zero                                -- 0
  | Succ(t : Tm)                        -- S t
  | NatElim(P : Tm, z : Tm, s : Tm, n : Tm)
    -- elim_ℕ(P, z, s, n)

  -- Lists
  | List(A : Tm)                        -- List A
  | Nil(A : Tm)                         -- nil_A
  | Cons(A : Tm, h : Tm, t : Tm)        -- cons_A h t
  | ListElim(A : Tm, P : Tm, n : Tm, c : Tm, l : Tm)
    -- elim_List(A, P, n, c, l)

  -- Unit
  | Unit                                -- ⊤
  | Tt                                  -- tt

  -- Sum
  | Sum(A : Tm, B : Tm)                -- A + B
  | Inl(A : Tm, B : Tm, t : Tm)        -- inl t
  | Inr(A : Tm, B : Tm, t : Tm)        -- inr t
  | SumElim(A : Tm, B : Tm, P : Tm, l : Tm, r : Tm, s : Tm)
    -- elim_+(A, B, P, l, r, s)

  -- Identity
  | Eq(A : Tm, a : Tm, b : Tm)         -- Id_A(a, b)
  | Refl                                -- refl
  | J(A : Tm, a : Tm, P : Tm, pr : Tm, b : Tm, eq : Tm)
    -- J(A, a, P, pr, b, eq)

  -- Universes
  | U(i : ℕ)                           -- Type_i

  -- Annotations
  | Ann(t : Tm, A : Tm)                -- (t : A)

  -- Axiomatized primitive types
  | String                              -- string type
  | Int                                 -- integer type
  | Float                               -- float type
  | Attrs                               -- attribute set type
  | Path                                -- path type
  | Function                            -- opaque function type
  | Any                                 -- dynamic/any type

  -- String operations
  | StrEq(lhs : Tm, rhs : Tm)          -- string equality: lhs == rhs → H.bool (derived)

  -- Primitive literals
  | StringLit(s)                        -- string literal
  | IntLit(n)                           -- integer literal
  | FloatLit(f)                         -- float literal
  | AttrsLit                            -- attribute set literal
  | PathLit                             -- path literal
  | FnLit                               -- opaque function literal
  | AnyLit                              -- any literal

```

### 2.2 Binding convention

In `Pi(n, A, B)`, `Lam(n, A, t)`, `Sigma(n, A, B)`, and `Let(n, A, t, u)`:
the body (`B`, `t`, or `u`) binds one variable. Index 0 in the body
refers to the bound variable. All other indices shift by 1.

In `NatElim(P, z, s, n)`: `P` binds 0 variables (it's a function
term `ℕ → U`). `z` binds 0 variables. `s` binds 0 variables (it's
a function term). `n` binds 0 variables.

All eliminators take their arguments as closed terms (no implicit
binding). The motive is a function term, not a binder.

### 2.3 De Bruijn index conventions

Indices count inward from the use site: 0 = most recent binder.

Example: `λ(x : A). λ(y : B). x` is `Lam(x, A, Lam(y, B, Var(1)))`.

---

## 3. Values (Semantic Domain)

Values are the result of evaluation. They use de Bruijn **levels**
(counting outward from the top of the context) instead of indices.

```
Val ::=
  -- Functions
  | VPi(n : Name, A : Val, cl : Closure)   -- Π type
  | VLam(n : Name, A : Val, cl : Closure)  -- λ abstraction

  -- Pairs
  | VSigma(n : Name, A : Val, cl : Closure) -- Σ type
  | VPair(a : Val, b : Val)                  -- pair value

  -- Natural numbers
  | VNat
  | VZero
  | VSucc(v : Val)

  -- Lists
  | VList(A : Val)
  | VNil(A : Val)
  | VCons(A : Val, h : Val, t : Val)

  -- Unit
  | VUnit
  | VTt

  -- Sum
  | VSum(A : Val, B : Val)
  | VInl(A : Val, B : Val, v : Val)
  | VInr(A : Val, B : Val, v : Val)

  -- Identity
  | VEq(A : Val, a : Val, b : Val)
  | VRefl

  -- Universes
  | VU(i : ℕ)

  -- Axiomatized primitive types
  | VString | VInt | VFloat | VAttrs | VPath | VFunction | VAny

  -- Primitive literal values
  | VStringLit(s) | VIntLit(n) | VFloatLit(f)
  | VAttrsLit | VPathLit | VFnLit | VAnyLit

  -- Neutrals (stuck computations)
  | VNe(level : ℕ, spine : [Elim])

Elim ::=
  | EApp(v : Val)
  | EFst
  | ESnd
  | ENatElim(P : Val, z : Val, s : Val)
  | EListElim(A : Val, P : Val, n : Val, c : Val)
  | ESumElim(A : Val, B : Val, P : Val, l : Val, r : Val)
  | EJ(A : Val, a : Val, P : Val, pr : Val, b : Val)
  | EStrEq(arg : Val)

Closure ::= (env : Env, body : Tm)
Env     ::= [Val]          -- list indexed by de Bruijn index

```

### 3.1 Level/index relationship

De Bruijn levels count from the outermost binder: 0 = first-ever
bound variable. Levels are stable under context extension.

Conversion between index and level:

```
index = depth - level - 1
level = depth - index - 1

```

where `depth` is the current binding depth (length of the context).

### 3.2 Fresh variables

A fresh variable at depth `d` is `VNe(d, [])` — a neutral with
level `d` and empty spine. Used in conversion checking to compare
under binders.

### 3.3 Closure instantiation

```
instantiate((env, body), v) = eval([v] ++ env, body)

```

---

## 4. Evaluation Rules

`eval(ρ, t)` interprets term `t` in environment `ρ`, producing
a value. All rules are deterministic.

### 4.1 Variables and let

```
eval(ρ, Var(i))           = ρ[i]
eval(ρ, Let(n, A, t, u))  = eval([eval(ρ, t)] ++ ρ, u)
eval(ρ, Ann(t, A))        = eval(ρ, t)

```

### 4.2 Functions

```
eval(ρ, Pi(n, A, B))   = VPi(n, eval(ρ, A), (ρ, B))
eval(ρ, Lam(n, A, t))  = VLam(n, eval(ρ, A), (ρ, t))
eval(ρ, App(t, u))     = vApp(eval(ρ, t), eval(ρ, u))

```

where `vApp` performs beta reduction or accumulates:

```
vApp(VLam(n, A, cl), v)  = instantiate(cl, v)
vApp(VNe(l, sp), v)      = VNe(l, sp ++ [EApp(v)])
vApp(_, _)               = THROW "kernel bug: vApp on non-function"

```

### 4.3 Pairs

```
eval(ρ, Sigma(n, A, B))  = VSigma(n, eval(ρ, A), (ρ, B))
eval(ρ, Pair(a, b, T))   = VPair(eval(ρ, a), eval(ρ, b))
eval(ρ, Fst(t))          = vFst(eval(ρ, t))
eval(ρ, Snd(t))          = vSnd(eval(ρ, t))

```

where:

```
vFst(VPair(a, b))   = a
vFst(VNe(l, sp))    = VNe(l, sp ++ [EFst])
vFst(_)             = THROW "kernel bug: vFst on non-pair"

vSnd(VPair(a, b))   = b
vSnd(VNe(l, sp))    = VNe(l, sp ++ [ESnd])
vSnd(_)             = THROW "kernel bug: vSnd on non-pair"

```

### 4.4 Natural numbers

```
eval(ρ, Nat)             = VNat
eval(ρ, Zero)            = VZero
eval(ρ, Succ(t))         = VSucc(eval(ρ, t))   -- MUST trampoline for deep naturals
eval(ρ, NatElim(P,z,s,n)) = vNatElim(eval(ρ,P), eval(ρ,z), eval(ρ,s), eval(ρ,n))

```

where:

```
vNatElim(P, z, s, VZero)     = z
vNatElim(P, z, s, VSucc(n))  = vApp(vApp(s, n), vNatElim(P, z, s, n))
vNatElim(P, z, s, VNe(l,sp)) = VNe(l, sp ++ [ENatElim(P, z, s)])
vNatElim(_, _, _, _)         = THROW "kernel bug: vNatElim on non-nat"

```

**Note**: `vNatElim` on `VSucc` recurses. The implementation MUST
trampoline this to guarantee O(1) stack depth.

### 4.5 Lists

```
eval(ρ, List(A))            = VList(eval(ρ, A))
eval(ρ, Nil(A))             = VNil(eval(ρ, A))
eval(ρ, Cons(A, h, t))      = VCons(eval(ρ, A), eval(ρ, h), eval(ρ, t))  -- MUST trampoline for deep lists
eval(ρ, ListElim(A,P,n,c,l)) =
  vListElim(eval(ρ,A), eval(ρ,P), eval(ρ,n), eval(ρ,c), eval(ρ,l))

```

where:

```
vListElim(A, P, n, c, VNil(_))         = n
vListElim(A, P, n, c, VCons(_, h, t))  =
  vApp(vApp(vApp(c, h), t), vListElim(A, P, n, c, t))
vListElim(A, P, n, c, VNe(l, sp))      =
  VNe(l, sp ++ [EListElim(A, P, n, c)])
vListElim(_, _, _, _, _)               =
  THROW "kernel bug: vListElim on non-list"

```

**Note**: `vListElim` on `VCons` recurses. Must be trampolined.

### 4.6 Unit

```
eval(ρ, Unit)  = VUnit
eval(ρ, Tt)    = VTt

```

Unit has no eliminator in the core. The kernel does NOT implement
eta for Unit — `conv` does not equate arbitrary Unit-typed neutrals
with `VTt`. Two distinct neutrals of type Unit are not definitionally
equal even though they would be in an extensional theory. If eta for
Unit is needed, the elaborator must reduce to `Tt` before submitting
to the kernel.

### 4.7 Sum

```
eval(ρ, Sum(A, B))        = VSum(eval(ρ, A), eval(ρ, B))
eval(ρ, Inl(A, B, t))     = VInl(eval(ρ, A), eval(ρ, B), eval(ρ, t))
eval(ρ, Inr(A, B, t))     = VInr(eval(ρ, A), eval(ρ, B), eval(ρ, t))
eval(ρ, SumElim(A,B,P,l,r,s)) =
  vSumElim(eval(ρ,A), eval(ρ,B), eval(ρ,P), eval(ρ,l), eval(ρ,r), eval(ρ,s))

```

where:

```
vSumElim(A, B, P, l, r, VInl(_, _, v))  = vApp(l, v)
vSumElim(A, B, P, l, r, VInr(_, _, v))  = vApp(r, v)
vSumElim(A, B, P, l, r, VNe(k, sp))     =
  VNe(k, sp ++ [ESumElim(A, B, P, l, r)])
vSumElim(_, _, _, _, _, _)              =
  THROW "kernel bug: vSumElim on non-sum"

```

### 4.8 Identity

```
eval(ρ, Eq(A, a, b))        = VEq(eval(ρ, A), eval(ρ, a), eval(ρ, b))
eval(ρ, Refl)                = VRefl
eval(ρ, J(A, a, P, pr, b, eq)) =
  vJ(eval(ρ,A), eval(ρ,a), eval(ρ,P), eval(ρ,pr), eval(ρ,b), eval(ρ,eq))

```

where:

```
vJ(A, a, P, pr, b, VRefl)    = pr
vJ(A, a, P, pr, b, VNe(l,sp)) =
  VNe(l, sp ++ [EJ(A, a, P, pr, b)])
vJ(_, _, _, _, _, _)          = THROW "kernel bug: vJ on non-refl"

```

### 4.9 Universes

```
eval(ρ, U(i)) = VU(i)

```

### 4.10 Axiomatized primitives

Type formers evaluate to their corresponding values. Literals
carry their payload through. No computation, no recursion — these
are axiomatized constants.

```
eval(ρ, String)       = VString
eval(ρ, Int)          = VInt
eval(ρ, Float)        = VFloat
eval(ρ, Attrs)        = VAttrs
eval(ρ, Path)         = VPath
eval(ρ, Function)     = VFunction
eval(ρ, Any)          = VAny

eval(ρ, StringLit(s)) = VStringLit(s)
eval(ρ, IntLit(n))    = VIntLit(n)
eval(ρ, FloatLit(f))  = VFloatLit(f)
eval(ρ, AttrsLit)     = VAttrsLit
eval(ρ, PathLit)      = VPathLit
eval(ρ, FnLit)        = VFnLit
eval(ρ, AnyLit)       = VAnyLit

```

Most primitives have no eliminators. They exist to integrate
Nix's native types into the kernel's type system as opaque,
axiomatized constants. The exception is String, which has
`StrEq` (§4.11).

### 4.11 String equality (StrEq)

```
eval(ρ, StrEq(lhs, rhs)) = vStrEq(eval(ρ, lhs), eval(ρ, rhs))

```

where:

```
-- trueV / falseV are the plus-encoded derived booleans:
--   trueV  = VDescCon boolDescV VTt (VInl eqTtV eqTtV VRefl)
--   falseV = VDescCon boolDescV VTt (VInr eqTtV eqTtV VRefl)
-- where boolDescV = plus-desc of two VDescRet VTt summands.

vStrEq(VStringLit(s₁), VStringLit(s₂)) = if s₁ == s₂ then trueV else falseV
vStrEq(VNe(l, sp),     rhs)            = VNe(l, sp ++ [EStrEq(rhs)])
vStrEq(lhs,            VNe(l, sp))     = VNe(l, sp ++ [EStrEq(lhs)])
vStrEq(_, _)                           = THROW "kernel bug: vStrEq on non-string"

```

`StrEq` is a binary predicate on strings. Both arguments must be of
type `String`. The result type is the derived `H.bool` —
`μ ⊤ (plus (retI tt) (retI tt)) tt` — which is the kernel
representation of booleans after their retirement as primitives.
Unlike other eliminators, StrEq has no motive: it always returns
`H.bool`, not a dependent type.

When both arguments are concrete string literals, `vStrEq` reduces
to the plus-encoded `true_` or `false_` value by Nix-level string
comparison. When either argument is neutral, the neutral's spine is
extended with `EStrEq` carrying the other argument. This is sound
because `StrEq` is symmetric: `StrEq(a, b) ≡ StrEq(b, a)` for all
`a, b : String`.

---

## 5. Quotation Rules

`quote(d, v)` converts a value back to a term, converting levels to
indices. `d` is the current binding depth.

```
quote(d, VPi(n, A, cl))    = Pi(n, quote(d, A), quote(d+1, instantiate(cl, fresh(d))))
quote(d, VLam(n, A, cl))   = Lam(n, quote(d, A), quote(d+1, instantiate(cl, fresh(d))))
quote(d, VSigma(n, A, cl)) = Sigma(n, quote(d, A), quote(d+1, instantiate(cl, fresh(d))))
quote(d, VPair(a, b))      = Pair(quote(d, a), quote(d, b), _)
quote(d, VNat)             = Nat
quote(d, VZero)            = Zero
quote(d, VSucc(v))         = Succ(quote(d, v))   -- MUST trampoline for deep naturals
quote(d, VList(A))         = List(quote(d, A))
quote(d, VNil(A))          = Nil(quote(d, A))
quote(d, VCons(A, h, t))   = Cons(quote(d, A), quote(d, h), quote(d, t))  -- MUST trampoline for deep lists
quote(d, VUnit)            = Unit
quote(d, VTt)              = Tt
quote(d, VSum(A, B))       = Sum(quote(d, A), quote(d, B))
quote(d, VInl(A, B, v))    = Inl(quote(d, A), quote(d, B), quote(d, v))
quote(d, VInr(A, B, v))    = Inr(quote(d, A), quote(d, B), quote(d, v))
quote(d, VEq(A, a, b))     = Eq(quote(d, A), quote(d, a), quote(d, b))
quote(d, VRefl)            = Refl
quote(d, VU(i))            = U(i)
quote(d, VString)          = String
quote(d, VInt)             = Int
quote(d, VFloat)           = Float
quote(d, VAttrs)           = Attrs
quote(d, VPath)            = Path
quote(d, VFunction)        = Function
quote(d, VAny)             = Any
quote(d, VStringLit(s))    = StringLit(s)
quote(d, VIntLit(n))       = IntLit(n)
quote(d, VFloatLit(f))     = FloatLit(f)
quote(d, VAttrsLit)        = AttrsLit
quote(d, VPathLit)         = PathLit
quote(d, VFnLit)           = FnLit
quote(d, VAnyLit)          = AnyLit
quote(d, VNe(l, sp))       = quoteSp(d, Var(d - l - 1), sp)

quoteSp(d, head, [])                      = head
quoteSp(d, head, [EApp(v) | rest])        = quoteSp(d, App(head, quote(d, v)), rest)
quoteSp(d, head, [EFst | rest])           = quoteSp(d, Fst(head), rest)
quoteSp(d, head, [ESnd | rest])           = quoteSp(d, Snd(head), rest)
quoteSp(d, head, [ENatElim(P,z,s) | rest]) =
  quoteSp(d, NatElim(quote(d,P), quote(d,z), quote(d,s), head), rest)
quoteSp(d, head, [EListElim(A,P,n,c) | rest]) =
  quoteSp(d, ListElim(quote(d,A), quote(d,P), quote(d,n), quote(d,c), head), rest)
quoteSp(d, head, [ESumElim(A,B,P,l,r) | rest]) =
  quoteSp(d, SumElim(quote(d,A), quote(d,B), quote(d,P), quote(d,l), quote(d,r), head), rest)
quoteSp(d, head, [EJ(A,a,P,pr,b) | rest]) =
  quoteSp(d, J(quote(d,A), quote(d,a), quote(d,P), quote(d,pr), quote(d,b), head), rest)
quoteSp(d, head, [EStrEq(arg) | rest]) =
  quoteSp(d, StrEq(head, quote(d, arg)), rest)

fresh(d) = VNe(d, [])

```

---

## 6. Conversion Rules

`conv(d, v₁, v₂)` checks definitional equality of two values at
binding depth `d`. Returns boolean. **No type information is used** —
conversion is purely structural on normalized values.

### 6.1 Structural rules

```
conv(d, VU(i),    VU(j))    = (i == j)
conv(d, VNat,     VNat)     = true
conv(d, VUnit,    VUnit)    = true
conv(d, VZero,    VZero)    = true
conv(d, VTt,      VTt)      = true
conv(d, VRefl,    VRefl)    = true
conv(d, VSucc(a), VSucc(b)) = conv(d, a, b)
conv(d, VString,      VString)      = true
conv(d, VInt,         VInt)         = true
conv(d, VFloat,       VFloat)       = true
conv(d, VAttrs,       VAttrs)       = true
conv(d, VPath,        VPath)        = true
conv(d, VFunction,    VFunction)    = true
conv(d, VAny,         VAny)         = true
conv(d, VStringLit(s₁), VStringLit(s₂)) = (s₁ == s₂)
conv(d, VIntLit(n₁),    VIntLit(n₂))    = (n₁ == n₂)
conv(d, VFloatLit(f₁),  VFloatLit(f₂))  = (f₁ == f₂)
conv(d, VAttrsLit,    VAttrsLit)    = true
conv(d, VPathLit,     VPathLit)     = true
conv(d, VFnLit,       VFnLit)       = true
conv(d, VAnyLit,      VAnyLit)      = true

```

### 6.2 Binding forms

To compare under binders, generate a fresh variable and instantiate:

```
conv(d, VPi(_, A₁, cl₁), VPi(_, A₂, cl₂)) =
  conv(d, A₁, A₂) ∧ conv(d+1, instantiate(cl₁, fresh(d)), instantiate(cl₂, fresh(d)))

conv(d, VLam(_, _, cl₁), VLam(_, _, cl₂)) =
  conv(d+1, instantiate(cl₁, fresh(d)), instantiate(cl₂, fresh(d)))

conv(d, VSigma(_, A₁, cl₁), VSigma(_, A₂, cl₂)) =
  conv(d, A₁, A₂) ∧ conv(d+1, instantiate(cl₁, fresh(d)), instantiate(cl₂, fresh(d)))

```

### 6.3 Compound values

```
conv(d, VPair(a₁, b₁), VPair(a₂, b₂)) =
  conv(d, a₁, a₂) ∧ conv(d, b₁, b₂)

conv(d, VPair(a, b), VNe(l, sp)) =
  conv(d, a, vFst(VNe(l, sp))) ∧ conv(d, b, vSnd(VNe(l, sp)))    -- Σ-η

conv(d, VNe(l, sp), VPair(a, b)) =
  conv(d, vFst(VNe(l, sp)), a) ∧ conv(d, vSnd(VNe(l, sp)), b)    -- Σ-η

conv(d, VTt, VNe(_, _)) = true                                    -- ⊤-η
conv(d, VNe(_, _), VTt) = true                                    -- ⊤-η

conv(d, VList(A₁),        VList(A₂))        = conv(d, A₁, A₂)
conv(d, VNil(A₁),         VNil(A₂))         = conv(d, A₁, A₂)
conv(d, VCons(A₁, h₁, t₁), VCons(A₂, h₂, t₂)) =
  conv(d, A₁, A₂) ∧ conv(d, h₁, h₂) ∧ conv(d, t₁, t₂)

conv(d, VSum(A₁, B₁),           VSum(A₂, B₂))           = conv(d, A₁, A₂) ∧ conv(d, B₁, B₂)
conv(d, VInl(A₁, B₁, v₁),      VInl(A₂, B₂, v₂))      = conv(d, A₁, A₂) ∧ conv(d, B₁, B₂) ∧ conv(d, v₁, v₂)
conv(d, VInr(A₁, B₁, v₁),      VInr(A₂, B₂, v₂))      = conv(d, A₁, A₂) ∧ conv(d, B₁, B₂) ∧ conv(d, v₁, v₂)

conv(d, VEq(A₁, a₁, b₁), VEq(A₂, a₂, b₂)) =
  conv(d, A₁, A₂) ∧ conv(d, a₁, a₂) ∧ conv(d, b₁, b₂)

```

### 6.4 Neutrals

```
conv(d, VNe(l₁, sp₁), VNe(l₂, sp₂)) =
  (l₁ == l₂) ∧ convSp(d, sp₁, sp₂)

convSp(d, [], [])         = true
convSp(d, [e₁|r₁], [e₂|r₂]) = convElim(d, e₁, e₂) ∧ convSp(d, r₁, r₂)
convSp(d, _, _)           = false    -- different lengths

convElim(d, EApp(v₁),   EApp(v₂))   = conv(d, v₁, v₂)
convElim(d, EFst,        EFst)        = true
convElim(d, ESnd,        ESnd)        = true
convElim(d, ENatElim(P₁,z₁,s₁), ENatElim(P₂,z₂,s₂)) =
  conv(d, P₁, P₂) ∧ conv(d, z₁, z₂) ∧ conv(d, s₁, s₂)
convElim(d, EListElim(A₁,P₁,n₁,c₁), EListElim(A₂,P₂,n₂,c₂)) =
  conv(d, A₁, A₂) ∧ conv(d, P₁, P₂) ∧ conv(d, n₁, n₂) ∧ conv(d, c₁, c₂)
convElim(d, ESumElim(A₁,B₁,P₁,l₁,r₁), ESumElim(A₂,B₂,P₂,l₂,r₂)) =
  conv(d, A₁, A₂) ∧ conv(d, B₁, B₂) ∧ conv(d, P₁, P₂) ∧ conv(d, l₁, l₂) ∧ conv(d, r₁, r₂)
convElim(d, EJ(A₁,a₁,P₁,pr₁,b₁), EJ(A₂,a₂,P₂,pr₂,b₂)) =
  conv(d, A₁, A₂) ∧ conv(d, a₁, a₂) ∧ conv(d, P₁, P₂) ∧ conv(d, pr₁, pr₂) ∧ conv(d, b₁, b₂)
convElim(d, EStrEq(arg₁), EStrEq(arg₂)) = conv(d, arg₁, arg₂)
convElim(_, _, _) = false

```

### 6.5 Catch-all

```
conv(d, _, _) = false

```

Any pair of values not matching the above rules is not definitionally
equal. **No Pi-eta**: `f` and `λx. f x` are NOT equated; if the
elaborator needs them equal, it must eta-expand `f` before submitting
to the kernel. **Σ-eta and ⊤-eta are applied** (see §6.3): a pair
`⟨a, b⟩` converts against a neutral `x : Σ` by projecting both sides,
and any neutral of type `⊤` converts against `tt`. Σ-eta and ⊤-eta are
sound in the type-free conv because conv is always called on two values
sharing a type — the neutral's shape (Σ or ⊤) is witnessed by the
non-neutral side.

---

## 7. Typing Rules (Bidirectional)

### 7.1 Contexts

```
Ctx ::= {
  env   : Env,           -- values for evaluation
  types : [Val],         -- types of bound variables (indexed by de Bruijn)
  depth : ℕ              -- current binding depth
}

emptyCtx = { env = [], types = [], depth = 0 }

extend(Γ, n, A) = {
  env   = [fresh(Γ.depth)] ++ Γ.env,
  types = [A] ++ Γ.types,
  depth = Γ.depth + 1
}

lookupType(Γ, i) = Γ.types[i]
  -- THROW if i >= length(Γ.types)

```

### 7.2 Notation

```
Γ ⊢ t ⇐ A  ↝  t'     checking mode:  check(Γ, t, A) = t'
Γ ⊢ t ⇒ A  ↝  t'     synthesis mode: infer(Γ, t) = (t', A)
Γ ⊢ T type  ↝  T'     type formation: checkType(Γ, T) = T'
Γ ⊢ T type@i  ↝  T'   type + level:  checkTypeLevel(Γ, T) = (T', i)

```

The output `t'` is the elaborated core term (fully annotated).

### 7.3 Synthesis rules (infer)

**Var**

```
                lookupType(Γ, i) = A
                ──────────────────────
                Γ ⊢ Var(i) ⇒ A  ↝  Var(i)

```

**Ann** (annotation)

```
                Γ ⊢ A type  ↝  A'
                Â = eval(Γ.env, A')
                Γ ⊢ t ⇐ Â  ↝  t'
                ──────────────────────
                Γ ⊢ Ann(t, A) ⇒ Â  ↝  Ann(t', A')

```

**App** (application)

```
                Γ ⊢ f ⇒ fTy  ↝  f'
                whnf(fTy) = VPi(n, A, cl)
                Γ ⊢ u ⇐ A  ↝  u'
                B = instantiate(cl, eval(Γ.env, u'))
                ──────────────────────
                Γ ⊢ App(f, u) ⇒ B  ↝  App(f', u')

```

**CRITICAL**: `whnf(fTy)` must normalize `fTy` to weak head normal
form before pattern matching. If `fTy` is a let-unfolding or a
neutral that reduces further, the match will fail spuriously.

In this kernel, `eval` already produces WHNF, so `whnf(v) = v` for
all values. But this invariant must be maintained if the value
representation changes.

**Fst** (first projection)

```
                Γ ⊢ t ⇒ tTy  ↝  t'
                whnf(tTy) = VSigma(n, A, cl)
                ──────────────────────
                Γ ⊢ Fst(t) ⇒ A  ↝  Fst(t')

```

**Snd** (second projection)

```
                Γ ⊢ t ⇒ tTy  ↝  t'
                whnf(tTy) = VSigma(n, A, cl)
                B = instantiate(cl, vFst(eval(Γ.env, t')))
                ──────────────────────
                Γ ⊢ Snd(t) ⇒ B  ↝  Snd(t')

```

**Eliminator motive checking (checkMotive).**
All eliminators require a motive `P : domTy → U(k)` for some `k`.
The implementation provides a shared `checkMotive` helper that
handles two forms:

- Lambda motives (`P = λx. body`): extend the context with `x : domTy`
  and verify `body` is a type via `checkType`.
- Non-lambda motives: infer the type and verify it has shape
  `VPi(_, domTy, _ → VU(k))` for some `k`.

The `k` is not fixed — motives may target any universe level,
enabling **large elimination** (eliminators whose return type is a
type, not a value). For example, `NatElim(λn. U(0), ...)` is valid
and returns types at universe 1.

**NatElim**

```
                Γ ⊢ P ⇐ VPi(_, VNat, ([], U(k)))  ↝  P'
                P̂ = eval(Γ.env, P')
                Γ ⊢ z ⇐ vApp(P̂, VZero)  ↝  z'
                Γ ⊢ s ⇐ VPi(_, VNat, (Γ.env, Pi(_, App(P, Var(0)), ...)))  ↝  s'
                   -- s : Π(k : ℕ). P(k) → P(S(k))
                Γ ⊢ n ⇐ VNat  ↝  n'
                ──────────────────────
                Γ ⊢ NatElim(P, z, s, n) ⇒ vApp(P̂, eval(Γ.env, n'))
                   ↝  NatElim(P', z', s', n')

```

The full typing of `s` is:

```
s : Π(k : ℕ). P(k) → P(S(k))

```
This is checked by constructing the appropriate Pi type from `P̂`.

**ListElim**

```
                Γ ⊢ A type  ↝  A'
                Â = eval(Γ.env, A')
                Γ ⊢ P ⇐ VPi(_, VList(Â), ([], U(k)))  ↝  P'
                P̂ = eval(Γ.env, P')
                Γ ⊢ n ⇐ vApp(P̂, VNil(Â))  ↝  n'
                Γ ⊢ c ⇐ <Π(h:A). Π(t:List A). P(t) → P(cons h t)>  ↝  c'
                Γ ⊢ l ⇐ VList(Â)  ↝  l'
                ──────────────────────
                Γ ⊢ ListElim(A, P, n, c, l) ⇒ vApp(P̂, eval(Γ.env, l'))
                   ↝  ListElim(A', P', n', c', l')

```

**SumElim**

```
                Γ ⊢ A type  ↝  A'    Â = eval(Γ.env, A')
                Γ ⊢ B type  ↝  B'    B̂ = eval(Γ.env, B')
                Γ ⊢ P ⇐ VPi(_, VSum(Â, B̂), ([], U(k)))  ↝  P'
                P̂ = eval(Γ.env, P')
                Γ ⊢ l ⇐ VPi(_, Â, <P(inl x)>)  ↝  l'
                Γ ⊢ r ⇐ VPi(_, B̂, <P(inr y)>)  ↝  r'
                Γ ⊢ s ⇐ VSum(Â, B̂)  ↝  s'
                ──────────────────────
                Γ ⊢ SumElim(A,B,P,l,r,s) ⇒ vApp(P̂, eval(Γ.env, s'))
                   ↝  SumElim(A',B',P',l',r',s')

```

**J** (identity elimination)

```
                Γ ⊢ A type  ↝  A'    Â = eval(Γ.env, A')
                Γ ⊢ a ⇐ Â  ↝  a'    â = eval(Γ.env, a')
                Γ ⊢ P ⇐ <Π(y : A). Π(e : Id_A(a, y)). U(k)>  ↝  P'
                P̂ = eval(Γ.env, P')
                Γ ⊢ pr ⇐ vApp(vApp(P̂, â), VRefl)  ↝  pr'
                Γ ⊢ b ⇐ Â  ↝  b'    b̂ = eval(Γ.env, b')
                Γ ⊢ eq ⇐ VEq(Â, â, b̂)  ↝  eq'
                ──────────────────────
                Γ ⊢ J(A, a, P, pr, b, eq) ⇒ vApp(vApp(P̂, b̂), eval(Γ.env, eq'))
                   ↝  J(A', a', P', pr', b', eq')

```

**J motive verification.** For non-lambda motives, the
implementation structurally verifies all three components:

1. Outer Pi domain matches `A` (conversion check)
2. Inner Pi domain matches `Eq(A, a, y)` (conversion check)
3. Innermost codomain is `VU(k)` for some `k`

For lambda motives (`P = λy. body`), the body is checked via
`checkMotive` against `Eq(A, a, y)`, which performs the same
verification on the inner structure. This catches motive errors
at the motive itself rather than deferring to the base case.

**Axiomatized primitive type formers** (synthesis)

Primitive type formers are synthesized directly — they infer as
inhabitants of `U(0)`:

```
                ──────────────────────
                Γ ⊢ String ⇒ VU(0)  ↝  String

                ──────────────────────
                Γ ⊢ Int ⇒ VU(0)  ↝  Int

```

(Similarly for Float, Attrs, Path, Function, Any — all at level 0.)

**Primitive literals** (synthesis)

Literals synthesize their corresponding type:

```
                ──────────────────────
                Γ ⊢ StringLit(s) ⇒ VString  ↝  StringLit(s)

                ──────────────────────
                Γ ⊢ IntLit(n) ⇒ VInt  ↝  IntLit(n)

                ──────────────────────
                Γ ⊢ FloatLit(f) ⇒ VFloat  ↝  FloatLit(f)

```

(Similarly for AttrsLit → VAttrs, PathLit → VPath,
FnLit → VFunction, AnyLit → VAny.)

**StrEq** (string equality)

```
                boolV = VMu VUnit (VDescPlus (VDescRet VTt) (VDescRet VTt)) VTt
                Γ ⊢ lhs ⇐ VString  ↝  lhs'
                Γ ⊢ rhs ⇐ VString  ↝  rhs'
                ──────────────────────
                Γ ⊢ StrEq(lhs, rhs) ⇒ boolV  ↝  StrEq(lhs', rhs')

```

Both arguments are checked against `VString`. The result type is
the derived `H.bool` — `μ ⊤ (plus (retI tt) (retI tt)) tt` —
written `boolV` above. StrEq is not a dependent eliminator: it has
no motive parameter.

### 7.4 Checking rules (check)

**Lam** (lambda introduction)

```
                whnf(A) = VPi(n, dom, cl)
                Γ' = extend(Γ, n, dom)
                cod = instantiate(cl, fresh(Γ.depth))
                Γ' ⊢ t ⇐ cod  ↝  t'
                ──────────────────────
                Γ ⊢ Lam(n, _, t) ⇐ A  ↝  Lam(n, quote(Γ.depth, dom), t')

```

**Pair** (pair introduction)

```
                whnf(T) = VSigma(n, A, cl)
                Γ ⊢ a ⇐ A  ↝  a'
                B = instantiate(cl, eval(Γ.env, a'))
                Γ ⊢ b ⇐ B  ↝  b'
                ──────────────────────
                Γ ⊢ Pair(a, b, _) ⇐ T  ↝  Pair(a', b', quote(Γ.depth, T))

```

**Zero**

```
                whnf(A) = VNat
                ──────────────────────
                Γ ⊢ Zero ⇐ A  ↝  Zero

```

**Succ** (MUST trampoline for deep naturals)

```
                whnf(A) = VNat
                Γ ⊢ t ⇐ VNat  ↝  t'
                ──────────────────────
                Γ ⊢ Succ(t) ⇐ A  ↝  Succ(t')

```

**Nil**

```
                whnf(A) = VList(Â)
                ──────────────────────
                Γ ⊢ Nil(_) ⇐ A  ↝  Nil(quote(Γ.depth, Â))

```

**Cons** (MUST trampoline for deep lists)

```
                whnf(A) = VList(Â)
                Γ ⊢ h ⇐ Â  ↝  h'
                Γ ⊢ t ⇐ VList(Â)  ↝  t'
                ──────────────────────
                Γ ⊢ Cons(_, h, t) ⇐ A  ↝  Cons(quote(Γ.depth, Â), h', t')

```

**Tt**

```
                whnf(A) = VUnit
                ──────────────────────
                Γ ⊢ Tt ⇐ A  ↝  Tt

```

**Inl / Inr**

```
                whnf(T) = VSum(A, B)
                Γ ⊢ t ⇐ A  ↝  t'
                ──────────────────────
                Γ ⊢ Inl(_, _, t) ⇐ T  ↝  Inl(quote(Γ.depth, A), quote(Γ.depth, B), t')

                whnf(T) = VSum(A, B)
                Γ ⊢ t ⇐ B  ↝  t'
                ──────────────────────
                Γ ⊢ Inr(_, _, t) ⇐ T  ↝  Inr(quote(Γ.depth, A), quote(Γ.depth, B), t')

```

**Refl**

```
                whnf(T) = VEq(A, a, b)
                conv(Γ.depth, a, b) = true
                ──────────────────────
                Γ ⊢ Refl ⇐ T  ↝  Refl

```

If `conv(Γ.depth, a, b) = false`, this is a **type error**: the
two sides of the equation are not definitionally equal, and `refl`
cannot prove the equation. Report via effect.

**Primitive literals** (checked against their corresponding types)

```
                whnf(A) = VString
                ──────────────────────
                Γ ⊢ StringLit(s) ⇐ A  ↝  StringLit(s)

                whnf(A) = VInt
                ──────────────────────
                Γ ⊢ IntLit(n) ⇐ A  ↝  IntLit(n)

                whnf(A) = VFloat
                ──────────────────────
                Γ ⊢ FloatLit(f) ⇐ A  ↝  FloatLit(f)

                whnf(A) = VAttrs
                ──────────────────────
                Γ ⊢ AttrsLit ⇐ A  ↝  AttrsLit

                whnf(A) = VPath
                ──────────────────────
                Γ ⊢ PathLit ⇐ A  ↝  PathLit

                whnf(A) = VFunction
                ──────────────────────
                Γ ⊢ FnLit ⇐ A  ↝  FnLit

                whnf(A) = VAny
                ──────────────────────
                Γ ⊢ AnyLit ⇐ A  ↝  AnyLit

```

**Let**

```
                Γ ⊢ A type  ↝  A'
                Â = eval(Γ.env, A')
                Γ ⊢ t ⇐ Â  ↝  t'
                t̂ = eval(Γ.env, t')
                Γ' = { env = [t̂] ++ Γ.env, types = [Â] ++ Γ.types, depth = Γ.depth + 1 }
                Γ' ⊢ u ⇐ B  ↝  u'
                ──────────────────────
                Γ ⊢ Let(n, A, t, u) ⇐ B  ↝  Let(n, A', t', u')

```

Note: `Let` in checking mode — the expected type `B` is for the
body `u`, not for the definition `t`.

**Sub** (mode switch: fall through to synthesis)

```
                Γ ⊢ t ⇒ A  ↝  t'
                conv(Γ.depth, A, B) = true   -- or cumulativity check
                ──────────────────────
                Γ ⊢ t ⇐ B  ↝  t'

```

This is the catch-all. If no other checking rule applies, try
synthesis and verify the inferred type matches the expected type.

### 7.5 Type formation (checkType / checkTypeLevel)

The implementation provides two variants: `checkType(Γ, T)` returns
only the elaborated term, while `checkTypeLevel(Γ, T)` returns both
the elaborated term and the universe level. `checkType` is a thin
wrapper: `checkType(Γ, T) = checkTypeLevel(Γ, T).term`. Universe
levels are computed structurally during the type formation check
(see §8.2), not by post-hoc inspection of evaluated values.

```
                ──────────────────────
                Γ ⊢ Nat type  ↝  Nat

                ──────────────────────
                Γ ⊢ Unit type  ↝  Unit

                ──────────────────────
                Γ ⊢ String type  ↝  String

                (Similarly for Int, Float, Attrs, Path, Function, Any)

                ──────────────────────
                Γ ⊢ U(i) type  ↝  U(i)

                Γ ⊢ A type  ↝  A'
                ──────────────────────
                Γ ⊢ List(A) type  ↝  List(A')

                Γ ⊢ A type  ↝  A'       Γ ⊢ B type  ↝  B'
                ──────────────────────
                Γ ⊢ Sum(A, B) type  ↝  Sum(A', B')

                Γ ⊢ A type  ↝  A'
                Â = eval(Γ.env, A')
                Γ' = extend(Γ, n, Â)
                Γ' ⊢ B type  ↝  B'
                ──────────────────────
                Γ ⊢ Pi(n, A, B) type  ↝  Pi(n, A', B')

                Γ ⊢ A type  ↝  A'
                Â = eval(Γ.env, A')
                Γ' = extend(Γ, n, Â)
                Γ' ⊢ B type  ↝  B'
                ──────────────────────
                Γ ⊢ Sigma(n, A, B) type  ↝  Sigma(n, A', B')

                Γ ⊢ A type  ↝  A'     Â = eval(Γ.env, A')
                Γ ⊢ a ⇐ Â  ↝  a'
                Γ ⊢ b ⇐ Â  ↝  b'
                ──────────────────────
                Γ ⊢ Eq(A, a, b) type  ↝  Eq(A', a', b')

                -- Fallback: infer and check it's a universe
                Γ ⊢ T ⇒ A  ↝  T'
                whnf(A) = VU(i)
                ──────────────────────
                Γ ⊢ T type  ↝  T'

```

---

## 8. Universe Rules

### 8.1 Universe formation

```
U(i) : U(i + 1)                for all i ≥ 0

```

### 8.2 Type former levels

Universe levels are computed by `checkTypeLevel`, which returns
`{ term; level; }` from the **typing derivation**, not from
post-hoc value inspection. This avoids the problem of unknown
levels for neutral type variables. We write `level(A)` as shorthand
for `checkTypeLevel(Γ, A).level`.

```
checkTypeLevel(Γ, Nat)         = { Nat,     0 }
checkTypeLevel(Γ, Unit)        = { Unit,    0 }
checkTypeLevel(Γ, String)      = { String,  0 }
checkTypeLevel(Γ, Int)         = { Int,     0 }
checkTypeLevel(Γ, Float)       = { Float,   0 }
checkTypeLevel(Γ, Attrs)       = { Attrs,   0 }
checkTypeLevel(Γ, Path)        = { Path,    0 }
checkTypeLevel(Γ, Function)    = { Function, 0 }
checkTypeLevel(Γ, Any)         = { Any,     0 }
checkTypeLevel(Γ, List(A))     = { List(A'), level(A) }
checkTypeLevel(Γ, Sum(A, B))   = { Sum(A',B'), max(level(A), level(B)) }
checkTypeLevel(Γ, Pi(n, A, B)) = { Pi(n,A',B'), max(level(A), level(B)) }
checkTypeLevel(Γ, Sigma(n,A,B))= { Sigma(n,A',B'), max(level(A), level(B)) }
checkTypeLevel(Γ, Eq(A, a, b)) = { Eq(A',a',b'), level(A) }
checkTypeLevel(Γ, U(i))        = { U(i),   i + 1 }
-- Fallback: infer type, require VU(i), extract i
checkTypeLevel(Γ, T)           = { T', i }  where Γ ⊢ T ⇒ VU(i)

```

The fallback handles neutral type expressions (variables,
applications) by inferring their type and requiring it to be a
universe. This correctly propagates levels through type variables:
if `B : U(1)`, then `checkTypeLevel` on `B` infers `VU(1)` and
returns level 1.

### 8.3 Cumulativity

A type `A` at level `i` is also a type at level `j` for all `j > i`.
This is implemented by accepting `conv(d, VU(i), VU(j))` when `i ≤ j`
**in the Sub rule only** (checking mode, when comparing an inferred
universe against an expected universe). The `conv` function itself
uses strict equality `i == j`.

The cumulativity check is in `check`:

```
-- In the Sub rule:
-- If inferredTy = VU(i) and expectedTy = VU(j) and i ≤ j:  accept
-- Otherwise: conv(Γ.depth, inferredTy, expectedTy) must hold

```

### 8.4 Universe consistency

The kernel MUST reject `U(i) : U(i)`. This is guaranteed by the
level computation: `level(U(i)) = i + 1`, so `U(i)` lives at level
`i + 1`, not `i`. Self-containing universes cannot be constructed.

This prevents Girard's paradox (Girard 1972), which requires a type
that contains itself. Hurkens (1995) gives the compact MLTT rendering
of the inconsistency proof. Universe stratification is the standard
fix, and it is why the kernel enforces `level(U(i)) = i + 1`.

---

## 9. Fuel Mechanism

### 9.1 Evaluation fuel

Every call to `evalF` receives a fuel parameter and decrements it
by one before evaluating the term. When fuel reaches 0:

```
evalF(fuel=0, ρ, t) = THROW "normalization budget exceeded"

```

The kernel aborts via `throw`. Layer 0 (TCB) has no access to the
effect system by design, so fuel exhaustion and kernel invariant
violations both manifest as Nix-level throws caught by `tryEval`.
Callers should treat any throw from the evaluator as "term not
verified" — the distinction between fuel exhaustion and a kernel bug
is in the error message text, not the failure mechanism.

### 9.2 Default budget

The default fuel budget is 10,000,000 reduction steps. This is
configurable by the caller via `evalF`. No minimum is enforced —
callers may pass arbitrarily low fuel, which will cause immediate
`throw` on the first eval step.

### 9.3 Fuel accounting

Fuel is **per-path**, not a global counter. Each call to `evalF`
captures `f = fuel - 1` and passes `f` to all sub-evaluations of
that term. When evaluating `App(t, u)`, both `evalF(f, ρ, t)` and
`evalF(f, ρ, u)` receive the same `f`. This means fuel bounds the
**depth** of any single evaluation path, not the total work across
all paths.

For a balanced binary tree of N applications, the total work is
O(2^depth × fuel), not O(fuel). This is inherent to pure Nix —
there is no mutable global counter. The fuel mechanism guarantees
termination (every path eventually hits 0) but does not bound total
computation time.

All fuel consumption flows through `evalF`:

- Direct term evaluation (each `evalF` call decrements fuel by 1)
- Beta-reduction in `vApp` consumes fuel indirectly via
  `instantiateF`, which calls `evalF`
- Iota-reduction in recursive eliminators (`vNatElimF`,
  `vListElimF`, `vSumElimF`) consumes fuel indirectly via `vAppF`

Non-recursive eliminators (`vJ`) complete in O(1) and do not call
`evalF`. Structural operations (building values, pattern matching
on tags) do not consume fuel.

### 9.4 Fuel threading in trampolined eliminators

Trampolined eliminators (`vNatElimF`, `vListElimF`) flatten
recursive chains into `builtins.foldl'` loops. Each fold step
threads fuel through the accumulator:

```
foldl'(λ{acc, fuel}. λi.
  if fuel ≤ 0 then THROW "normalization budget exceeded"
  else { acc = step(fuel, acc, chain[i]); fuel = fuel - 1; })
  {acc = base; fuel = fuel}
  [1..n]

```

This ensures that an N-element chain consumes N units of fuel from
the fold, plus whatever fuel each step application consumes
internally. Without this threading, each step would get the
original fuel budget, giving an effective budget of N × fuel.

The worst-case complexity of a threaded fold is O(fuel²): at step
*i*, the inner `vAppF` receives `fuel - i` as its own per-path
budget. Summing over all steps gives Σ(fuel - i) ≈ fuel²/2. To
achieve O(fuel), `vAppF` would need to return remaining fuel — an
invasive signature change. The quadratic residual is inherent to
per-path fuel semantics and is a strict improvement over the
pre-threading O(N × fuel) with unbounded N.

### 9.5 Fuel consumption in constructor chains

Trampolined Succ and Cons evaluation (`eval(ρ, Succ(t))` and
`eval(ρ, Cons(A, h, t))`) flatten chains of n constructors and
deduct n fuel units from the budget before evaluating the base.
A chain of n Succ constructors consumes n+1 fuel (1 for the entry
evaluation, n for the chain deduction). Cons chains additionally
thread remaining fuel through the fold: each fold step evaluates
the element type and head with the current fuel budget, then
decrements by 1 (matching the eliminator fuel threading pattern
from §9.4). This is a third fuel consumption site alongside
`evalF` decrements and eliminator fold steps.

---

## 10. Properties the Implementation Must Satisfy

### 10.1 Soundness (non-negotiable)

If the kernel accepts `Γ ⊢ t : A`, then `t` is a valid term of
type `A` in MLTT with the specified type formers and universe
hierarchy. Formally:

**If `check(Γ, t, A)` succeeds, then `Γ ⊢ t : A` is derivable
in the declarative typing rules of MLTT.**

Equivalently: the kernel never accepts an ill-typed term.

### 10.2 Determinism

For any input `(Γ, t, A)`, the kernel produces the same result
on every invocation. There is no randomness, no system-dependent
behavior, no sensitivity to evaluation order (beyond fuel
exhaustion, which always rejects).

### 10.3 Termination

For any input `(Γ, t, A)`, the kernel terminates. It either:
- Accepts (returns the elaborated term)
- Rejects with a type error (via effect)
- Rejects with fuel exhaustion
- Crashes with a kernel bug diagnostic (throw)

It never loops. The fuel mechanism guarantees this.

### 10.4 Evaluation roundtrip

For any well-typed term `t` and environment `ρ` consistent with
the context:

```
quote(d, eval(ρ, quote(d, eval(ρ, t)))) = quote(d, eval(ρ, t))

```

Evaluation followed by quotation is idempotent. The result is a
normal form.

### 10.5 Conversion reflexivity

For any value `v`:

```
conv(d, v, v) = true

```

### 10.6 Conversion symmetry

For any values `v₁, v₂`:

```
conv(d, v₁, v₂) = conv(d, v₂, v₁)

```

### 10.7 Conversion transitivity

For any values `v₁, v₂, v₃`:

```
conv(d, v₁, v₂) ∧ conv(d, v₂, v₃)  ⟹  conv(d, v₁, v₃)

```

### 10.8 Type preservation under evaluation

If `Γ ⊢ t : A` and `eval(Γ.env, t) = v`, then `v` represents a
value of type `A`. This is not directly testable (values don't
carry types) but is ensured by the correctness of the evaluation
rules.

### 10.9 Strong normalization (for well-typed terms)

For any well-typed term `t`, `eval` terminates without exhausting
fuel for a sufficiently large fuel budget. The fuel mechanism is
a practical safeguard, not a theoretical necessity for well-typed
terms.

---

## 11. Derived Test Cases

Every rule in this spec generates at least one positive test (the
rule applies and succeeds) and one negative test (the rule's
premises are violated and the kernel rejects).

### 11.1 Required positive tests (kernel must ACCEPT)

```
-- Identity
⊢ Refl : Eq(Nat, Zero, Zero)

-- Function type
⊢ Lam(x, Nat, Var(0)) : Pi(x, Nat, Nat)

-- Application
f : Pi(x, Nat, Nat) ⊢ App(f, Zero) : Nat

-- Dependent function
⊢ Lam(A, U(0), Lam(x, Var(0), Var(0))) : Pi(A, U(0), Pi(x, A, A))

-- Sigma pair
⊢ Pair(Zero, Tt, Sigma(x, Nat, Unit)) : Sigma(x, Nat, Unit)

-- Nat induction: 0 + 0 = 0
⊢ Refl : Eq(Nat, NatElim(_, Zero, _, Zero), Zero)

-- List
⊢ Cons(Nat, Zero, Nil(Nat)) : List(Nat)

-- Sum injection
⊢ Inl(Nat, Unit, Zero) : Sum(Nat, Unit)

-- Universe hierarchy
⊢ U(0) : U(1)
⊢ U(1) : U(2)
⊢ Nat : U(0)
⊢ Pi(x, Nat, Nat) : U(0)

-- Let binding
⊢ Let(x, Nat, Zero, Var(0)) : Nat

-- Cumulativity: Nat : U(0) should also be accepted at U(1)

-- StrEq: type inference returns the derived H.bool
--   (= μ ⊤ (plus (retI tt) (retI tt)) tt; see §4.11)
⊢ StrEq(StringLit("a"), StringLit("b")) : H.bool

-- StrEq reduction: equal strings reduce to the derived true_ value;
-- unequal strings reduce to false_. Both witnessed via Refl over the
-- derived-bool form. Expressing this rule at the Tm level requires
-- the plus/μ machinery; see the examples/verified-functions.nix
-- fixture `recordStrEqMatch` for an executable test.

```

### 11.2 Required negative tests (kernel must REJECT)

```
-- Type mismatch
⊢ Zero : Unit                          REJECT

-- Universe violation
⊢ U(0) : U(0)                          REJECT

-- Refl on unequal terms
⊢ Refl : Eq(Nat, Zero, Succ(Zero))     REJECT

-- Application of non-function
⊢ App(Zero, Zero)                      REJECT

-- Projection of non-pair
⊢ Fst(Zero)                            REJECT

-- Wrong eliminator scrutinee
⊢ NatElim(_, _, _, Tt)                 REJECT  (Tt : Unit, not Nat)

-- Unbound variable
⊢ Var(0)  (in empty context)           REJECT

-- StrEq on non-string
⊢ StrEq(Zero, StringLit("foo"))       REJECT  (lhs is Nat, expected String)

-- Ill-typed pair
⊢ Pair(Zero, Zero, Sigma(x, Nat, Unit))  REJECT  (snd is Nat, expected Unit)

```

### 11.3 Required stress tests

```
-- Large Nat: S^5000(0) : Nat                     ACCEPT (trampoline)
-- Large List: cons^5000 : List(Nat)              ACCEPT (trampoline)
-- NatElim on S^5000(0)                            ACCEPT (trampoline)
-- ListElim on cons^5000                           ACCEPT (trampoline)
-- Succ elaboration: elab-succ-5000               ACCEPT (trampoline)
-- Cons elaboration: elab-cons-5000               ACCEPT (trampoline)
-- Deeply nested Pi: Pi(x₁, ..., Pi(xₙ, Nat, Nat)...) for n=500  ACCEPT
-- Fuel exhaustion: artificially low fuel on complex term    REJECT (fuel)
-- Fuel threading: NatElim fold decrements fuel per step    ACCEPT
-- Fuel threading: ListElim fold decrements fuel per step   ACCEPT

```

### 11.4 Required roundtrip tests

For each value form, verify:

```
quote(d, eval(ρ, t)) = normal_form(t)

```

where `normal_form(t)` is the expected normal form.

---

## 12. Notation Index

| Symbol | Meaning |
|--------|---------|
| Γ | Typing context |
| ρ | Value environment |
| d | Binding depth (for levels ↔ indices) |
| ⊢ | Typing judgment |
| ⇐ | Checking mode |
| ⇒ | Synthesis mode |
| ↝ | Elaborates to |
| ≡ | Definitional equality |
| Π | Dependent function type |
| Σ | Dependent pair type |
| ℕ | Natural numbers |
| 𝔹 | Booleans (derived: `μ ⊤ (plus (retI tt) (retI tt)) tt`) |
| ⊤ | Unit type |
| ⊥ | Void / empty type (derived: `Fin 0`) |
| U(i) | Universe at level i |
| Id_A(a,b) | Identity type |
| TCB | Trusted computing base |
| WHNF | Weak head normal form |
| NbE | Normalization by evaluation |
| THROW | Kernel invariant violation (crash) |
| REJECT | Term rejected (via effect or fuel) |

---

## 13. Known Limitations

The following are documented implementation choices or limitations,
not bugs. They are recorded here so auditors do not rediscover them.

### 13.1 Pair quotation uses dummy type annotation

`quote(d, VPair(a, b))` produces `Pair(quote(d,a), quote(d,b), Unit)`
— the type annotation is always `Unit` regardless of the actual pair
type. The `VPair` value does not carry its type (values are untyped
in NbE), so the annotation cannot be reconstructed without additional
context. Quoted pairs are structurally correct but carry a dummy
annotation.

### 13.2 Lambda domain annotations discarded in checking mode

When checking `Lam(n, A, t)` against `VPi(n, dom, cl)`, the lambda's
domain annotation `A` is discarded and replaced by `dom` from the
Pi type. This is standard bidirectional type checking (Dunfield &
Krishnaswami 2021, §4): in checking mode, the expected type provides
the domain, not the term. The elaborated output uses `quote(d, dom)`,
making the original annotation unrecoverable.

### 13.3 Term constructors do not validate argument types

Term constructors (`mkVar`, `mkSucc`, etc.) accept arbitrary Nix
values without type validation. `mkVar "hello"` produces
`{ tag = "var"; idx = "hello"; }`, which crashes at eval time.
The trust boundary is the HOAS layer (`hoas.nix`), which is the
public API — direct term construction is internal to the kernel.

### 13.4 `tryEval` only catches `throw` and `assert false`

`builtins.tryEval` in the elaborator's `isConstantFamily` sentinel
detection only catches explicit `throw` and `assert false`. Nix
coercion errors (e.g., "cannot convert a function to JSON"),
missing attribute access, and type comparison errors are uncatchable.
The elaborator uses `builtins.typeOf` in error paths to avoid
triggering coercion errors.

### 13.5 HOAS sentinel comparison

The `isConstantFamily` sentinel test in the elaborator applies two
distinct sentinel values and compares the results to detect whether
a binding body is dependent. Both Pi and Sigma paths compare
**elaborated kernel terms** (`H.elab r1.value == H.elab r2.value`)
rather than raw HOAS trees. This avoids false negatives from Nix's
function identity comparison (`==` on lambdas). However, if `H.elab`
itself produces structurally different terms for semantically
equivalent types (e.g., through different elaboration paths), false
negatives remain possible. This is a safe failure mode — the kernel
still type-checks correctly, but elaboration may require explicit
`_kernel` annotations unnecessarily.

### 13.6 StrEq neutral canonicalization

When one argument to `vStrEq` is neutral and the other is a literal,
the neutral's spine is extended with `EStrEq(literal)`. When both
arguments are neutral, the **left** neutral's spine is extended with
`EStrEq(right)`. This means `StrEq(x, y)` and `StrEq(y, x)` (where
both are neutral) produce different normal forms: `VNe(x, [EStrEq(y)])`
vs `VNe(y, [EStrEq(x)])`. Therefore `conv` will report them as
**not** definitionally equal, even though `StrEq` is semantically
symmetric. This is a safe conservatism: the kernel may reject some
provable equalities but never accepts a false one.

### 13.7 Extract uses type value threading (not sentinels)

The `extract` function threads kernel type values (`tyVal`) through
recursive extraction, rather than using sentinel-based non-dependence
tests. For Pi extraction, the codomain type is computed per-invocation
via `instantiate(tyVal.closure, kernelArg)`, supporting both dependent
and non-dependent function extraction. For Sigma extraction (records),
the second component's type is computed via
`instantiate(tyVal.closure, val.fst)`. A `reifyType : Val → HoasTree`
fallback converts kernel type values back to HOAS when the HOAS body
cannot be applied (e.g., when the body accesses record fields from a
neutral). `reifyType` loses sugar (VSigma → `H.sigma`, not
`H.record`) so the HOAS body is preferred when available.

### 13.8 Spine comparison complexity

`convSp` uses `builtins.elemAt` in a fold to compare neutral spines.
In Nix, `builtins.elemAt` on lists is O(1) (Nix lists are internally
vectors/arrays), so the actual complexity is O(n), not O(n²). This
was incorrectly flagged in an earlier audit.

---

## References

1. Coquand, T. et al. (2009). *A simple type-theoretic language: Mini-TT.*
2. Dunfield, J. & Krishnaswami, N. (2021). *Bidirectional Typing.* ACM Computing Surveys.
3. Kovács, A. (2022). *Generalized Universe Hierarchies.* CSL 2022.
4. Abel, A. & Chapman, J. (2014). *Normalization by Evaluation in the Delay Monad.*
5. Girard, J.-Y. (1972). *Interprétation fonctionnelle et élimination des coupures de l'arithmétique d'ordre supérieur.* Thèse d'État, Université Paris 7.
6. Hurkens, A. J. C. (1995). *A Simplification of Girard's Paradox.* TLCA 1995.
7. de Bruijn, N. (1972). *Lambda Calculus Notation with Nameless Dummies.*
8. Martin-Löf, P. (1984). *Intuitionistic Type Theory.* Bibliopolis.
9. Felicissimo, T. (2023). *Generic Bidirectional Typing for Dependent Type Theories.*


## Core API

### Adapt


Handler context transformation. Contravariant on context, covariant on continuation.

## `adapt`

Transform a handler's state context.

```
adapt : { get : P -> S, set : P -> S -> P } -> Handler<S> -> Handler<P>
```

Wraps a handler that works with child state S so it works with
parent state P, using a get/set lens. Propagates both resume and abort.

```nix
counterHandler = { param, state }: { resume = null; state = state + param; };
adapted = adapt { get = s: s.counter; set = s: c: s // { counter = c; }; } counterHandler;
# adapted now works with { counter = 0; logs = []; } state
```

## `adaptHandlers`

Adapt an entire handler set (attrset of handlers) to a different state context.
Applies the same get/set lens to every handler in the set.

```nix
stateHandlers = {
  get = { param, state }: { value = state; inherit state; };
  put = { param, state }: { value = null; state = param; };
};
adapted = adaptHandlers { get = s: s.data; set = s: d: s // { data = d; }; } stateHandlers;
```


### Binds


Idiomatic Nix bind helpers

## `bindAttrs`

Like a bind-chain but operates over named attrset of required-effects.

```nix
bind.attrs { foo = 99; bar = pure 22; baz = asks (env: env.baz); }
```

Values that are non-effects become send params: `send "foo" 99`.

Result has same attr-keys with corresponding effect result.

See also: `bind.comp`, `bind.fn` for which this is the foundation.

# NOTE: Ordering of chained effects.

Since an attrSet has no order, this function chains effects in
same order as `builtins.attrNames` (alphabetical). If you need
an special order for computations that might be order senstive,
specify a `__sort = names => names` function.

## `bindComp`

Turns a Nix effectful function into an effect chain via bindAttrs.

```nix
bindComp { bar = pure 22; } ({ foo, bar }: pure (foo * bar))
```

The function sees bar as the result of `pure 22` and `foo` as the
result of `send "foo" false` -- false comes directly from using 
`lib.functionArgs f`, the handler can know if "foo" is optional in f.

This works by using `bindAttrs` on the intersection of function args
and attrs.

## `bindFn`

Like bindComp but works on normal Nix functions and turns
its result into a pure-effect.

```nix
bindFn { bar = pure 22; } ({ foo, bar }: foo * bar)
```


### Build


## `materialize`

Materialize a validated BuildPlan into a derivation.

Converts the eval-time plan (validated by fx.build.plan) into a
pkgs.runCommand derivation. Sources are copied into a working
directory, per-step environment variables are scoped, and steps
execute sequentially under set -euo pipefail.

```
materialize : { pkgs, plan, native? } -> derivation
```

The plan argument is the `.plan` field from fx.build.plan output.
Shell generation helpers (mkStepScript, mkSourceSetup, mkBuildScript)
are pure functions tested inline.

## `plan`

Validate and process build steps into a BuildPlan.

Runs an eval-time pipeline that validates each step against
BuildStep, filters steps by `when` predicates using reader context,
and collects all errors without throwing.

```
plan : { name, steps, sources?, context? } -> { plan, errors, warnings, typeErrors }
```

## `types`

Build types for effects-powered builders.

BuildStep and BuildPlan describe build pipelines at the type level,
enabling validation before materialization into derivations.


### Comp


Computation ADT: introduction and elimination forms for Pure | Impure.

## `impure`

Create a suspended computation (OpCall constructor). Takes an effect and a continuation queue.

## `isComp`

Test whether a value is a computation.

## `isPure`

Test whether a computation is Pure. For hot-path conditionals where match would allocate.

## `match`

Eliminate a computation by cases.

```
match comp { pure = a: ...; impure = effect: queue: ...; }
```

Every function that consumes a Computation should go through
match or isPure — never inspect _tag directly.

## `pure`

Lift a value into a pure computation (Return constructor).


### Diag


The `diag` namespace provides typed diagnostic Errors and
structured Hints for the type-checker and runtime contracts.

## error

Diagnostic Error ADT.

An Error has a Layer (Kernel | Generic), a Detail record whose fields
are all optional, a short msg, an optional hint, and a list of
children. A leaf has `children = []`; a branch has a non-empty
children list whose entries are `{ position, error }` pairs. Sibling
failures produce many children; a chained descent produces one
child; a leaf has none.

Constructors:
  mkKernelError  { position?, rule, msg, expected?, got?, ctx_depth?, hint? }
  mkGenericError { type?, context?, value, desc?, index?, guard?, msg, hint? }

Combinators:
  nestUnder  : Position -> Error -> Error
  addChild   : Position -> Error -> Error -> Error

Layer constants: Kernel, Generic. Predicates: isError, isLayer.
Equality: eq (structural).

Pure data. No dependencies on kernel, trampoline, effects, tc, or
types modules.

## positions

Shared diagnostic alphabet. Pure data.

A Position names the blame location in a structural descent through
a Desc or through raw MLTT structure (Π / Σ / Ann / μ / App). The
alphabet is description-centric: names such as `DArgSort`, `DPlusL`,
and `PiDom` identify sub-positions by their meaning in the structure,
not by the code path that happens to visit them.

Two kinds of consumer:

  - A kernel enrichment layer that wraps rule delegations, emitting
    a child error tagged with the `Position` of the sub-call that
    failed.
  - A value-level validator (record / list / variant field walkers)
    that emits `Field` / `Elem` / `Tag` positions from its per-
    component blame traversal.

Both consumers produce `Error` trees whose children are keyed by
`Position`, allowing errors from either source to compose into one
tree.

## pretty

Pretty-printing for diagnostic Errors.

Exports:
  pathSegments : Error -> [String]
  pathString   : Error -> String
  oneLine      : Error -> String
  multiLine    : Error -> String

Chain walkers recurse directly up to 500
frames, then fall through to a `builtins.genericClosure` slow
path that WHNF-forces the next node at each step.

Pure data -> string; no effects.

## hints

Hint resolver for diagnostic Errors.

Exports:
  resolve  : Error -> Hint | null
  classify : Error -> String
  hints    : { <key> = Hint; }

A Hint is `{ _tag = "Hint"; text; category; severity; docLink; }`.
The `_tag` marker keeps it terminal for `api.extractValue`, and
the remaining fields are plain data consumable by renderers,
LSPs, docs, and linters. Severity is `"error"` at this layer;
`docLink` points at the per-key heading anchor on the
auto-generated diag module page (`/nix-effects/core-api/diag`).
The anchor slug matches the docs-site markdown renderer's
heading-id slugification rule, so each hint's docLink lands
precisely on the rendered section for its key.

Keys encode a leaf-anchored suffix of the blame path plus the
classifier pattern: `"<pos1>.<pos2>...<posN>::<pattern>"`. A key
matches when its positions equal the last N tags of the blame
path; `resolve` returns the hint under the longest matching
suffix. Single-position keys are the 1-hop special case. Hint
text is position-semantic: no kernel-rule strings, no source-file
references.

Chain walking recurses directly up to 500
frames, then falls through to a `builtins.genericClosure` slow
path that WHNF-forces the next node.

## Hint registry

The Hint table maps each *blame-path-suffix · classifier-pattern*
key to a structured Hint record. Each subsection below
corresponds to one such key; the heading anchor is the canonical
`docLink` target referenced from `hints.nix`.

### AnnTerm::type-mismatch

Category: **type-mismatch** · Severity: **error**

the annotated term does not match its declared type

### AnnType::not-a-type

Category: **sort** · Severity: **error**

the annotation position must be a type (live in some U(k)), not a term. Write a type expression such as `nat`, `bool`, `u 0`, or a user-defined datatype.

### AppArg::type-mismatch

Category: **type-mismatch** · Severity: **error**

the argument does not match the function's domain

### AppHead::not-a-function

Category: **arity** · Severity: **error**

the head of an application must have a function type (Pi)

### Case::type-mismatch

Category: **elimination** · Severity: **error**

this case-body's inferred type does not match the type the eliminator's motive requires

### DArgBody::not-a-desc

Category: **description** · Severity: **error**

the body of `arg` must produce a description (Desc I), not an ordinary value. Build one with `descRet`, `descArg`, `descRec`, `descPi`, or `descPlus`.

### DArgSort::universe-mismatch

Category: **universe** · Severity: **error**

the sort position of `arg` must live in U(0); descriptions only carry small types. Pass `u 0`, or factor the dependency through `descRec` / `descPi` if a larger type is genuinely needed.

### DPiBody::not-a-desc

Category: **description** · Severity: **error**

the body of `pi` must produce a description for each input, not a plain term. Return a Desc I via `descRet`, `descArg`, `descRec`, `descPi`, or `descPlus`.

### DPiFn::not-a-function

Category: **arity** · Severity: **error**

the index selector `f` of `pi` must be a function `S -> I`

### DPiFn::type-mismatch

Category: **indexing** · Severity: **error**

the index selector's domain must match the declared sort `S`

### DPiSort::universe-mismatch

Category: **universe** · Severity: **error**

the sort position of `pi` must live in U(0); `descPi` takes a small domain. Use `u 0`, or encode the dependency through an index instead of the Pi domain.

### DPlusL::not-a-desc

Category: **description** · Severity: **error**

the left summand of `plus` must be a description (Desc I)

### DPlusR::not-a-desc

Category: **description** · Severity: **error**

the right summand of `plus` must be a description at the same index type as the left summand

### DRecIndex::type-mismatch

Category: **indexing** · Severity: **error**

the index position of `rec` must match the Desc's declared index type. Pass a term of that index type, or adjust the enclosing `μ I ...` to match.

### DRecTail::not-a-desc

Category: **description** · Severity: **error**

the tail position of `rec` must itself be a description, not an ordinary term. Continue the spine with `descRet`, `descArg`, `descRec`, `descPi`, or `descPlus`.

### DRetIndex::type-mismatch

Category: **indexing** · Severity: **error**

the index position of `ret` must match the Desc's declared index type. Supply a term of that index type, or redefine the enclosing `μ I ...` over the index you actually have.

### Elem::inhabitation-failed

Category: **inhabitation** · Severity: **error**

the element does not inhabit the list's element type

### Elem::refinement-failed

Category: **refinement** · Severity: **error**

the element violates the element type's refinement predicate

### Field::inhabitation-failed

Category: **inhabitation** · Severity: **error**

the field's value does not inhabit the declared field type

### Field::refinement-failed

Category: **refinement** · Severity: **error**

the field's value violates the field type's refinement predicate

### JType::not-a-type

Category: **sort** · Severity: **error**

the type parameter of `J` must be a type (live in some U(k)), not a term. Pass a type expression like `nat`, `u 0`, or the type shared by J's two endpoints.

### JType::type-mismatch

Category: **type-mismatch** · Severity: **error**

the type parameter of `J` must match the type of its two endpoints

### LevelMaxLhs::type-mismatch

Category: **universe** · Severity: **error**

the left operand of `max` must be a Level

### LevelMaxRhs::type-mismatch

Category: **universe** · Severity: **error**

the right operand of `max` must be a Level

### LevelSucPred::type-mismatch

Category: **universe** · Severity: **error**

the predecessor of `suc` must be a Level

### Motive.PiDom::not-a-type

Category: **sort** · Severity: **error**

the motive's domain must be a type (live in some U(k)). The motive receives the scrutinee's type as its domain and returns a type; supply a concrete type such as `nat`, `u 0`, or the datatype being eliminated.

### Motive::not-a-function

Category: **arity** · Severity: **error**

the motive must be a function from the scrutinee's type into a type, not a bare type or value. Supply a one-argument function whose body lives in some `U k`.

### Motive::not-a-type

Category: **sort** · Severity: **error**

an eliminator's motive must return a type (live in some U(k))

### MuDesc::not-a-desc

Category: **description** · Severity: **error**

the description argument of μ must be a Desc I term, not an ordinary value. Construct it with `descRet`, `descArg`, `descRec`, `descPi`, or `descPlus`.

### MuIndex::type-mismatch

Category: **indexing** · Severity: **error**

the index passed to `con` must have the description's index type

### MuPayload::type-mismatch

Category: **indexing** · Severity: **error**

the payload of `con` must inhabit the description's interpretation at the given index

### OpaqueType::not-a-function

Category: **arity** · Severity: **error**

the annotation on an opaque lambda must be a Pi type

### OpaqueType::type-mismatch

Category: **type-mismatch** · Severity: **error**

the opaque lambda's declared domain does not match the expected domain

### PiCod::not-a-type

Category: **sort** · Severity: **error**

the codomain family of Π must return a type for each argument, not an ordinary value. Provide a function whose body inhabits some `U k`.

### PiDom::not-a-type

Category: **sort** · Severity: **error**

the domain of Π must be a type (live in some U(k)), not a term or value. Supply a type expression like `nat`, `bool`, `u 0`, or a user-defined datatype.

### Scrut::type-mismatch

Category: **elimination** · Severity: **error**

the scrutinee's type must match the eliminator's expected shape. Annotate the scrutinee via `ann`, or switch to the eliminator that matches its inferred type.

### SigmaFst::inhabitation-failed

Category: **inhabitation** · Severity: **error**

the first component does not inhabit the declared `fst` type

### SigmaFst::refinement-failed

Category: **refinement** · Severity: **error**

the first component violates the `fst` type's refinement predicate

### SigmaSnd::inhabitation-failed

Category: **inhabitation** · Severity: **error**

the second component does not inhabit the dependent `snd` type

### SigmaSnd::refinement-failed

Category: **refinement** · Severity: **error**

the second component violates the `snd` type's refinement predicate

### Tag::inhabitation-failed

Category: **inhabitation** · Severity: **error**

the variant's payload does not inhabit the branch type

### Tag::refinement-failed

Category: **refinement** · Severity: **error**

the variant's payload violates the branch type's refinement predicate

### ULevel::type-mismatch

Category: **universe** · Severity: **error**

the level argument of `U` must be a Level


### Kernel


Freer monad kernel: Return/OpCall ADT with FTCQueue bind, send, map, seq, pipe, kleisli.

## `bind`

Monadic bind: sequence two computations.

```
bind comp f = case comp of
  Pure a       -> f a
  Impure e q   -> Impure e (snoc q f)
```

O(1) per bind via FTCQueue snoc (Kiselyov & Ishii 2015, Section 3.1).

## `kleisli`

Kleisli composition: compose two Kleisli arrows (a -> M b) and (b -> M c) into (a -> M c).

## `map`

Map a function over the result of a computation (Functor instance).

## `pipe`

Chain a computation through a list of Kleisli arrows, threading results via bind.

## `send`

Send an effect request. Returns the handler's response via continuation.

## `seq`

Sequence a list of computations, threading state via bind. Returns the last result.


### Pipeline


Typed pipeline framework with composable stages.

Stages are composable transformations executed with reader (immutable
environment), error (collecting validation errors), and acc (non-fatal
warnings) effects. The run function wires up all handlers and returns
{ value, errors, warnings, typeErrors }.

```nix
let
  stage1 = pipeline.mkStage {
    name = "discover";
    transform = data:
      bind (pipeline.asks (env: env.config)) (cfg:
        pure (data // { config = cfg; }));
  };
  result = pipeline.run { config = "prod"; } [ stage1 ];
in result  # => { config = "prod"; }
```

## `compose`

Chain stages into a single computation.

```
compose : [Stage] -> Data -> Computation Data
```

Each stage's transform receives the output of the previous stage
and returns a computation producing the next stage's input.
Initial data seeds the pipeline.

## `mkStage`

Create a named pipeline stage.

```
mkStage : { name, description?, transform, inputType?, outputType? } -> Stage
```

transform : Data -> Computation Data
  Takes current pipeline data, uses effects (ask, raise, warn),
  returns computation producing updated pipeline data.

inputType/outputType : optional type schemas for validation
  at stage boundaries (checked when provided).
  Validation uses fx.types.validate which sends typeCheck effects.

## `run`

Execute a pipeline with effect handling.

```
run : args -> [Stage] -> { value, errors, warnings, typeErrors }
```

args : { ... }
  Becomes the reader environment -- stages access via ask/asks.

stages : [Stage]
  Ordered list of stages to execute.

Returns:
  value      -- final pipeline data from last stage
  errors     -- list of { message, context } from validation failures
  warnings   -- list of non-fatal warning items
  typeErrors -- list of type validation errors


### Queue


FTCQueue (catenable queue, after Kiselyov & Ishii 2015). O(1) snoc/append, amortized O(1) viewl.

## `append`

Concatenate two queues. O(1).

## `leaf`

Create a singleton queue containing one continuation function.

## `node`

Join two queues. O(1) — just creates a tree node.

## `qApp`

Apply a queue of continuations to a value. Processes continuations
left-to-right: if a continuation returns Pure, feed the value to the
next continuation. If it returns Impure, append the remaining queue
to the effect's own queue and return.

## `singleton`

Create a queue with a single continuation. O(1).

## `snoc`

Append a continuation to the right of the queue. O(1).

## `viewl`

Extract the leftmost continuation from the queue. Amortized O(1).

```
Returns { head = fn; tail = queue | null; }
```

`tail` is null if the queue had only one element.


### Sugar


Opt-in syntax-livability layer for nix-effects.

- `fx.sugar.do` / `fx.sugar.letM` — combinator forms
- `fx.sugar.operators.__div` — `/` as reverse-apply (bind)
- re-exports of `pure bind run handle map seq pipe kleisli`

See `book/src/sugar.md` for the opt-in matrix and caveats.


### Trampoline


Trampolined interpreter using builtins.genericClosure for O(1) stack depth.

## `handle`

Trampolined handler combinator with return clause.

```
handle : { return?, handlers, state? } -> Computation a -> { value, state }
```

Follows Kiselyov & Ishii's `handle_relay` pattern but trampolined
via genericClosure for O(1) stack depth.

**Arguments** (attrset):
- `return` — `value -> state -> { value, state }`. How to transform the final Pure value. Default: identity.
- `handlers` — `{ effectName = { param, state }: { resume | abort, state }; }`. Each must return `{ resume; state; }` or `{ abort; state; }`.
- `state` — initial handler state. Default: null.

## `rotate`

Selectively handle known effects and rotate unknown effects outward.

```
rotate : { return?, handlers, state? } -> Computation a -> Computation b
```

If the current effect has a matching handler, the handler is applied.
If it does not match, the effect is re-suspended and its continuation
is wrapped so handling resumes after that effect is interpreted by an
outer handler.

This corresponds to the Kyo-style handler rotation law
from https://gist.github.com/vic/3a7f52974a28675dbaf40b34bec74787:

```
handle(tag1, suspend(tag2, i, k), f) = suspend(tag2, i, x => handle(tag1, k(x), f))` for `tag1 != tag2
```

## `run`

Run a computation through the genericClosure trampoline.

```
run : Computation a -> Handlers -> State -> { value : a, state : State }
```

**Arguments:**
- `comp` — the freer monad computation to interpret
- `handlers` — `{ effectName = { param, state }: { resume | abort, state }; ... }`
- `initialState` — starting state passed to handlers

Handlers must return one of:

```
{ resume = value; state = newState; }  -- invoke continuation with value
{ abort  = value; state = newState; }  -- discard continuation, halt
```

This is the defunctionalized encoding of Plotkin & Pretnar (2009):
`resume` ≡ invoke continuation k(v), `abort` ≡ discard k.

Stack depth: O(1) — constant regardless of computation length.
Time: O(n) where n = number of effects in the computation.


## Effects

### Acc


Accumulator effect: emit/emitAll/collect for incremental list building.

## `collect`

Read the current accumulated items.

```
collect : Computation [a]
```

## `emit`

Append a single item to the accumulator.

```
emit : a -> Computation null
```

## `emitAll`

Append a list of items to the accumulator.

```
emitAll : [a] -> Computation null
```

## `handler`

Standard accumulator handler. State is a list of accumulated items.
Initial state: `[]`

```nix
handle { handlers = acc.handler; state = []; } comp
```


### Choice


Non-deterministic choice effect: choose/fail/guard with list handler.

## `choose`

Non-deterministic choice from a list of alternatives.
The handler determines how alternatives are explored.

```
choose : [a] -> Computation a
```

## `fail`

Fail the current branch of non-deterministic computation.
Equivalent to `choose []`.

```
fail : Computation a
```

## `guard`

Guard a condition: continue if true, fail if false.

```
guard : bool -> Computation null
```

## `initialState`

Initial state for the listAll handler.

## `listAll`

Handler that explores all non-deterministic branches and returns
a list of all results. Empty choices abort that branch.

State is `{ results : [a], pending : [Computation a] }`.
After handling, results are in `state.results`.

```nix
let r = handle { handlers = choice.listAll; state = choice.initialState; } comp;
in r.state.results
```


### Conditions


CL-style condition system: signal/warn with restart-based recovery.

## `collectConditions`

Collecting handler: accumulates conditions in state, resumes with continue.
State shape: list of { name, data }
Initial state: []

## `fail`

Fail handler: throws on any condition. Ignores available restarts.
Use as a last-resort handler.

## `ignore`

Ignore handler: resumes with null for any condition.
All conditions are silently discarded.

## `signal`

Signal a condition. The handler chooses a restart strategy.

```
signal : string -> any -> [string] -> Computation any
```

**Arguments:**
- `name` — condition name (e.g. `"division-by-zero"`, `"file-not-found"`)
- `data` — condition data (error details, context)
- `restarts` — list of available restart names

The handler receives `{ name, data, restarts }` and returns a
`{ restart, value }` attrset. The continuation receives this choice.

## `warn`

Signal a warning condition. Like signal but with a conventional
`"muffle-warning"` restart. If the handler doesn't muffle, the
computation continues normally.

```
warn : string -> any -> Computation null
```

## `withRestart`

Create a handler that invokes a specific restart for a named condition.
For all other conditions, falls through (throws).

```
withRestart : string -> string -> any -> handler
```

**Arguments:**
- `condName` — condition name to match
- `restartName` — restart to invoke
- `restartVal` — value to pass via the restart


### Error


Error effect with contextual messages and multiple handler strategies.

## `collecting`

Collecting error handler: accumulates errors in state as a list.
Resumes computation with null so subsequent effects still execute.
Use when you want all errors, not just the first.

State shape: list of { message, context }

## `raise`

Raise an error. Returns a Computation that sends an "error" effect.
The handler determines what happens: throw, collect, or recover.

```
raise : string -> Computation a
```

## `raiseWith`

Raise an error with context. The context string describes where
in the computation the error occurred, enabling stack-trace-like
error reports when used with the collecting handler.

```
raiseWith : string -> string -> Computation a
```

## `result`

Result error handler: aborts computation with tagged Error value.
Uses the non-resumption protocol to discard the continuation.

Returns `{ _tag = "Error"; message; context; }` on error.

## `strict`

Strict error handler: throws on first error via builtins.throw.
Use when errors should halt evaluation immediately.

Includes context in the thrown message when available.


### HasHandler


Check if a handler with given name exists in current scope.

```
hasHandler : String -> Computation Bool
```


### Linear


Graded linear resource tracking: acquire/consume/release with usage enforcement.

Each resource gets a capability token at acquire time. The graded handler
covers linear (exactly once), affine (at most once via release), exact(n),
and unlimited usage through a single maxUses parameter.

Quick start:

```nix
let comp = bind (linear.acquireLinear "secret") (token:
  bind (linear.consume token) (val:
    pure val));
in linear.run comp
```

For composition with other handlers, use handler/return/initialState with
`adaptHandlers`.

## `acquire`

Acquire a graded linear resource. Returns a capability token.

```
acquire : { resource : a, maxUses : Int | null } -> Computation Token
```

The token wraps the resource with an ID for tracking. The handler
maintains a resource map in its state, counting each consume call
against the maxUses bound.

- `maxUses = 1` — Linear: exactly one consume required
- `maxUses = n` — Exact: exactly n consumes required
- `maxUses = null` — Unlimited: any number of consumes allowed

Tokens should be consumed exactly maxUses times, or explicitly
released. At handler exit, the return clause (finalizer) checks:
released → always OK, `maxUses = null` → always OK,
otherwise → `currentUses` must equal `maxUses`.

## `acquireExact`

Acquire a resource that must be consumed exactly n times.

```
acquireExact : a -> Int -> Computation Token
```

## `acquireLinear`

Acquire a linear resource (exactly one consume required).

```
acquireLinear : a -> Computation Token
```

## `acquireUnlimited`

Acquire an unlimited resource (any number of consumes allowed).

```
acquireUnlimited : a -> Computation Token
```

## `consume`

Consume a capability token, returning the wrapped resource value.

```
consume : Token -> Computation a
```

Increments the token's usage counter. Aborts with `LinearityError` if:
- Token was already released (`"consume-after-release"`)
- Usage would exceed maxUses bound (`"exceeded-bound"`)

The returned value is the original resource passed to acquire.

## `handler`

Graded linear resource handler. Interprets linearAcquire, linearConsume,
and linearRelease effects. Tracks resource usage in handler state.

Use with `trampoline.handle`:

```nix
handle {
  handlers = linear.handler;
  return = linear.return;
  state = linear.initialState;
} comp
```

Or use the convenience: `linear.run comp`

- `linearAcquire`: creates token, adds resource entry to state
- `linearConsume`: increments usage counter, returns resource value
- `linearRelease`: marks resource as released (finalizer skips it)

## `initialState`

Initial handler state for the linear resource handler.

```nix
{ nextId = 0; resources = {}; }
```

- `nextId`: monotonic counter for generating unique resource IDs.
- `resources`: map from ID (string) to resource tracking entry.

## `release`

Explicitly release a capability token without consuming it.

```
release : Token -> Computation null
```

Marks the resource as released. The finalizer skips released resources,
so this allows affine usage (acquire then drop). Aborts with
`LinearityError` on double-release.

## `return`

Finalizer return clause for the linear handler.

Checks each resource in handler state:
- `released` → OK (explicitly dropped)
- `maxUses = null` → OK (unlimited)
- otherwise → `currentUses` must equal `maxUses`

On violation, wraps the original value in a `LinearityError` with
details of each mismatched resource. On success, passes through
unchanged. Runs on both normal return and abort paths.

## `run`

Run a computation with the graded linear handler.

```
run : Computation a -> { value : a | LinearityError, state : State }
```

Bundles handler, return clause, and initial state into one call.
To compose with other handlers, use handler/return/initialState
separately with `adaptHandlers`.

```nix
let
  comp = bind (acquireLinear "secret") (token:
    bind (consume token) (val:
      pure "got:${val}"));
in linear.run comp
# => { value = "got:secret"; state = { nextId = 1; resources = { ... }; }; }
```


### Reader


Read-only environment effect: ask/asks/local with standard handler.

## `ask`

Read the current environment.

```
ask : Computation env
```

## `asks`

Read a projection of the environment.

```
asks : (env -> a) -> Computation a
```

## `handler`

Standard reader handler. Interprets ask effects.
The state IS the environment (immutable through the computation).

```nix
handle { handlers = reader.handler; state = myEnv; } comp
```

## `local`

Run a computation with a modified environment.
Returns a new computation that transforms the environment
before executing the inner computation.

Since handlers are pure functions, local is implemented by
wrapping the inner computation's ask effects with the modifier.
In practice, use separate handler installation with the modified env.

```
local : (env -> env) -> Computation a -> Computation a
```


### Scope


Computation-scoped handlers via effect rotation.

## `handlersFromAttrs`

Helper to transform an attrset into named handlers.

If attrValue is a function `{ param, state }` it is used directly as handler;
If attrValue is a function, resume is `f param` and preserves state;
Otherwise a constant handler always resumes with attrValue, preserving state.

## `run`

Run a computation with scoped handlers. Effects matching `handlers`
are handled inside the scope. Unknown effects rotate outward.
The scope's internal state is hidden — caller sees only the body's value.

```
scope.run : { handlers, state? } -> Computation a -> Computation a
```

## `runWith`

Like scope.run but exposes the scope's final state alongside the value.

```
scope.runWith : { handlers, state? } -> Computation a -> Computation { value, state }
```

## `stateful`

Run a computation with scoped handlers while preserving
state around effect rotation.

```
scope : handlers -> Computation a -> Computation a
```


### State


Mutable state effect: get/put/modify with standard handler.

## `get`

Read the current state. Returns a Computation that, when handled,
yields the current state value.

```
get : Computation s
```

## `gets`

Read a projection of the current state.

```
gets : (s -> a) -> Computation a
```

## `handler`

Standard state handler. Interprets get/put/modify effects.
Use with `trampoline.handle`:

```nix
handle { handlers = state.handler; state = initialState; } comp
```

- `get`: returns current state as value
- `put`: replaces state with param, returns null
- `modify`: applies param (a function) to state, returns null

## `modify`

Apply a function to the current state. Returns a Computation that,
when handled, transforms the state via f and returns null.

```
modify : (s -> s) -> Computation null
```

## `put`

Replace the current state. Returns a Computation that, when handled,
sets the state to the given value and returns null.

```
put : s -> Computation null
```

## `update`

Apply a computation to the current state. Returns a Computation that,
when handled, updates the state and returns value.

```
update : (s -> Computation { state, value }) -> Computation value
```


### Typecheck


Reusable typeCheck handlers: strict (throw), collecting (accumulate), logging (record all).

## `collecting`

Collecting typeCheck handler: accumulates errors in state.
Resumes with `true` on success, `false` on failure (computation continues).

State shape: list of `{ context, typeName, actual, message, path }`
Initial state: `[]`

## `logging`

Logging typeCheck handler: records every check (pass or fail) in state.
Always resumes with the actual check result (boolean).

State shape: list of `{ context, typeName, passed, path }`
Initial state: `[]`

## `strict`

Strict typeCheck handler: throws on first type error.
Resumes with true on success (check passed).

Use when type errors should halt evaluation immediately.
State: unused (pass null).


### Writer


Append-only output effect: tell/tellAll with list-collecting handler.

## `handler`

Standard writer handler. Collects tell output in state as a list.
Initial state: `[]`

```nix
handle { handlers = writer.handler; state = []; } comp
```

## `tell`

Append a value to the output log.

```
tell : w -> Computation null
```

## `tellAll`

Append a list of values to the output log.

```
tellAll : [w] -> Computation null
```


## Types

### Constructors


Type constructors: Record, ListOf, Maybe, Either, Variant.

## `Either`

Tagged union of two types. Accepts `{ _tag = "Left"; value = a; }`
or `{ _tag = "Right"; value = b; }`.

## `ListOf`

Homogeneous list type. `ListOf Type` checks that all elements have the given type.

Custom verifier sends per-element `typeCheck` effects with indexed context
strings (e.g. `List[Int][2]`) for blame tracking. Unlike Sigma, elements
are independent — no short-circuit. All elements are checked; the handler
decides error policy (strict aborts on first, collecting gathers all).

## `Maybe`

Option type. Maybe Type accepts null or a value of Type.

## `Record`

Record type constructor. Takes a schema { field = Type; ... } and checks
that a value has all required fields with correct types.
Extra fields are permitted (open record semantics).

## `Variant`

Discriminated union. Takes `{ tag = Type; ... }` schema.
Accepts `{ _tag = "tag"; value = ...; }` where value has the corresponding type.


### Dependent


Dependent contracts: Pi (Π), Sigma (Σ), Certified, Vector, DepRecord.
Grounded in Martin-Löf (1984) "Intuitionistic Type Theory".

## `Certified`

Certified value: `Σ(v:A).Proof(P(v))`.

A dependent pair where:

```
fst : A              — the value
snd : true           — proof witness (must be true AND predicate must hold)
```

The second component's type depends on the first: it checks both
that the proof is `true` and that `predicate(fst)` holds.

Certified is a first-order contract — both components are concrete
data, so the contract is checked immediately and completely (like
Sigma). The guard IS full membership.

Construction:

- `.certify v` — pure smart constructor (throws on invalid)
- `.certifyE v` — effectful smart constructor (sends `typeCheck` effects)
- `.check` — inherited from Sigma (full dependent pair check)
- `.validate` — inherited from Sigma (effectful introduction check)

The `.certifyE` constructor is NOT an introduction check — it's a
convenience that takes a raw value, evaluates the predicate, and
produces the Sigma pair `{ fst = v; snd = true; }`. The actual
introduction check (`.validate`) is inherited from Sigma and verifies
an already-formed pair.

## `DepRecord`

Dependent record type built on nested Sigma.

Schema is an ordered list of `{ name; type; }` where `type` can be:

- A Type (static field)
- A function (`partial-record → Type`) for dependent fields

Isomorphic to nested Sigma types:

```
{ a : A, b : B(a) }              ≅  Σ(a:A).B(a)
{ a : A, b : B(a), c : C(a,b) }  ≅  Σ(a:A).Σ(b:B(a)).C(a,b)
```

Values are nested Sigma pairs:

```nix
{ fst = a; snd = { fst = b; snd = c; }; }
```

Inherits from Sigma: `.validate` (effectful), `.proj1`, `.proj2`,
`.pair`, `.pairE`, `.curry`, `.uncurry`.

Use `.pack` to convert flat attrset → nested Sigma value.
Use `.unpack` to convert nested Sigma value → flat attrset.

## `Pi`

Dependent function type `Π(x:A).B(x)`.

Arguments:

- `domain` — Type A
- `codomain` — A-value → Type (type family B indexed by domain values)
- `universe` — Universe level (explicit parameter — see below)
- `name` — optional display name

== Higher-order contract with algebraic effects ==

Pi is a HIGHER-ORDER CONTRACT (Findler & Felleisen 2002). Higher-order
contracts check function values differently from data values: a data
contract is verified immediately and completely, but a function contract
is verified incrementally at each application site. This is the
standard, correct strategy for function contracts — not a deficit.

The (Specification, Guard, Verifier) triple for Pi:

```
Guard (check):       builtins.isFunction — the immediate first-order
                     part of the contract. Soundly rejects non-functions.
Verifier (validate): effectful guard (auto-derived, 1 arg) — wraps
                     the guard in a typeCheck effect for blame tracking.
Elimination (checkAt): deferred contract check (2 args) — verifies a
                     specific application f(arg) by sending typeCheck
                     effects for both domain (arg : A) and codomain
                     (f(arg) : B(arg)).
```

This is precisely the Findler-Felleisen decomposition: the immediate
part (`isFunction`) is checked at introduction; the deferred part
(domain + codomain) is checked at each elimination site via `checkAt`.

== Adequacy ==

```
check f ⟺ all typeCheck effects in (validate f) pass
```

Both `check` and `validate` verify the introduction form (is it a function?).
`checkAt` verifies individual applications — the deferred contract.

== Universe level ==

Universe level is an explicit parameter. In MLTT, the level is computed
as `max(i, sup_{a:A} level(B(a)))` by inspecting the syntax of B.
For types with explicit kernelType, the kernel computes and verifies
levels via checkTypeLevel. The explicit universe parameter provides
the level for the surface API's `.universe` field.

== MLTT rule mapping ==

```
Formation:          Pi { domain, codomain, universe }
Introduction check: .check (guard: isFunction)
Introduction verify: .validate (effectful guard, auto-derived)
Elimination:        .apply (pure), .checkAt (effectful, deferred contract)
Computation:        β-reduction (Nix evaluation)
```

Operations:

- `.checkAt f arg` — deferred contract check at elimination site
- `.apply arg` — pure elimination: compute codomain type B(arg)
- `.compose f other` — compose Pi types (requires witness function)
- `.domain` — the domain type A
- `.codomain` — the type family B

## `Sigma`

Dependent pair type `Σ(x:A).B(x)`.

Arguments:

- `fst` — Type A (type of the first component)
- `snd` — A-value → Type (type family for the second component)
- `universe` — Universe level (explicit parameter)
- `name` — optional display name

Values are `{ fst; snd; }` where `fst : A` and `snd : B(fst)`.

== First-order contract — guard is exact ==

Sigma is a FIRST-ORDER CONTRACT: both components are concrete data,
so the contract is checked immediately and completely. The guard
(`check`) IS full membership — there is no over-approximation.

```
Guard (check):    fst:A ∧ snd:B(fst) — exact. G = ⟦Σ(x:A).B(x)⟧.
Verifier (verify): decomposed effectful check — sends separate
                  typeCheck effects for fst and snd for blame tracking.
```

This contrasts with Pi where the guard over-approximates (`isFunction`)
because functions are higher-order. Sigma pairs are data — the
dependent relationship (snd's type depends on fst's value) can be
fully verified because both values are available.

Adequacy:

```
T.check v ⟺ all typeCheck effects in T.validate v pass
```

Under the all-pass handler. The guard is exact and the decomposed
verifier sends individual `typeCheck` effects per component — the all-pass
handler's boolean state tracks whether all passed. Totality: if the input
is structurally malformed (not an attrset, missing `fst`/`snd`), verify falls
back to a single `typeCheck` for the whole type — failure goes through the
effect system, never crashes Nix.

Universe level is an explicit parameter (computing
`sup_{a:A} snd(a).universe` requires evaluating the type family on
all domain values, same as Pi).

== MLTT rule mapping ==

```
Formation:    Sigma { fst, snd, universe }
Introduction: .check (exact guard), .validate (effectful, decomposed)
Elimination:  .proj1 (π₁), .proj2 (π₂)
Computation:  π₁(a,b) ≡ a, π₂(a,b) ≡ b
```

Operations:

- `.proj1 pair` — first projection π₁
- `.proj2 pair` — second projection π₂
- `.pair a b` — smart constructor (throws on invalid)
- `.validate v` — effectful: decomposed typeCheck effects for blame
- `.pairE a b` — effectful smart constructor
- `.pullback f g` — contravariant predicate pullback (see below)
- `.curry` / `.uncurry` — standard Sigma adjunction
- `.fstType` — the type A
- `.sndFamily` — the type family B

## `Vector`

Length-indexed list type family, built on Pi.

```
Vector(A) = Π(n:Nat).{xs : List(A) | |xs| = n}
```

This is the correct Martin-Löf encoding: Vector IS a Pi type.
It inherits `.validate` (effectful), `.compose`, `.apply`, `.domain`, `.codomain`
from Pi.

Usage:

```nix
Vector elemType           # the Pi type family (Nat → SizedList)
(Vector elemType).apply 3 # specific type for length 3
```


### Foundation


Type system foundation: Type constructor, check, validate, make, refine.

## `check`

Check whether a value inhabits a type. Returns bool.

## `make`

Validate a value and return it, or throw on failure.

## `mkType`

Create a type from its kernel representation.

A nix-effects type is defined by its `kernelType` — an HOAS type tree
representing the type in the MLTT kernel. All fields are derived:

- `.check` = `decide(kernelType, v)` — the decision procedure
- `.universe` = `checkTypeLevel(kernelType)` — computed universe level
- `.kernelCheck` = same as `.check`
- `.prove` = kernel proof checking for HOAS terms

Arguments:

- `name` — Human-readable type name
- `kernelType` — HOAS type tree (required — this IS the type)
- `guard` — Optional runtime predicate for refinement types.
  When present, `.check = kernelDecide(v) && guard(v)` (conjunction —
  kernel catches structural errors, guard handles residual constraints).
  The guard handles constraints the kernel can't express (e.g., x >= 0).
- `verify` — Optional custom verifier (`self → path → value → Computation`).
  `path` is a list of string segments describing the structural
  descent from the validation root (e.g. `["a" "b" "c"]` for a
  nested field, `["[0]" "mtu"]` for a list element's field).
  When null (default), `validate` is auto-derived by wrapping
  `check` in a `typeCheck` effect. Supply a custom `verify` for
  types that decompose checking (e.g. Record sends separate
  effects per field for blame tracking).
- `description` — Documentation string (default = `name`)
- `universe` — Optional universe level override. When null (default),
  computed from `checkTypeLevel(kernelType)`. Use when the kernelType
  is a fallback (e.g., `H.function_` for Pi) that doesn't capture the
  real universe level.
- `approximate` — When true, the kernelType is a sound but lossy
  approximation (e.g., `H.function_` for Pi, `H.any` for Sigma).
  Suppresses `_kernel`, `kernelCheck`, and `prove` on the result,
  since the kernel representation doesn't precisely capture this type.
  The kernelType is still used internally for universe computation.

## `refine`

Narrow a type with an additional predicate. Creates a refinement type
whose check = kernelDecide(v) ∧ guard(v) (conjunction).
The base type's kernel provides structural checking; the guard handles
the refinement predicate the kernel cannot express.
Grounded in Freeman & Pfenning (1991) "Refinement Types for ML" and Rondon et al. (2008) "Liquid Types".

## `validate`

Standalone effectful validation with explicit context string.

Sends a `typeCheck` effect with the given type, value, and context.
The handler receives `{ type, context, value }` and determines the
response: throw, collect error, log, or offer restarts.

For typical use, prefer `type.validate` (auto-derived by `mkType`,
uses the type's name as context). This 3-arg form is for cases
where a custom context string is needed.

```
validate : Type → Value → String → Computation Bool
```


### Linear


Linear type constructors: structural guards for capability tokens.

Pure type predicates that check token structure without consuming.
Usage enforcement is in effects/linear.nix (separate concerns).

Linear(T)       — exactly one consume required
Affine(T)       — at most one consume (release allowed)
Graded(n, T)    — exactly n consumes (generalizes Linear/Affine)

See Orchard et al. (2019) for graded modal types.

## `Affine`

Affine type: capability token that may be consumed at most once.

```
Affine : Type -> Type
```

Structurally identical to `Linear(T)`. The name communicates that the
resource may be explicitly released (dropped) via `effects/linear.release`
without consuming it — "at most once" vs Linear's "exactly once."

The structural guard is the same: both check for a valid capability
token with inner type T. The usage distinction (exactly-once vs
at-most-once) is enforced by the effect handler, not the type system.

Operations:

- `.check v` — pure guard: is v a valid affine token wrapping T?
- `.validate v` — effectful: sends `typeCheck` for blame tracking
- `.innerType` — the wrapped type T

## `Graded`

Graded type: capability token with usage multiplicity annotation.

```
Graded : { maxUses : Int | null, innerType : Type } -> Type
```

Generalizes Linear and Affine via a `maxUses` parameter:

```nix
Graded { maxUses = 1; innerType = T; }    # ≡ Linear(T)
Graded { maxUses = null; innerType = T; }  # ≡ Unlimited(T)
Graded { maxUses = n; innerType = T; }     # ≡ Exact(n, T)
```

The structural guard is the same as Linear and Affine — token
structure with inner type check. The `maxUses` appears in the type
name for documentation but is NOT checked by the guard (the grade
lives in handler state, not the token).

The name uses ω for null (unlimited):
`Graded(1, Int)`, `Graded(5, String)`, `Graded(ω, Bool)`

From Orchard et al. (2019) "Quantitative Program Reasoning with
Graded Modal Types" — semiring-indexed usage annotations where
+ models branching, × models sequencing, 1 = linear, ω = unlimited.

Operations:

- `.check v` — pure guard: is v a valid graded token wrapping T?
- `.validate v` — effectful: sends `typeCheck` for blame tracking
- `.innerType` — the wrapped type T
- `.maxUses` — the declared usage multiplicity

## `Linear`

Linear type: capability token that must be consumed exactly once.

```
Linear : Type -> Type
```

Creates a type whose `check` verifies the capability token structure:

```nix
{ _linear = true, id = Int, resource = innerType }
```

Pure structural guard — checking does not consume the token.
`effects/linear.nix` tracks consumption separately.

Adequacy invariant:

```
Linear(T).check v ⟺ all typeCheck effects in Linear(T).validate v pass
```

Holds by construction via `mkType`'s auto-derived `validate`.

Operations:

- `.check v` — pure guard: is v a valid linear token wrapping T?
- `.validate v` — effectful: sends `typeCheck` for blame tracking
- `.innerType` — the wrapped type T


### Primitives


Primitive types: String, Int, Bool, Float, Attrs, Path, Function, Null, Unit, Any.

## `Any`

Top type. Every value inhabits Any.

## `Attrs`

Attribute set type (any attrset).

## `Bool`

Boolean type.

## `Float`

Float type.

## `Function`

Function type.

## `Int`

Integer type.

## `Null`

Null type. Only null inhabits it.

## `Path`

Path type.

## `String`

String type.

## `Unit`

Unit type. Isomorphic to Null — the trivial type with one inhabitant.


### Refinement


Refinement types and predicate combinators.
Grounded in Freeman & Pfenning (1991) and Rondon et al. (2008).

## `allOf`

Combine predicates with conjunction: (allOf [p1 p2]) v = p1 v && p2 v.

## `anyOf`

Combine predicates with disjunction: (anyOf [p1 p2]) v = p1 v || p2 v.

## `inRange`

Predicate factory: (inRange lo hi) v = lo <= v <= hi.

## `matching`

Predicate factory: (matching pattern) s = s matches regex pattern.

## `negate`

Negate a predicate: (negate p) v = !(p v).

## `nonEmpty`

Predicate: string or list is non-empty.

## `nonNegative`

Predicate: value >= 0.

## `positive`

Predicate: value > 0.

## `refined`

Create a named refinement type.

```
refined : string -> Type -> (value -> bool) -> Type
```


### Universe


Universe hierarchy: Type_0 : Type_1 : Type_2 : ... Lazy infinite tower.

## `level`

Get the universe level of a type.

## `typeAt`

Create universe type at level n (cumulative). `Type_n` contains all types
with universe ≤ n. `Type_n` itself has universe n + 1.

```
Type_n : Type_(n+1) for all n
```


## Streams

### Combine


Stream combination: concat, interleave, zip, zipWith.

## `concat`

Concatenate two streams: all elements of the first, then all of the second.

```
concat : Computation (Step r a) -> Computation (Step s a) -> Computation (Step s a)
```

## `interleave`

Interleave two streams: alternate elements from each.

```
interleave : Computation (Step r a) -> Computation (Step s a) -> Computation (Step null a)
```

## `zip`

Zip two streams into a stream of pairs.
Stops when either stream ends.

```
zip : Computation (Step r a) -> Computation (Step s b) -> Computation (Step null { fst : a, snd : b })
```

## `zipWith`

Zip two streams with a combining function.

```
zipWith : (a -> b -> c) -> Computation (Step r a) -> Computation (Step s b) -> Computation (Step null c)
```


### Core


Stream primitives: done/more/fromList/iterate/range/replicate.

## `done`

Terminate a stream with a final value.

```
done : a -> Computation (Step a b)
```

## `fromList`

Create a stream from a list.

```
fromList : [a] -> Computation (Step null a)
```

## `iterate`

Create an infinite stream by repeated application.

```
iterate f x = [x, f(x), f(f(x)), ...]
```

Must be consumed with a limiting combinator (take, takeWhile).

```
iterate : (a -> a) -> a -> Computation (Step r a)
```

## `more`

Yield an element and a continuation stream.

```
more : a -> Computation (Step r a) -> Computation (Step r a)
```

## `range`

Create a stream of integers from start (inclusive) to end (exclusive).

```
range : int -> int -> Computation (Step null int)
```

## `replicate`

Create a stream of n copies of a value.

```
replicate : int -> a -> Computation (Step null a)
```


### Limit


Stream limiting: take, takeWhile, drop.

## `drop`

Skip the first n elements of a stream.

```
drop : int -> Computation (Step r a) -> Computation (Step r a)
```

## `take`

Take the first n elements of a stream.

```
take : int -> Computation (Step r a) -> Computation (Step null a)
```

## `takeWhile`

Take elements while a predicate holds.

```
takeWhile : (a -> bool) -> Computation (Step r a) -> Computation (Step null a)
```


### Reduce


Stream reduction: fold, toList, length, sum, any, all.

## `all`

Check if all elements satisfy a predicate.

```
all : (a -> bool) -> Computation (Step r a) -> Computation bool
```

## `any`

Check if any element satisfies a predicate.
Short-circuits on first match (via lazy evaluation).

```
any : (a -> bool) -> Computation (Step r a) -> Computation bool
```

## `fold`

Left fold over a stream.

```
fold : (b -> a -> b) -> b -> Computation (Step r a) -> Computation b
```

## `length`

Count the number of elements in a stream.

```
length : Computation (Step r a) -> Computation int
```

## `sum`

Sum all numeric elements in a stream.

```
sum : Computation (Step r number) -> Computation number
```

## `toList`

Collect all stream elements into a list.

```
toList : Computation (Step r a) -> Computation [a]
```


### Transform


Stream transformations: map, filter, scanl.

## `filter`

Keep only elements satisfying a predicate.

```
sfilter : (a -> bool) -> Computation (Step r a) -> Computation (Step r a)
```

## `map`

Map a function over each element of a stream.

```
smap : (a -> b) -> Computation (Step r a) -> Computation (Step r b)
```

## `scanl`

Accumulate a running fold over the stream, yielding each intermediate value.

```
scanl : (b -> a -> b) -> b -> Computation (Step r a) -> Computation (Step r b)
```


## Type Checker

### Check


Semi-trusted (Layer 1): uses the TCB (eval/quote/conv) and reports
type errors via `send "typeError"`. Bugs here may produce wrong
error messages but cannot cause unsoundness.

Spec reference: kernel-spec.md §7–§9.

## Core Functions

- `check : Ctx → Tm → Val → Computation Tm` — checking mode (§7.4).
  Verifies that `tm` has type `ty` and returns an elaborated term.
- `infer : Ctx → Tm → Computation { term; type; }` — synthesis mode (§7.3).
  Infers the type of `tm` and returns the elaborated term with its type.
- `checkType : Ctx → Tm → Computation Tm` — verify a term is a type (§7.5).
- `checkTypeLevel : Ctx → Tm → Computation { term; level; }` — like
  `checkType` but also returns the universe level (§8.2).

## Context Operations (§7.1)

- `emptyCtx` — empty typing context `{ env = []; types = []; depth = 0; }`
- `extend : Ctx → String → Val → Ctx` — add a binding (index 0 = most recent)
- `lookupType : Ctx → Int → Val` — look up a variable's type by index

## Test Helpers

- `runCheck : Computation → Value` — run a computation through the
  trampoline handler, aborting on `typeError` effects.
- `checkTm : Ctx → Tm → Val → Tm|Error` — check and unwrap.
- `inferTm : Ctx → Tm → { term; type; }|Error` — infer and unwrap.

## Key Behaviors

- **Sub rule**: when checking mode doesn't match (e.g., checking a
  variable), falls through to `infer` and uses `conv` to compare.
- **Cumulativity**: `U(i) ≤ U(j)` when `i ≤ j` (§8.3).
- **Large elimination**: motives may return any universe, enabling
  type-computing eliminators (`checkMotive`).
- **Trampolining**: Succ and Cons chains checked iteratively (§11.3).

## `diag`

# fx.tc.check.diag — Kernel Diagnostic Shell

Outside the trust boundary. Routes kernel check/infer results
through the trusted core and decorates failures with a resolved
hint + a SourceMap-sourced surface origin. No new effects.

## API

- `sourceMap` — SourceMap data type and combinators. See the
  module's own doc for the full surface.
- `checkD   : Ctx -> Tm -> Val -> SourceMap -> Any`
- `inferD   : Ctx -> Tm -> SourceMap -> Any`
- `runCheckD : SourceMap -> Computation -> Any`
- `runCheckDLazy : (Unit -> SourceMap) -> Computation -> Any`

On success, these return what the trusted core returned (the
elaborated Tm for checkD, `{term; type;}` for inferD). On failure,
they return `{ error; msg; expected; got; hint; surface; }`.

`runCheckDLazy` defers `mkSm null` into the failure branch, so the
success path pays one closure allocation instead of the full SM
walker. Kernel HOAS entry points (`checkHoas`/`inferHoas`) use this
variant.

`hint` is a `Hint` record (`{ _tag="Hint"; text; category;
severity; docLink; }`) from `fx.diag.hints.resolve`, or null.
`surface` is the SourceMap's hoas payload at the blame chain's
leaf, or null when the chain exits the mapped region.

Soundness audit: this module contains no kernel rule logic. A bug
here can produce wrong hint text or an incorrect surface back-map;
it cannot cause an ill-typed term to be accepted.


### Conv


Checks whether two values are definitionally equal at a given
binding depth. Purely structural — no type information used, no
eta expansion. Pure function — part of the TCB.

Spec reference: kernel-spec.md §6.

## Core Functions

- `conv : Depth → Val → Val → Bool` — check definitional equality.
- `convSp : Depth → Spine → Spine → Bool` — check spine equality
  (same length, pairwise `convElim`).
- `convElim : Depth → Elim → Elim → Bool` — check elimination frame
  equality (same tag, recursively conv on carried values).

## Conversion Rules

- §6.1 **Structural**: same-constructor values with matching fields.
  Universe levels compared by `==`. Primitive literals by value.
- §6.2 **Binding forms**: Pi, Lam, Sigma compared under a fresh
  variable at depth d (instantiate both closures, compare at d+1).
- §6.3 **Compound values**: recursive on all components.
- §6.4 **Neutrals**: same head level and convertible spines.
- §6.5 **Catch-all**: different constructors → false.

## Trampolining

VSucc and VCons chains use `genericClosure` to avoid stack overflow
on S^5000 or cons^5000 comparisons.

## No Eta

`conv` does not perform eta expansion: a neutral `f` and
`λx. f(x)` are **not** definitionally equal. Cumulativity
(`U(i) ≤ U(j)`) is handled in check.nix, not here.


### Elaborate


Bridges the fx.types layer to the kernel's term representation
via the HOAS combinator layer. Provides the Nix ↔ kernel boundary.

## Type Elaboration

- `elaborateType : FxType → Hoas` — convert an fx.types type descriptor
  to an HOAS tree. Dispatches on: (1) `_kernel` annotation, (2) structural
  fields (Pi: domain/codomain, Sigma: fstType/sndFamily), (3) name
  convention (Bool, Nat, String, Int, Float, ...).
  Dependent Pi/Sigma require explicit `_kernel` annotation.

## Value Elaboration

- `elaborateValue : Hoas → NixVal → Hoas` — convert a Nix value to
  an HOAS term tree given its HOAS type. Bool→true_/false_, Int→natLit,
  List→cons chain, Sum→inl/inr, Sigma→pair. Trampolined for large lists.

## Structural Validation

- `validateValue : String → Hoas → NixVal → [{ path; msg; }]` —
  applicative structural validator. Mirrors `elaborateValue`'s recursion
  but accumulates all errors instead of throwing on the first.
  Path accumulator threads structural context (Reader effect).
  Error list is the free monoid (Writer effect).
  Empty list ↔ `elaborateValue` would succeed (soundness invariant).

## Value Extraction

- `extract : Hoas → Val → NixValue` — reverse of `elaborateValue`.
  Converts kernel values back to Nix values. VZero→0, VSucc^n→n,
  VCons chain→list, VInl/VInr→tagged union.
  Pi extraction wraps the VLam as a Nix function with boundary conversion.
  Opaque types (Attrs, Path, Function, Any) throw — kernel discards payloads.
- `extractInner : Hoas → Val → Val → NixValue` — three-argument extraction
  with kernel type value threading. Supports dependent Pi/Sigma via closure
  instantiation instead of sentinel tests.
- `reifyType : Val → Hoas` — converts a kernel type value back to HOAS.
  Fallback for when HOAS body application fails (dependent types).
  Loses sugar (VSigma→sigma, not record).

## Decision Procedure

- `decide : Hoas → NixVal → Bool` — returns true iff elaboration
  and kernel type-checking both succeed. Uses `tryEval` for safety.
- `decideType : FxType → NixVal → Bool` — elaborate type then decide.

## Full Pipeline

- `verifyAndExtract : Hoas → Hoas → NixValue` — type-check an HOAS
  implementation against an HOAS type, evaluate, extract to Nix value.
  Throws on type error.


### Eval


Pure evaluator: interprets kernel terms in an environment of
values. Zero effect system imports — part of the trusted computing
base (TCB).

Spec reference: kernel-spec.md §4, §9.

## Core Functions

- `eval : Env → Tm → Val` — evaluate with default fuel (10M steps)
- `evalF : Int → Env → Tm → Val` — evaluate with explicit fuel budget
- `instantiate : Closure → Val → Val` — apply a closure to an argument

## Elimination Helpers

- `vApp : Val → Val → Val` — apply a function value (beta-reduces VLam, extends spine for VNe)
- `vFst`, `vSnd` — pair projections
- `vNatElim`, `vListElim` — inductive eliminators
- `vSumElim` — sum elimination
- `vJ` — identity elimination (computes to base on VRefl)

## Trampolining (§11.3)

`vNatElim`, `vListElim`, `succ` chains, and `cons` chains use
`builtins.genericClosure` to flatten recursive structures iteratively,
guaranteeing O(1) stack depth on inputs like S^10000(0) or cons^5000.

## Fuel Mechanism (§9)

Each `evalF` call decrements a fuel counter. When fuel reaches 0,
evaluation throws `"normalization budget exceeded"`. This bounds
total work and prevents unbounded computation in the Nix evaluator.
Default budget: 10,000,000 steps.


### Hoas


Higher-Order Abstract Syntax layer that lets you write kernel terms
using Nix lambdas for variable binding. The `elaborate` function
compiles HOAS trees to de Bruijn indexed Tm terms.

Spec reference: kernel-spec.md §2.

## Example

```nix
# Π(A:U₀). A → A
H.forall "A" (H.u 0) (A: H.forall "x" A (_: A))
```

## Type Combinators

- `nat`, `bool`, `unit`, `void` — base types
- `string`, `int_`, `float_`, `attrs`, `path`, `function_`, `any` — primitive types
- `listOf : Hoas → Hoas` — List(elem)
- `sum : Hoas → Hoas → Hoas` — Sum(left, right)
- `eq : Hoas → Hoas → Hoas → Hoas` — Eq(type, lhs, rhs)
- `u : Int → Hoas` — Universe at level
- `forall : String → Hoas → (Hoas → Hoas) → Hoas` — Π-type (Nix lambda for body)
- `sigma : String → Hoas → (Hoas → Hoas) → Hoas` — Σ-type

## Compound Types (Sugar)

- `record : [{ name; type; }] → Hoas` — nested Sigma (sorted fields)
- `maybe : Hoas → Hoas` — Sum(inner, Unit)
- `variant : [{ tag; type; }] → Hoas` — nested Sum (sorted tags)

## Term Combinators

- `lam : String → Hoas → (Hoas → Hoas) → Hoas` — λ-abstraction
- `let_ : String → Hoas → Hoas → (Hoas → Hoas) → Hoas` — let binding
- `zero`, `succ`, `true_`, `false_`, `tt`, `refl` — intro forms
- `nil`, `cons`, `pair`, `inl`, `inr` — data constructors
- `stringLit`, `intLit`, `floatLit`, `attrsLit`, `pathLit`, `fnLit`, `anyLit` — primitive literals
- `absurd`, `ann`, `app`, `fst_`, `snd_` — elimination/annotation

## Eliminators

- `ind` — NatElim(motive, base, step, scrut)
- `boolElim` — (k : Level) → (Q : bool → U(k)) → Q true_ → Q false_ → (b : bool) → Q b
- `listElim` — ListElim(elem, motive, onNil, onCons, scrut)
- `sumElim` — SumElim(left, right, motive, onLeft, onRight, scrut)
- `j` — J(type, lhs, motive, base, rhs, eq)

## Elaboration

- `elaborate : Int → Hoas → Tm` — compile at given depth
- `elab : Hoas → Tm` — compile from depth 0

## Convenience

- `checkHoas : Hoas → Hoas → Tm|Error` — elaborate type+term, type-check
- `inferHoas : Hoas → { term; type; }|Error` — elaborate and infer
- `natLit : Int → Hoas` — build S^n(zero)

## Stack Safety

Binding chains (pi/lam/sigma/let), succ chains, and cons chains
are elaborated iteratively via `genericClosure` — safe to 8000+ depth.


### Quote


Converts values back to terms, translating de Bruijn levels to
indices. Pure function — part of the TCB.

Spec reference: kernel-spec.md §5.

## Core Functions

- `quote : Depth → Val → Tm` — read back a value at binding depth d.
  Level-to-index conversion: `index = depth - level - 1`.
- `quoteSp : Depth → Tm → Spine → Tm` — quote a spine of eliminators
  applied to a head term (folds left over the spine).
- `quoteElim : Depth → Tm → Elim → Tm` — quote a single elimination
  frame applied to a head term.
- `nf : Env → Tm → Tm` — normalize: `eval` then `quote`. Useful for
  testing roundtrip idempotency (`nf env (nf env tm) == nf env tm`).
- `lvl2Ix : Depth → Level → Index` — level-to-index helper.

## Trampolining

VSucc and VCons chains are quoted iteratively via `genericClosure`
for O(1) stack depth on deep values (5000+ elements).

## Binder Quotation

For VPi, VLam, VSigma: instantiates the closure with a fresh
variable at the current depth, then quotes the body at `depth + 1`.


### Term


Syntax of the kernel's term language. All 48 constructors produce
attrsets with a `tag` field (not `_tag`, to distinguish kernel terms
from effect system nodes). Binding is de Bruijn indexed: `mkVar i`
refers to the i-th enclosing binder (0 = innermost).

Name annotations (`name` parameter on `mkPi`, `mkLam`, `mkSigma`,
`mkLet`) are cosmetic — used only in error messages, never in
equality checking.

Spec reference: kernel-spec.md §2.

## Constructors

### Variables and Binding
- `mkVar : Int → Tm` — variable by de Bruijn index
- `mkLet : String → Tm → Tm → Tm → Tm` — `let name : type = val in body`
- `mkAnn : Tm → Tm → Tm` — type annotation `(term : type)`

### Functions (§2.2)
- `mkPi : String → Tm → Tm → Tm` — dependent function type `Π(name : domain). codomain`
- `mkLam : String → Tm → Tm → Tm` — lambda `λ(name : domain). body`
- `mkApp : Tm → Tm → Tm` — application `fn arg`

### Pairs (§2.3)
- `mkSigma : String → Tm → Tm → Tm` — dependent pair type `Σ(name : fst). snd`
- `mkPair : Tm → Tm → Tm` — pair constructor `(fst, snd)`
- `mkFst : Tm → Tm` — first projection
- `mkSnd : Tm → Tm` — second projection

### Inductive Types
- `mkNat`, `mkZero`, `mkSucc`, `mkNatElim` — natural numbers with eliminator
- `mkList`, `mkNil`, `mkCons`, `mkListElim` — lists with eliminator
- `mkUnit`, `mkTt` — unit type and value
- `mkSum`, `mkInl`, `mkInr`, `mkSumElim` — disjoint sum with eliminator
- `mkEq`, `mkRefl`, `mkJ` — identity type with J eliminator

### Universes
- `mkU : (Int | Tm) → Tm` — universe `U(level)`. Accepts either a
  concrete Int (wrapped via `mkLevelLit`) or a Level-typed Tm
  directly.
- `mkLevelLit : Int → Tm` — builds `suc^n zero` as a Level term.

### Axiomatized Primitives (§2.1)
- `mkString`, `mkInt`, `mkFloat`, `mkAttrs`, `mkPath`, `mkFunction`, `mkAny` — type formers
- `mkStringLit`, `mkIntLit`, `mkFloatLit`, `mkAttrsLit`, `mkPathLit`, `mkFnLit`, `mkAnyLit` — literal values


### Value


Values are the semantic domain produced by evaluation. They use
de Bruijn *levels* (counting outward from the top of the context),
not indices, which makes weakening trivial.

Spec reference: kernel-spec.md §3.

## Closures

`mkClosure : Env → Tm → Closure` — defunctionalized closure.
No Nix lambdas in the TCB; a closure is `{ env, body }` where
`body` is a kernel Tm evaluated by `eval.instantiate`.

## Value Constructors

Each `v*` constructor mirrors a term constructor:

- `vLam`, `vPi` — function values/types (carry name, domain, closure)
- `vSigma`, `vPair` — pair types/values
- `vNat`, `vZero`, `vSucc` — natural number values
- `vList`, `vNil`, `vCons` — list values
- `vUnit`, `vTt` — unit
- `vSum`, `vInl`, `vInr` — sum values
- `vEq`, `vRefl` — identity values
- `vU` — universe values
- `vString`, `vInt`, `vFloat`, `vAttrs`, `vPath`, `vFunction`, `vAny` — primitive types
- `vStringLit`, `vIntLit`, `vFloatLit`, `vAttrsLit`, `vPathLit`, `vFnLit`, `vAnyLit` — primitive literals

## Neutrals

`vNe : Level → Spine → Val` — a stuck computation: a variable
(identified by de Bruijn level) applied to a spine of eliminators.

`freshVar : Depth → Val` — neutral with empty spine at the given depth.
Used during type-checking to introduce fresh variables under binders.

## Elimination Frames (Spine Entries)

- `eApp`, `eFst`, `eSnd` — function/pair eliminators
- `eNatElim`, `eListElim`, `eSumElim`, `eJ` — inductive eliminators


### Verified


High-level combinators for writing kernel-checked implementations.
Write programs with these combinators, then call `v.verify` to
type-check and extract a Nix function that is correct by construction.

## Example

```nix
# Verified successor: Nat → Nat
v.verify (H.forall "x" H.nat (_: H.nat))
         (v.fn "x" H.nat (x: H.succ x))
# → Nix function: n → n + 1
```

## Literals

- `nat : Int → Hoas` — natural number literal (S^n(zero))
- `str : String → Hoas` — string literal
- `int_ : Int → Hoas` — integer literal
- `float_ : Float → Hoas` — float literal
- `true_`, `false_` — boolean literals
- `null_` — unit value (tt)

## Binding

- `fn : String → Hoas → (Hoas → Hoas) → Hoas` — lambda abstraction
- `let_ : String → Hoas → Hoas → (Hoas → Hoas) → Hoas` — let binding

## Data Operations

- `pair`, `fst`, `snd` — Σ-type construction and projection
- `field : Hoas → String → Hoas → Hoas` — record field projection by name
- `inl`, `inr` — Sum injection
- `app` — function application

## Eliminators (Constant Motive)

These auto-generate the motive `λ_.resultTy`, so you only supply
the result type and the branches:

- `if_ : Hoas → Hoas → { then_; else_; } → Hoas` — Bool elimination
- `match : Hoas → Hoas → { zero; succ : k → ih → Hoas; } → Hoas` — Nat elimination
- `matchList : Hoas → Hoas → Hoas → { nil; cons : h → t → ih → Hoas; } → Hoas` — List elimination
- `matchSum : Hoas → Hoas → Hoas → Hoas → { left; right; } → Hoas` — Sum elimination

## Derived Combinators

- `map : Hoas → Hoas → Hoas → Hoas → Hoas` — map f over a list
- `fold : Hoas → Hoas → Hoas → Hoas → Hoas → Hoas` — fold over a list
- `filter : Hoas → Hoas → Hoas → Hoas` — filter a list by predicate

## Pipeline

- `verify : Hoas → Hoas → NixValue` — type-check + eval + extract
- `verifiedFn : Hoas → Hoas → VerifiedValue` — callable value with
  `_hoasImpl` for full kernel body verification in parent types