# Kleisli.IO Documentation — Full Content # kli ## Getting Started ### Configuration ## What `kli init` Sets Up Running `kli init` in your project configures five things: 1. **MCP server** — Claude gets access to task management and pattern tools 2. **Hooks** — Automatic context injection at session start and during tool use 3. **Skills** — Domain knowledge that Claude loads when you invoke kli commands 4. **Commands** — Slash commands like `/kli:plan`, `/kli:implement`, `/kli:research` 5. **Agents** — Specialized sub-agents Claude spawns during workflows (reflector, curator, graph analyst, etc.) You don't interact with any of these directly. They are infrastructure that Claude uses behind the scenes when you run slash commands like `/kli:plan` or `/kli:implement`. ## Hooks kli installs Claude Code hooks that run automatically during sessions: | Hook | Event | What It Does | |------|-------|-------------| | session-start | SessionStart | Registers session, detects parallel sessions, injects git state and active task context | | tool-call | PostToolUse | Records tool usage events for behavioral fingerprinting and domain detection | | session-task-write | PostToolUse | Writes session files when tasks are claimed or released | | file-conflict | PostToolUse | Warns when you edit a file recently touched by another session | | playbook-activate | UserPromptSubmit | Detects programming domains from prompts and nudges pattern retrieval | | feedback-nudge | Stop | Reminds Claude to give feedback on activated patterns before stopping | | session-leave | SessionEnd | Cleans up session files and records session departure | See the [Hooks Reference](/kli/hooks) for detailed documentation on each hook. ## Skills Skills are domain knowledge documents that Claude loads on demand: | Skill | Loaded When You Run | What Claude Learns | |-------|-------------------|-------------------| | kli-research | `/kli:research` | How to explore codebases, select agents, define exit criteria | | kli-planning | `/kli:plan` | How to design phases, define success criteria, reuse research | | kli-implementation | `/kli:implement` | TDD methodology, verification gates, design principles | | kli-reflection | `/kli:reflect` | Pattern extraction, feedback loops | | kli-workflow | All commands | Phase transitions, artifact flow between commands | ## MCP Server kli runs one MCP server alongside Claude Code. It gives Claude access to 31 tools for task management, pattern learning, and session coordination — you never call these tools yourself. It handles task creation, observations, graph queries, conflict detection between parallel sessions, pattern search, feedback scoring, and session fingerprinting. The server starts automatically and shuts down after idle timeout. ## CLAUDE.md `kli init` adds a small section to your project's `CLAUDE.md` that tells Claude the task MCP server is available. You can customize this if needed, but the defaults work for most projects. ### Installation ## Prerequisites - **Claude Code** — Anthropic's CLI for Claude ([install guide](https://docs.anthropic.com/en/docs/claude-code)) Nix is **not required** to use kli. Pre-built releases are pulled from GitHub automatically. ## Install ```bash curl -fsSL https://kli.kleisli.io/install | sh ``` This downloads the kli binary and configures your Claude Code environment with the necessary MCP servers, hooks, and skills. ## Initialize a Project In any project directory: ```bash kli init ``` This sets up the kli plugin for the current project, adding the task MCP server configuration and Claude Code hooks. ## Verify Installation Start Claude Code in your project and type: ``` /kli:research ``` If kli is installed correctly, Claude will begin the research workflow. You can also check available commands with `/help` in Claude Code. ## Optional: Build from Source If you use Nix and want to build kli from source: ```bash nix build github:kleisli-io/kli ``` This is only needed for development or deployment — not for using kli as a plugin. ### Quick Start kli extends Claude Code with structured workflows for software engineering. You interact with kli through **slash commands** in Claude Code — Claude handles the task management, pattern matching, and coordination behind the scenes. ## The kli Workflow A typical kli session follows four phases: ### 1. Research Explore a codebase or problem space before making changes: ``` /kli:research ``` Claude investigates the codebase, spawns specialized agents (codebase explorers, web researchers), and produces a `research.md` artifact summarizing findings. ### 2. Plan Create a phased implementation plan: ``` /kli:plan ``` Claude reads the research artifact, designs an implementation strategy broken into phases, and presents it for your approval. Each phase has explicit success criteria and verification steps. ### 3. Implement Execute the plan phase by phase: ``` /kli:implement ``` Claude works through each phase using TDD (Red → Green → Refactor), runs automated verification after each phase, and requests your manual approval before proceeding. ### 4. Reflect After completing work, capture lessons learned: ``` /kli:reflect ``` Claude analyzes the session's observations and outcomes, extracting patterns that improve future workflows. ## Other Useful Commands | Command | Purpose | |---------|---------| | `/kli:handoff` | Create a handoff document when transferring work to a new session | | `/kli:resume_handoff` | Resume work from a previous handoff | | `/kli:create-task` | Create an event-sourced task for tracking complex work | | `/kli:validate` | Verify implementation against plan criteria | ## What's Happening Behind the Scenes When you use kli commands, Claude uses MCP (Model Context Protocol) tools to manage tasks, record observations, and coordinate agents. You don't need to interact with these tools directly — the slash commands handle everything. For details on what each command does internally, see the [Command Reference](/kli/commands/plan), [Workflow Reference](/kli/workflows/planning), and [Agent Reference](/kli/agents/reflector) sections. ## Using kli ### Understanding Patterns The playbook is kli's long-term memory — a collection of patterns learned from previous work. Patterns capture what worked, what didn't, and how to approach specific types of problems. Claude consults the playbook automatically when starting tasks and updates it after reflecting on completed work. ## What Patterns Look Like A pattern is a short, actionable piece of guidance tagged with a domain and scored by how often it has been helpful or harmful. For example: ``` [lisp-000042] helpful=5 harmful=0 :: When editing defstruct forms, always reload dependents — SBCL doesn't propagate slot changes to compiled callers. ``` Patterns are prescriptive ("do X when Y") rather than descriptive ("X exists"). They capture the kind of knowledge that saves time on the second encounter. ## How Patterns Emerge 1. **During work**, Claude records observations in the task event log — things it discovers, constraints it hits, approaches that succeed or fail 2. **During reflection** (`/kli:reflect`), Claude reviews those observations and promotes the transferable ones to patterns. Not every observation qualifies — only insights that would help in future, unrelated tasks 3. **Over time**, patterns accumulate feedback. When a pattern helps Claude complete a task successfully, it gets a helpful vote. When it leads astray, it gets a harmful vote. High-scoring patterns surface more readily; low-scoring ones fade ## The Litmus Test Not every observation becomes a pattern. To be promoted, an insight must pass all three criteria: - **Transferable** — Useful beyond the original task - **Actionable** — Provides specific guidance, not just information - **Prescriptive** — Says what to do (or avoid), not just what exists System-specific facts stay as observations on the task. Only insights that would help on a different project in a different context become patterns. ## How Patterns Help You When Claude starts working on a task, it queries the playbook for patterns relevant to the current domain and problem. This happens automatically — you'll see Claude reference activated patterns in its reasoning. The effect is cumulative: - **First time** working in a new area, Claude relies on general knowledge - **After a few tasks**, patterns from your specific codebase and conventions start activating - **Over many sessions**, Claude develops a working knowledge of your project's idioms, pitfalls, and proven approaches ## Triggering Reflection Run `/kli:reflect` after completing a piece of work. Claude will: 1. Review the session's observations 2. Identify insights that pass the litmus test 3. Create or update patterns in the playbook 4. Report what was learned Reflection is most valuable after tasks that involved debugging, discovering non-obvious constraints, or finding approaches that worked better than expected. ## Domains Patterns are tagged with domains like `lisp`, `nix`, `web`, or `ops`. Domain tags help Claude activate the right patterns — when you're working on Nix code, Nix patterns surface; when you're working on Lisp, Lisp patterns surface. The playbook-activate hook detects domains from your prompts and triggers pattern retrieval automatically. ## Pattern Lifecycle The full lifecycle of a pattern: 1. **Discovery** — An insight surfaces during implementation 2. **Observation** — Claude records it in the task's event stream 3. **Promotion** — During `/kli:reflect`, observations that pass the litmus test become patterns 4. **Activation** — Retrieved via semantic search when Claude starts relevant new work 5. **Feedback** — Marked helpful or harmful based on application outcomes 6. **Evolution** — Content updated based on accumulated evidence Patterns are never deleted. If harmful votes exceed helpful votes, a pattern is deprioritized rather than removed — preserving the record of what didn't work. ## Background kli's workflow draws on two bodies of work. The research → plan → implement structure comes from Dex Horthy's [advanced context engineering](https://github.com/humanlayer/12-factor-agents) methodology at HumanLayer, which established that dividing AI coding work into sequential phases — each producing a compacted artifact as input for the next — dramatically improves output quality in large codebases. kli extends this with a fourth phase, **reflect**, which closes the feedback loop by promoting observations into reusable patterns. The playbook concept itself is adapted from the [Agentic Context Engineering](https://arxiv.org/abs/2510.04618) paper (Stanford, SambaNova, UC Berkeley, 2025), which established the methodology of agents writing observations between phases of work. kli was first used with this methodology in October 2025 on a production project, where a file-based observation system accumulated 230 tasks and 117 handoff documents before hitting scalability limits. kli's playbook system extends the original methodology with event-sourced task state (CRDT-based merging for safe parallel sessions), helpful/harmful scoring that lets patterns fade rather than requiring manual curation, and hybrid retrieval combining semantic search with spreading activation over a co-application graph. ### Workflow Overview kli structures work into four phases: **research**, **plan**, **implement**, and **reflect**. You move through these phases by typing slash commands in Claude Code. Each phase produces artifacts that feed the next. ## Research ``` /kli:research ``` Claude explores the codebase, reads files, spawns sub-agents for deeper investigation, and writes a `research.md` document summarizing what it found. You guide the research by describing what you want to understand — Claude handles the file reading, code tracing, and documentation. Research is iterative. Claude proposes findings, you correct or redirect, and Claude refines. The output is a markdown artifact that captures the current state of the codebase relevant to your task. Use research when you're starting something unfamiliar, investigating a bug, or need to understand existing code before making changes. ## Plan ``` /kli:plan ``` Claude reads the research artifact and designs a phased implementation plan. Each phase has: - A description of what to build or change - Success criteria (automated checks and manual verification) - Dependencies on other phases The plan is presented for your approval before any code is written. You can ask Claude to revise phases, reorder work, add or remove steps. Once approved, the plan becomes a DAG of phase tasks in the task graph. If requirements change mid-implementation, use `/kli:iterate_plan` to revise the plan while preserving completed work. ## Implement ``` /kli:implement ``` Claude works through the plan phase by phase. For each phase: 1. Claude reads the phase description and success criteria 2. Writes code following a test-first approach where applicable 3. Runs automated verification (builds, tests, linting) 4. Presents the results and asks for your manual approval before moving to the next phase You stay in control throughout — Claude won't proceed to the next phase without your sign-off. If something needs adjustment, you direct Claude to fix it before approving. ## Reflect ``` /kli:reflect ``` After completing work, Claude reviews the session's observations and extracts reusable patterns. These patterns enter the playbook — a knowledge base that improves future sessions. See [Understanding Patterns](/kli/using-kli/understanding-patterns) for how this works. Reflection is optional but valuable. The more you reflect, the better Claude becomes at tasks in your codebase. ## Supporting Commands These commands support the main workflow: | Command | When to Use | |---------|-------------| | `/kli:create-task` | Start tracking a piece of work before researching it | | `/kli:resume-task` | Pick up where you left off on an existing task | | `/kli:handoff` | Save context when you need to continue in a new session | | `/kli:resume_handoff` | Resume work from a saved handoff document | | `/kli:validate` | Check implementation against plan criteria after implementing | | `/kli:commit` | Create a git commit with context-aware message generation | ## Skipping Phases The four phases are a guide, not a requirement. For small changes, you might skip research and go straight to planning. For exploratory work, you might research without ever planning. Use what fits the task. ### Working with Tasks Tasks are how kli tracks work across sessions. A task is a directory with an append-only event log that records everything Claude does — observations, decisions, artifacts, and state changes. Tasks persist between sessions, so you can stop and resume without losing context. ## Creating a Task ``` /kli:create-task ``` Claude asks what you're working on, creates the task directory, and sets it as the active task for the session. From this point, observations and artifacts are recorded against the task. You don't always need to create a task explicitly. Running `/kli:plan` on an existing task or `/kli:research` will create or attach to tasks as needed. ## Resuming a Task ``` /kli:resume-task ``` Claude finds your active and recent tasks, lets you pick one, and loads its full context — observations, artifacts, graph neighbors, and any handoffs from previous sessions. ## Task Lifecycle Tasks progress through three states: 1. **Created** — Initial state with a birth certificate describing the work 2. **Active** — Claude is working on the task, recording observations and artifacts 3. **Completed** — All work finished; the task rejects further mutations ## Phases When you run `/kli:plan`, Claude breaks work into phases. Each phase is itself a task, linked to the parent plan with `phase-of` edges. Phases can depend on each other — Claude tracks these dependencies and only works on phases whose dependencies are complete. During `/kli:implement`, Claude queries the plan to find the next ready phase, works on it, marks it complete, and moves to the next. You see this as a sequence of implementation steps with approval gates between them. ## The Task Graph Tasks form a directed acyclic graph (DAG). Edges between tasks carry meaning: | Edge | Meaning | |------|---------| | `phase-of` | Subtask/phase of a parent plan | | `depends-on` | Must complete before this task starts | | `related-to` | Informational relationship | | `references` | Links to research or prior work | | `same-day` | Automatically linked tasks from the same day | | `topic` | Semantically similar tasks | Claude uses the graph to find ready phases, track dependencies, detect related work from previous sessions, and coordinate parallel sessions. You don't interact with the graph directly — Claude handles all queries and mutations through the task MCP server. ## Observations As Claude works, it records observations — discoveries, constraints, decisions, and outcomes. These are timestamped entries in the task's event log. Observations serve two purposes: 1. **Session context** — When you resume a task, Claude replays observations to understand what happened previously 2. **Pattern source** — During `/kli:reflect`, observations that are transferable and actionable get promoted to playbook patterns You can direct Claude to record specific observations, but it also records them naturally during research and implementation. ## The Event Log Every task mutation is recorded as an immutable event in `events.jsonl`. The current state is computed by replaying events — there is no mutable database. This gives you full history of what happened during a task. Claude records these event types as it works: ``` session.join — Claude started working on a task session.claim — Claude took exclusive ownership for conflict-sensitive work observation — Knowledge captured during work task.complete — Task marked as finished task.reopen — Completed task reopened metadata.set — Key-value metadata updated edge.add — Graph edge created edge.remove — Graph edge severed handoff.create — Handoff document generated ``` ## Handoffs When you need to continue work in a new Claude Code session: ``` /kli:handoff ``` Claude writes a handoff document summarizing the current state — what's done, what's in progress, key learnings, and recommended next steps. The handoff is stored in the task directory. To resume: ``` /kli:resume_handoff ``` Claude reads the handoff, verifies the current state matches expectations, and presents an action plan for continuing. ## Parallel Sessions Multiple Claude Code sessions can work on related tasks simultaneously. The task system handles this through: - **Session tracking** — Each session registers when it joins a task - **File conflict detection** — A hook warns when you edit a file recently touched by another session - **CRDT merging** — Events from different sessions merge automatically using conflict-free replicated data type semantics: observations append (no conflicts possible), metadata uses last-writer-wins per key, edges are add/remove sets, and status uses max-progress ordering The session-start hook shows you when parallel sessions are active so you're aware of concurrent work. Event sourcing means you always have a full audit trail — if something went wrong, the event log shows exactly what happened and when. ## Command Reference ### Create Task > Scaffold a task with event-sourced tracking via the task MCP server Scaffold an event-sourced task with the task MCP server. This is for quick, focused work that doesn't need the full KLI research/plan/implement cycle. ## When to Use This - Quick bug fixes, small features, one-off investigations - Work you want tracked but don't need formal phases for - Creating parent tasks to organize subtasks later - Linking new work to existing tasks in the graph ## Process ### Step 1: Parse Arguments Parse $ARGUMENTS to determine intent: | Pattern | Mode | Behavior | |---------|------|----------| | No args | Interactive | Ask what the task is about | | Short name (1-4 words) | Direct | Use as task name, infer description | | Sentence/description | Infer | Extract a kebab-case name, use input as description | **Name convention**: kebab-case, descriptive, no date prefix (auto-added by the server as `YYYY-MM-DD-`). Names are validated for descriptiveness. The server rejects meaningless names. **Good names** (pass validation): - `fix-login-redirect` - verb + object - `add-retry-logic-to-api-client` - descriptive action - `research-caching-strategies` - clear intent **Bad names** (rejected): - `P1`, `P2` - letter+number only - `phase-1` - no semantic content after prefix - `stuff`, `misc`, `wip` - vague words - `foo`, `bar` - too short If no arguments provided, ask: ``` What are you working on? Describe the task briefly - I'll create a tracked task for it. ``` ### Step 2: Check Context Before creating, gather context: 1. **Check for active task**: Call `task_get()` (no args) to see if a current task is already set 2. **If active task exists**: Ask whether this new task should be: - A **subtask** (phase-of the current task) - A **related** task (linked but independent) - A **standalone** task (no connection) This avoids orphaned tasks and keeps the graph connected. ### Step 3: Create the Task Based on Step 2: **Standalone or no parent context:** ``` Call task_create(name="", description="") ``` **Subtask of existing task:** ``` Call task_fork(name="", from="", edge_type="phase-of", description="") Then call task_set_current(task_id="") to switch context ``` **Related to existing task:** ``` Call task_create(name="", description="") Then call task_link(target_id="", edge_type="related-to") ``` ### Step 4: Set Metadata Set useful metadata on the new task: ``` Call task_set_metadata(key="tags", value="") ``` Infer tags from the description. Common tags: `bugfix`, `feature`, `refactor`, `investigation`, `infrastructure`, `nix`, `lisp`, `mcp`, `dashboard`, `shell`. If the task has a clear scope, also set: ``` Call task_set_metadata(key="scope", value="") ``` ### Step 5: Record Initial Observation Record context that will be useful when reviewing this task later: ``` Call observe(text="Created via /create-task. ") ``` Include relevant context like: - What triggered this task (error message, user request, discovery during other work) - Key files or components likely involved - Any constraints or decisions already made ### Step 6: Report Present the created task: ``` Task created: Description: Tags: Parent: Link: The task is now your active context. All observations, artifacts, and metadata will be tracked in the event stream. When done, mark complete with task_complete() or hand off with /kli:handoff. ``` ## Error Handling | Error | Response | |-------|----------| | Task MCP unavailable | "Task MCP server not responding. Have you run `kli init` in this project?" | | Name too vague | Ask for a more descriptive name | | Duplicate name | The date prefix usually prevents this, but if it happens, suggest appending a disambiguator | | Parent task not found | List available tasks with `task_list()` and let user pick | ## Guidelines - Task names should be self-documenting: someone reading the task list should understand what each task is about - Don't over-tag: 2-4 tags is plenty - Always check for an active parent before creating standalone tasks - The initial observation is important: it's the first thing someone sees when bootstrapping the task later - This command does NOT enter plan mode or create research documents. It creates a tracked task and sets context. The user decides what to do next. ### Handoff > Create handoff document for transferring work to another session You are tasked with writing a handoff document to hand off your work to another agent in a new session. You will create a handoff document that is thorough, but also **concise**. The goal is to compact and summarize your context without losing any of the key details of what you're working on. ## Process ### 1. Generate Scaffold via MCP Tool **Call the handoff MCP tool** to generate the path, create the directory, emit the `:handoff.create` event, and write minimal placeholder content: ``` mcp__task__handoff(summary="brief-description-of-handoff") ``` This returns structured metadata: - `path`: the full handoff file path (already created with minimal content) - `task`: the current task ID - `task_dir`: the task directory path (use for playbook-export-state) - `timestamp`: ISO 8601 timestamp - `session`: session ID The MCP tool requires a current task. If no task is set, it will error - use `task_bootstrap` or `task_create` first. **Parse the returned path and task_dir** - you will overwrite the file with rich content. ### 2. Write Handoff Document Use the Write tool to write rich content to the path returned by the MCP tool. Use the following template structure with YAML frontmatter: **Important**: Before writing the handoff document you need to read the handoff document that was auto-generated by using the mcp__task__handoff tool so as to avoid getting a Write error. ```markdown --- date: [ISO 8601 timestamp from MCP response] timestamp: [YYYY-MM-DD] git_branch: [from git] git_commit: [from git] repository: [repository name] task: "YYYY-MM-DD-description" type: handoff status: active --- # Handoff: [Task Name] - [Brief Description] **Created**: [ISO timestamp] **Task Directory**: `[task_dir from MCP response]` ## Task(s) [Description of the task(s) that you were working on, along with the status of each (completed, work in progress, planned/discussed). If working on an implementation plan, call out which phase you are on.] ## Critical References [List 2-3 most important file paths that must be consulted:] - `[task_dir]/research.md` - [Brief description] - `[task_dir]/plan.md` - [Brief description] - [Other critical files] ## Recent Changes [Describe recent changes made to the codebase in file:line syntax:] - `path/to/file.ext:line` - [Description of change] - `another/file.ext:line-range` - [Description of change] ## Learnings [Describe important things learned - patterns, root causes, key information for next session:] - [Learning 1] - Evidence at `file.ext:line` - [Learning 2] - Pattern found in `file.ext:line` - [Important discovery with specific references] ## Artifacts [Exhaustive list of artifacts produced or updated as filepaths/file:line references:] - `[task_dir]/research.md` - [What it contains] - `[task_dir]/plan.md` - [Current phase status] - `.claude/commands/newcommand.md:1-50` - [What was created] ## Task Graph State [If task has phases, capture current graph state:] - **Current Phase**: [from task_graph(query="plan")] - **Completed Phases**: [list of completed phases] - **Pending Phases**: [list of pending/active phases] - **Blocked By**: [any blocking tasks] - **Related Tasks**: [any related-to edges] [For complex tasks with many phases, optionally spawn graph-analyst to capture comprehensive state:] ``` Task( subagent_type="graph-analyst", prompt='{"question": "What is the complete state of task ? Include all phases, their status, and any related tasks."}', description="Capture task graph state for handoff" ) ``` ## Action Items & Next Steps [List of action items for next agent to accomplish:] 1. [Next action based on current state] 2. [Following priority action] 3. [Additional tasks identified] ## Other Notes [Other notes, references, useful information:] - [Relevant codebase sections] - [Related documentation] - [Important context not captured above] ``` ### 3. Present to User After creating the document, respond: ``` Handoff created at: [path from MCP tool] To commit this handoff: git add [task_dir]/handoffs/ git commit -m "docs: add handoff for [description]" To resume from this handoff in a new session: /kli:resume_handoff [path from MCP tool] ``` ## Important Guidelines - **Be thorough and precise**: Include both top-level objectives and lower-level details - **More information, not less**: This defines minimum content - add more if needed - **Avoid excessive code snippets**: Prefer file:line references over large blocks - **Cross-reference KLI artifacts**: Always link to research.md and plan.md if they exist - **Specific file references**: Use `file.ext:line` format consistently - **Concise but complete**: Compact context without losing key details ### Implement > Implement approved plan phase-by-phase with TDD workflow and verification gates Execute phased implementation plans using TDD workflow (Red → Green → Refactor), running verification gates (automated + manual), and marking phases complete via `task_complete()` only after all verification passes. **The kli-implementation skill provides comprehensive guidance on:** - TDD methodology (Red → Green → Refactor cycle with discipline) - Design principles (Extensibility, Composability, Parametricity with examples) - Zero TODOs policy enforcement - Verification gate requirements (automated before manual, both blocking) - Deviation handling patterns ## Initial Setup **Set up task context:** - If task name provided: `task_bootstrap(task_id)` - If no parameter: Call `task_get()` to check current task. If none, ask "Which plan would you like to implement?" **Load plan structure:** ``` task_query("(query \"plan\")") → Phase structure: phases, status, dependencies task_query("(query \"plan-ready\")") → Which phases are ready to work on ``` **Check plan status:** - If all phases completed: "Plan already complete" - If phases remain: Identify next phase from plan-frontier (phases are ranked by affinity score — higher affinity = better next candidate) **If plan.md artifact exists:** Read it for detailed success criteria and verification commands. **Present initial response** with plan overview, phase count, resume point (first ready phase). ## Implementation Process ### Step 0: Load Context **0a. Activate playbook patterns** (REQUIRED): ```lisp pq_query('(-> (activate "" :boost ( )) (:take 5))') ``` **0b. Load relevant skills**: Determine what kind of implementation this is. Load any domain-specific skills relevant to the task (e.g., design skills for UI work, language-specific skills for specialized domains). ### Step 1: Execute Phases (Loop Until All Complete) Use `task_query("(query \"plan-ready\")")` to find the next ready phase. For each phase: 1. **Switch to phase task and announce start**: ``` task_bootstrap("phase-N-") → Get phase description with changes required and success criteria ``` Show phase name, overview, changes required, success criteria. 2. **Record phase start**: `observe("Starting phase N: ")` 3. **Search Playbook Patterns**: ALWAYS search using `pq_query('(-> (search "") (:take 5))')` and `pq_query('(-> (proven :min 3) (:take 10))')`. 4. **Read Referenced Files FULLY**: Use Read without limit/offset for all mentioned files. 5. **TDD Red - Write Failing Tests**: Create tests that fail for correct reason. ``` observe("TDD Red: Tests written for . Failure reason: ") ``` 6. **TDD Green - Implement to Pass**: Implement minimum code to make tests pass. ``` observe("TDD Green: Implementation complete. Tests passing.") ``` 7. **TDD Refactor - Improve While Green**: Apply design principles (Extensibility, Composability, Parametricity). Run tests after EACH change. ``` observe("TDD Refactor: Applied . Tests still green.") ``` 8. **Run Automated Verification**: Execute ALL checks from phase description (build, tests, TODO check). If ANY fail, fix immediately and re-run. 9. **Request Manual Verification**: Present to user with automated status, manual checklist, files changed, testing instructions. ALWAYS give evidence for how you have tested that the implementation works. Wait for "approved" or feedback. If issues found: Fix, re-run automated verification, request again. 10. **Mark Phase Complete**: ``` observe("Phase N complete. Verification passed. Key outcomes: ") task_complete() # Marks this phase task as completed ``` 11. **Give Pattern Feedback** (per phase): ```lisp pq_query('(-> (pattern "") (:feedback! :helpful ""))') pq_query('(-> (pattern "") (:feedback! :harmful ""))') ``` 12. **Return to parent and continue**: ``` task_set_current(parent_task_id) task_query("(query \"plan-ready\")") → Find next ready phase ``` 13. **Handle Deviations**: If code differs from plan, PAUSE and inform user with options. Wait for decision. Record via `observe()`. ### Step 2: All Phases Complete ``` task_set_current(parent_task_id) observe("Implementation complete. All N phases done. Key challenges: ") task_set_metadata(key="phase", value="complete") ``` ### Step 3: Final Review 1. **Verify all applied patterns have feedback** — give `helpful` or `harmful` for every activated pattern 2. **Record novel insights as observations** — workarounds, anti-patterns, techniques used 2+ times: ``` observe("Implementation insight: . Evidence: ") ``` 3. **Do NOT use `(add! ...)`** — pattern promotion goes through `/kli:reflect` (Reflector → Curator) Present completion summary with next steps (`/kli:validate`, `/core:commit`, `/kli:reflect`). ## Resuming Implementation 1. `task_bootstrap(parent_task_id)` — restores context with plan progress 2. `task_query("(query \"plan\")")` — see all phases with completion status 3. `task_query("(query \"plan-ready\")")` — find next ready phase 4. Continue from Step 1 at the next incomplete phase ## Remember - Follow **TDD discipline** from kli-implementation skill (Red → Green → Refactor) - Apply **design principles** from kli-implementation skill - Run ALL automated verification before requesting manual verification - **`task_complete()`** marks phase done — replaces ✓ checkmarks in plan.md - **`task_query("(query \"plan-ready\")")`** finds next phase — replaces ✓ parsing - **`observe()`** records progress — observations flow through the task event stream - Wait for manual approval before next phase - NEVER introduce TODOs (zero TODOs policy) - Read files FULLY before modifying - Search playbook patterns for EACH phase - **Give feedback per phase** on patterns applied - One phase at a time - complete ALL verification before proceeding - ALWAYS give proof that a phase was implemented successfully ## See Also - CLAUDE.md - Task model, PQ/TQ reference, playbook workflow ### Iterate Plan > Iterate on existing implementation plans with thorough research and updates You are tasked with updating existing implementation plans based on user feedback. You should be skeptical, thorough, and ensure changes are grounded in actual codebase reality. ## Initial Response When this command is invoked: 1. **Set up task context**: - If task name or path provided: `task_bootstrap(task_id)` - If no parameter: Call `task_get()` to check current task. If none, ask user. - Use `task_graph(query="plan")` to see the current plan structure (phases, status, dependencies) 2. **Handle different input scenarios**: **If NO task/plan identified**: ``` I'll help you iterate on an existing plan. Which task's plan would you like to update? Provide the task name or use task_list() to find it. ``` Wait for user input. **If task identified but NO feedback**: ``` I've found the plan. Current structure: [output of task_graph(query="plan")] What changes would you like to make? For example: - "Add a phase for migration handling" - "Update the success criteria to include performance tests" - "Adjust the scope to exclude feature X" - "Split Phase 2 into two separate phases" ``` Wait for user input. **If BOTH task AND feedback provided**: - Proceed immediately to Step 1 - No preliminary questions needed ## Process Steps ### Step 1: Understand Current Plan 1. **Load plan structure from task DAG**: - `task_graph(query="plan")` — shows phases, status, dependencies - `task_get()` — shows description, goals, observations, metadata - If plan.md exists as artifact, read it for detailed criteria 2. **Understand the requested changes**: - Parse what the user wants to add/modify/remove - Identify if changes require codebase research - Determine scope of the update ### Step 2: Research If Needed **Only spawn research tasks if the changes require new technical understanding.** If the user's feedback requires understanding new code patterns or validating assumptions: 1. **Record iteration intent**: `observe("Plan iteration: ")` 2. **Spawn parallel sub-tasks for research**: Use the right agent for each type of research: **For code investigation:** - **codebase-locator** - To find relevant files - **codebase-analyzer** - To understand implementation details - **pattern-finder** - To find similar patterns **For historical context (use PQ queries):** - `pq_query('(-> (search "") (:take 5))')` - Find patterns - `pq_query('(-> (proven :min 3) (:take 10))')` - Get proven patterns (helpful >= 3) **Be EXTREMELY specific about directories**: - Include full path context in prompts - Specify exact directories to search 3. **Read any new files identified by research**: - Read them FULLY into the main context - Cross-reference with the plan requirements 4. **Wait for ALL sub-tasks to complete** before proceeding ### Step 3: Present Understanding and Approach Before making changes, confirm your understanding: ``` Based on your feedback, I understand you want to: - [Change 1 with specific detail] - [Change 2 with specific detail] My research found: - [Relevant code pattern or constraint] - [Important discovery that affects the change] I plan to update the plan by: 1. [Specific modification to make] 2. [Another modification] Does this align with your intent? ``` Get user confirmation before proceeding. ### Step 4: Update the Plan Plans are task DAGs. Update the plan structure using task MCP tools: 1. **Modify the DAG as needed**: - **Add phases** (preferred): Use `scaffold-plan!` for multiple phases with dependencies: ``` task_query("(scaffold-plan! (new-phase \"Implement new feature\" :after existing-phase) (follow-up \"Integration tests\" :after new-phase))") ``` - **Add single phase**: `task_fork(name="implement-new-feature", from=parent_task_id, edge_type="phase-of", description="...")` + add dependency edges with `task_link`. Names are validated for descriptiveness (avoid `P1`, `phase-1`, etc.) - **Update phase description**: Switch to phase task with `task_set_current`, then `observe("Updated scope: ")`, switch back - **Reorder phases**: Adjust `depends-on` edges with `task_link` / `task_sever` - **Remove phase(s)**: Use TQ bulk sever for efficiency, then record the decision: ```lisp ;; Single phase removal task_query("(-> (node \"obsolete-phase\") (:sever-from-parent! :phase-of))") ;; Multiple phases at once (replaces multiple task_sever calls) task_query("(-> (node \"phase-1\" \"phase-2\" \"phase-3\") (:sever-from-parent! :phase-of))") ``` Then: `observe("Phases removed: . Reason: ")` 2. **If plan.md artifact exists**, update it to match the DAG changes: - Use the Edit tool for surgical changes - Keep all file:line references accurate - Update success criteria if needed 3. **Ensure consistency**: - Verify with `task_graph(query="plan")` after changes - Maintain the distinction between automated vs manual success criteria - Include specific file paths for new content 4. **Record the iteration**: `observe("Plan iteration complete: ")` ### Step 5: Review and Complete 1. **Present the changes made**: ``` I've updated the plan for task [task-name]. Changes made: - [Specific change 1] - [Specific change 2] The updated plan now: - [Key improvement] - [Another improvement] Would you like any further adjustments? ``` 2. **Be ready to iterate further** based on feedback ## Important Guidelines 1. **Be Skeptical**: - Don't blindly accept change requests that seem problematic - Question vague feedback - ask for clarification - Verify technical feasibility with code research - Point out potential conflicts with existing plan phases 2. **Be Surgical**: - Make precise edits, not wholesale rewrites - Preserve good content that doesn't need changing - Only research what's necessary for the specific changes - Don't over-engineer the updates 3. **Be Thorough**: - Read the entire existing plan before making changes - Research code patterns if changes require new technical understanding - Ensure updated sections maintain quality standards - Verify success criteria are still measurable 4. **Be Interactive**: - Confirm understanding before making changes - Show what you plan to change before doing it - Allow course corrections - Don't disappear into research without communicating 5. **Track Progress**: - Use `observe()` to record iteration decisions and progress - Verify plan DAG with `task_graph(query="plan")` after changes 6. **No Open Questions**: - If the requested change raises questions, ASK - Research or get clarification immediately - Do NOT update the plan with unresolved questions - Every change must be complete and actionable ## Success Criteria Guidelines When updating success criteria, always maintain the two-category structure: 1. **Automated Verification** (can be run by execution agents): - Commands that can be run: `make test`, `npm run lint`, `pytest`, `cargo test`, etc. - Use your project's existing build/test commands - Specific files that should exist - Code compilation/type checking 2. **Manual Verification** (requires human testing): - UI/UX functionality - Performance under real conditions - Edge cases that are hard to automate - User acceptance criteria ## Sub-task Spawning Best Practices When spawning research sub-tasks: 1. **Only spawn if truly needed** - don't research for simple changes 2. **Spawn multiple tasks in parallel** for efficiency 3. **Each task should be focused** on a specific area 4. **Provide detailed instructions** including: - Exactly what to search for - Which directories to focus on - What information to extract - Expected output format 5. **Request specific file:line references** in responses 6. **Wait for all tasks to complete** before synthesizing 7. **Verify sub-task results** - if something seems off, spawn follow-up tasks ## Example Interaction Flows **Scenario 1: User provides everything upfront** ``` User: /iterate_plan 2025-10-16-feature - add phase for error handling Assistant: [Reads plan, researches error handling patterns if needed, updates plan] ``` **Scenario 2: User provides just task name** ``` User: /iterate_plan 2025-10-16-feature Assistant: I've found the plan. What changes would you like to make? User: Split Phase 2 into two phases - one for backend, one for frontend Assistant: [Proceeds with update] ``` **Scenario 3: User provides no arguments** ``` User: /iterate_plan Assistant: Which task's plan would you like to update? Provide the task name or use task_list() to find it. User: 2025-10-16-feature Assistant: I've found the plan. What changes would you like to make? User: Add more specific success criteria Assistant: [Proceeds with update] ``` ### Plan > Create detailed implementation plans through iterative planning with artifact reuse Create detailed, phased implementation plans as task DAGs following KLI methodology. **The kli-planning skill provides comprehensive guidance on:** - Research artifact reuse patterns - Phase design principles (incremental, testable, clear boundaries, 3-7 phases optimal) - Clarifying question templates (structured with context and options) - Verification gate patterns (automated + manual, both required) - Out-of-scope definition strategies - Phase boundary specification - Success criteria definition (automated + manual) ## Initial Setup **When invoked without arguments:** ``` What would you like to create a plan for? Examples: - "Add WebSocket support to the API" - "Refactor the CSS build pipeline" - "Based on the research task, plan the migration" What would you like to plan? ``` Wait for user input. **When invoked with arguments:** Goal is `$ARGUMENTS`. Proceed to planning. ### Task Setup Call `task_get()` to check if there's already a current task. If so, use it. If not, create one: ``` task_create(name="plan-") task_set_metadata(key="goals", value='["Create phased implementation plan for "]') task_set_metadata(key="phase", value="planning") ``` Then call `task_get()` to retrieve the full task state — check for existing artifacts (research.md) and observations. ### Check Existing State **If the task already has phase children** (check `task_query("(query \"plan\")")`): ``` Found existing plan with N phases (M complete, K pending). Options: 1. Iterate on plan (modify phases) 2. Start fresh (create new task) Which would you prefer? ``` **If `task_get()` shows a `research.md` artifact:** - Read it fully — findings become foundations for the plan - Token savings: Reusing research.md saves 40-50% tokens vs spawning duplicate sub-agents ## Planning Process ### Step 1: Handle Research Artifact (If Exists) Read research.md FULLY (no limit/offset). Extract summary, findings, code references, playbook patterns, open questions. Present to user. If no research: Gather current codebase state by spawning codebase-locator/analyzer as needed. ### Step 2: Activate Playbook Patterns (REQUIRED) **Before planning**, activate relevant patterns: ```lisp pq_query('(-> (activate "" :boost ( )) (:take 5))') ``` This uses graph-based retrieval to find patterns for implementation approach, phasing strategies, and verification patterns. The activation is persisted for handoff continuity. ### Step 2.5: Discover Related Prior Work (Optional) Spawn graph-analyst to find relevant prior tasks: ``` Task( subagent_type="graph-analyst", prompt='{"question": "What prior tasks relate to ? Are there patterns or learnings I should consider?"}', description="Find related prior work" ) ``` This surfaces: - Similar tasks that succeeded or failed - Patterns that were helpful or harmful for similar work - Potential dependencies or conflicts with existing work **When to use:** If the planning goal involves work that may have been attempted before or relates to existing infrastructure. **When to skip:** If this is clearly novel work with no prior history (e.g., integrating a brand new library). ### Step 3: Decompose into Phases Break work into incremental phases with clear boundaries. Each phase independently testable. **Phase design guidance:** See kli-planning skill for: - Optimal phase count (3-7 phases) - Phase boundary criteria - Incremental delivery patterns - Dependency management ### Step 4: Define Success Criteria For each phase, specify automated verification (build, tests, TODO check) and manual verification (UI/UX, performance, acceptance). ### Step 5: Ask Clarifying Questions List uncertainties requiring user input. For each, provide context and concrete options. Wait for responses. Update plan based on answers. **When to iterate:** See kli-planning skill for exit criteria vs continue criteria. ### Step 6: Define Out-of-Scope Explicitly list what's NOT being done to prevent scope creep. ### Step 7: Create Plan as Task DAG Plans are task DAGs, not markdown files. Use TQ's `scaffold-plan!` for efficient creation. **Present the plan outline to user for approval first.** Then create the DAG: **Option 1: scaffold-plan! (Recommended for plans with dependencies)** ``` task_query("(scaffold-plan! (implement-core-library \"Core library with API surface\") (add-integration-layer \"Integration with existing system\" :after implement-core-library) (write-test-suite \"Comprehensive test coverage\" :after add-integration-layer))") ``` Creates all phases with dependencies in one expression. Names are validated for descriptiveness. **Auto-improvement:** Short names like `p1` are auto-improved from descriptions: - `(p1 "Research architecture")` → creates `research-architecture` **Option 2: scaffold-chain! (For linear phase sequences)** ``` task_query("(scaffold-chain! \"Setup infrastructure\" \"Implement core logic\" \"Add test coverage\")") ``` Creates a linear dependency chain automatically. **Option 3: task_fork (For complex custom structures)** ``` task_fork(name="implement-user-authentication", from=current_task_id, edge_type="phase-of", description="Implement OAuth2 flow\n\nChanges Required:\n- ...\n\nSuccess Criteria:\n- ...") ``` Use when you need more control over task naming or descriptions. Names are validated. Verify the DAG: `task_query("(query \"plan\")")` — shows all phases with status, dependencies, and enriched fields (including `:alpha`, `:affinity` for Markov-aware ranking). **Optionally write plan.md** as a human-readable artifact if the plan is complex enough to warrant it. The task DAG is the source of truth. ### Step 8: Record Planning Decisions ``` observe("Plan complete: N phases created. Key decisions: . Open questions resolved: .") ``` ### Step 9: Pattern Feedback Give feedback on patterns that informed the plan: ```lisp pq_query('(-> (pattern "") (:feedback! :helpful "informed phase structure for X"))') pq_query('(-> (pattern "") (:feedback! :harmful "didnt apply to this planning context"))') ``` Record planning insights as observations (patterns are promoted during `/kli:reflect`): ``` observe("Planning insight: . Evidence: plan phasing approach") ``` ### Step 10: Present Plan to User ``` ## Plan Complete: **Phases**: N phases created as task DAG Plan DAG: Next step: `/kli:implement` to execute the phases. Use `task_query("(query \"plan-ready\")")` to see which phases are ready. ``` ### Step 11: Iterate Plan (If Needed) If user requests changes, modify the DAG and record via `observe()`: **Add phases:** ```lisp task_fork(name="new-phase", from=current_task_id, edge_type="phase-of", description="...") ``` **Remove phases (bulk sever):** ```lisp ;; Single phase task_query("(-> (node \"obsolete-phase\") (:sever-from-parent! :phase-of))") ;; Multiple phases at once task_query("(-> (node \"phase-1\" \"phase-2\") (:sever-from-parent! :phase-of))") ``` **Add dependencies:** ```lisp task_query("(-> (node \"phase-2\") (:link! \"phase-1\" :depends-on))") ``` Record all changes: `observe("Plan iteration: ")` ## Resuming a Plan 1. `task_bootstrap(task_id)` — restores full context 2. `task_query("(query \"plan\")")` — see all phases with status 3. `task_query("(query \"plan-ready\")")` — see which phases are ready (non-completed) 4. Continue from appropriate step ## Remember - **Plans are task DAGs** — use `scaffold-plan!` or `task_fork` to create phases - **task_query("(query \"plan\")")** is the source of truth, not plan.md - **Reuse research.md** if available (saves 40-50% tokens) - Ask clarifying questions for ANY ambiguity - Design phases following principles from kli-planning skill - Both automated AND manual verification required for each phase - Define out-of-scope explicitly - **Give feedback** on patterns that informed the plan - **Record decisions** via `observe()` — observations flow through the task event stream - Get user confirmation before finalizing plan ## See Also - CLAUDE.md - Task model, PQ/TQ reference, playbook workflow ### Reflect > Reflect on completed task and evolve playbooks Extract learnings from completed tasks by orchestrating the reflector and curator agents. The event stream IS the observation source — no separate observation files needed. > **Architecture Note**: This command orchestrates the reflection workflow. The kli-reflection skill provides the methodology (WHAT to evaluate). This command provides the orchestration (HOW to execute). **The kli-reflection skill provides comprehensive guidance on:** - Sequential execution pattern (reflector → curator) - Observation analysis methodology - Pattern effectiveness evaluation criteria - Harm signal tier definitions (Tier 1/2/3 with responses) - Evidence-based learning principles - New pattern discovery process - Reflection artifact structure IMPORTANT: Before starting this workflow you ABSOLUTELY NEED to load the kli-reflection skill, NO EXCEPTIONS. ## Workflow Overview ``` ┌──────────────────────────────────────────────────────────────┐ │ /reflect WORKFLOW │ ├──────────────────────────────────────────────────────────────┤ │ 1. GATHER STATE (task_get + timeline for observations) │ │ 2. REFLECTOR (analyze observations from event stream) │ │ 3. CURATOR (update playbooks via MCP tools) │ │ 4. REPORT (combined summary) │ └──────────────────────────────────────────────────────────────┘ Task Isolation: Each task is reflected independently. Cross-cutting knowledge emerges through playbook accumulation. Two feedback pathways feed into playbooks: - Curator: Analysis-based updates from reflection.md via PQ mutations - Real-time: Feedback given via `(:feedback! ...)` during work ``` ## Initial Response When this command is invoked: **1. Set up task context:** - If task name provided: `task_bootstrap(task_id)` - If no parameter: Call `task_get()` to check current task. If none, ask user. **2. Verify task has sufficient evidence:** ``` task_get() → Check observations, artifacts, metadata, phase timeline(limit=50) → Get full event history with observations task_graph(query="plan") → Check plan completion status ``` **If task has observations** (from `observe()` calls during work): - Proceed with reflection — the event stream contains the evidence **If task has NO observations:** ``` This task has no recorded observations. Observations are recorded via observe() during KLI commands (/research, /plan, /implement). Without observations, there's insufficient evidence for pattern effectiveness analysis. Options: 1. Reflect on what artifacts exist (reduced analysis) 2. Skip reflection for this task ``` **Note**: Not all tasks go through every phase. Simple tasks may skip research. Verify what evidence exists and proceed with available data. ## Step 0.5: Gather Graph Context (Optional) For complex tasks with many phases, spawn graph-analyst first to get comprehensive graph state: ``` Task( subagent_type="graph-analyst", prompt='{"question": "What is the complete state of task ? Include all phases, their status, and any related tasks."}', description="Get comprehensive task graph state" ) ``` Pass this graph context to the reflector agent for more informed analysis. **When to use:** - Task has 5+ phases - Task has cross-task dependencies - Multiple patterns were activated during the task **When to skip:** - Simple tasks with 1-3 phases - No cross-task relationships ## Step 1: Orchestrate Reflector Agent **Spawn reflector agent as Task:** Use Task tool with subagent_type: "reflector" Pass parameters in prompt: ``` Analyze completed task and produce reflection artifact. Task ID: Task directory: Context: To get the task's observations and evidence, use these MCP tools: - task_set_current("") to set context - task_get() to get state with observations (last 3 shown) - timeline(limit=50) to get ALL observations and events - task_graph(query="plan") to see phase completion Also read any artifacts listed in task_get output (research.md, plan.md, etc.) Evaluate pattern effectiveness with evidence from observations. Classify harm signals into tiers: - Tier 1 (Auto-Action): outcome=FAILURE, explicit rejection → auto-increment harmful - Tier 2 (Flag for Review): excessive iterations, implicit correction → increment with review note - Tier 3 (Track Only): minor iterations, context mismatch → track but no counter change Identify new patterns discovered during task. Generate reflection.md artifact in the task directory with: - Complete frontmatter - Patterns applied and effectiveness - Harm Signals section (tiered) - Challenges and resolutions - New patterns discovered - Playbook update recommendations Return summary when complete. ``` **Wait for reflector agent to complete.** ## Step 2: Orchestrate Curator Agent **After reflector returns, spawn curator agent as Task:** Use Task tool with subagent_type: "curator" Pass parameters in prompt: ``` Update playbooks based on reflection artifact. Task directory: Reflection: /reflection.md Read reflection.md recommendations. Update playbook using PQ mutations: - `(-> (pattern "id") (:feedback! :helpful "evidence"))` for effective patterns - `(-> (pattern "id") (:feedback! :harmful "evidence"))` for misleading patterns - `(add! :domain :X :content "...")` for new patterns discovered - `(-> (pattern "id") (:evolve! "new content" :reason "why"))` for pattern description updates Process harm signals by tier: - Tier 1: `(:feedback! :harmful ...)` - Tier 2: `(:feedback! :harmful ...)` with review note in evidence - Tier 3: Track only (no feedback call) Return summary of all changes made. ``` **Wait for curator agent to complete.** ## Step 3: Record and Present Results ``` task_set_current("") observe("Reflection complete: patterns evaluated, helpful, harmful, new patterns added") ``` Present to user: ``` Reflection complete! **Task:** ## Reflection Analysis - Reflection: /reflection.md - Patterns evaluated: patterns - Harm signals detected: - Tier 1 (auto-action): - Tier 2 (flagged): - Tier 3 (tracked): ## Playbook Updates (Curator) - Helpful incremented: patterns - Harmful incremented: patterns - New patterns added: patterns Review reflection.md for full analysis. ``` ## Important Notes - **Task isolation** — each task is reflected independently - **Event stream is source of truth** — observations from `observe()` calls, surfaced by `task_get()` and `timeline()` - **No observation files required** — observations flow through the task event stream via `observe()` - **Sequential execution** — Reflector → Curator (dependencies) - **Playbook updates via PQ** — `(:feedback! ...)`, `(add! ...)`, `(:evolve! ...)` (not file edits) - **Cross-cutting knowledge** — emerges through playbook accumulation ## Error Handling **If task has no observations:** - Offer reduced analysis from artifacts only - Or skip reflection **If reflector agent fails:** - Present error details - Offer to retry **If curator agent fails:** - Note that reflection.md was created - Offer to run curator manually or apply updates via playbook MCP tools directly ## Remember You are an **orchestrator** for per-task reflection. Key responsibilities: 1. Gather task state and observations via `task_get()` + `timeline()` 2. Delegate analysis to reflector (reads event stream) 3. Delegate playbook updates to curator (uses playbook MCP tools) 4. Record results via `observe()` 5. Present combined results Cross-cutting knowledge accumulates in playbooks over many reflections. ### Research > Document codebase as-is through iterative research with observation capture Research the codebase or external topics by delegating to sub-agents. **The kli-research skill provides comprehensive guidance on:** - Documentarian philosophy (document what IS, not what SHOULD BE) - Error amplification principles (research errors amplify 1000x downstream) - Research decomposition patterns - Exit criteria evaluation (when research is complete) ## Research Strategies Five research strategies are available. Bundled agents are always present; other capabilities use the most specialized available agent type, falling back to general-purpose. | Strategy | Keywords | Approach | |----------|----------|----------| | **codebase** | "how", "where", "implementation", "code", "architecture" | Bundled: codebase-locator, codebase-analyzer, pattern-finder | | **visual** | "design", "UI", "visual", "component", "inspiration", "peer" | Sub-agent: describe visual research goal | | **github** | "repo", "repository", "github.com", "open source", "package source" | Sub-agent: describe repo analysis goal | | **external** | "library docs", "framework", "documentation", "how to use X" | Sub-agent: describe web research goal | | **graph** | "prior tasks", "patterns for", "project health", "task history", "related tasks", "what has been done" | Bundled: graph-analyst | ## Initial Setup When invoked without arguments, respond: ``` I'm ready to research. Please provide your research question: - **Codebase research**: "How does authentication work?", "Where are API endpoints?" - **Visual research**: "Find modern card component examples", "Analyze nordic design trends" - **GitHub research**: "Analyze the tokio-rs/tokio repository", "Map the Next.js repo structure" - **External docs**: "How does React Server Components work?", "Redis caching best practices" - **Graph research**: "What prior tasks relate to MCP?", "What patterns have been effective for Lisp?", "Project health?" - **Hybrid**: "How should we improve our navigation based on best practices?" ``` Wait for user's research query. ## Research Process ### Step 0: Set Up Task Context Call `task_get()` to check if there's already a current task. If so, use it. If not, create one: ``` task_create(name="research-") task_set_metadata(key="goals", value='["Research ", "Document findings with file:line evidence"]') task_set_metadata(key="phase", value="research") ``` Then call `task_get()` to retrieve the full task state including any existing observations and artifacts. ### Step 0.5: Activate Playbook Patterns (REQUIRED) **Before researching**, activate relevant patterns: ```lisp pq_query('(-> (activate "" :boost ( )) (:take 5))') ``` This retrieves prior learnings and patterns that may inform the research. The activation is persisted for handoff continuity. ### Step 1: Classify Research Type Analyze the query to determine strategy: **Codebase keywords**: "how", "where", "implementation", "code", "architecture", "what calls", "imports" **Visual keywords**: "design", "UI", "visual", "component", "inspiration", "peer", "navigation examples", "modern" **GitHub keywords**: "repo", "repository", "github.com/", "open source", "package source", "analyze X repo" **External keywords**: "docs", "documentation", "library", "framework", "how to use", "best practices for X library" **Graph keywords**: "prior tasks", "prior work", "task history", "patterns for", "pattern effectiveness", "what patterns", "project health", "task health", "graph health", "related tasks", "what relates to", "dependencies", "stale", "blocked", "orphan", "what has been done for", "similar tasks" ### Step 3: Spawn Research Agents **For codebase research**, spawn appropriate agents based on the question: ``` # For locating files/components Task( subagent_type="codebase-locator", prompt=". Task dir: ", description="Locate relevant files" ) # For deep analysis of specific components Task( subagent_type="codebase-analyzer", prompt=". Task dir: ", description="Analyze implementation" ) # For finding similar patterns Task( subagent_type="pattern-finder", prompt=". Task dir: ", description="Find related patterns" ) ``` **For web/external research** (documentation, articles, best practices): Select the most specialized available agent type for web research; fall back to general-purpose. ``` Task( subagent_type=, prompt="Research goal: . Topics: . Return: summary of findings with source URLs.", description="Web research: " ) ``` **For GitHub repository research:** Select the most specialized available agent type for repository analysis; fall back to general-purpose. ``` Task( subagent_type=, prompt="Analyze the / repository. Focus: . Return: key files, architecture summary, patterns found.", description="Analyze /" ) ``` **For visual/design research** (UI patterns, inspiration, branding): Select the most specialized available agent type for visual/design research; fall back to general-purpose. ``` Task( subagent_type=, prompt="Research visual design patterns for . Analyze: . Return: patterns found, layout/color/typography analysis.", description="Visual research: " ) ``` **For graph-based research (task/pattern graphs):** ``` Task( subagent_type="graph-analyst", prompt='{"question": ""}', description="Query task/pattern graphs" ) ``` Use for questions about prior work, pattern effectiveness, task relationships, or project health. The graph-analyst queries TQ (task graph) and PQ (pattern graph) to answer from the graph perspective. **Capability selection guidance:** - `codebase-locator` for "where is X?" questions (bundled, always available) - `codebase-analyzer` for "how does X work?" questions (bundled, always available) - `pattern-finder` for "how is X done elsewhere?" questions (bundled, always available) - `graph-analyst` for prior tasks, patterns, project health (bundled, always available) - Sub-agent for web research: external docs, articles, best practices - Sub-agent for repo analysis: GitHub repository structure and architecture - Sub-agent for visual research: design patterns, UI inspiration, branding The agents will: 1. Research the question using their specialized tools 2. Generate findings in the task directory 3. Return with status, findings, and evidence ### Step 4: Handle Hybrid Research If the query needs BOTH strategies (e.g., "How should we redesign our navigation?"): 1. First spawn codebase agents (bundled) to understand current implementation 2. Then spawn a sub-agent for visual/external research describing what patterns to find 3. Combine results in a summary ### Step 5: Present Results After agents return: 1. Record key findings: `observe("Research findings: ")` 2. Write research.md to the task directory (path from `task_get()`) 3. Present to user: - Status (success/partial/failure) - Key findings summary - Evidence references (file:line for codebase, screenshots for visual) - Suggested next steps ### Step 6: Record Research Findings as Observations Research produces observations (system-specific findings), not patterns (reusable techniques). Record all discoveries via `observe()`: ``` observe("Research finding: ") observe("Architecture insight: ") observe("Constraint found: ") ``` **Do NOT use `(add! ...)` during research.** Research follows the documentarian philosophy — document what IS, not prescribe what to DO. Findings are observations by nature. Reusable patterns emerge later during reflection (`/kli:reflect`), which applies the litmus test: - **Transferable**: Would help on a *different* project? - **Actionable**: Says "when X, do Y" (not "X exists")? - **Prescriptive**: Gives advice, not description? **What to record as observations:** - Architectural discoveries (how the system works) - Anti-patterns found (things that don't work) - Workarounds that succeeded - Cross-cutting concerns observed ### Step 7: Handle Follow-Up Questions **Simple clarification:** Re-spawn relevant agent with refined question **Component extraction:** For visual research, ask if user wants code extracted **Full iteration:** Spawn agents again with follow-up context ## CRITICAL: Delegation Required **DO NOT research the codebase yourself.** You MUST delegate to specialized agents. ❌ **WRONG**: Using Read, Grep, Glob, or Search to investigate the question directly ❌ **WRONG**: Answering the research question from your own knowledge ✅ **CORRECT**: Spawn specialized agents via Task tool and let them do the work The agents: - Have access to appropriate tools for their specialty - Return structured results - Track findings in the task directory **Your job**: Set up the task directory, spawn appropriate agents, synthesize and present results. ## Remember - Follow **documentarian philosophy** from kli-research skill - **ALWAYS delegate to specialized agents** - never research directly - Auto-detect strategy from query keywords - Use hybrid mode for questions needing both perspectives - **Record findings** as observations via `observe()` — patterns are promoted during `/kli:reflect` - **Record findings** via `observe()` — observations flow through the event stream - Get user validation before marking complete - No placeholder values in artifacts ## See Also - CLAUDE.md - Task model, PQ/TQ reference, playbook workflow ### Resume Handoff > Resume work from handoff document with context analysis and validation You are tasked with resuming work from a handoff document through an interactive process. ## Context These handoffs contain critical context, learnings, and next steps from previous work sessions that need to be understood and continued. ## Initial Response When this command is invoked: ### 1. If the path to a handoff document was provided **Example**: `/kli:resume_handoff /handoffs/2025-10-26_14-30-00_phase-1.md` - Skip the default message - Immediately read the handoff document FULLY (no limit/offset) - Immediately read any research or plan documents it links to under "Critical References" - Do NOT use a sub-agent to read these critical files - Derive the task ID from the path (the task name is the directory containing `handoffs/`) - Set current task via `task_bootstrap` so subsequent MCP calls have context - Begin analysis process by ingesting context - Propose course of action to user and confirm ### 2. If a task name was provided **Example**: `/kli:resume_handoff 2025-10-26-handoff-commands` - Bootstrap task: `task_bootstrap(task_id="2025-10-26-handoff-commands")` — sets context + returns state with artifacts - Check timeline: `timeline(limit=20)` — look for `:handoff.create` events which contain the handoff path - Also glob for handoff files using the task directory from `task_get()`: `/handoffs/*.md` - If no handoffs exist: "I can't find any handoff documents in this task. Would you like to create one with /kli:handoff?" - If one handoff: Proceed with that handoff - If multiple handoffs: Use the most recent (by timestamp in filename YYYY-MM-DD_HH-MM-SS) - Read handoff FULLY and linked artifacts - Begin analysis process ### 3. If no parameters provided **Example**: `/kli:resume_handoff` **Step 1: Use Session Start Context (Already Injected)** The session start hook injects task context at the beginning of each session. Look for this in the conversation startup: ``` TASK[1]{dir,phase,last_artifact}: ,Phase 2: Implementation,research.md ``` - If `TASK[1]` context exists: Extract the task directory from it (first field before comma) - This context is **already in your conversation** - no file reading needed - The session start hook has done the discovery work for you **Step 2: Find Handoffs for Active Task** If active task found: 1. Bootstrap task: `task_bootstrap(task_id)` 2. Check timeline for `:handoff.create` events: `timeline(limit=30)` 3. Also glob for handoff files: `/handoffs/*.md` 4. If handoffs exist: - Use most recent by filename timestamp (YYYY-MM-DD_HH-MM-SS) - Announce: "Found active task with handoff. Resuming from `{handoff_path}`..." - Proceed to read handoff and linked artifacts 5. If no handoffs but task exists: - Get task state with `task_get()` for observations and artifacts - Ask: "Found active task but no handoffs. Read observations instead? [Y/n]" **Step 3: Fallback (No TASK Context)** If no `TASK[1]` in session startup (rare - usually means no active task): ``` I'll help you resume work from a handoff document. Please provide either: - Full path: `/kli:resume_handoff ` - Task name: `/kli:resume_handoff 2025-10-26-task-name` Or I can help you find available handoffs. What would you like to do? ``` Then wait for user input. ## Process Steps ### Step 1: Read and Analyze Handoff 1. **Restore task context and activate relevant patterns**: ``` task_bootstrap(task_id) pq_query('(-> (activate "" :boost ( )) (:take 5))') ``` The activate query uses task topic + graph context to surface semantically relevant patterns. 2. **Read handoff document completely**: - Use Read tool WITHOUT limit/offset parameters - Extract all sections: - Task(s) and their statuses - Critical References - Recent changes - Learnings - Artifacts - Action items and next steps - Other notes 2. **Read referenced artifacts**: - Read all files mentioned in "Critical References" section - Read research.md if referenced - Read plan.md if referenced - Read any other critical files mentioned - Use Read tool FULLY for each file 3. **Verify current state** (read-only validation): - Check if mentioned files still exist at specified paths - Use `task_get()` to see current task state vs handoff state - Check `timeline(limit=20)` for recent activity since handoff - Check if recent changes are still present - Note any discrepancies between handoff and current state 4. **Verify graph state** (if task has phases): Spawn graph-analyst to check current graph state: ``` Task( subagent_type="graph-analyst", prompt='{"question": "What is the current state of task ? Are there any stale phases, new blocking dependencies, or changes since the handoff?"}', description="Verify task graph state" ) ``` This catches: - Phases completed by another session since handoff - New blocking dependencies introduced - Related tasks that were created - Changes to task metadata or goals ### Step 2: Synthesize and Present Analysis Present comprehensive analysis to user: ``` I've analyzed the handoff from [date] for task [name]. **Original Tasks:** - [Task 1]: [Status from handoff] → [Current verification] - [Task 2]: [Status from handoff] → [Current verification] **Key Learnings from Handoff:** - [Learning 1 with file:line reference] - [Learning 2 with pattern discovered] **Recent Changes Status:** - [Change 1] - [Verified present/Missing/Modified] - [Change 2] - [Verified present/Missing/Modified] **Critical Artifacts Reviewed:** - [research.md]: [Key findings summary] - [plan.md]: [Phase status summary] **Graph State** (if task has phases): - [Graph-analyst findings about current state] - [Phases completed since handoff] - [New blocking dependencies] - [Related tasks created] **Recommended Next Actions:** Based on the handoff's action items: 1. [Most logical next step] 2. [Second priority] 3. [Additional tasks] **Discrepancies Found** (if any): - [File mentioned but not found] - [Change mentioned but different] - [State mismatch] Shall I proceed with [recommended action 1], or would you like to adjust the approach? ``` Get user confirmation before proceeding. ### Step 3: Create Action Plan 1. **Use TodoWrite to create task list**: - Convert action items from handoff into todos - Add any new tasks discovered during analysis - Prioritize based on dependencies 2. **Present the plan**: ``` I've created a task list based on the handoff: [Show todo list] Ready to begin with the first task? ``` ### Step 4: Begin Work 1. Start with first approved task 2. Reference learnings from handoff throughout work 3. Apply patterns discovered in handoff 4. Update progress as tasks complete ## Guidelines 1. **Be Thorough in Analysis**: - Read entire handoff document - Verify ALL mentioned changes exist - Check for regressions or conflicts - Read all referenced artifacts 2. **Be Interactive**: - Present findings before starting work - Get buy-in on approach - Allow course corrections - Adapt based on current vs handoff state 3. **Leverage Handoff Wisdom**: - Pay attention to "Learnings" section - Apply documented patterns - Avoid repeating mistakes mentioned - Build on discovered solutions 4. **Validate Before Acting**: - Never assume handoff state matches current - Verify file references still exist - Check for breaking changes since handoff - Confirm patterns still valid 5. **Avoid Unnecessary Sub-Agents**: - Read files directly in main context - Only spawn agents if complex verification needed - Most handoff resumption is straightforward reading ## Common Scenarios **Clean Continuation**: - All changes present, no conflicts - Proceed with recommended actions **Diverged Codebase**: - Some changes missing or modified - Reconcile differences and adapt plan **Incomplete Work**: - Tasks marked "in_progress" - Complete unfinished work first **Stale Handoff**: - Significant time passed - Re-evaluate strategy based on current state ### Resume Task > Resume work on a task by gathering context from event stream and graph state Resume work on a task by gathering context from the task MCP server's event stream, graph state, and artifacts. Unlike `/kli:resume_handoff` which requires a handoff document, this command works directly with the task's live state. ## When to Use - **No handoff exists** — work was interrupted without creating a handoff - **Picking up where you left off** — same session or new session - **Checking task status** — understand what's been done and what's next - **Exploring a task** — unfamiliar with a task and need context Use `/kli:resume_handoff` instead when a handoff document exists and you want to follow its specific guidance. ## Initial Response When this command is invoked: ### 1. If task ID provided **Example**: `/core:resume-task 2026-01-31-coalgebraic-task-infrastructure` - Bootstrap the task: `task_bootstrap(task_id="2026-01-31-coalgebraic-task-infrastructure")` - This single call sets current task, emits session.join, and returns: - Full computed state (description, observations, artifacts, metadata) - Graph neighbors (related tasks, dependencies) - Playbook query (enriched semantic query) - Handoff document (if one exists) - Proceed to context gathering ### 2. If no parameter provided **Example**: `/core:resume-task` **Step 1: Check Session Start Context** The session start hook injects task context at conversation startup. Look for: ``` TASK[1]{dir,phase,last_artifact}: ,Phase 2: Implementation,research.md ``` If `TASK[1]` exists: - Extract task ID from the first field (e.g., `2025-12-12-task-name`) - Bootstrap: `task_bootstrap(task_id="2025-12-12-task-name")` - Proceed to context gathering **Step 2: Check for Current Task** If no session context, call `task_get()` to check if a current task is already set. If a task is current: - Announce: "Resuming current task: ``" - Proceed to context gathering **Step 3: List Available Tasks** If no current task, use TQ to find recent active tasks: ``` task_query('(-> (query "recent") (:take 10) (:select :display-name :crdt-status :obs-count :alpha :affinity))') ``` Present the list and ask which task to resume. ## Context Gathering Once a task is identified, gather comprehensive context using task MCP tools. ### Step 1: Get Core State The `task_bootstrap` call already provides: - **State**: description, status, claim, sessions, observations, artifacts, metadata - **Neighbors**: typed edges to related tasks - **Playbook query**: enriched semantic query for pattern activation If you used `task_set_current` + `task_get` separately, you have the same information. ### Step 2: Get Timeline Retrieve recent events to understand activity: ``` timeline(limit=30) ``` This shows: - Recent observations - Session joins/leaves - Artifact registrations - Metadata changes - Handoff creations (`:handoff.create` events) ### Step 3: Check Plan Structure (If Task Has Phases) If the task has children (phases), query the plan: ``` task_graph(query="plan") # Full plan structure task_graph(query="plan-frontier") # Which phases are ready ``` This reveals: - Phase completion status (completed vs active) - Dependency ordering - Which phases are unblocked and ready to work on (ranked by affinity score) - Any blocked phases waiting on dependencies ### Step 4: Check Graph Health Query task health to identify issues: ``` task_health() ``` Or spawn graph-analyst for deeper analysis: ``` Task( subagent_type="graph-analyst", prompt='{"question": "What is the current state of task ? Are there stale phases, blocked work, or issues I should know about?"}', description="Analyze task graph state" ) ``` ### Step 5: Activate Relevant Patterns Use the enriched query from bootstrap to get relevant playbook patterns: ``` pq_query('(-> (activate "" :boost ( )) (:take 5))') ``` This surfaces patterns that are semantically relevant to this task's topic and its graph neighbors. ### Step 6: Read Critical Artifacts If the task has registered artifacts, read them: - `research.md` — prior research findings - `plan.md` — detailed plan document (if exists alongside DAG) - Recent handoffs — if `:handoff.create` events exist in timeline Read artifacts FULLY without limit/offset to get complete context. ## Synthesis and Presentation Present your analysis to the user: ``` ## Task: [Task Name] **Status**: [active/completed] | **Claim**: [held by session/unclaimed] **Created**: [date] | **Sessions**: [count] ### Goals [from metadata.goals] ### Current State [Summary of what has been accomplished based on observations and artifacts] ### Plan Progress (if phases exist) [X/Y] phases complete | [Z] ready to work on **Completed:** - ✓ Phase 1: [name] - ✓ Phase 2: [name] **Ready:** - ○ Phase 3: [name] — [brief description] **Blocked:** - ○ Phase 4: [name] — waiting on Phase 3 ### Recent Activity [Last 3-5 significant events from timeline] ### Relevant Patterns [Top 2-3 patterns from playbook activation] ### Artifacts [List of registered artifacts with brief descriptions] ### Graph Context [Related tasks, dependencies, what this enables] ### Recommended Next Steps Based on the task state, I recommend: 1. **[Most logical next action]** — [why] 2. **[Second priority]** — [why] 3. **[Additional consideration]** — [why] Shall I proceed with [recommended action 1]? ``` Get user confirmation before taking action. ## Special Cases ### Task Has No Observations ``` This task was created but has no recorded observations yet. Goals: [from metadata] Description: [from state] Would you like me to: 1. Start researching this task (/kli:research) 2. Create a plan (/kli:plan) 3. Just observe the current state and proceed ``` ### Task Is Completed ``` This task is marked as completed. Completed at: [timestamp if available] Sessions: [list] Observations: [count] Artifacts: [list] To continue working on it, I would need to reopen it first. Would you like me to: 1. Reopen the task and continue 2. Create a new related task for follow-up work 3. Just review the completed work ``` ### Task Has Handoff Documents If `:handoff.create` events exist in timeline: ``` This task has [N] handoff document(s): - [path1] — [timestamp] - [path2] — [timestamp] Would you like me to: 1. Resume from the latest handoff (/kli:resume_handoff) 2. Ignore handoffs and work from live task state ``` ### Task Is Stale If task hasn't had activity in a long time (check session timestamps): ``` This task hasn't been worked on since [date]. The codebase may have changed significantly. Consider: 1. Re-validating any existing plan 2. Re-checking file references in artifacts 3. Running /kli:validate if implementation was in progress ``` ## Guidelines 1. **Task Bootstrap is Canonical** - `task_bootstrap` is the primary entry point — it does everything in one call - Use it instead of multiple separate calls when starting fresh 2. **TQ for Complex Queries** - Use `task_query` for complex graph traversals - Examples: - `(-> (current) (:follow :phase-of) :ids)` — get phases of current task - `(-> (current) (:back :depends-on) :ids)` — what depends on this task - `(-> (query "plan-ready") :enrich (:sort :affinity) (:take 3))` — ready phases ranked by affinity 3. **Timeline Over Artifacts** - The timeline is the source of truth - Artifacts are useful but observations in the event stream are more current 4. **Don't Assume Handoff State** - Unlike resume_handoff, don't expect handoff document guidance - Work from the live task state 5. **Interactive Confirmation** - Always present analysis before taking action - Let user choose the next step - Don't auto-proceed with implementation ## Comparison with resume_handoff | Aspect | resume-task | resume_handoff | |--------|-------------|----------------| | Input | Task ID or current task | Handoff document path | | Source of truth | Event stream + graph | Handoff markdown | | Guidance | Inferred from state | Explicit next steps | | When to use | No handoff exists | Handoff was created | | Context depth | Comprehensive (all tools) | Focused (handoff content) | ## Integration with Workflow ``` [Task Created] ↓ /kli:research → /kli:plan → /kli:implement → /kli:validate → /kli:reflect ↑ ↓ └──────────── /core:resume-task ←─────────────┘ (re-enter anywhere) ``` `/core:resume-task` is the universal re-entry point for any task, at any stage. ### Validate > Validate implementation against plan, verify success criteria, identify issues You are tasked with validating that an implementation plan was correctly executed, verifying all success criteria and identifying any deviations or issues. ## Context Task state gathered via MCP: Call `task_get()` to retrieve current task state including description, phase, observations, artifacts, and graph context. If no current task, use `task_list()` to find available tasks. ## Initial Setup When invoked: 1. **Set up task context**: - If task name provided: `task_bootstrap(task_id)` - If no parameter: Call `task_get()` to check current task, or `task_list()` to find available tasks - Ask if no task can be determined: "Which task should I validate?" 2. **After getting task context**: ``` I'll validate task: [Task Name] Loading task state via task_get() and task_graph(query="plan"). I'll verify: 1. All phases are complete 2. Automated checks pass 3. Code matches plan specifications 4. Manual verification is clear Starting validation... ``` ## Validation Process ### Step 1: Load Task State and Plan **Retrieve task state and plan structure:** ``` task_get() → Full state: description, observations, artifacts, metadata task_graph(query="plan") → Phase structure: phases, status, dependencies task_graph(query="plan-frontier") → Which phases are ready/completed timeline(limit=20) → Recent activity and observations ``` **Read task artifacts** (from artifacts list in task_get output): - Read plan.md if registered as artifact (for verification commands and criteria) - Read research.md if registered (for additional context) **Extract validation context:** From task state (`task_get`): - Task description, goals, and phase metadata - Observations from all phases (implementation decisions, challenges) - Registered artifacts (what files were produced) From plan DAG (`task_graph(query="plan")`): - Phase completion status (completed vs active) - Phase dependencies and ordering - Success criteria from phase descriptions **Identify scope:** - Which files should have been modified? (from phase descriptions and artifacts) - What functionality should exist? (from task goals) - What tests should pass? (from phase success criteria) - What patterns should be followed? (from observations) **Verify Plan DAG Health:** Spawn graph-analyst to verify plan integrity: ``` Task( subagent_type="graph-analyst", prompt='{"question": "Is the task plan DAG healthy? Are there stale phases, orphan tasks, or broken dependencies?"}', description="Verify plan DAG health" ) ``` This catches: - Phases marked complete that still have incomplete dependencies - Orphan phases not connected to the main task - Stale phases that should be addressed - Blocked tasks that might be unblocked now - Missing Markov transition edges between related tasks - Unorganized tasks (below observation threshold) Include DAG health findings in the validation report. ### Step 2: Verify Automated Criteria For each phase's "Automated Verification" section: 1. **Extract commands** from plan - Example: `npm run build`, `cargo build`, `go build ./...` - Example: `make test`, `pytest`, `cargo test` - Example: `npm run lint`, `eslint src/` 2. **Run each command**: - Execute exactly as specified in plan - Capture output (success/failure) - Note any warnings or errors 3. **Document results**: ``` ✓ Phase 1 Automated Checks: ✓ Build succeeded ✓ tests pass (24 passed, 0 failed) ⚠️ Phase 2 Automated Checks: ✓ Build succeeded ✗ Linting failed (3 warnings in src/handler.py:42) ``` ### Step 3: Code Review Against Plan **Compare implementation to plan specifications:** 1. **Read mentioned files** from plan "Changes Required" sections 2. **Cross-reference with research.md** (if available): - Compare implementation against patterns documented in research - Verify code references from research were followed - Check if open questions from research were addressed 3. **Verify changes match plan**: - Were specified functions added/modified? - Does structure match plan? - Are there unexpected changes? - Do changes follow patterns from research.md? 4. **Spawn analyzer agents ONLY if needed**: Execute this step ONLY if: - Artifacts (plan.md + research.md) don't provide sufficient context, OR - Complex verification is needed beyond what artifacts document, OR - Inconsistencies found that require deeper analysis If spawning agents: - Use **codebase-analyzer** to verify complex changes - Use **pattern-finder** to check consistency - Provide context from artifacts to focus agent analysis 5. **Document findings**: ``` Matches Plan: - Database migration added table as specified - API endpoints implement correct methods - Error handling follows plan pattern Deviations: - Used different variable name (minor) - Added extra validation (improvement) Potential Issues: - Missing index could impact performance - No rollback handling mentioned ``` ### Step 4: Assess Manual Verification **Review manual criteria from plan:** 1. **List what needs manual testing**: - UI functionality checks - Performance testing - Edge case verification - Integration testing 2. **Ensure criteria are clear and actionable**: - Can a developer follow these steps? - Are expected results specified? - Are edge cases covered? 3. **If criteria are vague**, suggest improvements ### Step 5: Generate Validation Report **Present comprehensive findings:** ```markdown ## Validation Report: [Task Name] **Task**: [task_id from task_get()] **Date**: [Current date] **Commits**: [git commit range if identifiable] ### Phase Completion Status ✓ Phase 1: [Name] - Complete ✓ Phase 2: [Name] - Complete ⚠️ Phase 3: [Name] - Issues found (see below) ### Plan DAG Health **Graph-analyst findings:** - ✓ No stale phases detected - ✓ No orphan tasks - ⚠️ 1 blocked task waiting on external dependency (Include specific findings from graph-analyst output) ### Automated Verification Results **Phase 1:** ✓ Build succeeds ✓ Tests pass (24 passed, 0 failed) **Phase 2:** ✓ Build succeeds ✗ Linting: 3 warnings in src/handler.py:42-45 - Warning: unused variable 'x' - Warning: missing type annotation **Phase 3:** ✓ Integration tests pass ### Code Review Findings #### Verified Against Research: (If research.md exists) - Follows pattern documented in research.md section X - Code references from research (file:line) were followed - Open questions from research addressed appropriately - Implementation consistent with research findings #### Matches Plan Specifications: - Database migration correctly adds `users` table - API endpoints implement specified REST methods - Error handling follows documented pattern - Test coverage added as planned #### Deviations from Plan: - **src/handler.py:42**: Used different approach than planned (minor, arguably better) - **src/validator.py:89**: Added extra input validation (improvement, not in plan) - **Naming**: Used `processRequest` instead of `handleRequest` (inconsistent) #### Potential Issues: - **Performance**: Missing index on foreign key `user_id` could impact queries - **Error handling**: Migration has no rollback procedure - **Documentation**: New API endpoints not documented - **Edge case**: No handling for empty input in `processRequest` ### Manual Verification Assessment **From Plan - Clear and Actionable:** - [ ] Verify feature appears correctly in UI dashboard - [ ] Test with >1000 users to check performance - [ ] Confirm error messages are user-friendly **From Plan - Needs Clarification:** - [ ] "Test edge cases" - Which edge cases specifically? - Suggestion: Empty input, max length input, special characters **Additional Manual Testing Recommended:** - [ ] Verify integration with existing auth system - [ ] Test rollback procedure for migration - [ ] Check API documentation is updated ### Summary **Overall Status**: ⚠️ **Implementation mostly complete, minor issues found** **Blockers**: None **Warnings**: - 3 linting warnings should be addressed - Missing index could impact production performance - Documentation gaps exist **Recommendations**: 1. Fix linting warnings before merge 2. Add index on `user_id` or document performance trade-off 3. Add API documentation for new endpoints 4. Clarify manual test cases for edge cases **Ready for Reflection?** ✓ Yes, but fix linting warnings first **Ready for PR?** ⚠️ After addressing warnings and documentation ``` ## Special Cases ### Plan Not Found ``` Could not find a plan for this task. No phases found via task_query("(query \"plan\")"). Did you mean one of these recent tasks? [List from task_list() or task_query("(query \"recent\")")] Please provide the correct task name. ``` ### No Checkmarks in Plan ``` The plan has no phase checkmarks yet. Options: 1. Run /implement to execute the plan 2. If implementation is done but plan not updated, I can validate anyway 3. If validation shows implementation is complete, I can update the plan How should I proceed? ``` ### Validation Failures ``` ⚠️ VALIDATION FAILURES DETECTED Critical Issues: - Build fails: build command returned error code 1 - Tests failing: 5/24 tests fail - Missing implementation: Phase 3 not started Recommendations: 1. Fix build errors before proceeding 2. Debug failing tests 3. Complete Phase 3 implementation Cannot proceed to reflection until these are resolved. Would you like me to help debug these issues? ``` ## Important Guidelines 1. **Be thorough but practical**: - Focus on what matters for correctness - Don't nitpick trivial style differences - Highlight real issues that affect functionality 2. **Run all automated checks**: - Never skip verification commands - If a command fails, investigate why - Report failures clearly with error messages 3. **Think critically**: - Does implementation actually solve the problem? - Are there edge cases not handled? - Could this break existing functionality? 4. **Be constructive**: - Frame issues as opportunities to improve - Suggest solutions, not just problems - Acknowledge what was done well 5. **Consider maintainability**: - Is code readable and well-structured? - Are patterns consistent with codebase? - Will future developers understand this? ## Validation Checklist Always verify: - [ ] All phases marked complete in plan - [ ] Plan DAG is healthy (no stale/orphan phases) - [ ] All automated tests from plan executed - [ ] Test results documented (pass/fail) - [ ] Code changes match plan specifications - [ ] No regressions introduced (existing tests still pass) - [ ] Manual verification steps are clear - [ ] Error handling is robust - [ ] Documentation updated if needed ## Integration with Workflow **Position in KLI cycle:** ``` /research → /plan → /implement → /validate → /reflect ↑ You are here ``` **Relationship to other commands:** - After `/implement` completes all phases - Before `/reflect` updates playbooks - Can help prepare for PR/commit **When to use:** - After implementation, before reflection - Before creating PR - When resuming work to verify state - To catch issues early ## What NOT to Do - Don't skip automated verification commands - Don't validate without reading the plan - Don't accept "looks good" without running checks - Don't nitpick trivial style choices - Don't proceed to reflection if critical issues exist - Don't create validation artifact file (just report) ## Remember Validation is your last chance to catch issues before reflecting and updating playbooks. Be thorough, be honest, and be constructive. The goal is to ensure quality before marking the task complete and learning from it. ## Workflow Reference ### Implementation > Implementation phase domain knowledge including TDD methodology, design principles (Extensibility/Composability/Parametricity), and quality standards. Use when implementing features using TDD workflow, writing code with tests, or applying design principles. Activates for coding and testing tasks. DO NOT use for research or planning. ## Requirements & Standards ### TDD Discipline (CRITICAL) **Test-Driven Development is NON-NEGOTIABLE in KLI implementation.** The cycle MUST be followed: 1. **Red**: Write failing tests FIRST 2. **Green**: Implement minimum code to pass tests 3. **Refactor**: Improve design while keeping tests green **Why This Matters:** - Tests define behavior before implementation (design thinking) - Failing tests confirm tests actually test something - Passing tests confirm implementation meets requirements - Green tests during refactor confirm no regressions - Never proceed without passing automated verification **Common TDD Violations:** - ❌ Writing implementation before tests - ❌ Skipping Red phase (tests never fail) - ❌ Skipping Refactor phase (technical debt accumulates) - ❌ Batching multiple features before testing - ✅ One feature → Write test → Implement → Refactor → Verify ### Design Principles (The Three Pillars) **Every implementation decision must consider:** **1. Extensibility** - Can new variants be added without modifying existing code? - Use: Directory-based registration (not hardcoded lists) - Use: Plugin architectures (load from directories) - Avoid: Enum/switch statements (require modification) - Example: A `plugins/` directory where new plugins are added by creating files — no registration code to modify **2. Composability** - Can components be combined in new ways? - Use: Middleware/pipeline patterns (compose handlers) - Use: Pure data structures (no hidden state) - Use: Function composition (small, focused functions) - Example: A middleware pipeline where handlers compose — each handler is independent, ordering is configuration **3. Parametricity** - Are values parameterized instead of hardcoded? - Use: Configuration via arguments/environment - Avoid: Magic strings/numbers embedded in code - Avoid: Assumptions about deployment context - Example: A CLI tool where all paths, ports, and URLs come from config or arguments — no hardcoded values **Applying These Principles:** - Ask during refactor: "How would I add a new variant?" - Ask during refactor: "Can I compose this with something else?" - Ask during refactor: "Are there any hardcoded assumptions?" ### Zero TODOs Policy **NO TODOs ARE ALLOWED IN COMMITTED CODE** **Why:** - TODOs in committed code become technical debt - Incomplete work blocks phase completion - Verification should catch TODOs automatically **Enforcement:** ```bash # Automated check (in every phase) git diff --name-only | xargs rg "TODO|FIXME|HACK" && \ echo "TODOs found - must fix" || \ echo "No TODOs ✓" ``` **Handling TODOs:** - Complete the work before phase ends - If truly future work: Create GitHub issue, remove TODO - If out of scope: Document in plan.md out-of-scope, remove TODO - Never commit code with TODOs ## Overview The implementation phase executes plans using strict TDD methodology, applies design principles (Extensibility/Composability/Parametricity), and enforces verification gates (automated + manual) before proceeding to next phases. It produces observe() calls for reflection. **Role in Workflow:** 1. **Plan Navigation**: Uses `task_query("(query \"plan-ready\")")` to find next phase, `task_get()` on phase tasks for details 2. **Observation Recording**: Records progress via `observe()` into the event stream 3. **Phase-by-Phase**: One phase at a time, verification gates between, `task_complete()` marks phase done 4. **Pattern Application**: Apply patterns from PQ queries (`(-> (search "...") ...)`, `(-> (proven) ...)`) **Key Characteristics:** - TDD cycle for every feature (Red→Green→Refactor) - Design principles applied during Refactor - Automated + manual verification (both required, both blocking) - Resume capability via task DAG completion status - Deviation handling (pause and ask user) ## Quick Start **Before starting any implementation:** Activate playbook patterns (REQUIRED): ``` pq_query('(-> (activate "" :boost ()) (:take 5))') ``` This retrieves proven patterns via graph-based search and persists for handoff continuity. **Basic Implementation Workflow (Per Phase):** 1. **Announce phase** start - Extract from phase task: Overview, Changes Required, Success Criteria - Record via observe() 2. **Reference activated patterns** for this phase - Apply patterns from `(activate ...)` output - Document pattern applications in observations 3. **Read referenced files FULLY** - All files mentioned in "Changes Required" - No limit/offset parameters - Full context before changes 4. **TDD Red**: Write failing tests - Tests that express desired behavior - Run to confirm they fail for right reason - Record via observe() 5. **TDD Green**: Implement to pass tests - Minimum code to make tests pass - Run tests frequently - Record via observe() 6. **TDD Refactor**: Improve design - Apply Extensibility/Composability/Parametricity - One change at a time - Keep tests green throughout - Record via observe() 7. **Run automated verification** - Build, tests, TODO check, etc. - ALL must pass before manual verification - Document results in observations 8. **Mark phase complete in task DAG** - ONLY after automated verification passes 9. **Request manual verification** - Present checklist from phase task description - Wait for user approval - Document approval in observations 10. **Proceed to next phase** or complete ## TDD Methodology ### Phase 1: Red (Write Failing Tests) **Goal:** Create tests that fail for the RIGHT reason **Steps:** 1. **Identify what to test:** - From phase "Changes Required" in task description - What behavior must the code exhibit? - What edge cases must be handled? 2. **Write tests expressing desired behavior:** ```python # Example test (any testing framework) def test_my_feature_works(): """Tests that my_feature produces correct output""" assert my_feature(input) == expected_value ``` 3. **Run tests to confirm failure:** ```bash # Run your project's test suite: # e.g., pytest, npm test, cargo test, go test ./... # Expected: FAILED - "my_feature not defined" or similar ``` 4. **Verify failure reason:** - ✅ Good: "function not defined", "feature not implemented" - ❌ Bad: syntax error, import failure, test logic error - If bad failure: Fix test, try again **Document in observations:** ```markdown ### TDD Red: Write Failing Tests **Tests written:** - `test/test_my_feature.py:15` - Tests correct output for valid input - `test/test_my_feature.py:23` - Tests error handling for invalid input **Test execution:** $ Result: FAILED (as expected) Reason: my_feature function not defined **Effectiveness:** Tests fail for correct reason ✓ ``` **Critical Success Factors:** - Tests MUST fail before implementing - Failure reason MUST be "not implemented" (not syntax/import errors) - Tests MUST be specific and focused - Tests MUST express desired behavior clearly ### Phase 2: Green (Implement to Pass Tests) **Goal:** Implement MINIMUM code to make tests pass **Steps:** 1. **Implement changes from plan:** - Follow "Changes Required" specification - Focus on making tests pass (not perfection yet) - Reference playbook patterns from `(-> (search "...") ...)` output 2. **Run tests FREQUENTLY:** ```bash # After each logical unit of code: ``` 3. **Iterate until tests pass:** - Add missing functionality - Handle edge cases caught by tests - Fix test failures 4. **Verify ALL tests pass:** ```bash # Expected: SUCCESS - all tests pass ``` **Document in observations:** ```markdown ### TDD Green: Implement to Pass Tests **Changes made:** - `src/my_feature.py:42` - Added my_feature function - `src/my_feature.py:58` - Added input validation - `src/errors.py:15` - Added custom error for invalid input **Challenges encountered:** - Input validation required additional error type - Resolution: Added custom exception class **Test execution:** $ Result: PASSED ✓ **Effectiveness:** All tests passing after implementation ``` **Critical Success Factors:** - Implementation focused on passing tests (perfection comes in Refactor) - Tests run frequently (catch issues early) - All tests pass before proceeding to Refactor - Changes align with plan specification ### Phase 3: Refactor (Improve Design While Tests Green) **Goal:** Improve code quality WITHOUT changing behavior **The Three Design Principles (Apply in Order):** **1. Extensibility Check:** Question: "How would I add a new variant without modifying this code?" Patterns: - **Directory-based registration**: New variants added by creating files ``` plugins/ http.py # HTTP plugin file.py # File plugin network.py # Network plugin (added later without modifying existing) ``` - **Plugin architectures**: Load modules from directories at runtime ```python def load_plugins(directory): return [import_module(f) for f in glob(f"{directory}/*.py")] ``` - **Avoid hardcoded lists/enums**: Use discovery instead ```python # ❌ Bad: Hardcoded list (requires modification) def available_backends(): return ["http", "file", "network"] # Need to edit when adding new backend # ✅ Good: Discovery-based (no modification needed) def available_backends(): return [b.name for b in load_backends("src/backends/")] ``` **2. Composability Check:** Question: "Can I combine this with other components in new ways?" Patterns: - **Middleware/pipeline patterns**: Build complex operations from simple handlers ```python # Handlers compose naturally app = Pipeline( authenticate, authorize, handle_request, ) ``` - **Pure data structures**: No hidden state, easy to reason about ```python # ✅ Good: Pure function, composes easily def process_data(data, config): return transform(data, config.transform_fn) ``` - **Function composition**: Small, focused functions that combine ```python def process_pipeline(input): return persist(transform(validate(input))) ``` **3. Parametricity Check:** Question: "Are there any hardcoded values that should be parameters?" Patterns: - **No magic strings/numbers**: ```python # ❌ Bad: Magic number def retry_operation(op): for _ in range(3): # Why 3? attempt(op) # ✅ Good: Parameterized def retry_operation(op, max_retries=3): for _ in range(max_retries): attempt(op) ``` - **Configuration via arguments**: ```python # ❌ Bad: Assumes environment def connect_db(): return connect("localhost", 5432) # Hardcoded! # ✅ Good: Configurable def connect_db(host, port): return connect(host, port) ``` - **No deployment assumptions**: ```python # ❌ Bad: Assumes specific path def load_config(): return read_file("/etc/myapp/config.toml") # ✅ Good: Path provided by caller def load_config(config_path): return read_file(config_path) ``` **Refactoring Process:** 1. Identify refactoring opportunity (apply one principle) 2. Make ONE change at a time 3. Run tests after EACH change 4. Confirm tests still pass (GREEN) 5. Repeat for next refactoring **Document in observations:** ```markdown ### TDD Refactor: Improve Design While Tests Green **Refactorings applied:** 1. **Extensibility improvement:** - Changed: backend loader from hardcoded list to directory-based discovery - Why: New backends can be added by creating files (no code changes) - Tests: Still passing ✓ 2. **Composability improvement:** - Changed: extracted validate_input as separate pure function - Why: Can now compose with other validators, reuse in tests - Tests: Still passing ✓ 3. **Parametricity improvement:** - Changed: retry count from hardcoded 3 to parameter - Why: Different use cases need different retry counts - Tests: Still passing ✓ **Final test execution:** $ Result: PASSED ✓ (tests green throughout refactoring) ``` **Critical Success Factors:** - Tests REMAIN GREEN throughout (run after every change) - One refactoring at a time (don't batch) - Apply all three principles systematically - Reference playbook patterns for guidance ## Workflows ### Standard Phase Implementation Workflow ``` 1. Find ready phase: task_query("(query \"plan-ready\")") → pick first Switch to phase task: task_bootstrap("phase-N") - Extract from phase description: overview, changes, success criteria - observe("Starting phase N: ") ↓ 2. Apply patterns from PQ queries (`(-> (search "...") ...)`, `(-> (proven) ...)`) - Reference relevant patterns for this phase's domain - Record pattern applications via observe() ↓ 3. Read referenced files FULLY - All files in "Changes Required" - Use Read without limit/offset - Understand full context ↓ 4. TDD Red: Write failing tests - Express desired behavior in tests - Run tests, confirm failure for right reason - observe("TDD Red: ") ↓ 5. TDD Green: Implement to pass tests - Write minimum code to pass - Run tests frequently - observe("TDD Green: ") ↓ 6. TDD Refactor: Improve design - Apply Extensibility check - Apply Composability check - Apply Parametricity check - Keep tests green throughout - observe("TDD Refactor: ") ↓ 7. Run automated verification - Build, tests, TODO check, etc. - Record results via observe() - Fix failures immediately (don't proceed) ↓ 8. Request manual verification - Present checklist from phase description - Wait for user approval ↓ 9. Mark phase complete: - observe("Phase N complete. Key outcomes: ") - task_complete() # Marks phase task as completed ↓ 10. Return to parent: task_set_current(parent_id) Continue to next phase via task_query("(query \"plan-ready\")") If issues found: Fix, re-verify, request approval again ``` ### Resume Capability Pattern **Phase completion tracked via task DAG:** ``` task_query("(query \"plan\")") → Shows all phases with completion status task_query("(query \"plan-ready\")") → Shows phases ready to work on (non-completed) ``` **Resume behavior:** - `task_bootstrap(parent_task_id)` then `task_query("(query \"plan-ready\")")` finds resume point - Completed phases are immutable (task_complete guard) - Record resume via `observe("Resuming from phase N")` ### Deviation Handling Pattern **When code/reality differs from plan:** 1. **PAUSE immediately** 2. **Inform user of discrepancy:** ``` PAUSE: Code differs from plan Issue: Plan expected: Reality found: Proposed adaptation: Options: 1. Proceed with adapted approach (document in observations) 2. Update plan to reflect reality 3. Different approach (please specify) ``` 3. **Wait for user decision** 4. **Document in observations:** ```markdown ### Deviation from Plan Issue: Plan vs Reality: User decision: