NewsLab
Jun 28 23:08 UTC

Show HN: Orchid – Local-first record and replay for AI agent debugging (github.com)

4 points|by brightmonkey||2 comments|Read full story on github.com
Orchid (Orchestration interactive debugger) is a zero-instrumentation proxy that captures every API & LLM call in your agent pipeline, then lets you inspect and replay the entire run locally, step by step. No instrumentation, no vendor lock-in, no cloud dependency. It also provides a visual inspector and MCP server, so you can inspect the session yourself or use your favorite agentic coding IDE to debug your agent runs.

I built it because I was tired of debugging agent failures by grepping through logs, and the available AI observability tools all seemed to require intrusive instrumentation and/or sending my prompts and responses to a cloud service. I wanted something that would let me debug agent runs locally, without having to worry about vendor lock-in or data privacy.

Orchid is that tool. The call inspection features work extremely well, at least for my use cases, but the replay feature is perhaps more interesting. It makes LLM pipeline testing deterministic without mocking or re-running expensive API calls.

Free, self-hosted, runs on your machine or infrastructure: https://github.com/mario-guerra/orchid-trace

Would love feedback from anyone building multi-step agentic systems or struggling with non-deterministic LLM test failures.

Comments (2)

2 shown
  1. 1. kerlenton||context
    The approach you described to recording and replaying actions seems interesting. I went at this problem from a different perspective, though. I developed a wiretap transparent proxy that sits in the middle of the JSON-RPC traffic between the client and MCP server; therefore, the record/replay occurs on the wire rather than within the agent. This has the advantage of being able to work with clients you don’t own or can’t instrument (Claude Desktop, Cursor), but it has the disadvantage of only capturing the protocol and not what reasoning the agent has (which your method captures).

    How do you deal with replay non-determinism? When I replay a call I captured, I spin up a new server instance, but anything that is stateful, or any time that the model chooses different arguments the second time around makes it difficult to create an accurate repeat of the input and the output. I’m interested to see how Orchid manages that in multi-step execution contexts.

  2. 2. Dworf||context
    How do you handle MCP stdio framing specifically?