Show HN: Orchid – Local-first record and replay for AI agent debugging (github.com)
I built it because I was tired of debugging agent failures by grepping through logs, and the available AI observability tools all seemed to require intrusive instrumentation and/or sending my prompts and responses to a cloud service. I wanted something that would let me debug agent runs locally, without having to worry about vendor lock-in or data privacy.
Orchid is that tool. The call inspection features work extremely well, at least for my use cases, but the replay feature is perhaps more interesting. It makes LLM pipeline testing deterministic without mocking or re-running expensive API calls.
Free, self-hosted, runs on your machine or infrastructure: https://github.com/mario-guerra/orchid-trace
Would love feedback from anyone building multi-step agentic systems or struggling with non-deterministic LLM test failures.
How do you deal with replay non-determinism? When I replay a call I captured, I spin up a new server instance, but anything that is stateful, or any time that the model chooses different arguments the second time around makes it difficult to create an accurate repeat of the input and the output. I’m interested to see how Orchid manages that in multi-step execution contexts.