A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all (augmentcode.com)

51 points|by gmays|2h ago|7 comments|Read full story on augmentcode.com

Comments (7)

7 shown

1. rgbrgb|1h ago|context

I'd guess the same has always been true for READMEs / human dev docs. Of course it doesn't transfer directly but still feels incredible to be in an age where we can measure such (previously) theoretical things with synthetic programmers.
2. forgotusername6|47m ago|context

Interesting that they had a 100% read rate of agents.md. In my test repo lower down agents.md files were occasionally missed by vscode copilot. That fact put me off putting too much effort into nesting agents.md files too much within the repo and I've been focusing on agent skills instead.
3. weiliddat|15m ago|context

This is more a harness thing signaling the presence or forcing a read on AGENTS/CLAUDE.md right?
4. verdverm|45m ago|context

IME, multiple (good) AGENTS.md is even better. I mostly see them only at the root of a repository, but I spread more out into important subdirectories. They act as a table of contents and spark notes. Putting more focussed AGENTS.md in important places has been even more helpful.
Bonus points if you can force them into context without needing the agent to make a tool call, based on touching the files or systems near them. (my homegrown agent has this feature)
5. themafia|38m ago|context

The models are so terrible you have to think ahead of them so they don't make mistakes. This is not an upgrade. This is coping behavior.
6. readitalready|3m ago|context

That's like saying "the programmers are so terrible you have to think ahead of them so they don't make mistakes".
7. weiliddat|8m ago|context

I suspect the harness (of which AGENTS and skills and similar things) should be abstracted for better overall performance. This article doesn't really go into detail about model preferences, but some other benchmarks show that different models have differnt preferences of how to use certain tools (probably related to their post training material), and it should really be managed invisibly to me as the end user.
Also curious how well LLMs can self-reflect in a loop, in terms of, here's how the previous iteration went, here's what didn't go well, here's feedback from the human, how do I modify the docs I use in a way that I know I'll do better next time.
I know you can somewhat hillclimb via DSPy but that's hard to generalize.