Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems aclanthology.org 19 points by PranoyP 5 hours ago
mlop99 4 hours ago Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
shailendra145 5 hours ago A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
Very interesting work.
Excellent work
Interesting
Nice Work
Nice work
Great work
interesting
[dead]