Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems (aclanthology.org)
8 points by PranoyP 29 minutes ago | 0 comments
318 points by PranoyP 29 minutes ago | 0 comments
312 points by mukti 29 minutes ago | 0 comments
321 points by qdot76367 30 minutes ago | 1 comment
332 points by saikatsg 30 minutes ago | 0 comments
342 points by bikenaga 30 minutes ago | 1 comment
359 points by vaibhavdubey97 32 minutes ago | 0 comments
362 points by 01-_- 33 minutes ago | 0 comments
372 points by scosman 33 minutes ago | 0 comments
381 points by ginda307 33 minutes ago | 0 comments
392 points by speckx 34 minutes ago | 0 comments
402 points by doener 34 minutes ago | 0 comments
411 points by 01-_- 34 minutes ago | 0 comments
424 points by meetpateltech 35 minutes ago | 0 comments
432 points by PublicEnemy111 36 minutes ago | 0 comments
443 points by mfiguiere 37 minutes ago | 0 comments
452 points by thunderbong 38 minutes ago | 0 comments
462 points by tnolet 38 minutes ago | 3 comments
472 points by gnabgib 39 minutes ago | 0 comments
486 points by thtmnisamnstr 39 minutes ago | 0 comments
492 points by crousto 40 minutes ago | 0 comments
501 points by speckx 41 minutes ago | 0 comments
515 points by methuselah_in 41 minutes ago | 0 comments
521 points by djoldman 42 minutes ago | 0 comments
532 points by hnburnsy 43 minutes ago | 0 comments
541 points by mooreds 44 minutes ago | 0 comments
552 points by gmays an hour ago | 0 comments
562 points by Taikonerd an hour ago | 0 comments
578 points by fleahunter an hour ago | 4 comments
583 points by mooreds an hour ago | 2 comments
592 points by gk1 an hour ago | 0 comments
60