All these new tools are so exciting, but running untrusted code which auto-updates itself is blocking me from trying these tools.
I wish for a vetting tool. Have an LLM examine the code then write a spec of what it reads and writes, & you can examine that before running it. If something in the list is suspect.. you’ll know before you’re hosed not after :)
that's cool and all, before you get malicious code that includes prompt injections and code that never runs but looks super legit.
LLMs are NOT THOROUGH. Not even remotely. I don't understand how anyone can use LLMs and not see this instantly. I have yet to see an LLM get a better failure rate than around 50% in the real world with real world expectations.
Especially with code review, LLMs catch some things, miss a lot of things, and get a lot of things completely and utterly wrong. It takes someone wholly incompetent at code review to look at an LLM review and go "perfect!".
All these new tools are so exciting, but running untrusted code which auto-updates itself is blocking me from trying these tools.
I wish for a vetting tool. Have an LLM examine the code then write a spec of what it reads and writes, & you can examine that before running it. If something in the list is suspect.. you’ll know before you’re hosed not after :)
that's cool and all, before you get malicious code that includes prompt injections and code that never runs but looks super legit.
LLMs are NOT THOROUGH. Not even remotely. I don't understand how anyone can use LLMs and not see this instantly. I have yet to see an LLM get a better failure rate than around 50% in the real world with real world expectations.
Especially with code review, LLMs catch some things, miss a lot of things, and get a lot of things completely and utterly wrong. It takes someone wholly incompetent at code review to look at an LLM review and go "perfect!".
Feels very similar to Aider[1]
1: https://aider.chat/
btw do you have javascript's stack background?
Claude Code with a plan is so much cheaper than any API.