Ask HN: Why aren't AIs being used as app beta testers yet?

14 points by amichail a day ago

For example, why don't beta testing services such as TestFlight have ChatGPT as a possible beta tester along with the human testers?

Nerd_Nest 6 minutes ago

I’m still torn on this. On one hand, memory could make ChatGPT more useful, especially for people using it regularly for work or coding. But on the other hand, the idea that it “remembers” me just feels a little uncomfortable.

I’d want more control over what’s remembered and when. Curious if anyone here has used this yet — is it actually helpful in practice?

shibatanaoto an hour ago

I think it depends on the situation. Unit test can be done by Claude code. I use it everyday. For E2E testing, browser-based tools are already pretty convenient. AI could definitely help by suggesting UX improvements, but setting up a smooth workflow is still tricky. You’d need to figure out things like where to put the AI’s feedback, when it should kick off testing, and who’s going to sort through all the suggestions it generates. But technically it can be useful and quality is good enough.

danbrooks a day ago

I worked with a team that did this for the Facebook app.

https://engineering.fb.com/2018/05/02/developer-tools/sapien...

duxup a day ago

I'm going to throw out my own ignorant theory.

AIs that I find useful are still just LLMs and LLMs power comes from having a massive amount of text to work with to string together word math and come up with something ok. That's a lot of data that comes together to get things ... kinda right... sometimes.

I don't think there's that data set for "use an app" yet.

We've seen from "AI plays games" efforts that there have been some pretty spectacular failures. It seems like "use app" is a different problem.

cheevly a day ago

LLMs have literally won Pokemon. Im pretty sure that using an app is 10x simpler.
- Vilian a day ago
  
  A lot simpler to run pokemon than test an app, the game play by itself sometimes

bravesoul2 3 hours ago

That's a good idea. You are on to something

drakonka a day ago

They are; we're working on agents for web application testing over at qa.tech.

afrederico a day ago

They should totally be able to. If there's "vibe coding" there should be "vibe testing." We're working on just such a product (https://actory.ai); right now it only does websites but just imagine when we turn it on mobile/apps, etc. How cool would that be?

aristofun a day ago

Because for meaningful tests of an app (assuming b2c or b2b for end users) you are supposed to be or imitate a human being.

Current AI is not even designed to do that. It is just a very sophisticated auto-complete.

It is sophisticated enough to fool some VCs that you can chop your round peg into square hole. But there is no ground to expect a scalable solution.

gametorch 18 hours ago

Eh, I disagree. Lots of valuable open source code purely written by AI has already been shipped.
- aristofun 12 hours ago
  
  Give me 1 decent example of code "purely" written by AI
  
  gametorch an hour ago
  
  https://github.com/gametorch/image_to_pixel_art_wasm
  Thousands of users. 40+ GitHub stars. Original draft took 30 minutes. Added numerous feature requests and each took like 5 minutes a pop.
  I never wrote a single line of that code.
  Furthermore, my startup, https://gametorch.app/ has 110 sign ups, paying users, millions of impressions. Never wrote any of that code either. Typing it out at ~100 wpm is far too slow.

HeyLaughingBoy a day ago

Anecdotally, I know someone who tried to have ChatGPT generate unit tests and it was an abject failure.

cheevly a day ago

I know someone that generated unit tests successfully.
- whoknowsidont a day ago
  
  And I know exactly which one of these is an enterprise B2B app/platform.
haiku2077 19 hours ago

I generate tests with Claude almost every day.
owebmaster 19 hours ago

I have generated unit tests successfully, how did the someone you know failed?
gametorch 18 hours ago

I generated tons of valuable code with a bunch of GitHub stars, paying users, hundreds of signups, millions of impressions. Just chipping in my anecdote.

v5v3 a day ago

Are llm testers doing anything traditional scripts with for loops can't?

postalrat a day ago

llm testers have for loops so they can do everything traditional scripts with for loops can plus more.