The AGI Final Frontier: The CLJ-AGI Benchmark

raspasov.posthaven.com

18 points by raspasov a day ago

I get it now. Benchmarks, in the end, are prompts for AI researchers.

If you want a problem solved, translate it into an AGI benchmark.

With enough patience, it becomes something AI researchers report on, optimize for, and ultimately saturate. Months later, the solution arrives; all you had to do was wait. AI researchers are an informal, lossy form of distributed computation - they mass-produce solutions and tools that, almost inevitably, solve the messy problem you started with.

mdemare a day ago

More AGI Final Frontiers:

"Reimplement Sid Meier's Alpha Centauri", but with modern graphics, smart AIs that role-play their personalities, all bugs fixed, a much better endgame, AI-generated unexpected events, and a dev console where you can mod the game via natural language instructions."

"Reimplement all linux command line utilities in Rust, make their names, arguments and options consistent, and fork all software and scripts on the internet to use the new versions."

SequoiaHope 13 hours ago

That’s still just code! How about “design a metal 3D printing machine which can be built for $2000 and can make titanium, steel, aluminum, and copper parts with 100 micron precision, then design a simple factory for that machine. Write the manufacturing programs for all of the CNC machines, and work instructions for every step of the process. Order the material and hire qualified individuals to operate the machines. Identify funding opportunities and raise funds.”
I could go on. One of the challenges here is that many things like this cannot be designed by simply thinking, unless you have extremely super human performance, because complex subassemblies have to be built and prototypes and debugged. And right now there’s no good datasets for machine design, PCB design, machine tool programming, hiring, VC fund raising, negotiating building leases, etc.
We will never have real AGI unless it can learn how to improve without extensive datasets.
glimshe 21 hours ago

Let's say we had a ChatGPT-2000 capable of all of this. How would digital life look like? What people would do with their computers?
- Lerc 16 hours ago
  
  Even if we were not past a hard takeoff point where AIs could decide for themselves what to work on, the things that would be created in all areas would be incredible.
  Consider every time you played a game and thought it would be better if it had x,y, or z. Or you wished an application had this one simple nrw feature.
  All those things would be possible to make. A lot of people will discover why their idea was a bad idea. Some will discover their idea was great, some will erroneously think their bad idea is great.
  We will be inundated with the creation of those good and bad ideas. Some people will have ideas on how to manage that flood of new creations, and create tools to help out, some of those tools will be good and some of them will be bad, there will be a period of churn where finding the good and ignoring the bad is difficult, a badly made curator might make bad ideas linger.
  That's just in the domain of games and applications. If AI could manage that level of complexity, you can ask it to develop and test just about any software idea you have.
  I barely go a day without thinking of something that I could spend months of development time on.
  Some idle thoughts that such a model could develop and test.
  Can you make a transformer that instead of linear space V modifiers it instead used geodesics? Is it better? Would it better support scalable V values?
  Can you train a model to identify which layer is the likely next layer purely based upon the input given to that layer? If it only occasionally gets it wrong does the model perform better if you give the input to the layer that the predictor thought was the next layer. Can you induce looping/skipping layers this way?
  If you train a model with the layers in a round robin ordering on every input, do the layers regress to a mean generic layer form, or do they develop into a general information improver that works purely by the context of the input.
  What if you did every layer on a round robin twice, so that every layer was guaranteed to be followed by any of the other layers at least once?
  Given you can quadruple the parameters of a model without changing it's behavour using the Wn + Randomn, Wn - Randomn trick, can you distill a model To .25 size then quadruple to make a model to retain the size but takes further learning better, broadening parameter use.
  Can any of these ideas be combined with the ones above?
  Imagine instead of having these idle ideas, you could direct an AI to implement them and report back to you the results.
  Even if 99.99% of the ideas are failures, there could be massive advances from the fraction that remains.
raspasov 21 hours ago

"Reimplement Linux in Rust" would be a good one!

kloud a day ago

My "pelican test" for coding LLMs now is creating a proof of concept building UIs (creating a hello world app) using Jetpack Compose in Clojure. Since Compose is implemented as Kotlin compiler extensions and does not provide Java APIs, it cannot be used from Clojure using interop.

I outlined a plan to let it analyze Compose code and suggest it can reverse engineer bytecode of Kotlin demo app first and emit bytecode from Clojure or implement in Clojure directly based on the analysis. Claude Code with Sonnet 4 was confident implementing directly and failed spectacularly.

Then as a follow-up I tried to let it compile Kotlin demo app and then tried to bundle those classes using clojure tooling to at least make sure it gets the dependencies right as the first step to start from. It resorted to cheating by shelling out to graddlew from clojure :) I am going to wait for next round of SOTA models to burn some tokens again.

delegate a day ago

Doesn't Clojure already support all of those features ?

Eg.

> transducer-first design, laziness either eliminated or opt-in

You can write your code using transducers or opt-in for laziness in Clojure now. So it's a matter of choice of tools, rather than a feature of the language.

> protocols everywhere as much as practically possible (performance)

Again, it's a choice made by the programmer, the language already allows you to have protocols everywhere. It's also how Clojure is implemented under the hood.

-> first-class data structures/types are also CRDT data types, where practical (correctness and performance)

Most of the programs I worked on, did not require CRDT. I'm inclined to choose a library for this.

> first-class maps, vectors, arrays, sets, counters, and more

Isn't this the case already ? If Clojure's native data structures are not enough, there's the ocean of Java options..

Which leads to a very interesting question:

How should the 'real' AGI respond to your request ?

raspasov a day ago

> first-class maps, vectors, arrays, sets, counters, and more
That's my mistake; this line was intended to be a sub-bullet point of the previous line regarding CRDTs.
> the language already allows you to have protocols everywhere
The core data structures, for example, are not based on protocols; they are implemented in pure Java. One reason is that the 1.0 version of the language lacked protocols. All that being said, it remains an open question what the full implications of the protocol-first idea are.
> You can write your code using transducers or opt in for laziness in Clojure now. So it's a matter of choice of tools, rather than a feature of the language.
You 100% can. Unfortunately, many people don't. The first thing people learn is (map inc [1 2 3]), which produces a lazy sequence. Clojure would never change this behavior, as the authors value backward compatibility almost above everything else, and rightly so. A transducer-first approach would be a world where (map inc [1 2 3]) produces the vector [2 3 4] by default, for example.
This was mentioned by Rich Hickey himself in his "A History of Clojure" paper:
https://clojure.org/about/history https://dl.acm.org/doi/pdf/10.1145/3386321
(from paper) > "Clojure is an exercise in tool building and nothing more. I do wish I had thought of some things in a different order, especially transducers. I also wish I had thought of protocols sooner, so that more of Clojure’s abstractions could have been built atop them rather than Java interfaces."

rs186 13 hours ago

What comes to mind is whether AGI can gracefully solve the Go error handling problem, once and for all.

https://go.googlesource.com/proposal/+/master/design/go2draf...

Lerc 16 hours ago

The notion of when a language is created is open to interpretation.

It is not stated whether you want such a language described, specified, or implemented.

raspasov 8 hours ago

I think "created" is generally considered to be implemented :).
I also discuss performance, so I think implementation is definitely strongly implied.

malux85 a day ago

Perhaps this is a really great AGI test - not in the sense that the AGI can complete the given task correctly, but if the AGI can interpret incredibly hand-wavy requirements with “do XXX (as much as possible)” and implement these: A,B,C etc

upghost 17 hours ago

This is a good one. Forget AGI, I'd settle for an LLM that when doing Clojure doesn't spew hot trash. Balancing parens on tab complete would be a nice start. Or writing sensible ClojureScript that isn't reskinned JavaScript with parens would be pretty stellar.

raspasov 8 hours ago

Haha, the higher-end LLMs are not absolutely terrible. In my experience, LLMs in their current form are better at explaining code than creating it. Not perfect by any stretch in either task.
Balancing parens is still a challenge.