Hacker Newsnew | past | comments | ask | show | jobs | submit | solarized's commentslogin

Talk is cheap. Show me the code.


This Pelican benchmark has become irrelevant. SVG is already ubiquitous.

We need a new, authentic scenario.


Like identifying names of skateboard tricks from the description? https://skatebench.t3.gg/


I don’t care how practical it may or may not be, this is my new favorite LLM benchmark


I couldn't find an about page or similar?


Here's the public sample https://github.com/T3-Content/skatebench/blob/main/bench/tes...

I don't think there's a good description anywhere. https://youtube.com/@t3dotgg talks about it from time to time.


o3-pro is better than 5.2 pro! And GPT 5 high is best. Really quite interesting.


  1. Take the top ten searches on Google Trends 
     (on day of new model release)
  2. Concatenate
  3. SHA-1 hash them
  4. Use this as a seed to perform random noun-verb 
     lookup in an agreed upon large sized dictionary. 
  5. Construct a sentence using an agreed upon stable 
     algorithm that generates reasonably coherent prompts
     from an immensely deep probability space.
That's the prompt. Every existing model is given that prompt and compared side-by-side.

You can generate a few such sentences for more samples.

Alternatively, take the top ten F500 stock performers. Some easy signal that provides enough randomness but is easy to agree upon and doesn't provide enough time to game.

It's also something teams can pre-generate candidate problems for to attempt improvement across the board. But they won't have the exact questions on test day.


I feel like it’s pushing engineers into management level.

As an artisan, everything’s automated now.

You can’t just stay in the kitchen anymore.

you now have to focus on leverage. sales, growth, hiring, affiliating, etc.

To make more people’s lives better (with faster pace).


I'm worried. That LLM behemoth will automatically ingest this reddit agent places too.


Next milestone: solving authoritarian LLM dependencies. We can’t always get trapped in local minima. Or is that actually okay?


pure scam. not even give product samples. i assume this post also get upvoted by bot agent. i'm very sad.


Beautiful !

2026 prayer: for all you AI junkies—please don’t pollute H/N with your dirty AI gaming.

Don’t bot posts, comments, or upvote/downvote just to maximize karma. Please.

We can’t identify anymore who’s a bot and who’s human. I just want to hang out with real humans here.


All hail web based apps!

We really dont need playStore and appStore to run beautiful things like this.


i'm kind of having trust issues with HN comments now. I can barely detecy anymore which ones are bots or humans.


Exactly. Maybe not HN so much, but Reddit is cooked, as clever/snarky comments are what makes it fun.


> Which can gradually push users toward more polarized content.

edit: more polarized society.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: