More

solarized · 2026-02-15T10:09:32 1771150172

Talk is cheap. Show me the code.

solarized · 2026-02-11T20:22:51 1770841371

This Pelican benchmark has become irrelevant. SVG is already ubiquitous.

We need a new, authentic scenario.

viraptor · 2026-02-11T20:52:41 1770843161

Like identifying names of skateboard tricks from the description? https://skatebench.t3.gg/

alargemoose · 2026-02-11T21:23:55 1770845035

I don’t care how practical it may or may not be, this is my new favorite LLM benchmark

stevage · 2026-02-11T21:38:38 1770845918

I couldn't find an about page or similar?

viraptor · 2026-02-11T22:00:18 1770847218

Here's the public sample https://github.com/T3-Content/skatebench/blob/main/bench/tes...

I don't think there's a good description anywhere. https://youtube.com/@t3dotgg talks about it from time to time.

hmottestad · 2026-02-11T21:25:18 1770845118

o3-pro is better than 5.2 pro! And GPT 5 high is best. Really quite interesting.

echelon · 2026-02-11T21:56:20 1770846980

  1. Take the top ten searches on Google Trends 
     (on day of new model release)
  2. Concatenate
  3. SHA-1 hash them
  4. Use this as a seed to perform random noun-verb 
     lookup in an agreed upon large sized dictionary. 
  5. Construct a sentence using an agreed upon stable 
     algorithm that generates reasonably coherent prompts
     from an immensely deep probability space.

That's the prompt. Every existing model is given that prompt and compared side-by-side.

You can generate a few such sentences for more samples.

Alternatively, take the top ten F500 stock performers. Some easy signal that provides enough randomness but is easy to agree upon and doesn't provide enough time to game.

It's also something teams can pre-generate candidate problems for to attempt improvement across the board. But they won't have the exact questions on test day.

solarized · 2026-02-11T05:35:46 1770788146

I feel like it’s pushing engineers into management level.

As an artisan, everything’s automated now.

You can’t just stay in the kitchen anymore.

you now have to focus on leverage. sales, growth, hiring, affiliating, etc.

To make more people’s lives better (with faster pace).

solarized · 2026-01-31T08:49:51 1769849391

I'm worried. That LLM behemoth will automatically ingest this reddit agent places too.

solarized · 2026-01-28T08:06:31 1769587591

Next milestone: solving authoritarian LLM dependencies. We can’t always get trapped in local minima. Or is that actually okay?

solarized · 2026-01-11T07:38:58 1768117138

pure scam. not even give product samples. i assume this post also get upvoted by bot agent. i'm very sad.

solarized · 2025-12-30T23:12:30 1767136350

Beautiful !

2026 prayer: for all you AI junkies—please don’t pollute H/N with your dirty AI gaming.

Don’t bot posts, comments, or upvote/downvote just to maximize karma. Please.

We can’t identify anymore who’s a bot and who’s human. I just want to hang out with real humans here.

solarized · 2025-12-11T13:18:43 1765459123

All hail web based apps!

We really dont need playStore and appStore to run beautiful things like this.

solarized · 2025-12-10T02:39:08 1765334348

i'm kind of having trust issues with HN comments now. I can barely detecy anymore which ones are bots or humans.

dwd · 2025-12-10T07:03:03 1765350183

Exactly. Maybe not HN so much, but Reddit is cooked, as clever/snarky comments are what makes it fun.

solarized · 2025-11-30T04:08:13 1764475693

> Which can gradually push users toward more polarized content.

edit: more polarized society.