Hi! I put together a simple metric to track progress toward AR glasses that are good enough for everyday use, replacing your screens, wearing them all day, the whole thing. Curious what others think about this :)
This question maybe formulated too harsh, but it is valuable. There are quite a few similar applications (I think I tried 2 or 3 of them). Some are around for couple of years.
What is new / unique in your approach?
Hi! I created an algorithm to detect unused screen real estate and made a video browser that auto-positions itself there. Uses seed growth to find the biggest unused rectangular region every 0.1s. Repositions automatically when you rearrange windows. Would be fun to hear what you think :)
I created a calculator that scores your life on objective factors across 12 categories (career, health, relationships, sleep, exercise, mental health, finances, etc.).
The premise is that if you score high on most areas, you have few barriers to wanting to exist. More importantly, maintaining these behaviors demonstrates functional capacity. You can't score high if you're genuinely not functioning.
Each question is rated 0-10 where 5 is average. Z-score transformation converts ratings to population percentiles, so a 7/10 becomes 84th percentile. Final score is the arithmetic mean. All data stays local.
Would love feedback on whether anything is missing. I am also curious if you agree on the premise. :)
//To understand poetry, one must first be fluent with its technical aspects. To fully grasp poetry, you must first understand its meter, rhyme, and figures of speech. After mastering the technical side, you should ask two questions:
First, how artfully has the objective of the poem been rendered? This rates the poem's perfection. Second, how important is that objective? This rates the objective's importance. I propose a formula to rate a poem's greatness by plotting its perfection on a horizontal axis and its importance on a vertical axis, with the resulting area indicating its greatness. --Dr. J. Evans Pritchard//
I've always found the Wilhelm Scream a bit intriguing, but there's no good centralized website for browsing and cataloging all its appearances. So I built one!
Anyone can easily add or edit entries. The goal is to document every Wilhelm Scream with timestamps, YouTube clips, and details for all movies and TV series.
Isn't it ironic that it ended up being harder to get a computer to explicitly not create a photorealistic image of an elephant than to have create one?
Author here! While working on h-matched.com (tracking time between benchmark release and AI achieving human-level performance), I just added the first negative datapoint - LongBench v2 was solved 22 days before its public release.
This wasn't entirely unexpected given the trend, but it raises fascinating questions about what happens next. The trend line approaching y=0 has been discussed before, but now we're in uncharted territory.
Mathematically, we can make some interesting observations about where this could go:
1. It won't flatten at zero (we've already crossed that)
2. It's unlikely to accelerate downward indefinitely (that would imply increasingly trivial benchmarks)
3. It cannot cross y=-x (that would mean benchmarks being solved before they're even conceived)
My hypothesis is that we'll see convergence toward y=-x as an asymptote. I'll be honest - I'm not entirely sure what a world operating at that boundary would even look like. Maybe others here have insights into what existence at that mathematical boundary would mean in practical terms?
Since model capabilities do not change after release, shouldn't the model release date be the benchmark? (In this case, o1-preview was released on September 12, 2024)
You could flip it around like that. In this case I have chosen to have the "Released Date" as being when the benchmark was published and the "Solved Date" to be when an AI system had a human-level performance for that specific benchmark.
Author here. I think our approach to AI benchmarks might be too human-centric. We keep creating harder and harder problems that humans can solve (like expert-level math in FrontierMath), using human intelligence as the gold standard.
But maybe we need simpler examples that demonstrate fundamentally different ways of processing information. The dice prediction isn't important - what matters is finding clean examples where all information is visible, but humans are cognitively limited in processing it, regardless of time or expertise.
It's about moving beyond human performance as our primary reference point for measuring AI capabilities.