Balloons work by displacing the atmosphere (mostly nitrogen with some oxygen) with something lighter (helium or hydrogen). This causes buoyancy, and makes the balloon rise.
This only works so long as the atmosphere being displaced weighs more than the balloon plus the payload. As soon as the air gets thin enough that the weight of the balloon+payload is equal to the weight of the air that would fill the volume of the balloon, then it stops rising. (Or, more likely the balloon rips open because it expanded farther than it could stretch).
Usually, this is really high in the atmosphere, but it's definitely not space.
This is all ignoring that orbit requires going sideways really, really fast (so fast, actually, that it requires falling, but going sideways so fast that the earth curves away and you miss).
It is not that easy to build such app from scratch ... it all requires a lot of work, even with AI help. I think the most important is to provide easy to use UI first, and if speed or some missing features will be blockers for further innovation step then maybe native app will be at some point created.
The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.
But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.
I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.
> honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.
Me too!
> I like Gemini Pro's UI over Claude so much
This I don't understand. I mean, I don't see a lot of difference in both UIs. Quite the opposite, apart from some animations, round corners and color gradings, they seem to look very alike, no?
Y'know I ended up buying Kimi's moderato plan which is 19$ but they had this unique idea where you can talk to a bot and they could reduce the price
I made it reduce the price of first month to 1.49$ (It could go to 0.99$ and my frugal mind wanted it haha but I just couldn't have it do that lol)
Anyways, afterwards for privacy purposes/( I am a minor so don't have a card), ended up going to g2a to get a 10$ Visa gift card essentially and used it. (I had to pay a 1$ extra but sure)
Installed kimi code on my mac and trying it out. Honestly, I am kind of liking it.
My internal benchmark is creating pomodoro apps in golang web... Gemini 3 pro has nailed it, I just tried the kimi version and it does have some bugs but it feels like it added more features.
Gonna have to try it out for a month.
I mean I just wish it was this cheap for the whole year :< (As I could then move from, say using the completely free models)
There are many lists, but I find all of them outdated or containing wrong information or missing the actual benchmarks I'm looking for.
I was thinking, that maybe it's better to make my own benchmarks with the questions/things I'm interested in, and whenever a new model comes out run those tests with that model using open-router.
Whatever human that is in charge of the chat bots is your coworker. That person that is responsible for the output of the bots is the one that you would trust but verify with.
I'm on a multi-year quest to answer that question!
The best I've found is running Python code inside Pyodide in WASM in Node.js or Deno accessed from Python via a subprocess, which is a wildly convoluted way to go but does appear to work! https://til.simonwillison.net/deno/pyodide-sandbox
Here's a related recent experimental library which does something similar but with JavaScript rather than Python as the unsafe language, again via Deno in a subprocess: https://github.com/simonw/denobox
In that case you'll need to look at general purpose sandboxes you can run Python in - stuff like Firecracker or Bubblewrap on Linux or sandbox-exec on macOS.
reply