Hacker Newsnew | past | comments | ask | show | jobs | submit | sunpazed's commentslogin

This is really impressive. At these speeds, it’s possible to run agents with multi-tool turns within seconds. Consider it a feature rich, “non-deterministic API” for your platform or business.


Thanks so much for delivering on this model. It’s great as a draft model for speculative decoding. Keep up the great work!!


Don’t have enough ram for this model, however the smaller 20B model runs nice and fast on my MacBook and is reasonably good for my use-cases. Pity that function calling is still broken with llama.cpp



I'm glad to see this was a bug of some sort and (hopefully) not a full RAM limitation. I've used quite a few of these models on my MacBook Air with 16GB of RAM. I also have a plan to build an AI chat bot and host it from my bedroom on a $149 mini-pc. I'll probably go much smaller than the 20B models for that. The Qwen3 4B model looks quite good.

https://joeldare.com/my_plan_to_build_an_ai_chat_bot_in_my_b...


what are your use cases? wondering if it's good enough for coding / agentic stuff


In my testing, tokens per sec is half the speed of the GPU, however the power usage is 10x less — 2 watts ANE vs 20 watts GPU on my M4 Pro.


The key benefit is significant lower power usage. Benchmarked llama3.2-1B on my machines; M1 Max (47t/s, ~1.8 watts), M4 Pro (62t/s, ~2.8 watts). The GPU is twice as fast (even faster on the Max), but draws much more power (~20 watts) vs the ANE.

Also the ANE models are limited to 512 tokens of context, so unlikely yet to use these in production.


We can ran 2000 or 4000 context with ANE


Love this! The C64 introduced me to the world of computers as a kid. I still have that almost 40 year old machine in my collection, but I’m weary of failure every time I turn it on. This is somewhat better than the MiSTer as I can use physical peripherals with it. Great work!


The most common failure points in these old boxes are the capacitors and the power supply. Swap out all the caps and replace the original power supply for a modern remake and the 64 could last you another 40 years. :)


Let’s remind ourselves that MCP was announced to the world in November 2024, only 4 short months ago. The RFC is actively being worked on and evolving.


It's April 2025


Yes, and it's been about 4 and a half months since Nov 25, 2024.


While I’m a fan, we’re not using MCP for any production workloads for these very reasons.

Authentication, session management, etc, should be handled outside of the standard, and outside of the LLM flow entirely.

I recently mused on these here; https://github.com/sunpazed/agent-mcp/blob/master/mcp-what-i...


Funny to see you here. I’m from operata.io (also a Melbourne based startup) and would see your website (operatr.io) when I would mis-spell mine!


I had the same frustration and wanted to see "under the hood", so I coded up this little agent tool to play with MCP (sse and stdio), https://github.com/sunpazed/agent-mcp

I really is just json-rpc 2.0 under the hood, either piped to stdio or POSTed over http.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: