More

sunpazed · 2025-11-08T12:44:19 1762605859

This is really impressive. At these speeds, it’s possible to run agents with multi-tool turns within seconds. Consider it a feature rich, “non-deterministic API” for your platform or business.

sunpazed · 2025-08-15T14:20:47 1755267647

Thanks so much for delivering on this model. It’s great as a draft model for speculative decoding. Keep up the great work!!

sunpazed · 2025-08-11T13:15:48 1754918148

Don’t have enough ram for this model, however the smaller 20B model runs nice and fast on my MacBook and is reasonably good for my use-cases. Pity that function calling is still broken with llama.cpp

tarruda · 2025-08-11T13:19:32 1754918372

It is fixed in this PR/branch: https://github.com/ggml-org/llama.cpp/pull/15181

codazoda · 2025-08-11T16:03:29 1754928209

I'm glad to see this was a bug of some sort and (hopefully) not a full RAM limitation. I've used quite a few of these models on my MacBook Air with 16GB of RAM. I also have a plan to build an AI chat bot and host it from my bedroom on a $149 mini-pc. I'll probably go much smaller than the 20B models for that. The Qwen3 4B model looks quite good.

https://joeldare.com/my_plan_to_build_an_ai_chat_bot_in_my_b...

tempotemporary · 2025-08-13T17:04:56 1755104696

what are your use cases? wondering if it's good enough for coding / agentic stuff

sunpazed · 2025-05-04T11:19:00 1746357540

In my testing, tokens per sec is half the speed of the GPU, however the power usage is 10x less — 2 watts ANE vs 20 watts GPU on my M4 Pro.

sunpazed · 2025-05-04T11:15:49 1746357349

The key benefit is significant lower power usage. Benchmarked llama3.2-1B on my machines; M1 Max (47t/s, ~1.8 watts), M4 Pro (62t/s, ~2.8 watts). The GPU is twice as fast (even faster on the Max), but draws much more power (~20 watts) vs the ANE.

Also the ANE models are limited to 512 tokens of context, so unlikely yet to use these in production.

anemll · 2025-05-07T16:15:40 1746634540

We can ran 2000 or 4000 context with ANE

sunpazed · 2025-05-03T04:59:54 1746248394

Love this! The C64 introduced me to the world of computers as a kid. I still have that almost 40 year old machine in my collection, but I’m weary of failure every time I turn it on. This is somewhat better than the MiSTer as I can use physical peripherals with it. Great work!

Gergo · 2025-05-03T06:43:17 1746254597

The most common failure points in these old boxes are the capacitors and the power supply. Swap out all the caps and replace the original power supply for a modern remake and the 64 could last you another 40 years. :)

sunpazed · 2025-04-14T02:25:37 1744597537

Let’s remind ourselves that MCP was announced to the world in November 2024, only 4 short months ago. The RFC is actively being worked on and evolving.

sealeck · 2025-04-14T12:05:38 1744632338

It's April 2025

marcellus23 · 2025-04-14T17:25:37 1744651537

Yes, and it's been about 4 and a half months since Nov 25, 2024.

sunpazed · 2025-04-14T02:17:48 1744597068

While I’m a fan, we’re not using MCP for any production workloads for these very reasons.

Authentication, session management, etc, should be handled outside of the standard, and outside of the LLM flow entirely.

I recently mused on these here; https://github.com/sunpazed/agent-mcp/blob/master/mcp-what-i...

sunpazed · 2025-04-10T14:26:11 1744295171

Funny to see you here. I’m from operata.io (also a Melbourne based startup) and would see your website (operatr.io) when I would mis-spell mine!

sunpazed · 2025-04-10T00:08:13 1744243693

I had the same frustration and wanted to see "under the hood", so I coded up this little agent tool to play with MCP (sse and stdio), https://github.com/sunpazed/agent-mcp

I really is just json-rpc 2.0 under the hood, either piped to stdio or POSTed over http.