More

dust42 · 2026-02-05T08:59:11 1770281951

The brand new Qwen3-Coder-Next runs at 300Tok/s PP and 40Tok/s on M1 64GB with 4-bit MLX quant. Together with Qwen Code (fork of Gemini) it is actually pretty capable.

Before that I used Qwen3-30B which is good enough for some quick javascript or Python, like 'add a new endpoint /api/foobar which does foobaz'. Also very decent for a quick summary of code.

It is 530Tok/s PP and 50Tok/s TG. If you have it spit out lots of the code that is just copy of the input, then it does 200Tok/s, i.e. 'add a new endpoint /api/foobar which does foobaz and return the whole file'

dust42 · 2026-02-04T18:13:51 1770228831

Well, he destroyed Gawker. Not that I think they were good people. But it was definitely a personal vendetta.

gruez · 2026-02-04T18:18:30 1770229110

Gawker was a well known website with 23 million visit per month, and a Wikipedia page. This guy has 44k subscribers and no Wikipedia page. It's a stretch to go from "Thiel had a vendetta against Gawker" to "Thiel had a vendetta against this guy".

pydry · 2026-02-04T22:47:01 1770245221

By contrast this involved flipping a switch. It was extremely easy.

gruez · 2026-02-05T00:13:05 1770250385

If that were true you'd see much more people getting banned than one streamer with 44k subs.

pydry · 2026-02-05T10:27:52 1770287272

You seem to be assuming that this is a large bank.

dust42 · 2026-02-04T18:11:39 1770228699

There are now quite a few cases in Europe where the EU or local govs been de-banking individuals. No court, no judge needed. Much more efficient way to shut down critics. We ain't need no people who delegitimize those in power.

orwin · 2026-02-04T18:23:50 1770229430

Can we have a link? In France, at most you can get your account restricted (can't go into deficit and a sum is blocked) until the issue is resolved (99% because of unpaid taxes, sometimes the money is blocked by a judge until a judgement is passed). It's weird if the EU don't have a standard.

ThePowerOfFuet · 2026-02-04T21:10:21 1770239421

Qonto is French and so is the commenter on Twitter.

What you are saying is not correct; perhaps you are confusing it with the right to an account?

https://www.banque-france.fr/fr/a-votre-service/particuliers...

orwin · 2026-02-06T10:13:28 1770372808

Yes, exactly. French citizens have an inalienable right to a checking account, enforced by French government. I don't remember the exact law, but I know someone who was 'interdit bancaire' (took too much revolving credits in the 90s) and the local bank _had_ to let him open a new account (a very limited one).

Luhrel · 2026-02-04T18:48:04 1770230884

Look at what happened to Jacques Baud because he criticized the EU on the Ukranian war and is now considered a "pro-russia" and propagating "disinformation". [1]

I read what he wrote ("L’art de la guerre russe, comment l’Occident a conduit l’Ukraine à l’échec"), you can download it temporarily here : [2]. His book is absolutely not pro-russia nor pro-EU.

[1] https://www.swissinfo.ch/eng/various/former-swiss-intelligen...

[2] https://www.swisstransfer.com/d/0117e2b4-8e70-4343-8097-7bfe...

Wilya · 2026-02-04T19:50:50 1770234650

The guy has a "Conspiracy theory and disinformation" section in his wikipedia page, that mentions 9/11.

Come on. That's your "totally not pro-russia" example ?

Ferret7446 · 2026-02-06T02:48:41 1770346121

So being pro Russia justifies de-banking?

LunaSea · 2026-02-04T18:28:53 1770229733

This sounds like bullshit.

The only debanking cases I've been aware of were the US putting pressure on judges from the International Court and a special appointee of the UN for Palestine.

dust42 · 2026-02-04T13:01:01 1770210061

He's Chinese and if you had looked into his comment history you'd know this is not someone who uses LLMs for karma farming and looking at his blog he has a long history of posting about database topics going back before there was GPT.

Should I ever participate in a Chinese speaking forum, I'd certainly use an LLM for translation as well.

dust42 · 2026-02-03T17:07:21 1770138441

Unfortunately Qwen3-next is not well supported on Apple silicon, it seems the Qwen team doesn't really care about Apple.

On M1 64GB Q4KM on llama.cpp gives only 20Tok/s while on MLX it is more than twice as fast. However, MLX has problems with kv cache consistency and especially with branching. So while in theory it is twice as fast as llama.cpp it often does the PP all over again which completely trashes performance especially with agentic coding.

So the agony is to decide whether to endure half the possible speed but getting much better kv-caching in return. Or to have twice the speed but then often you have again to sit through prompt processing.

But who knows, maybe Qwen gives them a hand? (hint,hint)

ttoinou · 2026-02-03T17:13:29 1770138809

I can run nightmedia/qwen3-next-80b-a3b-instruct-mlx at 60-74 tps using LM Studio. What did you try ? What benefit do you get from KV Caching ?

dust42 · 2026-02-03T17:31:53 1770139913

KV caching means that when you have 10k prompt, all follow up questions return immediately - this is standard with all inference engines.

Now if you are not happy with the last answer, you maybe want to simply regenerate it or change your last question - this is branching of the conversation. Llama.cpp is capable of re-using the KV cache up to that point while MLX does not (I am using MLX server from MLX community project). I haven't tried with LMStudio. Maybe worth a try, thanks for the heads-up.

cgearhart · 2026-02-03T21:15:33 1770153333

Any notes on the problems with MLX caching? I’ve experimented with local models on my MacBook and there’s usually a good speedup from MLX, but I wasn’t aware there’s an issue with prompt caching. Is it from MLX itself or LMstudio/mlx-lm/etc?

anon373839 · 2026-02-03T23:56:56 1770163016

There’s this issue/outstanding PR: https://github.com/lmstudio-ai/mlx-engine/pull/188#issuecomm...

dust42 · 2026-02-03T23:00:30 1770159630

It is the buffer implementation. [u1 10kTok]->[a1]->[u2]->[a2]. If you branch between the assistant1 and user2 answers then MLX does reprocess the u1 prompt of let's say 10k tokens while llama.cpp does not.

I just tested with GGUF and MLX of Qwen3-Coder-Next with llama.cpp and now with LMStudio. As I do branching very often, it is highly annoying for me to the point of being unusable. Q3-30B is much more usable then on Mac - but by far not as powerful.

dust42 · 2026-02-02T07:03:53 1770015833

From personal experience I'd like to add the last 5% take 95% of the time - at least if you are working on a make over of an old legacy system.

dust42 · 2026-01-29T16:17:34 1769703454

Indeed. Rather than having the company telling me that they did great I'd rather make up my own mind and watch the video.

dust42 · 2026-01-28T08:44:37 1769589877

> People writing and maintaining software need to optimize for simplicity, readibility, maintainability. Whether they use an LLM to achieve that is seconday. The humans in the loop must understand what's going on.

In a perfect world that is.

dust42 · 2026-01-28T08:41:14 1769589674

Linux is nowadays mostly sponsored by big corporations. They have different goals and different ways to do things. Probably the first 10 years Linux was driven by enthusiasts and therefore it was a lean system. Something like systemd is typical corporate output. Due it its complexity it would have died long before finding adoption. But with enterprise money this is possible. Try to develop for the combo Linux Bluetooth/Audio/dbus: the complexity drives you crazy because all this stuff was made for (and financed by) corporate needs of the automotive industry. Simplicity is never a goal in these big companies.

But then Linux wouldn't be where it is without the business side paying for the developers. There is no such thing as a free lunch...

dust42 · 2026-01-28T08:22:20 1769588540

> AIs have endless grit (or at least as endless as your budget).

That is the only thing he doesn't address: the money it costs to run the AI. If you let the agents loose, they easily burn north of 100M tokens per hour. Now at $25/1M tokens that gets quickly expensive. At some point, when we are all drug^W AI dependent, the VCs will start to cash in on their investments.