More

shawnz · 2026-02-02T19:15:18 1770059718

If their whole business is based around being an established standard and making users happy is not a relevant goal, then why do anything at all? They already are an established standard, so why would they bother taking any further actions whatsoever, making any changes or rolling out any new products? Clearly they are trying to achieve something, right? So what is it?

bluGill · 2026-02-02T21:28:47 1770067727

It is about making specific high value users happy. If the rest of us are unhappy - we don't matter. They know for most people ubuntu or whatever isn't a realistic option and so they can take whatever money they can get from those people. Sure a few people like me will run *BSD or linux, but we are a footnote not worth their time.

The only danger is every once in a while one of those little footnotes becomes large enough to be a problem and you lose the market of those who do matter as well. While there are many obvious examples of where that happened, there are also a lot of cases where it didn't.

2snakes · 2026-02-02T20:10:24 1770063024

It used to be empowering everyone to achieve more.

shawnz · 2026-02-01T15:46:51 1769960811

Wow, these preassembled ESP32 plus touchscreen boards are extremely cheap, and there are tons of them in all kinds of different form factors on Amazon. I didn't realize this kind of thing was so plentiful, this seems like a great way to bootstrap many kinds of electronics/IoT projects

frogperson · 2026-02-01T18:54:16 1769972056

Yeah ESP32 is an awesome rabbit hole. An esp32-c6, cheap yellow display, and a 3d printer and you can build some really interesting things.

brcmthrowaway · 2026-02-01T18:28:40 1769970520

Any commercial products using ESP?

saidinesh5 · 2026-02-01T18:50:30 1769971830

Just look for ESP32 CYD - CYD stands for cheap yellow display. There are a lot of variants.

https://github.com/witnessmenow/ESP32-Cheap-Yellow-Display?t... . I bought mine for about $12 and it's been quite fun tinkering with it.

bri3d · 2026-02-01T22:22:07 1769984527

https://templates.blakadder.com/esp32.html

Here’s a list of just a few. They’re insanely popular not only because they’re just good to use, but also because they’re one of the cheaper FCC approved modules you can buy, which takes a lot of the pain out of bringing a product to market.

slmkbh · 2026-02-01T21:35:35 1769981735

A lot of Shelly devices use ESP chips: https://www.shelly.com/ - And they are hackable!

buescher · 2026-02-02T12:43:58 1770036238

You can find the fcc notices in manuals if you search on espressif’s module grantee code which is 2AC7Z. Espressif is extremely widely used.

jansper39 · 2026-02-02T09:26:54 1770024414

Loads, you can usually tell if something is limited to 2.4Ghz only.

Cabled up an EVSE the other day and the brains of that was a ESP32 chip.

cyberax · 2026-02-02T08:50:11 1770022211

If you pick a smart device that has WiFi connectivity, then there's about a 50% chance that it has an ESP inside.

MallocVoidstar · 2026-02-01T22:23:22 1769984602

Yes, many. As a random example, see: https://www.servethehome.com/ubiquiti-flex-mini-2-5g-review-...

The last image on the page shows various chips in the switch, the top left is an ESP32.

magicalhippo · 2026-02-02T10:24:04 1770027844

Interesting, seems like they're just using it as a MCU? Specs don't mention anything wireless, and I don't see an antenna.

magicalhippo · 2026-02-02T10:17:16 1770027436

My "smart" resistive water heater uses an ESP for Wifi connectivity, so it can heat the water when the electricity prices are low for example.

kurtoid · 2026-02-02T10:54:04 1770029644

(my older) LIFX bulbs have an Espressif MAC address, and I think LIFX has stated they're ESP32-based in the past

bdavbdav · 2026-02-01T18:42:37 1769971357

I think there are plenty using espressif chips. One of my robot vacuums (possibly the Neato?) certainly appeared to be.

shawnz · 2026-02-01T21:34:56 1769981696

AFAIK my humidifier uses an ESP32 chip.

shawnz · 2026-01-31T20:53:44 1769892824

IIRC Apple has attempted to implement some defences against this, for example by requiring the passcode to be inputted before an update can be installed to prevent another San Bernardino scenario. A cursory search indicates that they also have some kind of transparency log system for updates, but it seems to only apply to their cloud systems and not iOS updates.

shawnz · 2026-01-31T17:50:03 1769881803

The table has two categorizations: "In transit & on server" and "End-to-end". The former, which covers iCloud backups in the default configuration, is explicitly NOT end-to-end, meaning there are moments in time during processing where the data is not encrypted.

However, iCloud backups actually are listed as "End-to-end" if you turn on the new Advanced Data Protection feature.

shawnz · 2026-01-30T03:51:24 1769745084

What's the joke here? If they are better than average drivers, that's a huge win which improves road safety for everyone

shawnz · 2026-01-23T19:48:41 1769197721

Why would you need to create a local account? You can just not choose to store the keys in your Microsoft account during BitLocker setup: https://www.diskpart.com/screenshot/en/others/windows-11/win...

Admittedly, the risks of choosing this option are not clearly laid out, but the way you are framing it also isn't accurate

shakna · 2026-01-23T20:23:09 1769199789

All "Global Reader" accounts have "microsoft.directory/bitlockerKeys/key/read" permission.

Whether you opt in, or not, if you connect your account to Microsoft, then they do have the ability fetch the bitlocker key, if the account is not local only. [0] Global Reader is builtin to everything +365.

[0] https://github.com/MicrosoftDocs/entra-docs/commit/2364d8da9...

crazygringo · 2026-01-23T20:35:37 1769200537

They're Microsoft and it's Windows. They always have the ability to fetch the key.

The question is do they ever fetch and transmit it if you opt out?

The expected answer would be no. Has anyone shown otherwise? Because hypotheticals that they could are not useful.

brianxq3 · 2026-01-24T03:25:29 1769225129

> Because hypotheticals that they could are not useful.

Why? They are useful to me and I appreciate the hypotheticals because it highlights the gaps between "they can access my data and I trust them to do the right thing" and "they literally can't access my data so trust doesn't matter."

lazide · 2026-01-23T21:01:19 1769202079

Considering all the shenanigans Microsoft has been up to with windows 11 and various privacy, advertising, etc. stuff?

Hell, all the times they keep enabling one drive despite it being really clear I don’t want it, and then uploading stuff to the cloud that I don’t want?

I have zero trust for Microsoft now, and not much better for them in the past either.

nativeit · 2026-01-24T12:17:05 1769257025

This 100% happens, they’ve done it to at least one of my clients in pretty explicit violations of HIPAA (they are a very small health insurance broker), even though OneDrive had never been engaged with, and indeed we had previously uninstalled OneDrive entirely.

One day they came in and found an icon on their desktop labeled “Where are my files?” that explained they had all been moved in OneDrive following an update. This prompted my clients to go into full meltdown mode, as they knew exactly what this meant. We ultimately got a BAA from Microsoft just because we don’t trust them not to violate federal laws again.

jasomill · 2026-01-24T09:28:20 1769246900

What do Entra role permissions have to do with Microsoft's ability to turn over data in its possession to law enforcement in response to a court order?

cyberax · 2026-01-23T20:36:16 1769200576

This is for the _ActiveDirectory_. If your machine is joined into a domain, the keys will be stored in the AD.

This does not apply to standalone devices. MS doesn't have a magic way to reach into your laptop and pluck the keys.

riskable · 2026-01-23T20:53:18 1769201598

> MS doesn't have a magic way to reach into your laptop and pluck the keys.

Of course they do! They can just create a Windows Update that does it. They have full administrative access to every single PC running Windows in this way.

g-b-r · 2026-01-23T23:28:33 1769210913

People really pay too little attention to this attack avenue.

It's both extremely convenient and very unlikely to be detected; especially given that most current systems are associated to an account.

I'd be surprised if it's not widely used by law enforcement, when it's not possible to hack a device in more obvious ways.

Please check theupdateframework.io if you have a say in an update system.

g-b-r · 2026-01-24T08:34:45 1769243685

I actually misremembered what theupdateframework.io is, I thought it provided more protections...

theragra · 2026-01-24T03:11:45 1769224305

Isn't it the same with many Linux distros?

Updates are using root to run?

g-b-r · 2026-01-24T04:16:55 1769228215

It's largely the same for all automatic updating systems that don't protect against personalized updates.

I don't know the status of the updating systems of the various distributions; if some use server-delivered scripts run as root, that's potentially a further powerful attack avenue.

But I was assuming that the update process itself is safe; the problem is that you usually don't have guarantees that the updates you get are genuine.

So if you update a component run as root, yes, the update could include malicious code that can do anything.

But even an update to a very constrained application could be very damaging: for example, if it is for a E2EE messaging application, it could modify it to have it send each encryption key to a law enforcement agency.

rstuart4133 · 2026-01-24T05:50:31 1769233831

> the problem is that you usually don't have guarantees that the updates you get are genuine

A point of order: you do have that guarantee for most Linux distro packages. All 70,000 of them in Debian's case. And all Linux distro distribute their packages anonymously, so they can never target just one individual.

That's primarily because they aren't trying to make money out of you. Making money requires a billing relationship, and tracking which of your customers own what. Off the back of that governments can demand particular users are targeted with "special" updates. Australia in particular demands commercial providers do that with its "Assistance and Access Bill (2018)" and I'm sure most governments in the OECD have equivalents.

smileybarry · 2026-01-24T21:16:14 1769289374

> so they can never target just one individual

You assume the binary can't just have a machine check in itself that activates only on the target's computer.

rstuart4133 · 2026-01-27T00:07:55 1769472475

Yes, they can do that. But they can't select who gets the binary, so everybody gets it. Debian does reproducible builds on trusted machines so they would have to infect the source.

You can safely assume the source will be viewed by a lot of people over time, so the change will be discovered. The source is managed mostly by git, so there would be history about who introduced the change.

The reality is open source is so far ahead on proprietary code on transparency, there is almost no contest at this point. If a government wants to compromise proprietary code it's easy, cheap, and undetectable. Try the same with open source it's still cheap, but the social engineering ain't easy, and it will be detected - it's just a question of how long it takes.

dhx · 2026-01-24T09:23:49 1769246629

Not really, but it's quite complex for Linux because there are so many ways one can manage the configuration of a Linux environment. For something high security, I'd recommend something like Gentoo or NixOS because they have several huge advantages:

- They're easy to setup and maintain immutable and reproducible builds.

- You only install the software you need, and even within each software item, you only build/install the specific features you need. For example, if you are building a server that will sit in a datacentre, you don't need to build software with Bluetooth support, and by extension, you won't need to install Bluetooth utilities and libraries.

- Both have a monolithic Git repository for packages, which is advantageous because you gain the benefit of a giant distributed Merkle tree for verifying you have the same packages everyone else has. As observed with xz-utils, you want a supply chain attacker to be forced to infect as many people as possible so more people are likely to detect it.

- Sandboxing is used to minimise the lines of code during build/install which need to have any sort of privileges. Most packages are built and configured as "nobody" in an isolated sandbox, then a privileged process outside of the sandbox peeks inside to copy out whatever the package ended up installing. Obviously the outside process also performs checks such as preventing cool-new-free-game from overwriting /usr/bin/sudo.

- The time between a patch hitting an upstream repository and that patch being part of a package installed in these distributions is fast. This is important at the moment because there are many efforts underway to replace and rewrite old insecure software with modern secure equivalents, so you want to be using software with a modern design, not just 5 year old long-term-support software. E.g. glycin is a relatively new library used by GNOME applications for loading of untrusted images. You don't want to be waiting 3 years for a new long-support-support release of your distribution for this software.

No matter which distribution you use, you'll get some common benefits such as:

- Ability to deploy user applications using something like Flatpak which ensures they are used within a sandbox.

- Ability to deploy system applications using something like systemd which ensures they are used within a sandbox.

Microsoft have long underinvested in Windows (particularly the kernel), and have made numerous poor and failed attempts to introduce secure application packaging/sandboxing over the years. Windows is now akin to the horse and buggy when compared to the flying cars of open source Linux, iOS, Android and HarmonyOS (v5+ in particular which uses the HongMeng kernel that is even EAL6+, ASIL D and SIL 3 rated).

theragra · 2026-02-05T04:20:22 1770265222

Sadly, Linux still has many small issues for desktop day-to-day usage. I encounter different small bugs almost each day, something I don't see on Windows that often. These bugs or inconvenient UI are tolerable for me, but not for everybody. Today the bug was Firefox not starting with first click on the shortcut, and mysterious case where keyboard clicks are not registering in the Firefox omnibar until Firefox restart.

shawnz · 2026-01-23T20:59:37 1769201977

Furthermore it seems like it's specific to Azure AD, and I'm guessing it probably only has effect if you enable to option to back up the keys to AD in the first place, which is not mandatory

I'd be curious to see a conclusive piece of documentation about this, though

cyberax · 2026-01-23T22:33:24 1769207604

Regular AD also has this feature, you can store the encryption keys in the domain controller. I don't think it's turned on by default, but you can do that with a group policy update.

smileybarry · 2026-01-24T21:13:54 1769289234

That's for Entra/AD, aka a workplace domain. Personal accounts are completely separate from this. (Microsoft don't have a AD relationship with your account; if anything, personal MS accounts reside in their own empty Entra forest)

vel0city · 2026-01-24T03:46:47 1769226407

They could also just push an update to change it anyways to grab it.

If you really don't trust Microsoft at all then don't use Windows.

shawnz · 2026-01-13T19:18:22 1768331902

I don't agree that this is end to end encrypted. For example, a compromise of the TEE would mean your data is exposed. In a truly end to end encrypted system, I wouldn't expect a server side compromise to be able to expose my data.

This is similar to the weasely language Google is now using with the Magic Cue feature ever since Android 16 QPR 1. When it launched, it was local only -- now it's local and in the cloud "with attestation". I don't like this trend and I don't think I'll be using such products

liuliu · 2026-01-13T19:59:32 1768334372

I agree it is more like e2teee, but I think there is really no alternative beyond TEE + anonymization. Privacy people want it locally, but it is 5 to 10 years away (or never, if the current economics works, there is no need to reverse the trend).

shawnz · 2026-01-13T20:05:03 1768334703

There's FHE, but that's probably an even more difficult technical challenge than doing everything locally

liuliu · 2026-01-13T22:19:30 1768342770

FHE is impossible. You cannot expect to compete on 100x more cost for the same service you provide (and there is no design for accelerated hardware (Tensor Core) on FHE).

0xWTF · 2026-01-16T21:47:03 1768600023

Only 100x the cost? Really? Can you cite a reference how you get it that cheap?

liuliu · 2026-01-20T19:44:41 1768938281

No need for the sarcasm. I am extremely generous about what FHE can achieve. Of course it is not 100x right now.

gardnr · 2026-01-13T20:11:34 1768335094

FHE would be ideal. Relevant conversation from 6 months ago:

https://news.ycombinator.com/item?id=44601023

ignoramous · 2026-01-13T20:08:24 1768334904

> ... 5 to 10 years away (or never, if the current economics works...

Think PCs in 5y to 10y that can run SoTA multi-modal LLMs (cf Mac Pro) will cost as much as cars do, and I reckon folks will buy it.

binary132 · 2026-01-13T20:43:24 1768337004

ISTM that most people would rather give away their privacy than pay even a single cent for most things.

2bitencryption · 2026-01-13T20:00:26 1768334426

if (big if) you trust the execution environment, which is apparently auditable, and if (big if) you trust the TEE merkle hash used to sign the response is computer based on the TEE as claimed (and not a malicious actor spoofing a TEE that lives within an evil environment) and also if you trust the inference engine (vllm / sglanf, what have you) then I guess you can be confident the system is private.

Lots of ifs there, though. I do trust Moxie in terms of execution though. Doesn’t seem like the type of person to take half measures.

mosura · 2026-01-16T13:29:22 1768570162

> if (big if) you trust the execution environment, which is apparently auditable

This is the key question.

What makes it so strange is such an execution environment would have clear applications outside of AI usage.

derefr · 2026-01-13T20:18:04 1768335484

"Server-side" is a bit of a misnomer here.

Sure, for e.g. E2E email, the expectation is that all the computation occurs on the client, and the server is a dumb store of opaque encrypted stuff.

In a traditional E2E chat app, on the other hand, you've still got a backend service acting as a dumb pipe, that shouldn't have the keys to decrypt traffic flowing through it; but you've also got multiple clients — not just your own that share your keybag, but the clients of other users you're communicating with. "E2E" in the context of a chat app, means "messages are encrypted within your client; messages can then only be decrypted within the destination client(s) [i.e. the client(s) of the user(s) in the message thread with you.]"

"E2E AI chat" would be E2E chat, with an LLM. The LLM is the other user in the chat thread with you; and this other user has its own distinct set of devices that it must interact through (because those devices are within the security boundary of its inference infrastructure.) So messages must decrypt on the LLM's side for it to read and reply to, just as they must decrypt on another human user's side for them to read and reply to. The LLM isn't the backend here; the chat servers acting as a "pipe" are the backend, while the LLM is on the same level of the network diagram as the user is.

Let's consider the trivial version of an "E2E AI chat" design, where you physically control and possess the inference infrastructure. The LLM infra is e.g. your home workstation with some beefy GPUs in it. In this version, you can just run Signal on the same workstation, and connect it to the locally-running inference model as an MCP server. Then all your other devices gain the ability to "E2E AI chat" with the agent that resides in your workstation.

The design question, being addressed by Moxie here, is what happens in the non-trivial case, when you aren't in physical possession of any inference infrastructure.

Which is obviously the applicable case to solve for most people, 100% of the time, since most people don't own and won't ever own fancy GPU workstations.

But, perhaps more interesting for us tech-heads that do consider buying such hardware, and would like to solve problems by designing architectures that make use of it... the same design question still pertains, at least somewhat, even when you do "own" the infra; just as long as you aren't in 100% continuous physical possession of it.

You would still want attestation (and whatever else is required here) even for an agent installed on your home workstation, so long as you're planning to ever communicate with it through your little chat gateway when you're not at home. (Which, I mean... why else would you bother with setting up an "E2E AI chat" in the first place, if not to be able to do that?)

Consider: your local flavor of state spooks could wait for you to leave your house; slip in and install a rootkit that directly reads from the inference backend's memory; and then disappear into the night before you get home. And, no matter how highly you presume your abilities to detect that your home has been intruded into / your computer has been modified / etc once you have physical access to those things again... you'd still want to be able to detect a compromise of your machine even before you get home, so that you'll know to avoid speaking to your agent (and thereby the nearby wiretap van) until then.

wutinthewut · 2026-01-15T02:12:19 1768443139

Agree. Products and services in the privacy space have a tendency to be incredibly misleading in their phrasing, framing, and overall marketing as to the nature of their assertions that sound pretty much like: "we totally can never ever see your messages, completely and utterly impossible". Proton is particularly bad for this, it's rather unfortunate to see this from "Moxie" as well.

It's like, come on you know exactly what you're doing, it's unambiguous how people will interpret this, so just stop it. Cue everyone arguing over the minutiae while hardly anyone points out how troubling it is that these people/entities have no concerns with being so misleading/dishonest...

gravifer · 2026-01-28T07:32:02 1769585522

I asked the model about its capabilities, and it turns out it indeed can do Web searches; if it's not hallucinating, the backend server indeed decrypts the output of the LLM; only the user prompt is E2EEed against the server

Edit: I'm a little weary to find there is convenient import but not export functionalities. I manually copied the conversation into a markdown file <https://gist.github.com/Gravifer/1051580562150ce7751146be0c9...>

Stefan-H · 2026-01-13T19:54:56 1768334096

Just like your mobile device is one end of the end-to-end encryption, the TEE is the other end. If properly implemented, the TEE would measure all software and ensure that there are no side channels that the sensitive data could be read from.

paxys · 2026-01-13T19:57:16 1768334236

By that logic SSL/TLS is also end-to-end encryption, except it isn't

Stefan-H · 2026-01-13T20:03:06 1768334586

When the server is the final recipient of a message sent over TLS, then yes, that is end-to-end encryption (for instance if a load balancer is not decrypting traffic in the middle). If the message's final recipient is a third party, then you are correct, an additional layer of encryption would be necessary. The TEE is the execution environment that needs access to the decrypted data to process the AI operations, therefore it is one end of the end-to-end encryption.

shawnz · 2026-01-13T20:09:25 1768334965

This interpretation basically waters down the meaning of end-to-end encryption to the point of uselessness. You may as well just say "encryption".

Stefan-H · 2026-01-13T20:14:35 1768335275

E2EE is usually applied in contexts where the message's final recipient is NOT the server on the other end of a TLS connection, so yes, this scenario is a stretch. The point is that in the context of an AI chat app, you have to decide on the boundary that you draw around the server components that are processing the request and necessarily need access to decrypted data, and call that one "end" of the connection.

paxys · 2026-01-13T20:04:22 1768334662

No need to make up hypotheticals. The server isn't the final destination for your LLM requests. The reply needs to come back to you.

charcircuit · 2026-01-13T20:31:25 1768336285

If Bob and Alice are in an E2EE chat Bob and Alice are the ends. Even if Bob asks Alice a question and she replies back to Bob, Alice is still an end.

Similarly with AI. The AI is one of the ends of the conversation.

paxys · 2026-01-13T23:14:08 1768346048

So ChatGPT is end-to-end encrypted?

Stefan-H · 2026-01-14T16:36:04 1768408564

No, because there is a web server that exposes an API that accepts a plaintext prompt and returns plaintext responses (even if this API is exposed via TLS). Since this web server is not the same server as the backend systems that are processing the prompt, it is a middle entity, rather than an end in the system.

The difference here is that the web server receiving a request for Confer receives an encrypted blob that only gets decrypted when running in memory in the TEE where the data will be used, which IS an end in the system.

hrimfaxi · 2026-01-14T00:56:18 1768352178

Is your point that TLS is typically decrypted by a web server rather than directly by the app the web server forwards traffic to?

charcircuit · 2026-01-14T03:49:52 1768362592

Yes. I include Cloudflare as part of the infrastructure of the ChatGPT service.

Stefan-H · 2026-01-14T16:40:09 1768408809

See my other comment, but the answer here is resoundingly "No". For the communication to be end-to-end encrypted the payload needs to be encrypted through all steps of the delivery process until it reached the final entity it is meant for. Infrastructure like cloudflare generally is configured to be able to read the full contents of the web request (TLS interception or Load balancing) and therefore the message lives for a time unencrypted in the memory of a system that is not the intended recipient.

lsofzz · 2026-01-14T09:02:45 1768381365

Go read a book on basic cryptography. Please.

charcircuit · 2026-01-14T15:24:58 1768404298

I have read through Handbook of Applied Cryptography.

shawnz · 2026-01-12T21:34:16 1768253656

Another fun application of combining LLMs with arithmetic coding is steganography. Here's a project I worked on a while back which effectively uses the opposite technique of what's being done here, to construct a steganographic transformation: https://github.com/shawnz/textcoder

akoboldfrying · 2026-01-13T00:50:22 1768265422

Cool! It creates very plausible encodings.

> The Llama tokenizer used in this project sometimes permits multiple possible tokenizations for a given string.

Not having tokens be a prefix code is thoroughly unfortunate. Do the Llama team consider it a bug? I don't see how to rectify the situation without a full retrain, sadly.

shawnz · 2026-01-13T04:41:52 1768279312

I can't imagine they consider it a bug, it is a common and beneficial property of essentially every LLM today. You want to be able to represent common words with single tokens for efficiency, but at the same time you still need to be able to represent prefixes of those words in the cases where they occur separately

akoboldfrying · 2026-01-13T05:25:25 1768281925

I find this surprising, but I suppose it must be more efficient overall.

Presumably parsing text into tokens is done in some deterministic way. If it is done by greedily taking the longest-matching prefix that is a token, then when generating text it should be possible to "enrich" tokens that are prefixes of other tokens with additional constraints to force a unique parse: E.g., if "e" is a token but "en" is too, then after generating "e" you must never generate a token that begins with "n". A text generated this way can be deterministically parsed by the greedy parser.

Alternatively, it would suffice to restrict to a subset of tokens that are a prefix code. This would be simpler, but with lower coding efficiency.

shawnz · 2026-01-13T05:36:49 1768282609

Regarding the first part: that's an interesting idea, although I worry it would bias the outputs in an unrealistic way. Then again, maybe it would only impact scenarios that would have otherwise been unparsable anyway?

Regarding the second part: you'd effectively just be limiting yourself to single character tokens in that case which would drastically impact the LLM's output quality

akoboldfrying · 2026-01-13T07:52:58 1768290778

The first approach would only affect outputs that would have been otherwise unparseable.

The second approach works with any subset of tokens that form a prefix code -- you effectively set the probability of all tokens outside this subset to zero (and rescale the remaining probabilities if necessary). In practice you would want to choose a large subset, which means you almost certainly want to avoid choosing any single-character tokens, since they can't coexist with tokens beginning with that character. (Choosing a largest-possible such subset sounds like an interesting subproblem to me.)

shawnz · 2026-01-13T11:07:34 1768302454

I don't think I see the vision here. If you want to maximize the number of tokens representable as a prefix code while still being able to output any sequence of characters, how could you possibly pick anything other than the one-character-long tokens?

Are you saying you'd intentionally make some output sequences impossible on the basis they're not likely enough to be worth violating the prefix code for? Surely there's enough common short words like "a", "the", etc that that would be impractical?

And even excluding the cases that are trivially impossible due to having short words as a prefix, surely even the longer words share prefixes commonly enough that you'd never get tokens longer than, say, two characters in the best case? Like, so many words start with "st" or "wh" or "re" or whatever, how could you possibly have a prefix code that captures all of them, or even the most common ones, without it being uselessly short?

akoboldfrying · 2026-01-13T12:10:51 1768306251

> Surely there's enough common short words like "a", "the", etc that that would be impractical?

Tokens don't have to correspond to words. The 2-character tokens "a " and " a" will cover all practical uses of the lowercase word "a". Yes, this does make some strings unrepresentable, such as the single-character string "a", but provided you have tokens "ab", "ba", "ac", "ca", etc., all other strings can be represented. In practice you won't have all such tokens, but this doesn't materially worsen the output provided the substrings that you cannot represent are all low-probability.

shawnz · 2026-01-13T12:17:29 1768306649

Ah yeah, factoring in the whitespace might make this a bit more practical

bonzini · 2026-01-13T05:23:04 1768281784

I think it's plausible that different languages would prefer different tokenizations. For example in Spanish the plural of carro is carros, in Italian it's carro. Maybe the LLM would prefer carr+o in Italian and a single token in Spanish.

akoboldfrying · 2026-01-13T12:19:57 1768306797

Certainly! What surprised me was that apparently LLMs are deliberately designed to enable multiple ways of encoding the same string as tokens. I just assumed this would lead to inefficiency, since I assumed that it would cause training to not know whether it should favour outputting, say, se|same or ses|ame after "open", and thus throw some weight on each. But provided there's a deterministic rule, like "always choose the longest matching token", this uncertainty goes away.

bonzini · 2026-01-13T17:30:00 1768325400

LLMs are probabilistic black boxes, trying to inject determinism in their natural language processing (as opposed to e.g. forcing a grammar for the output) may very well screw them over completely.

akoboldfrying · 2026-01-14T06:11:16 1768371076

LLMs are ultimately just matrix multiplication and some other maths, nothing about them is inherently nondeterministic. When nondeterminism is present, it's because it was deliberately sprinkled on top (because it tends to produce better results).

bonzini · 2026-01-15T07:43:38 1768463018

Yes determinism is not the best word. What I mean is that if you force the LLM to output "carr+o" even when it prefers "carro", this could result in worse quality output.

shawnz · 2026-01-12T18:31:06 1768242666

I don't think it is intending to frame the move as clueless, but rather short-sighted. It could very well be a good move for them in the short term.

shawnz · 2026-01-11T21:53:06 1768168386

One huge benefit of Tahoe for me is that you can now hide any menubar icon, even if they don't explicitly support hiding. It's a small thing but that alone makes the upgrade worth it for me