Hacker Newsnew | past | comments | ask | show | jobs | submit | johnmaguire's commentslogin

Presumably you are licensing your code as MIT or a similar license.

Not all code is licensed that way. Some open-source code had strings attached, but AI launders the code and makes them moot.


If you want to attach strings which involve restricting access, open source is not the way to go.

You're right - the reality of the world today is that open-sourced code is slurped up by AI companies, all questions of legality/ethics aside. But this was not the reality of the world that existed when the code was licensed and released. That is why it is easy to empathize with code authors who did not expect their code to be used in this manner.

Nah I neither agree nor empathize. Anyone with a reasonable understanding of how the internet works knows that putting something on it means that thing can be used in a myriad of ways, many of them unanticipated. That's something one implicitly signs up for when posting content of their own free will. If the gift isn't to be wholly given, don't give it at all; put it behind a wall so it's clear that even though it's "available", it isn't a gift.

By far the most popular strings involve restricting restricting access. That is, viral licenses which require derived works to also be open source.

No one cares. Copyright in general is done, and we are all stronger now. Don't fight AI, fight for open models.

Great! So I assume it is now Completely Fine to rip Netflix / Hulu / Disney+ / whatever and share it with everyone I know?

Copyright isn't "done", copyright has just been restricted to the rich and powerful. AI has essentially made it legal to steal from anyone who isn't rich enough to sue you - which in the case of the main AI companies means everyone except a handful of giants.


TIL I'm "rich and powerful." It doesn't feel any different, I've got to say.

The thing is, copyright is not done. The legal framework still exists and is enforced so I am not sure how to read your reply as anything other than a strongly worded opinion. Just ask Disney.

I use AI every day in my dev workflows, yet I am still easily able to empathize with those who did not intend for their code to be laundered through AI to remove their attribution (or whatever other caveats applied in their licensing.)


> Just ask Disney.

Disney saw which way the wind is blowing and invested over a billion into OpenAI


If they saw the wind they wouldn't have chosen OpenAI

The thing is, nobody in China gives a rat's patoot about copyright. If we do, they win.

A compromise might have been possible, based on treaties engineered by the people who brought us the TPP, but nobody in the current US government is capable of negotiating anything like that or inclined to try. And it wouldn't exactly leave the rest of us better off if they did.

As a result, copyright is a zero-sum game from a US perspective, which matters because that's where the majority of leading research happens on the majority of available compute. Every inch of ground gained by Big IP comes at America's expense.

So they must lose, decisively and soon. Yes, the GPL will be lost as collateral damage. I'm OK with that. You will be, too.


I know tech normally breaks the rules/laws and have been able to just force through their desired outcome (to the detriment of society), but I don't think they are going to be able just ignore copyright. If anything those who depend on copyright see how ruthlessly/poor faith tech has treated previous industries and/or basically anyone once they have the leverage.

Tech is becoming universally hated whereas before it was adored and treated optimistically/preferably.


there are no open models. none. zero.

there are binary files that some companies are allowing you to download, for now. it was called shareware in the old days.

one day the tap will close and we'll see then what open models really means


From a political perspective there's no closing that tap, only opening it further. As long as China exists there will be constant pressure to try to stay ahead, or at least match Chinese models. And China is gleefully increasing that pressure over time, just waiting for the slip that causes a serious migration to their models.

Not true; e.g. https://allenai.org/open-models .

For my own purposes, open weights are 95% as good, to be honest. I understand that not everyone will agree with that. As long as training takes hundreds of millions of dollars' worth of somebody else's compute, we're always going to be at the big companies' mercy to some extent.

At some point they will start to restrict access, as you suggest, and that's the point where the righteous indignation displayed by the neo-Luddites will be necessary and helpful. What I advocate is simply to save up enough outrage for that battle. Don't waste your passion defending legacy copyright interests.


> and that's the point where the righteous indignation displayed by the neo-Luddites will be necessary and helpful

At that point it will be far, far, faaaaar too late.

> Don't waste your passion defending legacy copyright interests

The companies training big models are actively respecting copyright from anyone big enough to actually fight back, and soaking everyone else.

They are actively furthering the entrenchment of Big IP Law.


They are actively furthering the entrenchment of Big IP Law.

China: lol


> people who open source their code do not care about profit

Not only are there businesses built around open-source work, but it used to be widely-accepted that publishing open-source software was a good way to land a paying gig as a junior.

I think that whether you need to continue working to afford to live is very relevant to discussions about AI.

Profits don't need to be direct - and licenses are chosen based on a user's particular open-source goals. AI does not respect code's original licensing.


I'd love to know how you fit smaller models into your workflow. I have an M4 Macbook Pro w/ 128GB RAM and while I have toyed with some models via ollama, I haven't really found a nice workflow for them yet.

It really depends on the tasks you have to perform. I am using specialized OCR models running locally to extract page layout information and text from scanned legal documents. The quality isn't perfect, but it is really good compared to desktop/server OCR software that I formerly used that cost hundreds or thousands of dollars for a license. If you have similar needs and the time to try just one model, start with GLM-OCR.

If you want a general knowledge model for answering questions or a coding agent, nothing you can run on your MacBook will come close to the frontier models. It's going to be frustrating if you try to use local models that way. But there are a lot of useful applications for local-sized models when it comes to interpreting and transforming unstructured data.


> I formerly used that cost hundreds or thousands of dollars for a license

Azure Doc Intelligence charges $1.50 for 1000 pages. Was that an annual/recurring license?

Would you mind sharing your OCR model? I'm using Azure for now, as I want to focus on building the functionality first, but would later opt for a local model.


I took a long break from document processing after working on it heavily 20 years ago. The tools I used before were ABBYY FineReader and PrimeOCR. I haven't tried any of the commercial cloud based solutions. I'm currently using GLM-OCR, Chandra OCR, and Apple's LiveText in conjunction with each other (plus custom code for glue functionality and downstream processing).

Try just GLM-OCR if you want to get started quickly. It has good layout recognition quality, good text recognition quality, and they actually tested it on Apple Silicon laptops. It works easily out-of-the-box without the yak shaving I encountered with some other models. Chandra is even more accurate on text but its layout bounding boxes are worse and it runs very slowly unless you can set up batched inference with vLLM on CUDA. (I tried to get batching to run with vllm-mlx so it could work entirely on macOS, but a day spent shaving the yak with Claude Opus's help went nowhere.)

If you just want to transcribe documents, you can also try end-to-end models like olmOCR 2. I need pipeline models that expose inner details of document layout because I need to segment and restructure page contents for further processing. The end-to-end models just "magically" turn page scans into complete Markdown or HTML documents, which is more convenient for some uses but not mine.


These are some really great explicit examples and links, much appreciated.

How does GLM-OCR compare to Qwen 3 VL? I've had good experiences with Qwen for these purposes.

Qwen 3 and 3.5 models are quite capable. Perhaps the greatest benefit of GLM-OCR is speed: it's only a 0.9 billion parameter model, so it's fast enough to run on large volumes of complicated scans even if all you have for inference is an entry level MacBook or a low end Nvidia card. Even CPU based inference on basic laptops is probably tolerable with it for small page volumes.

Not OP but I had an XML file with inconsistent formatting for album releases. I wanted to extract YouTube links from it, but the formatting was different from album to album. Nothing you could regex or filter manually. I shoved it all into a DB, looked up the album, then gave the xml to a local LLM and said "give me the song/YouTube pairs from this DB entry". Worked like a charm.

I didn’t realize that you can get 128GB of memory in a notebook, that is impressive!

I've got a 128 GiB unified memory Ryzen Ai Max+ 395 (aka Strix Halo) laptop.

Trying to run LLM models somehow makes 128 GiB of memory feel incredibly tight. I'm frequently getting OOMs when I'm running models that are pushing the limits of what this can fit, I need to leave more memory free for system memory than I was expecting. I was expecting to be able to run models of up to ~100 GiB quantized, leaving 28 GiB for system memory, but it turns out I need to leave more room for context and overhead. ~80 GiB quantized seems like a better max limit when trying not running on a headless system so I'm running a desktop environment, browser, IDE, compilers, etc in addition to the model.

And memory bandwidth limitations for running the models is real! 10B active parameters at 4-6 bit quants feels usable but slow, much more than that and it really starts to feel sluggish.

So this can fit models like Qwen3.5-122B-A10B but it's not the speediest and I had to use a smaller quant than expected. Qwen3-Coder-Next (80B/3B active) feels quite on speed, though not quite as smart. Still trying out models, Nemotron-3-Super-120B-A12B just came out, but looks like it'll be a bit slower than Qwen3.5 while not offering up any more performance, though I do really like that they have been transparent in releasing most of its training data.


There's been some very recent ongoing work in some local AI frameworks on enabling mmap by default, which can potentially obviate some RAM-driven limitations especially for sparse MoE models. Running with mmap and too little RAM will then still come with severe slowdowns since read-only model parameters will have to be shuttled in from storage as they're needed, but for hardware with fast enough storage and especially for models that "almost" fit in the RAM filesystem cache, this can be a huge unblock at negligible cost. Especially if it potentially enables further unblocks via adding extra swap for K-V cache and long context.

Most workstation class laptops (i.e. Lenovo P-series, Dell Precision) have 4 DIMM slots and you can get them with 256 GB (at least, before the current RAM shortages).

There's also the Ryzen AI Max+ 395 that has 128GB unified in laptop form factor.

Only Apple has the unique dynamic allocation though.


Yep, I have a 13" gaming tablet with the 128 GB AMD Strix Halo chip (Ryzen AI Max+ 395, what a name). Asus ROG Flow Z13. It's a beast; the performance is totally disproportionate to its size & form factor.

I'm not sure what exactly you're referring to with "Only Apple has the unique dynamic allocation though." On Strix Halo you set the fixed VRAM size to 512 MB in the BIOS, and you set a few Linux kernel params that enable dynamic allocation to whatever limit you want (I'm using 110 GB max at the moment). LLMs can use up to that much when loaded, but it's shared fully dynamically with regular RAM and is instantly available for regular system use when you unload the LLM.


What operating system are you using? I was looking at this exact machine as a potential next upgrade.

Arch with KDE, it works perfectly out of the box.

I configured/disabled RGB lighting in Windows before wiping and the settings carried over to Linux. On Arch, install & enable power-profiles-daemon and you can switch between quiet/balanced/performance fan & TDP profiles. It uses the same profiles & fan curves as the options in Asus's Windows software. KDE has native integration for this in the GUI in the battery menu. You don't need to install asus-linux or rog-control-center.

For local AI: set VRAM size to 512 MB in the BIOS, add these kernel params:

ttm.pages_limit=31457280 ttm.page_pool_size=31457280 amd_iommu=off

Pages are 4 KiB each, so 120 GiB = 120 x 1024^3 / 4096 = 31457280

To check that it worked: sudo dmesg | grep "amdgpu.*memory" will report two values. VRAM is what's set in BIOS (minimum static allocation). GTT is the maximum dynamic quota. The default is 48 GB of GTT. So if you're running small models you actually don't even need to do anything, it'll just work out of the box.

LM Studio worked out of the box with no setup, just download the appimage and run it. For Ollama you just `pacman -S ollama-rocm` and `systemctl enable --now ollama`, then it works. I recently got ComfyUI set up to run image gen & 3d gen models and that was also very easy, took <10 minutes.

I can't believe this machine is still going for $2,800 with 128 GB. It's an incredible value.


You may wanna see if openrgb isn't able to configure the RGB. Could even do some fun stuff like changing the color once done with a training run or something

I use openrgb to turn off all the RGB crap on my desktop machine. Unfortunately you have to leave openrgb running and it takes a constant 0.5% of CPU. I wish there was a "norgb" program that would simply turn off RGB everywhere and not use any CPU while doing it.

Brilliant!

Really appreciate this response! Glad to hear you are running Arch and liking it.

I've been a long-time Apple user (and long-time user of Linux for work + part-time for personal), but have been trying out Arch and hyprland on my decade+ old ThinkPad and have been surprised at how enjoyable the experience is. I'm thinking it might just be the tipping point for leaving Apple.


I just did! Warmly encouraging you to try it out! Managed to put Omarchy on an external ssd on my old macbookpro 2019; rarely booting in macos now. Long time i haven’t enjoyed using a computer SO MUCH!

> Only Apple has the unique dynamic allocation though.

What do you mean? On Linux I can dynamically allocate memory between CPU and GPU. Just have to set a few kernel parameters to set the max allowable allocation to the GPU, and set the BIOS to the minimum amount of dedicated graphics memory.


Maybe things have changed but the last time I looked at this, it was only max 96GB to the GPU. And it isn't dynamic in the sense you still have to tweak the kernel parameters, which require a reboot.

Apple has none of this.


Strix Halo you can get at least 120 GB to the GPU (out of 128 GB total), I'm using this configuration.

Setting the kernel params is a one-time initial setup thing. You have 128 GB of RAM, set it to 120 or whatever as the max VRAM. The LLM will use as much as it needs and the rest of the system will use as much it needs. Fully dynamic with real-time allocation of resources. Honestly I literally haven't even thought of it after setting those kernel args a while ago.

So: "options ttm.pages_limit=31457280 ttm.page_pool_size=31457280", reboot, and that's literally all you have to do.

Oh and even that is only needed because the AMD driver defaults it to something like 35-48 GB max VRAM allocation. It is fully dynamic out of the box, you're only configuring the max VRAM quota with those params. I'm not sure why they choice that number for the default.


You do have to set the kernel parameters once to set the max GPU allocation, I have it set to 110 GiB, and you have to set a BIOS setting to set the minimum GPU allocation, I have it set to 512 MiB. Once you've set those up, it's dynamic within those constraints, with no more reboots required.

On Windows, I think you're right, it's max 96 GiB to the GPU and it requires a reboot to change it.


Intel had dynamic allocation since Intel 830(2001) for Pentium III Mobile. Everything always did, especially platforms with iGPUs like Xbox 360.

Only Apple and AMD have APUs with relatively fast iGPU that becomes relevant in large local LLM(>7b) use cases.


I use Raycast and connect it to LM Studio to run text clean up and summaries often. The models are small enough I keep them in memory more often than not

Shouldn't we prioritize large scale open weights and open source cloud infra?

An OpenRunPod with decent usage might encourage more non-leading labs to dump foundation models into the commons. We just need infra to run it. Distilling them down to desktop is a fool's errand. They're meant to run on DC compute.

I'm fine with running everything in the cloud as long as we own the software infra and the weights.

This is conceivably the only way we could catch up to Claude Code is to have the Chinese start releasing their best coding models and for them to get significant traction with companies calling out to hosted versions. Otherwise, we're going to be stuck in a take off scenario with no bridge.


I run Qwen3.5-plus through Alibaba’s coding plan (Model Studio): incredibly cheap, pretty fast, and decent. I can’t compare it to the highest released weight one though.

Is that https://www.alibabacloud.com/help/en/model-studio/coding-pla... ? I was a bit confused that it seems to be sized in requests not tokens

Yeah that's the one. I've not managed to get close to the limits that the cheapest plan has. Though I did get to sign up at $3 a month which has been neat, too, seems that's gone now

I also want to try Qwen 3.5 plus. I have a doubt, I see almost same pricing for both Qwen and Claude code(the difference being the highest pro plan looks cheaper), and not for the lower plans. Am I missing something, when you say “cheaper” ??

I'm using their $3 USD (currently, it will go up in price later I believe - edit: just checked and yeah, so the $10 one) lite plan, and I'm yet to get close to hitting the request limits when I swap to it once I'm out of Claude tokens.

Do you think that composers of the past did not also face real-world constraints?

> AI here is the final nail in the coffin

so far*


Yup doing this with Caddy and Nebula, works great!

It's gonna be like that HBO Silicon Valley bit again, where everyone and their doctor is telling you about their app.

I don't quite follow - are you describing an issue with the way your team has structured PRs? IMO, a PR should contain just enough code to clearly and completely solve "a thing" without solving too much at once. But what this means in practice depends on the team, product, velocity, etc. It sounds like your PRs might be broken up into too small of chunks if you can't understand why the code is being added.

I am saying PRs I get are around 60-70 lines of change, which is small enough to be considered as single unit (add to this unit tests which must pass with new change, so we are talking about 30 line change + 30 line unit test)

But when looking at the PR changes, you don't always see whole picture because review subjects (code lines) are scattered across files and methods, and GitHub also shows methods and files partially making it even more difficult to quickly spot the context around those updated lines.

Its difficult problem, because even if GitHub shows whole body of the updated method or a file, you still don't see grand picture.

For example: A (calls) -> B -> C -> D

And you made changes in D, how do you know the side effect on B, what if it broke A?


If the code is well architected, the contract between C and D should make it clear whether changes in D affect C or not. And if C is not affected, then B and A won't be either.

> If the code is well architected

Big constraint. Code changes, initial architecture could have been amazing, but constantly changing business requirements make things messy.

Please don't use, "In ideal world" examples :) Because they are singular in vast space of non-ideal solutions


In that case your problem is bigger than just reviewing changes. You need to point the fingers at the bad code and bad architecture first.

There's no way to make spaghetti code easy to review.


> Its difficult problem, because even if GitHub shows whole body of the updated method or a file, you still don't see grand picture.

> For example: A (calls) -> B -> C -> D

> And you made changes in D, how do you know the side effect on B, what if it broke A?

That's poor encapsulation. If the changes in D respect its contract, and C respects D's contract, your changes in D shouldn't affect C, much less B or A.


> That's poor encapsulation

That's the reality of most software built in last 20 years.

> If the changes in D respect its contract, and C respects D's contract, your changes in D shouldn't affect C, much less B or A.

Any changes in D, eventually must affect B or A, it's inevitable, otherwise D shouldn't exist in call stack.

How the case I mentioned can happen, imagine in each layer you have 3 variations: 1 happy path 2 edge case handling, lets start from lowest:

D: 3, C: 3D=9, B: 3C=27, A: 3*B=81

Obviously, you won't be writing 81 unit tests for A, 27 for B, you will mock implementations and write enough unit tests to make the coverage good. Because of that mocking, when you update D and add a new case, but do not surface relevant mocking to upper layers, you will end up in a situation where D impacts A, but its not visible in unit tests.

While reading the changes in D, I can't reconstruct all possible parent caller chain in my brain, to ask engineer to write relevant unit tests.

So, case I mentioned happens, otherwise in real world there would be no bugs


Leaky abstractions are a thing. You can just encapsulate your way out of everything.

check out the branch. if the changes are that risky, the web ui for your repository host is not suitable for reviewing them.

the rest of your issues sound architectural.

if changes are breaking contracts in calling code, that heavily implies that type declarations are not in use, or enumerable values which drive conditional behavior are mistyped as a primitive supertype.

if unit tests are not catching things, that implies the unit tests are asserting trivial things, being written after the implementation to just make cases that pass based on it, or are mocking modules they don't need to. outside of pathological cases the only thing you should be mocking is i/o, and even then that is the textbook use for dependency injection.


I'm not sure how to make it work but like others in this thread I have an interest in sharing some - but not all - of my notes with some AI agents. Would love a solution that is built in to Obsidian / Obsidian Sync.

This could also be read as a take on the nurture aspect of childrearing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: