Hacker Newsnew | past | comments | ask | show | jobs | submit | JDye's commentslogin

We've had an interesting experience on the interviewing side of this. Asking a system design question and getting answers involving multiple layers of caching, pub/sub, event-driven whatever/etc.. when the real answer is to just use postgres. It handles the scale we're asking about. We know it, as that's what we use internally.

I've only worked at my startup so I can't comment on scale elsewhere, but if our simple architecture can handle 30k requests per second, I think it can handle most other companies scale too.


If your interview questions answer is a particular technology then you're asking a really bad question.


It's more "just put it in A database". If someone said MongoDB I'd be just as happy.


It may look like they're all easily interchangable because the UI and actions are similar (you have a viewport and can do extrudes, etc..) but fundamentally, they're all working on very different objects at their core. Blender and 3DS Max are the most alike, but Zbrush is an entirely different paradigm and so is parametric CAD. An extrude in Blender is massively different from a pad in FreeCAD.

Maybe, with a ton of time and effort the blender UI could be abstracted from most of the box-modeling approach and then pasted over a different paradigm, but It'd take tens of thousands of hours I imagine,.


You can do sculpting in Blender as well as parametric objects, similarly you can emulate most of substance designer with shaders, maybe just not _quite_ good enough that's the thing.

It feels like we have been so so close to an unified 3D content creation tool kit for many years now!


Blender is a mesh editor at its heart. That isn't suitable for CAD work.


Residential proxies aren't used for scraping? That doesn't align well with my experience...


I live in the UK and can't view a large portion of the internet without having to submit my ID to _every_ site serving anything deemed "not safe the for the children". I had a question about a new piercing and couldn't get info on it from Reddit because of that. I try using a VPN and they're blocked too. Luckily, I work at a copmany selling proxies so I've got free proxies whenever I want, but I shouldn't _need_ to use them.

I find it funny that companies like Reddit, who make their money entirely from content produced by users for free (which is also often sourced from other parts of the internet without permission), are so against their site being scraped that they have to objectively ruin the site for everyone using it. See the API changes and killing off of third party apps.

Obviously, it's mostly for advertising purposes, but they love to talk about the load scraping puts on their site, even suing AI companies and SerpApi for it. If it's truly that bad, just offer a free API for the scrapers to use - or even an API that works out just slightly cheaper than using proxies...

My ideal internet would look something like that, all content free and accessible to everyone.


> that they have to objectively ruin the site for everyone using it. See the API changes and killing off of third party apps.

Third party app users were a very small but vocal minority. The API changes didn't drop their traffic at all. In fact, it's only gone up since then.

The datacenter IP address blocks aren't just for scrapers, it's an anti-bot measure across the board. I don't spend much time on Reddit but even the few subreddits I visited were starting to become infiltrated by obvious bot accounts doing weird karma farming operations.

Even HN routinely gets AI posting bots. It's a common technique to generate upvote rings - Make the accounts post comments so they look real enough, have the bots randomly upvote things to hide activity, and then when someone buys upvotes you have a selection of the puppet accounts upvote the targeted story. Having a lot of IP addresses and generating fake activity is key to making this work, so there's a lot of incentive to do it.


I agree that write-actions should be protected, especially now when every other person online is a bot. As for read-actions, I'll continue to profit off those being protected too but I wouldn't be too bothered if something suddenly changed and all content across the internet was a lot easier to access programmatically. I think only harm can come from that data being restricted to the huge (nefarious) companies that can pay for that data or negotiate backroom deals.


Reddit's traffic is almost exclusively propaganda bots.


Have you considered that it’s because a new industry popped up that decided it was okay to slurp up the entire internet, repackage it, and resell it? Surely that couldn’t be why sites are trying to keep non humans out.


> I live in the UK and can't view a large portion of the internet without having to submit my ID to _every_ site serving anything deemed "not safe the for the children".

Really? Because I live in the UK and I've never been asked for my ID for anything.


[flagged]


Thanks lad. Will get right on it.


Scrapping First-past-the-Post is probably a good start.

Good luck!


2k IPs is not enough to do most enterprise scale scraping. Starlink's entire ASN doesn't seem to have enough V4 addresses to handle it even.


The actual secret is to use IPv6 with varied source IPs in the same subnet, you get an insane number of IPs and 90% of anti-scraping software is not specialized enough to realize that any IP in a /64 is the same as a single IP in a /32 in IPv4.


> any IP in a /64 is the same as a single IP in a /32 in IPv4

This is very commonly true but sadly not 100%. I am suffering from a shared /64 on which a VPS is, and where other folks have sent out spam - so no more SMTP for me.


If they're CGNAT then unless Starlink actively provides assistance to block them it won't matter.

As someone who wants the internet to maintain as much anarchy as possible I think it would be nice to see a large ISP that actively rotated its customer IPv6 assignments on a tight schedule.


Our postgres replication suddenly stopped working and it took three of us hours - maybe days - of looking through the postgres source before we actually accepted it wasn't us or our hosting provider being stupid and submitted a ticket.

I can't imagine the level of laziness or entitlement required for a student (or any developer) to blame their tools so quickly without conducting a thorough investigation.


I had a professor who cautioned us not to assume the problem was in the compiler, or in anyone else’s code. Students assuming that there is a compiler (or similar) bug is not uncommon. Common enough he felt it necessary to pre-empt those discussions.


We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level.

With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected.

I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.


> We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside

I have a great time using Claude Code in Rust projects, so I know it's not about the language exactly.

My working model is is that since LLM are basically inference/correlation based, the more you deviate from the mainstream corpus of training data, the more confused LLM gets. Because LLM doesn't "understand" anything. But if it was trained on a lot of things kind of like the problem, it can match the patterns just fine, and it can generalize over a lot layers, including programming languages.

Also I've noticed that it can get confused about stupid stuff. E.g. I had two different things named kind of the same in two parts of the codebase, and it would constantly stumble on conflating them. Changing the name in the codebase immediately improved it.

So yeah, we've got another potentially powerful tool that requires understanding how it works under the hood to be useful. Kind of like git.


Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.


Well, it's not like it never happened to me to "burn tokens" with some lifetime issue. :D But yeah, if you're working in Rust on something with sharp edges, LLM will get get hurt. I just don't tend to have these in my projects.

Even more basic failure mode. I told it to convert/copy a bit (1k LOC) of blocking code into a new module and convert to async. It just couldn't do a proper 1:1 logical _copy_. But when I manually `cp <src> <dst>` the file and then told it to convert that to async and fix issues, it did it 100% correct. Because fundamentally it's just non-deterministic pattern generator.


hot take (that shouldn't be?): if your code is super easy to follow as a human, it will be super easy to follow for an LLM. (hint: guess where the training data is coming from!)


This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc.

For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.


Is there any good documentation out there about how to perform this wizardry? I always assumed if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code. If there are extra steps that need to be done, why don't Claude's developers just add those extra steps to /init?


Not that I have seen, which is probably a big part of the disconnect. Mostly it's tribal knowledge. I learned through experimentation, but I've seen tips here and there. Here's my workflow (roughly)

> Create a CLAUDE.md for a c++ application that uses libraries x/y/z

[Then I edit it, adding general information about the architecture]

> Analyze the library in the xxx directory, and produce a xxx_architecture.md describing the major components and design

> /agent [let claude make the agent, but when it asks what you want it to do, explain that you want it to specialize in subsystem xxx, and refer to xxx_architecture.md

Then repeat until you have the major components covered. Then:

> Using the files named with architecture.md analyze the entire system and update CLAUDE.md to use refer to them and use the specialized agents.

Now, when you need to do something, put it in planning mode and say something like:

> There's a bug in the xxx part of the application, where when I do yyy, it does zzz, but it should do aaa. Analyze the problem and come up with a plan to fix it, and automated tests you can perform if possible.

Then, iterate on the plan with it if you need to, or just approve it.

One of the most important things you can do when dealing with something complex is let it come up with a test case so it can fix or implement something and then iterate until it's done. I had an image processing problem and I gave it some sample data, then it iterated (looking at the output image) until it fixed it. It spent at least an hour, but I didn't have to touch it while it worked.


I've taken time today to do this. With some of your suggestions, I am seeing an improvement in it's ability to do some of the grunt work I mentioned. It just saved me an hour refactoring a large protocol implementation into a few files and extracted some common utilities. I can recognise and appreciate how useful that is for me and for most other devs.

At the same time, I think there's limitations to these tools and that I wont ever be able to achieve what I see others saying about 95% of code being AI written or leaving the AI to iterate for an hour. There's just too many weird little pitfalls in our work that the AI just cannot seem to avoid.

It's understandable, I've fallen victim to a few of them too, but I have the benefit of the ability to continuously learn/develop/extrapolate in a way that the LLM cannot. And with how little documentation exists for some of these things (MASQUE proxying for example) anytime the LLM encounters this code it throws a fit, and is unable to contribute meaningfully.

So thanks for your suggestions, it has made Claude better and clearly I was dragging my feet a little. At the very least, it's freed up a some more of my time to work on the complex things Claude can't do.


To be perfectly honest, I've never used a single /command besides /init. That probably means I'm using 1% of the software's capabilities. In frankness, the whole menu of /-commands is intimidating and I don't know where to start.


You don't need to do much, the /agent command is the most useful, and it walks you through it. The main thing though is to give the agent something to work with before you create it. That's why I go through the steps of letting Claude analyze different components and document the design/architecture.

The major benefit of agents is that it keeps context clean for the main job. So the agent might have a huge context working through some specific code, but the main process can do something to the effect of "Hey UI library agent, where do I need to put code to change the color of widget xyz", then the agent does all the thinking and can reply with "that's in file 123.js, line 200". The cleaner you keep the main context, the better it works.


Never thought of Agents in that way to be honest. I think I need to try that style =)


/commands are like macros or mayyybe aliases. You just put in the commands you see yourself repeating often, like "commit the unstaged files in distinct commits, use xxx style for the commit messages..." - then you can iterate on it if you see any gaps or confusion, even give example commands to use in the different steps.

Skills on the other hand are commands ON STEROIDS. They can be packaged with actual scripts and executables, the PEP723 Python style + uv is super useful.

I have one skill for example that uses Python+Treesitter to check the unit thest quality of a Go project. It does some AST magic to check the code for repetition, stupid things like sleeps and relative timestamps etc. A /command _can_ do it, but it's not as efficient, the scripts for the skill are specifically designed for LLM use and output the result in a hyper-compact form a human could never be arsed to read.


> In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

claude-code has a built in plugin that it can use to fetch its own docs! You don't have to ever touch anything yourself, it can add the features to itself, by itself.


This is some great advice. What I would add is to avoid the internal plan mode and just build your own. Built in one creates md files outside the project, gives the files random names and its hard to reference in the future.

It's also hard to steer the plan mode or have it remember some behavior that you want to enforce. It's much better to create a custom command with custom instructions that acts as the plan mode.

My system works like this:

/implement command acts as an orchestrator & plan mode, and it is instructed to launch predefined set of agents based on the problem and have them utilize specific skills. Every time /implement command is initiated, it has to create markdown file inside my own project, and then each subagent is also instructed to update the file when it finished working.

This way, orchestrator can spot that agent misbehaved, and reviewer agent can see what developer agent tried to do and why it was wrong.


> if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code.

This is definitely not the case, and the reason anthropic doesnt make claude do this is because its quality degrades massively as you use up its context. So the solution is to let users manage the context themselves in order to minimize the amount that is "wasted" on prep work. Context windows have been increasing quite a bit so I suspect that by 2030 this will no longer be an issue for any but the largest codebases, but for now you need to be strategic.


Are you still talking about Opus 4.5 I’ve been working on a Rust, kotlin and c++ and it’s been doing well. Incredible at C++, like the number of mistakes it doesn’t make


> I still believe those who rave about it are not writing anything I would consider "engineering".

Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators.

The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word.

See also: https://www.seangoedecke.com/pure-and-impure-engineering/


> Engineering is.. novel, for lack of a better word.

Tell that to the guys drawing up the world's 10 millionth cable suspension bridge



I don't think it's that helpful to try to gatekeep the "engineering" term or try to separate it into "pure" and "impure" buckets, implying that one is lesser than the other. It should be enough to just say that AI assisted development is much better at non-novel tasks than it is at novel tasks. Which makes sense: LLMs are trained on existing work, and can't do anything novel because if it was trained on a task, that task is by definition not novel.


Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title.

OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best.

And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.


That's not the point being made to you. The point is that most people in the "software engineering" space are applying known tools and techniques to problems that are not groundbreaking. Very few are doing theoretical computer science, algorithm design, or whatever you think it is that should be called "engineering."


So the TL;DR here is... If you're in the business of recreating wheels - then you're in luck! We've automated wheel recreation to an acceptable degree of those wheels being true.


Most physical engineers are just applying known techniques all the time too. Most products or bridges or whatever are not solving some heretofore-unsolved problem.


It's how you use the tool that matters. Some people get bitter and try to compare it to top engineers' work on novel things as a strawman so they can go "Hah! Look how it failed!" as they swing a hammer to demonstrate it cannot chop down a tree. Because the tool is so novel and it's use us a lot more abstract than that of an axe, it is taking awhile for some to see its potential, especially if they are remembering models from even six months ago.

Engineering is just problem solving, nobody judges structural engineers for designing structures with another Simpson Strong Tie/No.2 Pine 2x4 combo because that is just another easy (and therefore cheap) way to rapidly get to the desired state. If your client/company want to pay for art, that's great! Most just want the thing done fast and robustly.


I think it's also that the potential is far from being realized yet we're constantly bombarded by braindead marketers trying to convince us that it's the best thing ever already. This is tiring especially when the leadership (not held back by any technical knowledge) believes them.

I'm sure AI will get there, I also think it's not very good yet.


Coding agents as of Jan 2026 are great at what 95% of software engineers do. For remaining 5% that do really novel stuff -- the agents will get there in a few years.


When he said 'just look at what I'v been able to build', I was expecting anything but an 'image converter'


Web scrapers maybe aren't "bad actors", but many sites dont want them. They'll use tons of TCP proxies which route them through a rotating pool of end user devices (mobiles, routers, etc...). Its not really possible to block these IPs as you'd also be blocking legitimate customers so other ways to detect and block are required.


Can't/won't these scrapers just switch to using VPNs or sshuttle or basically anything else that doesn't leak timing info about termination of TCP vs HTTP?


Not really. You can have 100,000 IPs from proxies or use VPNs and have only 5 egress IPs.

Anybody who wants to stop the scraper could get browser fingerprints, cross reference similar ones with those IPs and quite safely ban them as its highly likely theyre not a legitimate customer.

Its a lot harder to do it for the 100k IPs because those IPs will also have legitimate customer traffic on them and its a lot more likely the browser fingerprint could just be legitimate.

The risk of false postives (blocking real people) is usually higher than just allowing the scrapers and the incetives of a lot of sites arent aligned with stopping scrapers anyway. Think eccommerce, do they _really_ care if the product is being sold to scalpers or real customers? If anything, that behaviour can raise perception of their brand, increase demand, increase prices.

This tool should have less false positives than most, so maybe it will see more adoption than others (TCP fingerprinting for example) but I dont think this is going to affect anyone doing scraping seriously/at scale.


> Not really. You can have 100,000 IPs from proxies or use VPNs and have only 5 egress IPs.

Why…?

If I can run a proxy exit node on 100k residential IPs, why can't I run a VPN server on 100k residential IPs?

There is no additional technical complexity or resource consumption from the VPN server compared to the proxy server.


I don't mean that you can't do it, just that there is no company offering it so right now those are the only two options.

It's something we're experimenting with currently. the other commenter is right about apple products, but on android, desktop, etc... it's pretty easy.


for phones its a bit difficult because i don't think you can egress out ip traffic without root or jailbreak on iphone and iOS. but i guess on desktop this should be possible


I mentioned this in a podcast recently; fingerprinting of proxy servers using QUIC is a lot harder as UDP doesnt have enough headers to allow for unique characteristics like a TCP does.

Theres no way to include a timestamp in a UDP datagram so all timestamps received would be from the client machine.


Interesting!

So far I've only seen Bright Data (among the large players) offer UDP proxying over QUIC/HTTP3, but that's pretty limiting since less than half of sites have HTTP/3 enabled to begin with.


BrighData offer H3/QUIC but only in beta and you have to contact their sales team as far as I'm aware.

We (PingProxies) might be the only company to offer H3 to the proxy/QUIC to the target using the CONNECT-UDP method publicly. Although, it is in beta/unstable until I merge my changes into Rust's H3 library.

If you wanna play around with it, email me and I'll get you some credit. I think theres potential for stealth since outdated proxy clients/servers mean automated actors never use H3.

The proxy industry is full of another 100 companies saying they offer H3/QUIC, when they mean UDP proxying using SOCKS. I suppose the knowledge gap and what customers care about (protocol to end target) is very different to what I care about (being right/protocol to the proxy server).


> BrighData offer H3/QUIC but only in beta and you have to contact their sales team as far as I'm aware.

That's what I thought too, but it's working for me. (I've sent a lot of tickets, maybe they've put our account as something special without telling me, but doubt it.)

> If you wanna play around with it, email me and I'll get you some credit.

Done, emailed! :) Thanks!

> The proxy industry is full of another 100 companies saying they offer H3/QUIC, when they mean UDP proxying using SOCKS.

Out of the large players I've tested, none actually seem to even support SOCKS5's UDP ASSOCIATE. (I have not tested PingProxies yet.)

> I suppose the knowledge gap and what customers care about (protocol to end target) is very different to what I care about (being right/protocol to the proxy server).

I think there's a knowledge gap between the people making the sales landing pages, and the folks who actually run/maintain the proxy servers. There's some large vendors that advertise UDP support (for residential and/or mobile proxies) that I have yet to actually see working.


The most common method of proxying with residential proxies is still CONNECT tunnels and from my tests it catches a resi-proxy about 50% of the time. More with tuning of the score thresholds.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: