Lazy has nothing to do with it, codeberg simply doesn't work.
Most of my friends who use codeberg are staunch cloudflare-opponents, but cloudflare is what keeps Gitlab alive. Fact of life is that they're being attacked non-stop, and need some sort of DDoS filter.
Codeberg has that anubis thing now I guess? But they still have downtime, and the worst thing ever for me as a developer is having the urge to code and not being able to access my remote. That is what murders the impression of a product like codeberg.
Sorry, just being frank. I want all competitors to large monopolies to succeed, but I also want to be able to do my job/passion.
Maybe I'm too old school, but both GitHub and Codeberg for me are asyncronous "I want to send/share the code somehow", not "my active workspace I require to do work". But reading
> the worst thing ever for me as a developer is having the urge to code and not being able to access my remote.
Makes it seem like GitHub/Codeberg has to be online for you to be able to code, is that really the case? If so, how does that happen, you only edit code directly in the GitHub web UI or how does one end up in that situation?
For me it's a soft block rather than a hard block. I use multiple computers so when I switch to the other one I usually do a git pull, and after every commit I do a push. If that gets interrupted, then I have resort to things like rsyncing over from the other system, but more than once I've lost work that way. I'm strongly considering just standing up a VM and using "just git" and foregoing any UI, but I make use of other features like CI/CD and Releases for distribution, so the VM strategy is still just a bandaid. When the remote is unavailable, it can be very disruptive.
> If that gets interrupted, then I have resort to things like rsyncing over from the other system
I'm guessing you have SSH access between the two? You could just add it as another remote, via SSH, so you can push/pull directly between the two. This is what I do on my home network to sync configs and other things between various machines and OSes, just do `git remote add other-host git+ssh://user@10.55/~/the-repo-path` or whatever, and you can use it as any remote :)
Bonus tip: you can use local paths as git remote URLs too!
> but more than once I've lost work that way.
Huh, how? If you didn't push it earlier, you could just push it later? Some goes for pull? I don't understand how you could lose anything tracked in git, corruption or what happened?
Usually one of two things, mostly the latter: I forget to exclude all the .git/ directory from the sync, or I have in-progress and nowhere near ready for commit changes on both hosts, and I forget and sync before I check. These are all PEBKAC problems and/or workflow problems, but on a typical day I'll be working in or around a half-dozen repos and it's too easy to forget. The normal git workflow protects from that because uncommitted changes in one can just be rebased easily the next time I'm working in that on any given computer. I've been doing it like this for nearly 20 years and it's never been an issue because remotes were always quite stable/reliable. I really just need to change my worfklow for the new reality, but old habits die hard.
If you can rsync from the other system, and likely have an SSH connection between them, why don't you just add it as an additional remote and git pull from it directly?
You cannot git push something that is not committed. The solution is to commit often (and do it over ssh if you forget on a remote system). It doesn't need to a presentable commit. That can be cleaned up later. I use `git commit -amwip` all the time.
Sure, you might neglect to add a file to your commit, or commit at all, but that's a problem whether you're pushing to a central public git forge or not.
You'd create a bare git repo (just the contents of .git) on the host with git init --bare, separate from your usual working tree, and set it as a remote for your working trees, to which you can push and pull using ssh or even a path from the same machine.
If you have ssh access to the remote machine to set up a git remote, you can login to the remote machine and commit the changes that you forgot to commit.
For some projects, the issue tracker is a pretty integral part of the documentation. Sure, you can host your own issue tracker somewhere, but that's still shifting a center point somewhere, in a theoretically decentralized system. I've frequently wished the issue tracker was part of the repository. Also -- love them or hate them -- LLMs would probably love that too.
> Makes it seem like GitHub/Codeberg has to be online for you to be able to code, is that really the case?
I can understand that work with other active contributors, but I agree with you that it is a daft state of affairs for a solo or mostly-solo project.
Though if you have your repo online even away from the big places, it will get hit by the scrapers and you will end up with admin to do because of that, even if it doesn't block your normal workflow because your main remote is not public.
I was shaking my head in disbelief when reading that part too. I mean, git's whole raison d'etre, back when it was introduced, was that you do not need online access to the repo server most of the time.
> git's whole raison d'etre […] was that you do not need online access to the repo server most of the time
Not really. The point of git was to make Linus' job of collating, reviewing, and merging, work from a disparate team of teams much less arduous. It just happens that many of the patterns needed for that also mean making remote temporarily disconnected remote repositories work well.
The whole point of git was tm be a replacement for BitKeeper after the Linux developers got banned from it for "hacking" after Andrew Tridgell connected to the server over telnet and typed "HELP"
That too, though the point of using a distributed code control system was the purpose I mentioned. But even before BitKeeper getting in a tizzy about Tridgell's¹ shenanigans there was talk of replacing it because some properties of it were not ideal for something as large as the kernel with as many active contributors, and there were concerns about using a proprietary product to manage the Linux codebase. Linus was already tinkering with what would become the git we know.
--------
[1] He did a lot more than type “help” - he was essentially trying to reverse engineer the product to produce a compatible but more open client that gave access to metadata BitKeeper wanted you to pay to be able to access² which was a problem for many contributors.
[2] you didn't get the fulllest version history on the free variants, this was one of the significant concerns making people discuss alternatives, and in some high profile cases just plain refuse to touch BitKeeper at all
So those people are using the tool incorrectly, and would have a much better experience if they used it as designed. If everyone was running around using screwdriver handles to pound in nails, that wouldn't make it reasonable to say that any new screwdriver company has to have 5 lb handles.
Philosophically I think it's terrible that Cloudflare has become a middleman in a huge and important swath of the internet. As a user, it largely makes my life much worse. It limits my browser, my ability to protect myself via VPNs, etc, and I am just browsing normally, not attacking anything. Pragmatically though, as a webmaster/admin/whatever you want to call it nowadays, Cloudflare is basically a necessity. I've started putting things behind it because if I don't, 99%+ of my traffic is bots, and often bots clearly scanning for vulnerabilities (I run mostly zero PHP sites, yet my traffic logs are often filled with requests like /admin.php and /wp-admin.php and all the wordpress things, and constant crawls from clearly not search engines that download everything and use robots.txt as a guide of what to crawl rather than what not to crawl. I haven't been DDoSed yet, but I've had images and PDFs and things downloaded so many times by these things that it costs me money. For some things where I or my family are the only legitimate users, I can just firewall-cmd all IPs except my own, but even then it's maintenance work I don't want to have to do.
I've tried many of the alternatives, and they often fail even on legitimate usecases. I've been blocked more by the alternatives than I have by Cloudflare, especially that one that does a proof of work. It works about 80% of the time, but that 20% is really, really annoying to the point that when I see that scren pop up I just browse away.
It's really a disheartening state we find ourselves in. I don't think my principles/values have been tested more in the real world than the last few years.
Either I am very lucky or what I am doing has zero value to bots, because I've been running servers online for at least 15 years, and never had any issue that couldn't be solved with basic security hygiene. I use cloudflare as my DNS for some servers, but I always disable any of their paid features. To me they could go out of business tomorrow and my servers would be chugging along just fine.
> and use robots.txt as a guide of what to crawl rather than what not to crawl
Mental note, make sure my robots.txt files contain a few references to slowly returning pages full of almost nonsense that link back to each other endlessly…
Not complete nonsense, that would be reasonably easy to detect and ignore. Perhaps repeats of your other content with every 5th word swapped with a random one from elsewhere in the content, every 4th word randomly misspelt, every seventh word reversed, every seventh sentence reversed, add a random sprinkling of famous names (Sir John Major, Arc de Triomphe, Sarah Jane Smith, Viltvodle VI) that make little sense in context, etc. Not enough change that automatic crap detection sees it as an obvious trap, but more than enough that ingesting data from your site into any model has enough detrimental effect to token weightings to at least undo any beneficial effect it might have had otherwise.
And when setting traps like this, make sure the response is slow enough that it won't use much bandwidth, and the serving process is very lightweight, and just in case that isn't enough make sure it aborts and errors out if any load metric goes above a given level.
So, basically iocaine (https://iocaine.madhouse-project.org/). It has indeed been very useful to get the AI scraper load on a server I maintain down to a reasonable level, even with its not so strict default configuration.
Yes, except with the content being based on the real content rather than completely random. My intuition says that this will be more effective, specifically poisoning the model wrt tokens relating to that content rather than just increasing the overall noise level a bit (the damage there being smoothed out over the wider model).
First time seeing that, but yes, seems similar in concept. Iocaine can be self-hosted and put in as a "middleware" in your reverse proxy with a few lines of config, cloudflare's seems tied to their services. Cloudflares also generates garbage with generative models, while iocaine uses much simpler (and surely more "crude") methods of generating its garbage. Using LLMs to feed junk to LLMs just makes me cry, so much wasted compute.
Is iocaine actually newer though? Its first commit dates to 2025-01, while the blog post is from 2025-03. I couldn't find info on when Cloudflare started theirs. There's also Nepenthes, which had its first release in 2025-01 too.
Hot damn, this is a great idea! Reminds me fondly of an old project a friend and I built that looks like an SSH prompt or optionally an unauthed telnet listener, which looks and feels enough like a real shell that we would capture some pretty fascinating sessions of people trying to explore our system or load us with malware. Eventually somebody figured it out and then DDoSed the hell out of our stuff and would not stop hassling us. It was a good reminder that yanking people's chains sometimes really pisses them off and can attract attention and grudges that you really don't want. My friend ended up retiring his domain because he got tired of dealing with the special attention. It did allow us to capture some pretty fascinating data though that actually improved our security while it lasted.
This is one reason why most crawlers ignore robots.txt now. The other reason is that bandwidth/bots are cheap enough now that they don't need web admins to help them optimize their crawlers
While I sympathise, I disagree with your stance. Cloudflare handle a large % of the Internet now because of people putting sites that, as you admitted, don't need to be behind it there.
OP is about Github. Have you seen the Github uptime monitor? It’s at 90% [1] for the last 90 days. I use both Codeberg and Github a lot and Github has, by far, more problems than Codeberg. Sometimes I notice slowdowns on Codeberg, but that’s it.
To be fair, Github has several magnitudes higher of users running on it than Codeberg. I'm also a Codeberg user, but I don't think anyone has seen a Forgejo/Gitea instance working at the scale of Github yet.
I don't think OP was making a value judgment or anything. It's just weird to say you won't consider Codeberg because you need reliability when Codeberg's uptime is at 100% and Github's is at 90%.
To be fair, GitHub has several magnitudes higher of revenue to support that. Including from companies like mine who are paying them good money and get absolutely sub-par service and reliability from them. I'd be happy for Codeberg to take my money for a better service on the core feature set (git hosting, PRs, issues). I can take my CI/CD elsewhere, we self-host runners anyway.
I think the idea is that a Forgejo/Gitea instance should never have to work at anywhere near the scale of GitHub. Codeberg provides its Forgejo host as a convenience/community thing but it's not being built to be a central service.
My own git server has been hit severely by scrapers. They're scraping everything. Commits, comparisons between commits, api calls for files, everything.
And pretty much all of them, ByteDance, OpenAI, AWS, Claude, various I couldn't recognize. I basically just had to block all of them to get reasonable performance for a server running on a mini-pc.
I was going to move to codeberg at some point, but they had downtime when I was considering it, I'd rather deal with that myself then.
Anyone actually scraping git repos would probably just do a 'git clone'. Crawling git hosts is extremely expensive, as git servers have always been inadvertent crawler traps.
They generate a URL for every version of every file on every commit and every branch and tag, and if that wasn't enough, n(n+1)/2 git diffs for every file on every commit it has exited on. Even a relatively small git repo with a few hundred files and commit explodes into millions of URLs in the crawl frontier. Server side many of these are very expensive to generate as well so it's really not a fantastic interaction, crawler and git host.
If you run a web crawler, you need to add git host detection to actively avoid walking into them.
And yet, it's exactly what all the AI companies are doing. However much it costs them in server costs and good will seems to be worth less to them then the engineering time to special case the major git web UIs.
I doubt they're actually interested in the git repos.
From the shape of the traffic it just looks like a poorly implemented web crawler. By default, a crawler that does not take measures to actively avoid git hosts will get stuck there and spend days trying to exhaust the links of even a single repo.
For me it was specifically crawlers from the large companies, they we're at least announcing themselves as such. They did have different patterns, bytedance was relatively behaved, but some of the less known ones, did have weird patterns of looking at comparisons.
I do think they care about repos, and not just the code, but also how it evolves over time. I can see some use, if marginal in those traits. But if they really wanted that, I'd rather they clone my repos, I'd be totally fine with that. But i guess they'd have to deal with state, and they likely don't want to deal with that. Rather just increase my energy bill ;)
Probably has happened at some point, but personally, I have not been hit with/experienced downtime of Codeberg yet. The other day however GitHub was down again. I have not used Gitlab for a while, and when I used it, it worked fine, and its CI seems saner than Github's to me, but Gitlab is not the most snappy user experience either.
Well, Codeberg doesn't have all the features I did use of Gitlab, but for my own projects I don't really need them either.
How do people even on hacker news of all places conflate git with a code hosting platform all the time? Codeberg, GitHub or whatever are for tracking issues, running CI, hosting builds, and much more.
The idea that you shouldn't need a code hosting platform because git is decentralized is so out of place that it is genuinely puzzling how often it pops up.
They said they want to be able to rely on their git remote.
The people responding are saying "nah, an unreliable remote is fine because you can use other remotes" which doesn't address their problem. If Codeberg is unreliable, then why use it at all? Especially for CI, issues, and collab?
The person you’re replying to is saying that you can do everything outside of tracking issues, running CI, ... without a remote. Like all Git operations that are not about collaboration. (but there is always email)
Maybe a hard blocker if you are pair programming or collaborating every minute. Not really if you just have one hour to program solo.
The original intent of the authors is by now irrelevant. The current "point" of git is that it's the most used version control solution, with good tooling support from third parties. Nothing more. And most people prefer to use it in a centralised fashion.
That doesn't remove the fact that when people are working on the code, their local copy doesn't disappear after they pushed their commits and a local copy is still available.
Only exception is when people are using the code editor embedded in the "forge" but this is usually an exceptional use rather than the norm.
> That doesn't remove the fact that when people are working on the code, their local copy doesn't disappear after they pushed their commits and a local copy is still available.
It doesn't remove it but doesn't make it very relevant either, because of all the tests that are necessarily done remotely and can't be done locally, and without that feedback in many cases development is not possible.
> for me as a developer is having the urge to code and not being able to access my remote
I think that's the moment when you choose to self host your whatever git wrapper. It really isn't that complicated to do and even allows for some fun (as in cheap and productive) setups where your forge is on your local network or really close to your region and you (maybe) only mirror or backup to a bigger system like Codeberg/GitHub.
In our case, we also use that as an opportunity to mirror OCI/package repositories for dependencies we use in our apps and during development so not only builds are faster but also we don't abuse free web endpoints with our CI/CD requests.
I agree. I switched to Codeberg but switched back after a few months. Funny enough, I found there to be more unreported downtime on Codeberg than GitHub.
That is what we have been doing for quite some time now, from what I gathered. Every time I see something becoming popular, I am like "Hmm, I've seen this before", and I really have. They just gave it a fancier name with a fancier logo and did some marketing and there you go, old is new.
I have published 4 open source projects thanks to the productivity boost from AI. No apps though, just things I needed in my line of work.
But I have been absolutely flooded with trailers for new and upcoming indie games. And at least one indie developer has admitted that certain parts of their game had used the aide of AI.
I also noticed sometimes when I think of writing something, I ask AI first if it exists, and AI throws up some link and when I check the link it says "made with <some AI>".
So I'm not sure what author is trying to say here but I definitely feel like I am noticing a rise in software output due to AI.
But with that said, I also am noticing the burden of taking care of those open source projects. Sometimes it feels like I took on a 2nd job.
I think a lot of software is being produced with AI and going unnoticed, they don't all end up on the front page of HN for harassing developers.
I got a SARS virus flying to Udon Thani in 2019. We were seated next to two thai guys who were so sick they could barely sit up straight. We offered them help and treats because they looked like they were about to vomit.
Plane lands, next day I'm sick. I was laid up for 2 weeks with fever, the shits, and I had a weird spontaneous cough for over 1 month after I got better.
I bet most of that plane got sick, and it was so damn avoidable.
The problem is there can he huge penalties for not flying when you booked. You might not be able to rebook your flight or hotel or days off so you're stuck either getting everyone sick or perhaps being out thousands of dollars or not going on vacation at all.
This is what I do. I'd rather have a linux machine with a webUI on top like this than a full blown proxmox/truenas/unraid set up (for now). I never expose my NAS to the internet, other than wireguard/tailscale, so an admin console on a port never really bothered me.
To keep it simple, just install Fedora or RHEL and you have a NAS already.
I'm not saying one is better than the other, just that there is now finally an appeal in Cockpit to be used as a NAS. I've been following its development for almost 10 years.
I'm always skeptical to new tech, I don't like how AI companies have reserved all memory circuits for X years, that is definitely going to cause problems in society when regular health care sector businesses can't scale or repair their infra, and the environmental impact is also a discussion that I am not qualified to get into.
All I can say for sure is that it is absolutely useful, it has improved my quality of life without a doubt. I stick to the principle that it's here to improve my work life balance, not increase output for our owners.
And that it has done, so far. I can accomplish things that would have taken me weeks of stressful and hyperfocused work in just hours.
I use it very carefully, and sparingly, as a helpful tool in my toolbox. I do not let it run every command and look into every system, just focused efforts to generate large amounts of boilerplate code that would require me to have a lot of docs open if I were to do it myself.
I definitely don't let it read or write my e-mails, or write any text. Because I always loved writing, and will never stop loving it.
It's here to stay, because I'm not alone in feeling this way about it. So the staunch AI-deniers are just wasting their time. Just like any other tech, it's going to be used against humans, against the already oppressed.
I definitely recognize that the tech has made some people lose their minds. Managers and product owners are now vibe coding thinking they can replace all their developers. But their code base will rot faster than they think.
I started treating long random bucketnames as secrets years ago. Ever since I noticed hackers were discovering buckets online with secrets and healthcare info.
I just started using hashes for names. The deployment tooling knows the "real" name. The actual deployment hash registers a salt+hash of that name to produce a pseudo-random string name.
~As far as I know, bucket names are public via certificate transparency logs.~ There are tools for collecting those names. Besides you'd leak the subdomain to (typically) unencrypted DNS when you do a lookup and maybe via SNI.
> Besides you'd leak the subdomain to (typically) unencrypted DNS when you do a lookup and maybe via SNI.
"Leak" is maybe a bit over-exaggerated, although if someone MitM'd you they definitely be able to see it. But "leak" makes it seem like it's broadcasted somehow, which obviously it isn't.
You'd need to check the privacy policy of your DNS provider to know if they share the data with anyone else. I've commonly seen source IP address consider as PII, but not the content of the query. Cloudflare's DNS, for example, shares queries with APNIC for research purposes. https://developers.cloudflare.com/1.1.1.1/privacy/public-dns... Other providers share much more broadly.
> No man-in-the-middle is needed [...] Check out passive DNS
How does one execute this "passive DNS" without quite literally being on the receiving end, or at least sitting in-between the sending and receiving end? You're quite literally describing what I'm saying, which makes it less of a "leak" and more like "others might collect your data, even your ISP", which I'd say would be accurate than "your DNS leaks".
There's a lot of online documentation about passive DNS. Here's one example
> Passive DNS is a historical database of how domains have resolved to IP addresses over time, collected from recursive DNS servers around the world. It has been an industry-standard tool for more than a decade.
> Spamhaus’ Passive DNS cluster handles more than 200 million DNS records per hour and stores hundreds of billions of records per month, providing you with access to a vast lake of threat intelligence data.
> collected from recursive DNS servers around the world
Yes, of course, because those DNS servers are literally receiving the queries, eg "receiving the data".
Again, there is nothing "leaking" here, that's like saying you leak what HTTP path you're requesting to a server, when you're sending a HTTP request to that server. Of course, that's how the protocol works!
Putting a secret subdomain in a DNS query shares it with the recursive resolver, who's privacy policy may permit them to share it with others. This is a common practice and attackers have access to the aggregated datasets. You are correct that third-party web servers or CDN could share your HTTP path, but I am not aware of any examples and most privacy policies should prohibit them from doing so. If your web server provider or CDN do this, change providers. DNS recursive resolvers are chosen client side, so you can't always choose which one handles the query. Even privacy-focused DNS recursive resolvers share anonymized query data. They remove the source IP address, since it's PII, but still "leak" the secret subdomain.
Any time you send secret data such that it travels to an attacker visible dataset it is vulnerable to attack. I call that a leak but we can use a different term.
This is all good and we'll on the IaC side,yes. But at the end of the day, buckets are also user facing resources, and nobody likes random directory / bucket names.
That's a contradiction, a bucket name being treated as a secret in IaC, while being a user facing resource. So no, they're not user facing resources.
If anyone wants them to be user facing resources, then treat them as such, and ensure they're secure, and don't store sensitive info on them. Otherwise, put a service infront of them, and have the user go through it.
The S3 protocol was meant to make the lives of programmers easier, not end users.
It would be nice if the other end of this could be addressed: a configurable policy to limit resolution of bucket names within an account namespace. Ideally, if someone doesn’t have permission to resolve a bucket name, they shouldn’t even be able to detect whether it exists.
It's a fine balancing act between getting the latest updates and avoiding supply chain attacks.
I completely understand the author here, because I'm actually also leaning more towards avoiding supply chain attacks than jumping on the latest CVEs.
It's just a gut feeling, rooted in 25 years of experience as a sysadmin, but I feel like a supply chain attack can do a lot more damage in general than most unpatched known vulnerabilities.
Just based on my own personal experiences, no real data.
I'll try to put words to it, but a supply chain attack is more focused, higher chance of infilitration. While a CVE very rarely is exploited en masse, and exploitation often comes with many caveats.
That combined with the current state of the world, where supply chain attacks seem to be a very high profile target for state actors.
and those rare zero-days can be treated as the exception, and dealt with quickly. It seems backwards to optimize for dependency change reaction time these days with the supply chain such an attractive target.
We first submitted the article to the CACM a while ago. The review process takes some time and "Twelve years of Docker containers" didn't have quite the same vibe.
Most of my friends who use codeberg are staunch cloudflare-opponents, but cloudflare is what keeps Gitlab alive. Fact of life is that they're being attacked non-stop, and need some sort of DDoS filter.
Codeberg has that anubis thing now I guess? But they still have downtime, and the worst thing ever for me as a developer is having the urge to code and not being able to access my remote. That is what murders the impression of a product like codeberg.
Sorry, just being frank. I want all competitors to large monopolies to succeed, but I also want to be able to do my job/passion.
reply