We apologize for a period of extreme slowness today. The army of AI crawlers just leveled up and hit us very badly.
The good news: We're keeping up with the additional load of new users moving to Codeberg. Welcome aboard, we're happy to have you here. After adjusting the AI crawler protections, performance significantly improved again.
like this
reshared this
Codeberg
in reply to Codeberg • • •Lapo Luchini likes this.
reshared this
SparkIT, Oblomov, Khrys, CatSaladππ₯ (D.Burch), webhat, moonglum, Lapo Luchini, 16180339887, eternaltyro, The 1414° Code Forge, Gianmarco Gargiulo, Conny Duck e FKA ZOG reshared this.
Codeberg
in reply to Codeberg • • •reshared this
Kropotkinson (we are all π΅πΈ), Oblomov, Khrys, eternaltyro e Gianmarco Gargiulo reshared this.
Codeberg
in reply to Codeberg • • •We have a list of explicitly blocked IP ranges. However, a configuration oversight on our part only blocked these ranges on the "normal" routes. The "anubis-protected" routes didn't consider the challenge. It was not a problem while Anubis also protected from the crawlers on the other routes.
However, now that they managed to break through Anubis, there was nothing stopping these armies.
It took us a while to identify and fix the config issue, but we're safe again (for now).
reshared this
Oblomov e eternaltyro reshared this.
eternalyperplxed
in reply to Codeberg • • •Michael Simons
in reply to Codeberg • • •Codeberg
in reply to Codeberg • • •For the load average auction, we offer these numbers from one of our physical servers. Who can offer more?
(It was not the "wildest" moment, but the only for which we have a screenshot)
mei | fully hingeless architecture now in production likes this.
reshared this
Oblomov, mei | fully hingeless architecture now in production e scy reshared this.
Mx Autumn
in reply to Codeberg • • •montrak
in reply to Codeberg • • •DamonHD
in reply to Codeberg • • •Kevin
in reply to Codeberg • • •ouch. This remains a cat-and-mouse game.
At least having them solve the Anubis challenge does cost them extra resources, but if they can do that at scale, it doesn't promise a lot of good.
Askaaron
in reply to Codeberg • • •Mason Loring Bliss
in reply to Codeberg • • •Xe
in reply to Codeberg • • •Codeberg
in reply to Xe • • •Xe
in reply to Codeberg • • •sam
in reply to Codeberg • • •Sponsor @Xe on GitHub Sponsors
GitHubCodeberg
in reply to sam • • •Bredroll
in reply to Codeberg • • •Hakan BayΔ±ndΔ±r
in reply to Codeberg • • •This is a great number, but I have seen higher in my career. Unfortunately I either have no screenshots or lost what I already have.
5831.24 is pretty good though. Congrats for hitting, hope your head doesn't hurt. π
lindesbs #FckAFD
in reply to Codeberg • • •Codeberg
in reply to lindesbs #FckAFD • • •meta/hardware/achtermann.md at main
Codeberg.orgAurora
in reply to Codeberg • • •Sharlatan
in reply to Codeberg • • •Jann Horn
in reply to Codeberg • • •Stephen Foskett
in reply to Codeberg • • •arialdo
in reply to Codeberg • • •SKC π³οΈβπ
in reply to Codeberg • • •odo2063
in reply to Codeberg • • •ΓΓ³r SigurΓ°sson
in reply to Codeberg • • •Lenny
in reply to Codeberg • • •It's easy to get them (e.g. from projectdiscovery)
Codeberg
in reply to Lenny • • •Evey, XFS fanby
in reply to Codeberg • • •Ludovic :Firefox: :FreeBSD:
in reply to Codeberg • • •GNU/ηΏ ζη³
in reply to Codeberg • • •>now that they managed to break through Anubis
There was no break - it's a simple matter of changing the useragent, or if for some reason there's still a challenge, simply utilizing the plentiful computing power that is available on their servers (which far outstrips the processing power mobile devices have).
Anubis is evil and is proprietary malware - please do not attack your users with proprietary malware.
If you want to stop scraper bots, start serving GNUzip bombs - you can't scrape when your server RAM is full.
dd if=/dev/zero bs=1G count=10 | gzip > /tmp/10GiB.gz
dd if=/dev/zero bs=1G count=100 | gzip > /tmp/100GiB.gz
dd if=/dev/zero bs=1G count=1025 | gzip > /tmp/1TiB.gz
nginx; #serve gzip bombs
location ~* /bombs-path/.*\.gz {
add_header Content-Encoding "gzip";
default_type "text/html";
}
#serve zstd bombs
location ~* /bombs-path/.*\.zst {
add_header Content-Encoding "zstd";
default_type "text/html";
}
Then it's a matter of bait links that the user won't see, but bots will.
Codeberg
in reply to GNU/ηΏ ζη³ • • •@Suiseiseki Anubis is the option that saved us a lot of work over the past months. We are not happy about it being open core or using GitHub sponsors, but we acknowledge the position from the maintainer: codeberg.org/forgejo/discussioβ¦
Calling our usage of anubis an attack on our users is far-fetched. But feel free to move elsewhere, or host an alternative without resorting to extreme measures. We're happy to see working proof that any other protection can be scaled up to the level of Codeberg. ~f
Anubis - using proof-of-work to stop AI crawlers
Codeberg.orgCodeberg
in reply to Codeberg • • •@Suiseiseki BTW, we're also actively following the work around iocaine, e.g. come-from.mad-scientist.club/@β¦
However, as far as we can see, it does not sufficiently protect from crawling. As the bot armies successfully spread over many servers and addresses, damaging one of them doesn't prevent the next one from doing harmful requests, unfortunately. ~f
Pluto
in reply to Codeberg • • •I believe @Suiseiseki is not referring to codebergs usage of anubis specifically, rather shares fsfs' stance (which I don't share) that Anubis "acts like malware" for making "calculations that a user does not want done": fsf.org/blogs/sysadmin/our-smaβ¦
fsf saying fsf things π
glenngillen
in reply to Codeberg • • •@Suiseiseki@freesoftwareextremist.com βWe are not happy about it being open core β¦ GH sponsorsβ
Do you have better suggestions for how we can have a sustainable OSS model that isnβt entirely dependent on core contributors of major projects having full time jobs and then supporting everyone else in whatever free time they might have?
Stefano Zacchiroli
in reply to Codeberg • • •Codeberg
in reply to Stefano Zacchiroli • • •Stefano Zacchiroli reshared this.
Bradley Kuhn
in reply to Codeberg • • •I have a follow up question, though, @Codeberg, re: @zacchiro's question. Is it *possible* that giant human farms of Anubis challenge-solvers actually did it? Or did it all happen so fast that there is no way it could be that?
#Huawei surely could fund such a farm and the routing software needed to get the challenge to the human and back to the bot quickly enough that it might *seem* the bot did it.
Codeberg
in reply to Bradley Kuhn • • •@bkuhn
Anubis challenges are not solved by humans. It's not like a captcha. It's a challenge that the browser computes, based on the assumption that crawlers don't run real browsers for performance reasons and only implement simpler crawlers.
So at least one crawler now seems to emulate enough browser behaviour to make it pass the anubis challenge. ~f
@zacchiro
Bradley Kuhn
in reply to Codeberg • • •I get it now.
Thanks for taking the time to clue me in.
I'm lucky that I haven't needed to learn about this until now and I'm so sorry you've had to do all this work to fight this LLM training DDoS!
Cc: @zacchiro
HenrΓ½ Γlson
in reply to Codeberg • • •Codeberg
in reply to HenrΓ½ Γlson • • •HenrΓ½ Γlson
in reply to Codeberg • • •Steven Sandoval
in reply to Codeberg • • •Codeberg
in reply to Steven Sandoval • • •altf4
in reply to Codeberg • • •ec4x
in reply to Codeberg • • •Chamomile π
in reply to Codeberg • • •Julien AvΓ©rous β π«π·πͺπΊπΊπ¦
in reply to Codeberg • • •mikeTesteLinuxQlub
in reply to Codeberg • • •Andreas Fink
in reply to Codeberg • • •NerdNextDoor
in reply to Codeberg • • •Good luck with fighting the bots. I recently moved my OSDev project and site to Codeberg from GitHub and so far itβs been great!
Thank you for helping the open-source community!
Woozle Hypertwin
in reply to Codeberg • • •Now what needs to happen is that part of the challenge computes a known answer while the other part does useful computational work, and there's no way for the 'bot to tell which is which -- so it has to do both.
That could maybe contribute computing power to something important like Folding@Home, or even just something pretty like Electric Sheep.
Codeberg
in reply to Woozle Hypertwin • • •@woozle This topic was discussed in the past. The problem is that cutting useful work in small chunks AND verifying it is very difficult. It might work for some cryptocurrencies, but that's nothing we're interested in.
A proof of concept is more than welcome, but I don't yet know if anyone found a suitable task for this.
~f
Woozle Hypertwin
in reply to Codeberg • • •Woozle Hypertwin
in reply to Woozle Hypertwin • • •(on further thought) ...or is it?
Might that work? I guess there could be problems with trustability of the "unknown" answers -- does that look like the main issue to be solved?
Codeberg
in reply to Woozle Hypertwin • • •@woozle Remember that users want to get through the challenge page quickly. So the more samples you have, the simpler the individual problems need to be.
~f
Bredroll
in reply to Codeberg • • •Krzysztof Sakrejda
in reply to Codeberg • • •argv minus one
in reply to Codeberg • • •These companies are evidently willing to pay an absolutely staggering cost to do their scraping.
I wonder, are they paying with their own money, or are they βborrowingβ some unsuspecting strangers' compromised computers/routers/etc to do the work?
Trolli Schmittlauch π¦₯
in reply to Codeberg • • •Sharlatan
in reply to Codeberg • • •δΎδΊπ¦
in reply to Codeberg • • •ozamidas
in reply to Codeberg • • •boy Huawei is so nasty
I wonder who are the biggest offenders on this matter...
Aleksandra Fedorova
in reply to Codeberg • • •"AI crawlers learned how to solve the Anubis challenges"
Why does EU discuss chat control and not AI crawlers control again?
movq reshared this.
pΜ·tΜ΅rΜ΄aΜ΅cΜ·eΜΆ
in reply to Codeberg • • •eBPF could be more effective and easy on the CPU, since it acts on a way lower network layer. Anubis kinda has it's limits and it's way too easy to circumvent (as you found out)
Maybe it's worth it to consider eBPF (if not already happened)
And thanks guys for your work. I'm a proud supporter and I'll continue to support your work. Companies shouldn't control the Open Source space
Akseli β
in reply to Codeberg • • •ulveon.net
in reply to Codeberg • • •Anubis is extremely easy to bypass, you just have to change the User-Agent to not contain Mozilla, please get proper bot protection.
ulveon.net/p/2025-08-09-vanguaβ¦
This post talks briefly about other alternatives. Try Berghain, Balooproxy, or go-away.
Codeberg
in reply to ulveon.net • • •Tobias Hellgren
in reply to Codeberg • • •varx/tech
in reply to Codeberg • • •Have you looked into serving these LLM crawlers alternative versions of the site, with poisoned data? (And rate-limiting, of course.) I know it would be additional work for you to implement this, but... it might be effective.
I'm thinking you could have a precomputed set of 1000 different poison repos that get served up randomly, each of which is a Markov-chain-scrambled version of the files in a real repo.
(I wrote codeberg.org/timmc/marko to do something similar to the contents of my blog postsβa Markov model on either characters or words.)
marko
Codeberg.orgBradley Kuhn
in reply to Codeberg • • •π²π€¬ re: what's happened to @Codeberg today.
The AI ballyhoo *is* a real DDoS against one of the few code hosting sites that takes a stand against slurping #FOSS code into LLM training sets β in violation of #copyleft.
Deregulation/lack-of-regulation will bring more of this. β plenty of blame to go around, but #Microsoft & #GitHub deserve the bulk of it; they trailblazed the idea that FOSS code-hosting sites are lucrative targets.
giveupgithub.org
#GiveUpGitHub #FreeSoftware #OpenSource
Give Up GitHub - Software Freedom Conservancy
giveupgithub.orgserk
in reply to Bradley Kuhn • • •@bkuhn if anyone need it, there is this gist showing how to pseudo-automate repository bulk deletion.
gist.github.com/mrkpatchaa/637β¦
and this tool
reporemover.xyz very handy
Bulk delete github repos
GistBradley Kuhn
in reply to serk • • •IMO, @serk, the better move is not to delete the repository, but to do something like I've done here with my personal βsmall hacksβ repository:
github.com/bkuhn/small-hacks
I'm going to try to make a short video of how to do this, step by step. The main thing is that rather than 404'ing, the repository now spreads the message that we should #GiveUpGitHub!
GitHub - bkuhn/small-hacks: Give Up GitHub
GitHubBrett Sheffield (he/him)
in reply to Bradley Kuhn • • •@bkuhn @serk When @librecast moved our repos I wrote a script to wipe the GitHub repo and replace it with the #GiveUpGitHub README:
codeberg.org/librecast/giveupgβ¦
giveupgithub.sh
Codeberg.orgSolinvictus
in reply to Codeberg • • •Codeberg
Unknown parent • • •@gturri Anubis sends a challenge. The browser needs to compute the answer with "heavy" work. The server then has "light" work and verifies the challenge.
As far as we can tell, the crawlers actually do the computation and send the correct response. ~f
Alex
in reply to Codeberg • • •RyanParsley
in reply to Codeberg • • •kajer
in reply to Codeberg • • •lemgandi
in reply to Codeberg • • •Daniel Lakeland
in reply to Codeberg • • •Spammy sources could be those that open new connections too often, transfer too many bytes, or have too many open active connections. All of those kinds of things can be accounted in nftables.
A Fine Day to build a fence π³οΈββ§οΈπ³οΈβππΊπ¦π΅πΈ
in reply to Codeberg • • •bit101
in reply to Codeberg • • •mosher
in reply to Codeberg • • •Watchful Citizen
in reply to Codeberg • • •