Salta al contenuto principale




Is Meta Scraping the Fediverse for AI?


Building on some initial reports coming from the FediPact account and Dropsite news, we dive into potential measures admins can take for their instances.


Is Meta Scraping the Fediverse for AI?


A new report from Dropsite News makes the claim that Meta is allegedly scraping a large amount of independent sites for content to train their AI. What’s worse is that this scraping operation appears to completely disregard robots.txt, a control list used to tell crawlers, search engines, and bots which parts of a site should be accessed, and which parts should be avoided. It’s worth mentioning that the efficacy of such lists depend on the consuming software to honor this, and not every piece of software does.

Meta Denies All Wrongdoing


Andy Stone, a communications representative for Meta, has gone on record by claiming that the list is bogus, and the story is incorrect. Unfortunately, the spread of Dropsite’s story is relatively small, and there haven’t been any other public statements about the list at this time. This makes it difficult to adequately critique the initial story, but the concept is nevertheless a wakeup call.

However, it’s worth acknowledging Meta’s ongoing efforts to scrape data from many different sources. This includes user data, vast amounts of published books, and independent websites not part of Meta’s sprawling online infrastructure. Given that the Fediverse is very much a public network, it’s not surprising to see instances getting caught in Meta’s net.

Purportedly Affected Instances


The FediPact account has dug in to the leaked PDF, and a considerable amount of Fediverse instances appear on the list. The document itself is 1,659 pages of URLs, so we were able to filter down a number of matches based on keywords. Please keep in mind that these only account for sites that use a platform’s name in the domain:

  • Mastodon: 46 matches
  • Lemmy: 6 matches
  • PeerTube: 46 matches

There are likely considerably more unique domain matches in the list for a variety of platforms. Admins are advised to review whether their own instances are documented there. Even if your instance’s domain isn’t on the list, consider whether your instance is federating with something on the list. Due to the way federation works, cached copies of posts from other parts of the network can still show up on an instance that’s been crawled.

Access the Leaked List


We are mirroring this document for posterity, in case the original article is taken offline.

Download (PDF)

Protective Measures to Take


Regardless of the accuracy of the Dropsite News article, there’s an open question as to what admins can do to protect their instances from being scraped. Due to the nature of the situation, there is likely no singular silver bullet to solve these problems, but there are a few different measures that admins can take:

  • Establish Community Terms of Service – Establish a Terms of Service for your instance that explicitly calls out scraping for the purposes of data collection and LLM training specifically. While it may have little to no effect on Meta’s own scraping efforts, it at least establishes precedence and a paper trail for your own server community’s expectations and consent.
  • Request Data Removal – Meta has a form buried within the Facebook Privacy Center that could be used to submit a formal complaint regarding instance data and posts being part of their AI training data. Whether or not Meta does anything is a matter of debate, but it’s nevertheless an option.
  • (EU-Only) Send a GDPR Form – Similar to the above step, but try to get the request in front of Meta’s GDPR representatives that have to deal with compliance.
  • Establish Blocking Measures Anyway: Even if private companies can still choose to disregard things like robots.txt and HTTP Headers such as X-Robots-Tag: noindex, you can still reduce the attack surface of your site from AI agents that do actually honor those things.
  • Set Up a Firewall: one popular software package that’s seeing a lot of recent adoption for blocking AI traffic is Anubis, which has configurable policies that you can adjust as needed to handle different kinds of traffic.
  • Use Zip Bombs: When all else fails, take measures into your own hands. On the server side, use an Nginx or Apache configuration to detect specific User Agents associated with AI, and serve them ever-expanding compressed archives to slow them down.

In all reality, fighting against AI scraping is still a relatively new problem that’s complicated by lack of clear regulation, and companies deciding to do whatever they want. The best we can do for our communities is to adopt protective measures and stay informed of new developments in the space.

ShareOpenly logo Share


in reply to Sean Tilley

i apologize if this is a stupid question, but if i have my posts set to followers only they can’t scrape it right?
in reply to bluejayway

Probably not, but the tradeoff is that you're limiting audience reach. Occasionally, this can also break context in public conversations, where someone might follow someone else who responds to you, but can't see your original post.



Firefox Has Moved to Firefox.com


cross-posted from: programming.dev/post/36435575

::: spoiler Comments
- Hackernews.
:::



Firefox Has Moved to Firefox.com


::: spoiler Comments
- Hackernews.
:::




Firefox Has Moved to Firefox.com


cross-posted from: programming.dev/post/36435575

::: spoiler Comments
- Hackernews.
:::



Firefox Has Moved to Firefox.com


::: spoiler Comments
- Hackernews.
:::




Firefox Has Moved to Firefox.com


cross-posted from: programming.dev/post/36435575

::: spoiler Comments
- Hackernews.
:::



Firefox Has Moved to Firefox.com


::: spoiler Comments
- Hackernews.
:::



in reply to fne8w2ah

I really don't care about people not being sober as long as they can function correctly.

Everything else is just strong arming people into the awful state of being conscious in this horrible reality.

in reply to x00z

I really don’t care about people not being sober as long as they can function correctly.


Regardless of the rest you wrote that I disagree with, she crashed her car on the way to the flight that she was removed from, blaming the steering. Not sure how that would qualify as "function correctly"

in reply to Laser

That's true. I read over it.

But in general I think we should stop judging people. Except the ones that crash their car of course.

in reply to x00z

It's fine until it isn't. It's not an acceptable risk to others health allow people to go around drinking and driving when they think it doesn't affect their judgement


Brazil's top court rules US laws do not apply to its territory


Brazil's Supreme Court ruled on Monday that foreign legislation did not have jurisdiction in its country, after the United States used a law to sanction a judge on the court.
Questa voce è stata modificata (3 settimane fa)

in reply to zero

Normalizing while the Zionists bomb their country.
in reply to zero

It was always the plan. Overthrow one dictator to replace him with ISIS-in-suits. Then capitulate to the colonizers for the promise of crumbs.
That’ll show ‘em.

in reply to Zephorah

We could have wiped them off the face of the Earth if we had pressed the attack with DDT for one more decade. Look how we did in America; Wiped malaria out in 1951. (Some of that was infrastructure improvements!)

Rest of the world was happily on their way and in 1972 the US said, "Fuck you, got my problem solved, banned." Of course they couldn't ban it in other countries but the US said, "No ban, no trade.", which is a de facto ban.

One more decade and a concerted push could have eradicated mosquitoes. Then we could have banned it forever. That stupid bitch Rachel Carlson and her book Silent Spring raked up a malarial death count to rival Hitler.




Protester arrested over ‘Plasticine Action’ T-shirt: ‘How ridiculous is this?’


It was only after Miles Pickering arrived at Scotland Yard following his arrest that the police realised they had got things embarrassingly wrong.

The T-shirt worn by the Brighton engineer did not express support for a proscribed terrorist group, instead the words on it read “Plasticine Action” and inside the letter “o” was an image of the stop-motion character Morph giving two thumbs up.

Speaking to the Guardian, Pickering admitted it was designed to be an easy mistake to make, appearing to look like the logo of Palestine Action, the protest group banned under terrorism legislation last month, but text underneath the logo reads: “We oppose AI-generated animation.”

in reply to snugglesthefalse

It's really not the 'where', it's the general thought ((less)(ness)) we should not be slightly relieve at other people doing stupid shit, we should -as people- make a stand against stupid shit.

I think we should unite against stupid shit being laid upon us by greedy people rather than laugh at each others' misery.


in reply to null

Hmm. This person seems sad and very anti-American, but not much of a shitposter, which surprises me. There have been Hexbear types that include ISIS in the set of Western adversaries that are actually the good guys.
Questa voce è stata modificata (3 settimane fa)


Woman arrested in Bali over cocaine allegedly smuggled in sex toy, could face death penalty if convicted


The officers allegedly found 3.1 pounds of cocaine inside a sex toy hidden in her genitals and in her underwear. Police also accused her of smuggling dozens of ecstasy pills



in reply to Davriellelouna

Frankly, I wish we just abolished the damn monarchy already. I hate that we still have kings and queens in a democratic country even if they are just ceremonial. They don't deserve all this media attention just because they got a lucky birth

Treat all of them as one would any other citizen and be done with it. Unfortunately, the monarchy still faces popular support towards existing, though I hope it changes sooner or later

in reply to SkyeStarfall

Aren’t they financially compensated by the tax payers for being in the monarchy?


in reply to schizoidman

Ah yes "The Porn Loophole", was one of my favorites , I should still have it on a DVD somewhere.
in reply to schizoidman

Well they're going to suddenly see a lot of transexual porn streaming through Antarctica starting.... >click!< NOW.



Firefox Has Moved to Firefox.com


::: spoiler Comments
- Hackernews.
:::
Questa voce è stata modificata (2 settimane fa)



Apple Revokes EU Distribution Rights for Torrent Client, Developer Left in the Dark * TorrentFreak


While alternative app stores operate independently and are required by EU law, Apple is still in a position to exert some control. This became apparent a few weeks ago, when iTorrent users suddenly ran into trouble when installing the app.


Thought this was an interesting story, since it's pretty analagous to the recent Android situation, with third party app stores being enabled to some extent, but the company retaining ultimate censorship power.

in reply to chicken

it's not an alternative if they still have final say.

it's also not your property if the company can dictate what you run on it either. Stop giving these scum your money.

in reply to metaStatic

Yeah, neither Android nor iOS is good. We should all be buying linux devices, like this: starlabs.systems/pages/starlit…
Questa voce è stata modificata (2 settimane fa)
in reply to SuperDuperKitten

If it's Sailfish OS (Xperias or Jollaphones, updates are paid), apart from apps (hit or miss if it's popular enough, pure miss if it isn't), everything works fine (I guess, I haven't tried it).

If it's anything else, it's still murky.

in reply to Norah (pup/it/she)

It isn't. I don't particularly care for phones, and nobody mentioned phones specifically.

Edit: Though there are plenty of linux phones or linux for android phones.

Sadly, there are very few Linux tablets, so we thought we'd give an option.

Questa voce è stata modificata (2 settimane fa)
in reply to Norah (pup/it/she)

Android is not inherently a phone OS, it works on tablets too. Fair about iOS, though it's not unusual for folks to refer to the tablet OS as that, or just use it generally.
Questa voce è stata modificata (2 settimane fa)
in reply to Lime Buzz (fae/she)

I feel like you're just being obtuse on purpose. There are many people, myself included, who would use a Linux phone if the OS was there and you could get one with flagship specs. As it stands, you cannot.
in reply to Norah (pup/it/she)

sighs No, we aren't. We are glad, but honestly we are glad there are linux phones out there, they are easier to search for than linux tablets though.
in reply to Lime Buzz (fae/she)

They don't have any in the U.S unfortunatly. They wont sell the radios to access our cell networks to companies that wont do what they want them to do. Like lock bootloaders, ban apps they dont like, etc.
Questa voce è stata modificata (2 settimane fa)



Former Silicon Valley CEO( IRL Social Media App) Charged with Fraud and Obstruction of Justice


According to court documents, Abraham Shafi, 38, of Pepeekeo, Hawaii, allegedly committed fraud in connection with Get Together’s 2021 “Series C” funding round, which raised $170 million at a valuation of over $1 billion. In seeking investment, Shafi told potential investors that IRL was spending only $50,000 a month in paid advertising and that user signups “were not incentivized or paid.” However, Shafi had spent millions of dollars on paid advertising in the form of incentive advertising, a form of advertising in which users are provided a reward in a third-party app if they download IRL. In the lead up to Series C, Shafi asked his vendor for a “big burst” of ads for “a few days” to drive more installs of the IRL app. During the Series C process, investors specifically asked about paid advertising, and Shafi falsely responded that “[u]nlike other apps that spend aggressively to acquire new users, we spend very little.” Shafi concealed IRL’s spending on incentive ads by having them invoiced to a third-party firm, ensuring that the nature and amount of the expense did not appear on IRL’s ledger.

Shafi continued to conceal the amount that IRL was spending in incentive ads after the Series C closed, instructing an IRL employee to create false invoices that listed the ad spending as being related to infrastructure, or “infra costs,” and falsely telling his investors that the money spent on incentive ads had instead been used for other forms of advertising. When the SEC opened an investigation into IRL, Shafi restored his cell phone to a previously saved backup, resulting in the deletion of records, and instructed other IRL employees to lie about his involvement in the scheme.

Shafi is charged with wire fraud, securities fraud, and obstruction. If convicted, he faces a maximum penalty of 20 years in prison on each count. A federal judge will determine any sentence after considering the U.S. Sentencing Guidelines and other statutory factors.

Questa voce è stata modificata (2 settimane fa)



Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


cross-posted from: programming.dev/post/36433609


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.




Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.





Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


cross-posted from: programming.dev/post/36433609


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.




Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.





Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


cross-posted from: programming.dev/post/36433609


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.




Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.





Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


cross-posted from: programming.dev/post/36433609


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.




Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.





A lot of legit websites saying they a bad certificate


Just what the title says I tried cs.rin.ru, fitgirlrepack and a whole lot of others who are legit but bad certificate. Why? EDITED: The error is SSL_ERROR_BAD_CERT_DOMAIN
Questa voce è stata modificata (2 settimane fa)
in reply to justinthegeek

Can it be because I updated the certificate of a website I am developing?
Questa voce è stata modificata (2 settimane fa)
in reply to Panda1606

On your local system? In that case yeah, you might have fucked something up, if you for example replaced a root certificate authority or something.
in reply to Panda1606

SSL_ERROR_BAD_CERT_DOMAIN means incorrect SAN information, proxying, or DNS manipulation is occurring.
You could compare what you see in the browser and what you see via something like:
$ openssl s_client -showcerts -connect cs.rin.ru:443

You could also check the DNS resolution and traceroute to see how you are getting there to confirm if DNS is being effected or you are being proxied:
$ dig cs.rin.ru @127.0.0.1 A
$ mtr cs.rin.ru



How Trump’s Anti-Environment Crusade Enriches Drug Traffickers





Itch.io blocks adult game creator profiles for UK users while keeping some games accessible: British gamers can play the spicy games but can't peek at who makes them.


  • Itch.io now blocks UK users from viewing NSFW game creator profiles, though some individual adult games remain accessible after age verification.
  • The restrictions started in August and align with the UK's Online Safety Act requiring stricter age checks for adult content.
  • Developers lose UK discoverability while players can't browse creator catalogs, forcing them to rely on direct links shared elsewhere.


I Am An AI Hater. This is considered rude, but I do not care, because I am a hater.


::: spoiler Comments
- Hackernews
:::
#AII
Questa voce è stata modificata (2 settimane fa)


The Top 100 [Gen AI] Consumer Apps: Google and Grok are catching up to ChatGPT


#AII
Questa voce è stata modificata (2 settimane fa)


OpenAI and Anthropic publish findings from joint safety tests of each other's models, aimed at surfacing blind spots in their internal evaluations


OpenAI.

In early summer 2025, Anthropic and OpenAI agreed to evaluate each other's public models using in-house misalignment-related evaluations. We are now releasing our findings in parallel. The evaluations we chose to run focused on propensities related to sycophancy, whistleblowing, self-preservation, and supporting human misuse, as well as capabilities related to undermining AI safety evaluations and oversight. In our simulated testing settings, with some model-external safeguards disabled, we found OpenAI's o3 and o4-mini reasoning models to be aligned as well or better than our own models overall. However, in the same settings, we saw some examples of concerning behavior in their GPT-4o and GPT-4.1 general-purpose models, especially around misuse. Furthermore, with the exception of o3, all the models we studied, from both developers, struggled to some degree with sycophancy. During the testing period, GPT-5 had not yet been made available.
#AII



Israel launches fresh airstrikes in Damascus countryside in Syria


Israeli warplanes struck several sites near the town of Al-Kiswah in the Damascus countryside in southern Syria on Wednesday evening, local media said, Anadolu reports.


Archived version: archive.is/newest/middleeastmo…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.