LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI
Meta has scraped data from the most-trafficked domains on the internet —including news organizations, education platforms, niche forums, personal blogs, and even revenge porn sites—to train its artificial intelligence models, according to a leaked list obtained by Drop Site News.
By scraping data from roughly 6 million unique websites, including 100,000 of the top-ranked domains, Meta has generated millions of pages of content to use for Meta’s AI-training pipeline.
The sites that Meta scrapes consist of copyrighted content, pirated content, and adult videos, some of whose content is potentially illegally obtained or recorded, as well as news and original content from prominent outlets and content publishers.
They include mainstream businesses like Getty Images, Shopify, Shutterstock, but also extreme pornographic content, including websites advertising explicit sexual content and humiliation porn that exploits teenagers.
LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI
The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal.Murtaza Hussain (Drop Site News)
like this
Technology reshared this.
like this
Finland Tops Nextcloud’s First Digital Sovereignty Index
Nextcloud checks about 50 open-source apps—file storage, groupware, chat/video, notes, project management, and so on. Each tool is weighted the same, and then the category scores are averaged into a single national figure. That design favors a balanced ecosystem over dominance in just one niche.However, according to Nextcloud, the method favors SMEs and hobbyists—servers hidden behind firewalls, VPNs, or hosted by large enterprises don’t always show up—yet the index still offers a “pretty loud signal” about grassroots tech choices.
Finland Tops Nextcloud’s First Digital Sovereignty Index
Nextcloud’s Digital Sovereignty Index ranks countries by self-hosted tech use, with Finland, Germany, and the Netherlands leading the way in digital independence.Bobby Borisov (Linuxiac)
'This Verdict Is a Wake-Up Call:' Jury Trial Finds Meta Breached State Privacy Law in Class Action Against Fertility App | Law.com
'This Verdict Is a Wake-Up Call:' Jury Trial Finds Meta Breached State Privacy Law in Class Action Against Fertility App
A San Francisco federal court jury on Friday found Meta Platforms Inc. violated the California Invasion of Privacy Act in a landmark data privacy class action, which accused the Big Tech giant of illegally mining sensitive sexual and reproductive hea…Kat Black (The Recorder)
Why is WebRTC enabled by default?
In about:config media.peerconnection.enabled is set to true by default which, by my understanding and that of tools like ipleak.net, means both VPN and home IP addresses will be exposed during useage on platforms like PeerTube.
Is this an oversight, is my understanding wrong, or is this intentional for some reason? Seems like the opposite of user expectation, particulary given the WebRTC settings option is hidden on librewolf.
AI industry horrified to face largest copyright class action ever certified
AI industry horrified to face largest copyright class action ever certified
Copyright class actions could financially ruin AI industry, trade groups say.Ashley Belanger (Ars Technica)
like this
The foreign governments warning citizens about the dangers of visiting crime-ridden Britain
The foreign governments warning citizens about the dangers of visiting crime-ridden Britain
Australia, France, Canada and even Mexico are advising their citizens to exercise caution when travelling to the UKNatasha Leake (The Telegraph)
From YouTube to boob tube How the Kremlin’s slow-motion YouTube block pushed Russians back into the arms of television
From YouTube to boob tube
How the Kremlin’s slow-motion YouTube block pushed Russians back into the arms of televisionMeduza
Wildfires force Turkey to shut Dardanelles Strait to shipping
Wildfires force Turkey to shut Dardanelles Strait to shipping
The Dardanelles Strait serves as a key route for commercial shipping between Europe and Asia.Jaroslav Lukiv (BBC News)
like this
Spain ombudsman probes town's ban on Muslim celebrations
Jumilla has banned religious events in public sporting spaces, which is seen as a veiled attempt to prevent Muslim gatherings. Local authorities said the move was to "promote and preserve the traditional values."
Archived version: archive.is/newest/dw.com/en/sp…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
US has 'no plans' to recognise Palestinian statehood, JD Vance says on visit to UK
The meeting comes amid debates between Washington and London about the best way to end the wars between Russia and Ukraine, as well as Israel and Hamas.
Archived version: archive.is/newest/euronews.com…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
US | Someone keeps stealing, flying, fixing and returning this California man's plane. But why?
Someone has stolen Jason Hong's 1958 Cessna Skyhawk plane at least four times, taking the red single-engine plane for a joyride, and then returned it at airports in Southern California. Hong, and police, are baffled as to who, and why?
Florida farm identified as source of raw milk that sickened 21
The Florida Department of Health has identified Keely Farms Dairy as the source of raw milk linked to 21 cases of E
Sheinbaum rejects US ‘invasion’ after Trump orders military to target Mexico cartels
Mexico’s president says ‘there will be no invasion … it’s absolutely off the table’ after news reports of order
Archived version: archive.is/newest/theguardian.…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Media Capitulation Index: Who Owns the Media
Who Owns the Media
Who owns the 35 most powerful media companies in America? Learn how these giants came to dominate the U.S. media landscape through mergers, acquisition and manipulation.Free Press Action Fund
Tried out a filter the other day...
The Secret History of Tor: How a Military Project Became a Lifeline for Privacy
The Secret History of Tor: How a Military Project Became a Lifeline for Privacy
A story of secrecy, resistance, and the fight for digital freedom.The MIT Press Reader
ChatGPT will apologize for anything
ChatGPT will apologize for anything
ChatGPT will apologize for anything - even advice it definitely didn't give, and stuff it definitely didn't do. It very much regrets its recommendation that we hire a giraffe as CEO.Janelle Shane (AI Weirdness)
Battlefield 6 requires secure boot to be enabled and active
like this
Every time this franchise comes up I just find myself remembering all the fun I had with BF2 and 2142. I wanna play those again...
BF4 was actually pretty great fun too.
Now I'm just so over it.
What is the Present? A Debate on AI.
-- In a world where stealing is considered legal, is there at least something real and unique?
-- How can a tool originally created for control and greed save the world without taking away people's freedom and souls in return? Do you think the magic wand will be free?
-- You may end up like those people who believe in fairies if you continue to believe that AI does not pose a serious threat.
-- This post may be deleted in a few seconds, maybe later, but a reason will always be found, and if not, they will make one up on the fly.
-- Well, here is my favorite proverb about the bear: the bear does not negotiate with the bees, when buying honey, he takes and steals the entire hive and eats everything without a trace, and he really does not like it when the bees become impudent and try to hide the remains of the honey from him.
Genova: svelato il misterioso segnale captato dai radioamatori un anno fa
Dopo oltre un anno di analisi, indagini e confronti anche con esperti internazionali, l’Associazione Ricerca Italiana Aliena (A.R.I.A.), guidata dall’ufologo Angelo Maggioni, annuncia di aver risolto uno dei casi più misteriosi degli ultimi tempi: il segnale anomalo captato a Genova da un radioamatore nel febbraio 2024.
A supporto dell’inchiesta sono stati coinvolti vari consulenti, tra cui un esperto di effetti speciali e un ingegnere del suono che aveva individuato alcune anomalie nei dati. Fondamentale è stato anche il confronto con il SETI (Search for Extraterrestrial Intelligence), e in particolare con il dott. Graziano Chiaro dell’INAIF Milano (intervistato dalla stessa associazione qualche tempo fa) , referente per il SETI Italia. Fin da subito erano state avanzate due ipotesi: o si trattava di un segnale davvero anomalo… oppure di un’interferenza provocata da velivoli militari in alta quota.
Le più recenti informazioni confermano che in quei giorni erano attivi voli militari sopra il Nord Italia, probabilmente legati al conflitto in Ucraina e ai corridoi aerei utilizzati per missioni militari europee. Secondo quanto ricostruito, è molto probabile che il misterioso segnale si sia sovrapposto a una normale trasmissione tra radioamatori, creando un’anomalia solo apparente. «Non ci sono stati altri casi simili nelle stesse aree – da Loano a Genova, da La Spezia a Milano e Torino – nemmeno nei momenti in cui abbiamo registrato un picco di avvistamenti UFO tra giugno e luglio», spiega Angelo Maggioni. Tra questi, episodi degni di nota come l'avvistamento di un grande oggetto non identificato da parte di Nicolas P. a Genova, e un altro evento tra Ventimiglia e Nizza.
«Tutti questi elementi ci portano oggi a chiudere il caso: per noi, quel segnale ha un’origine spiegabile. Non c'è mistero, e non ha senso alimentare speculazioni inutili», precisa Maggioni. «A.R.I.A. lavora da sempre con serietà e rigore: evitiamo il sensazionalismo, perché non fa bene né alla ricerca né all’informazione».
L’associazione dichiara quindi ufficialmente declassato il caso da fenomeno anomalo a fenomeno identificato, prendendo le distanze da chi, ancora oggi, tenta di alimentare narrazioni esagerate e infondate.
Corri e basta? Nessun problema
- Preparazione
- Carico
- Scarico
- Gara
In poche parole ci si prepara al carico *di sforzo che il corpo dovra ricevere. In termini di chilometri e di qualità delle uscite e poi si da il tempo al corpo di *recuperare, nella fase di scarico e poi per chi gareggia c'è la gara dove il corpo è pronto a sfoggiare le migliorire ricevute nelle fasi precedenti. Per chi non corre invece si avra un bel avanzamento di qualità nella corsa.-
Ah, sunshine...
copymyjalopy likes this.
Proton is vibe coding some of its apps.
cross-posted from: lemmy.dbzer0.com/post/50693956
::: spoiler Transcript
A post by [object Object] (@zzt@mas.to) saying:
courtesy of @davidgerard@circumstances.run, Proton is now the only privacy vendor I know of that vibe codes its apps:
In the single most damning thing I can say about Proton in 2025, the Proton GitHub repository has a “cursorrules” file. They’re vibe-coding their public systems. Much secure!
I am once again begging anyone who will listen to get off of Proton as soon as reasonably possible, and to avoid their new (terrible) apps in any case. circumstances.run/@davidgerard…It has a reply by the author saying:
in an unsurprising update for those familiar with how Proton operates, they silently rewrote their monorepo’s history to purge .cursor and hide that they were vibe coding: github.com/ProtonMail/WebClien…given the utter lack of communication from Proton on this, I can only guess they’ve extracted .cursor into an external repository and continue to use it out of sight of the public
:::
GitHub - ProtonMail/WebClients at 2a5e2ad4db0c84f39050bf2353c944a96d38e07f
Monorepo hosting the proton web clients. Contribute to ProtonMail/WebClients development by creating an account on GitHub.GitHub
like this
I dont see any problem with AI coding. It can be done without the editor supporting it by just asking for a function like please implement a sort function given a list of numbers.
Proton code is open source, so all AI agents have already read everything. You as user just have to do the code review, fix it and test. I am not seeing any problem here.
MovieBox is Still Alive and Preparing to Fight Intellectual Property Thieves
Reports that the Nigerian Copyright Commission had recently shut down a pirate site didn't sound especially interesting. Operating under MovieBox branding, currently seen on endless domains, the local site reportedly received over 130 million visits in the previous three months and was actually still in business. Indeed, plans to develop the MovieBox brand began last month, with an application for intellectual property protection underpinning all kinds of business opportunities.
MovieBox is Still Alive and Preparing to Fight Intellectual Property Thieves * TorrentFreak
MovieBox hasn't been shut down, it's alive and well and preparing for war with potential intellectual property thieves.Andy Maxwell (TF Publishing)
Video link posts or embedded self hosted video: how does federation of this content work?
Are video files cached or federated in any way?
I want to make posts that include video, and those videos I wish to upload on my own webserver to not rely on external links or expiration dates.
But I fear for bandwith, and I want to know if the videos will be cached on the instance or if every user will be a full web request of the video (that I can of course mitigate via good compression, and/or having a dedicated CDN that won't empty my pockets).
Videos are not stored in every server. Nobody would have been able to pay for the bills if that was the case.
The videos and images stay on the origin, and are fetched from the origin.
Afaik admins that enable the image proxy cache only the images, not videos.
This is just a perfect advertisement for Debian 😀
You have a computer, but no freedom?
Parody of a popular clip from the American-Malayalee television series 'Akkarakazhchakal' (https://en.wikipedia.org/wiki/Akkara_Kazhchakal) advertising Debian. Those unaware, watch the the original...peertube.debian.social
like this
I'm starting to realize that advertising and ethical products don't mix.
We shouldn't be in a rush to be scumbags like our oppressors.
Great video, nonetheless.
Come un alieno.
👤 Quando parli di Linux, Fediverso, Privacy, ecc ... ti guardano strano
Ci sono momenti in cui ti accorgi che il mondo attorno a te non parla la tua lingua.
Non quella fatta di parole, ma quella fatta di passioni.
Quando dici "sto lavorando su un server", "gestisco un'istanza Fediverse", "mi piace la decentralizzazione", vedi subito gli sguardi cambiare.
Ti osservano come se stessi parlando in codice binario, come se stessi perdendo tempo in un mondo tutto tuo, inutile.
E invece no.
Quel mondo ha valore, senso, umanità, costruzione, appartenenza.
🧠 Non mi sto isolando: mi sto esprimendo
Quando scegli Linux, il software libero, il Fediverso, non lo fai per moda.
Lo fai perché credere nella libertà digitale oggi è un atto rivoluzionario.
Lo fai perché vuoi essere parte di qualcosa che non è controllato da pochi, ma costruito da molti, insieme.
Ma per chi ti sta vicino e non conosce questo mondo, sei solo quello "fissato col computer".
Se poi – come me – sei anche in carrozzina, allora l’etichetta è servita:
"poverino, si rifugia lì perché non ha altro da fare."
E invece no.
Quello è il mio modo di essere utile.
È lì che metto le mie energie, le mie idee, la mia voglia di contribuire a qualcosa.
🤝 La rete a cui contribuisco nella costruzione è fatta di persone vere
Nel Fediverso ho trovato relazioni autentiche, collaborazione, ascolto.
Nel gestire server, istanze, spazi condivisi… ritrovo me stesso.
In un mondo che spesso ti fa sentire inutile, lì posso essere parte attiva.
Non serve camminare per muoversi nel mondo digitale.
Basta voler esserci davvero.
🙏 Non chiedo comprensione. Chiedo solo rispetto
Non tutti devono capire cosa faccio.
Ma almeno, non giudicatelo.
Non riducete tutto a "passatempi da nerd", a "roba da smanettoni".
Perché per me – e per tanti altri – questo è un modo di vivere, di partecipare, di resistere.
E se qualcuno là fuori si è mai sentito guardato "diverso" per quello che ama, voglio dirti: non sei solo.
Se ti ritrovi in queste parole, rispondi, condividi, racconta.
Perché non siamo pochi. Siamo solo troppo sparsi per farci sentire.
like this
reshared this
Ottimo lavoro, bravissima, la curiosità, è la nostra vera forza. 💪 Per quanto riguarda Qwant, ti allego un mio post. 🙏 goto.casasnow.noho.st/@snow/st…
nyarch
Nyarch Linux
Nyarch Linux is a (meme) linux distribution based on Arch Linux made for very degenerated weebs - Nyarch LinuxGitHub
like this
Qwant and Ecosia debut Staan, a European search index that aims to take on Big Tech
cross-posted from: lemmy.sdf.org/post/39942527
European search engines Qwant and Ecosia said on Wednesday that they have both started serving search queries through an index they developed together, Staan, which aims to be a cheaper, more privacy-focused alternative to Google and Bing.Last year, French privacy-focused search engine Qwant struck a joint venture with German non-profit search engine Ecosia, to develop a European search index. Called European Search Perspective (EUSP), the JV now aims to serve around 50% of French queries and 33% of German queries by the end of the year.
Qwant said it is using the new index to power some of its features, like AI summaries for search, and Ecosia has plans to add some AI features soon to its platform, too.
EUSP is also in talks with companies to spur the adoption of its index for enabling search within apps. Notably, it is targeting chatbots, presenting Staan as a cheaper alternative to Google and Bing.
“If you’re using ChatGPT or any other AI chatbot, they all do knowledge grounding with web search […] our index can power deep research and AI summary features. Google and Bing’s solutions are also pricey, and our index can offer power search features at a tenth of the cost,” Christian Kroll, CEO of Ecosia, told TechCrunch.
EUSP, like Proton, is pushing to develop a European tech stack that doesn’t rely on technology from the U.S. or China.
“The timing could not be more urgent. The outcome of the 2024 U.S. election has reminded European policymakers and innovators just how exposed Europe remains when it comes to core digital infrastructure. Much of Europe’s search, cloud, and AI layers are built on American Big Tech stacks, putting entire sectors – from journalism to climate tech – at the mercy of political or commercial agendas,” the companies said in a statement.
Kroll added that through this index, combined with European privacy laws, EUSP can offer a more privacy-friendly search solution as compared to its U.S. counterparts.
Qwant and Ecosia debut Staan, a European search index that aims to take on Big Tech | TechCrunch
European search engines Qwant and Ecosia said on Wednesday that they have both started serving search queries through an index they developed together, Staan, that aims to be a cheaper, more privacy-focused alternative to Google and Bing.Ivan Mehta (TechCrunch)
kravietz 🦇 likes this.
reshared this
Korkki
in reply to geneva_convenience • • •Pavidus
in reply to geneva_convenience • • •BlueÆther
in reply to geneva_convenience • • •Parola filtrata: nsfw
geneva_convenience
in reply to BlueÆther • • •Good catch. That's worth a seperate post.
Hexbear is on the list too.
marcie (she/her)
in reply to geneva_convenience • • •irelephant [he/him]
in reply to BlueÆther • • •BlueÆther
in reply to irelephant [he/him] • • •mindbleach
in reply to geneva_convenience • • •Vendetta9076
in reply to mindbleach • • •irelephant [he/him]
in reply to mindbleach • • •Jerry on PieFed
in reply to geneva_convenience • • •My Mastodon instance is on the list. I try hard to block them.
The problem with the list is that it's a target list, but not a list showing how much content, if any, they manage to process from any of the sites.
NutWrench
in reply to geneva_convenience • • •like this
geneva_convenience likes this.