Salta al contenuto principale


in reply to Carl

Oh you haven't seen it ... lucky you. Keep it that way. I am, indeed, telling the truth. I think at this point they're just like "well, we have a contract for x more movies, so, really, we can do whatever the hell we want ... let's see just how stupid we can get"


LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI


Meta has scraped data from the most-trafficked domains on the internet —including news organizations, education platforms, niche forums, personal blogs, and even revenge porn sites—to train its artificial intelligence models, according to a leaked list obtained by Drop Site News.

By scraping data from roughly 6 million unique websites, including 100,000 of the top-ranked domains, Meta has generated millions of pages of content to use for Meta’s AI-training pipeline.

The sites that Meta scrapes consist of copyrighted content, pirated content, and adult videos, some of whose content is potentially illegally obtained or recorded, as well as news and original content from prominent outlets and content publishers.

They include mainstream businesses like Getty Images, Shopify, Shutterstock, but also extreme pornographic content, including websites advertising explicit sexual content and humiliation porn that exploits teenagers.

Technology reshared this.

in reply to geneva_convenience

My Mastodon instance is on the list. I try hard to block them.

The problem with the list is that it's a target list, but not a list showing how much content, if any, they manage to process from any of the sites.

in reply to geneva_convenience

One person's "scraping" is another person's plagiarism.


More than 130,000 Claude, Grok, ChatGPT, and Other LLM Chats Readable on Archive.org


cross-posted from: piefed.social/post/1127664

Archive: archive.ph/2025.08.08-085040/4…



More than 130,000 Claude, Grok, ChatGPT, and Other LLM Chats Readable on Archive.org


A researcher has found that more than 130,000 conversations with AI chatbots including Claude, Grok, ChatGPT, and others are discoverable on the Internet Archive, highlighting how peoples’ interactions with LLMs may be publicly archived if users are not careful with the sharing settings they may enable.

The news follows earlier findings that Google was indexing ChatGPT conversations that users had set to share, despite potentially not understanding that these chats were now viewable by anyone, and not just those they intended to share the chats with. OpenAI had also not taken steps to ensure these conversations could be indexed by Google.

“I obtained URLs for: Grok, Mistral, Qwen, Claude, and Copilot,” the researcher, who goes by the handle dead1nfluence, told 404 Media. They also found material related to ChatGPT, but said “OpenAI has had the ChatGPT[.]com/share links removed it seems.” Searching on the Internet Archive now for ChatGPT share links does not return any results, while Grok results, for example, are still available.

Dead1nfluence wrote a blog post about some of their findings on Sunday and shared the list of more than 130,000 archived LLM chat links with 404 Media. They also shared some of the contents of those chats that they had scraped. Dead1nfluence wrote that they found API keys and other exposed information that could be useful to a hacker.
playlist.megaphone.fm?p=TBIEA2…
“While these providers do tell their users that the shared links are public to anyone, I think that most who have used this feature would not have expected that these links could be findable by anyone, and certainly not indexed and readily available for others to view,” dead1nfluence wrote in their blog post. “This could prove to be a very valuable data source for attackers and red teamers alike. With this, I can now search the dataset at any time for target companies to see if employees may have disclosed sensitive information by accident.”

404 Media verified some of dead1influence’s findings by discovering specific material they flagged in the dataset, then going to the still-public LLM link and checking the content.

💡
Do you know anything else about this? I would love to hear from you. Using a non-work device, you can message me securely on Signal at joseph.404 or send me an email at joseph@404media.co.

Most of the companies whose AI tools are included in the dataset did not respond to a request for comment. Microsoft which owns Copilot acknowledged a request for comment but didn't provide a response in time for publication. A spokesperson for Anthrophic, which owns Claude, told 404 Media: “We give people control over sharing their Claude conversations publicly, and in keeping with our privacy principles, we do not share chat directories or sitemaps with search engines like Google. These shareable links are not guessable or discoverable unless people choose to publicize them themselves. When someone shares a conversation, they are making that content publicly accessible, and like other public web content, it may be archived by third-party services. In our review of the sample archived conversations shared with us, these were either manually requested to be indexed by a person with access to the link or submitted by independent archivist organizations who discovered the URLs after they were published elsewhere across the internet first.” 404 Media only shared a small sample of the Claude links with Anthrophic, not the entire list.

Fast Company first reported that Google was indexing some ChatGPT conversations on July 30. This was because of a sharing feature ChatGPT had that allowed users to send a link to a ChatGPT conversation to someone else. OpenAI disabled the sharing feature in response. OpenAI CISO Dane Stuckey said in a previous statement sent to 404 Media: “This was a short-lived experiment to help people discover useful conversations. This feature required users to opt-in, first by picking a chat to share, then by clicking a checkbox for it to be shared with search engines.”

A researcher who requested anonymity gave 404 Media access to a dataset of nearly 100,000 ChatGPT conversations indexed on Google. 404 Media found those included the alleged texts of non-disclosure agreements, discussions of confidential contracts, and people trying to use ChatGPT for relationship issues.

Others also found that the Internet Archive contained archived LLM chats.


in reply to misk

Don’t ever use the “share” button on anything. Just don’t. Not ever.
in reply to DominusOfMegadeus

I mean, just assume everything you type online is public because, you know, it fucking is.
in reply to DominusOfMegadeus

I don't think that's what "sharing" refers to in this case. This is about users who did/didn't modify the settings of their chatbot to make their inputs and outputs publicly available via search. 404's previous reporting on ChatGPT suggested some users may not have understood what the sharing option actually meant in this context.
Questa voce è stata modificata (1 mese fa)
in reply to DominusOfMegadeus

Good thing I press “Reply” for this comment and not “Share”.


Finland Tops Nextcloud’s First Digital Sovereignty Index


Nextcloud checks about 50 open-source apps—file storage, groupware, chat/video, notes, project management, and so on. Each tool is weighted the same, and then the category scores are averaged into a single national figure. That design favors a balanced ecosystem over dominance in just one niche.

However, according to Nextcloud, the method favors SMEs and hobbyists—servers hidden behind firewalls, VPNs, or hosted by large enterprises don’t always show up—yet the index still offers a “pretty loud signal” about grassroots tech choices.



'This Verdict Is a Wake-Up Call:' Jury Trial Finds Meta Breached State Privacy Law in Class Action Against Fertility App | Law.com


archive.ph/Y6fZ6



Why is WebRTC enabled by default?


In about:config media.peerconnection.enabled is set to true by default which, by my understanding and that of tools like ipleak.net, means both VPN and home IP addresses will be exposed during useage on platforms like PeerTube.

Is this an oversight, is my understanding wrong, or is this intentional for some reason? Seems like the opposite of user expectation, particulary given the WebRTC settings option is hidden on librewolf.








Spain ombudsman probes town's ban on Muslim celebrations


Jumilla has banned religious events in public sporting spaces, which is seen as a veiled attempt to prevent Muslim gatherings. Local authorities said the move was to "promote and preserve the traditional values."


Archived version: archive.is/newest/dw.com/en/sp…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



US has 'no plans' to recognise Palestinian statehood, JD Vance says on visit to UK


The meeting comes amid debates between Washington and London about the best way to end the wars between Russia and Ukraine, as well as Israel and Hamas.


Archived version: archive.is/newest/euronews.com…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.

Questa voce è stata modificata (1 mese fa)


US | Someone keeps stealing, flying, fixing and returning this California man's plane. But why?


Someone has stolen Jason Hong's 1958 Cessna Skyhawk plane at least four times, taking the red single-engine plane for a joyride, and then returned it at airports in Southern California. Hong, and police, are baffled as to who, and why?



Sheinbaum rejects US ‘invasion’ after Trump orders military to target Mexico cartels


Mexico’s president says ‘there will be no invasion … it’s absolutely off the table’ after news reports of order


Archived version: archive.is/newest/theguardian.…


Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.



Media Capitulation Index: Who Owns the Media




Tried out a filter the other day...


I feel like I haven’t posted here in a while… basically I decided to take a break from drinking and thus home brewing for a bit. I want to get back into meme spirits, and I also want to make a 0 oxygen beer from ferment to filter to serving, but for now I made a berry wine for the girlfriend from some Aldi frozen fruit. This has been sitting in the fermenter on the fruit for a good 3 months (started it just before my break from alcohol) and then I moved it into a keg, no cold crashing or anything. I then ran it through a 5 micron filter and then a 1 micron filter just to see how it went, I gotta say, it turned out great. I was expecting the filter to clog but it went through like a champ. I also then wanted to try pasteurizing it in the keg using my mash and boil to see if I get some delicious glue and rubber in my mashing vessel but I didn’t! I back sweetened the wine and haven’t had any re-fermentation happen after a few weeks, so project fuck around and find out was a success. I might end up retrying a milk wine again (last one had a few tiny cheese curds floating in it that turned off most people from it) and I definitely want to try making a spicy imperial stout, but for now, I’ve gotta buy wine bottles or give out samples of this wine until my keg is empty.
in reply to Alexander

Appreciate the tip! I’ll have to give it a shot, are there any particular red wine yeasts you like?



ChatGPT will apologize for anything


#AII


Battlefield 6 requires secure boot to be enabled and active


I dual boot with win 11, I do so for programming purposes, not gaming. I read online that the game straight up blocks Linux on all fronts (typical EA). So, I booted into win 11 and launched the beta. It still refused to start and complained that secure boot was "disabled". Booted into BIOS and it was enabled, but not active. I had to reset the keys to the windows default keys to be able to play this game. This is a no go for me. Not giving them my money until they stop this bullshit. Just wanted to let everyone know the situation so far.
in reply to DonutsRMeh

Every time this franchise comes up I just find myself remembering all the fun I had with BF2 and 2142. I wanna play those again...

BF4 was actually pretty great fun too.

Now I'm just so over it.



What is the Present? A Debate on AI.


-- In a world where stealing is considered legal, is there at least something real and unique?

-- How can a tool originally created for control and greed save the world without taking away people's freedom and souls in return? Do you think the magic wand will be free?

-- You may end up like those people who believe in fairies if you continue to believe that AI does not pose a serious threat.

-- This post may be deleted in a few seconds, maybe later, but a reason will always be found, and if not, they will make one up on the fly.

-- Well, here is my favorite proverb about the bear: the bear does not negotiate with the bees, when buying honey, he takes and steals the entire hive and eats everything without a trace, and he really does not like it when the bees become impudent and try to hide the remains of the honey from him.



Genova: svelato il misterioso segnale captato dai radioamatori un anno fa


Dopo oltre un anno di analisi, indagini e confronti anche con esperti internazionali, l’Associazione Ricerca Italiana Aliena (A.R.I.A.), guidata dall’ufologo Angelo Maggioni, annuncia di aver risolto uno dei casi più misteriosi degli ultimi tempi: il segnale anomalo captato a Genova da un radioamatore nel febbraio 2024.

A supporto dell’inchiesta sono stati coinvolti vari consulenti, tra cui un esperto di effetti speciali e un ingegnere del suono che aveva individuato alcune anomalie nei dati. Fondamentale è stato anche il confronto con il SETI (Search for Extraterrestrial Intelligence), e in particolare con il dott. Graziano Chiaro dell’INAIF Milano (intervistato dalla stessa associazione qualche tempo fa) , referente per il SETI Italia. Fin da subito erano state avanzate due ipotesi: o si trattava di un segnale davvero anomalo… oppure di un’interferenza provocata da velivoli militari in alta quota.

Le più recenti informazioni confermano che in quei giorni erano attivi voli militari sopra il Nord Italia, probabilmente legati al conflitto in Ucraina e ai corridoi aerei utilizzati per missioni militari europee. Secondo quanto ricostruito, è molto probabile che il misterioso segnale si sia sovrapposto a una normale trasmissione tra radioamatori, creando un’anomalia solo apparente. «Non ci sono stati altri casi simili nelle stesse aree – da Loano a Genova, da La Spezia a Milano e Torino – nemmeno nei momenti in cui abbiamo registrato un picco di avvistamenti UFO tra giugno e luglio», spiega Angelo Maggioni. Tra questi, episodi degni di nota come l'avvistamento di un grande oggetto non identificato da parte di Nicolas P. a Genova, e un altro evento tra Ventimiglia e Nizza.

«Tutti questi elementi ci portano oggi a chiudere il caso: per noi, quel segnale ha un’origine spiegabile. Non c'è mistero, e non ha senso alimentare speculazioni inutili», precisa Maggioni. «A.R.I.A. lavora da sempre con serietà e rigore: evitiamo il sensazionalismo, perché non fa bene né alla ricerca né all’informazione».

L’associazione dichiara quindi ufficialmente declassato il caso da fenomeno anomalo a fenomeno identificato, prendendo le distanze da chi, ancora oggi, tenta di alimentare narrazioni esagerate e infondate.



Corri e basta? Nessun problema


Avrei potuto mettere il solito titolo :"Consigli da coach" ma anche no. Il consiglio che vorrei dare è molto semplice è strettamente legato a chi fa gare ma credo sia utile anche per chi corre senza nessun obiettivo ma solo per stare bene. Si tratta della periodizzazione, cosa vuol dire? Molto semplicemente chi sta preparando una gara di solito ha 4 fasi:
- Preparazione
- Carico
- Scarico
- Gara
In poche parole ci si prepara al carico *di sforzo che il corpo dovra ricevere. In termini di chilometri e di qualità delle uscite e poi si da il tempo al corpo di *recuperare, nella fase di scarico e poi per chi gareggia c'è la gara dove il corpo è pronto a sfoggiare le migliorire ricevute nelle fasi precedenti. Per chi non corre invece si avra un bel avanzamento di qualità nella corsa.-



Ah, sunshine...


Keep up the good memeing though... [img=https://community.nodebb.org/assets/uploads/files/1754665617840-0e9ff9ea-6cf0-4934-8160-76831236c48a-image.png]0e9ff9ea-6cf0-4934-8160-76831236c48a-image.png[/img]
Keep up the good memeing though...


Ah, sunshine.....


Keep up the good memeing though.
Keep up the good memeing though.




Proton is vibe coding some of its apps.


cross-posted from: lemmy.dbzer0.com/post/50693956

::: spoiler Transcript
A post by [object Object] (@zzt@mas.to) saying:
courtesy of @davidgerard@circumstances.run, Proton is now the only privacy vendor I know of that vibe codes its apps:
In the single most damning thing I can say about Proton in 2025, the Proton GitHub repository has a “cursorrules” file. They’re vibe-coding their public systems. Much secure!
I am once again begging anyone who will listen to get off of Proton as soon as reasonably possible, and to avoid their new (terrible) apps in any case. circumstances.run/@davidgerard…

It has a reply by the author saying:
in an unsurprising update for those familiar with how Proton operates, they silently rewrote their monorepo’s history to purge .cursor and hide that they were vibe coding: github.com/ProtonMail/WebClien…

given the utter lack of communication from Proton on this, I can only guess they’ve extracted .cursor into an external repository and continue to use it out of sight of the public
:::



Proton’s Lumo AI chatbot: not end-to-end encrypted, not open source

pivot-to-ai.com/2025/08/02/pro… - text
pivottoai.libsyn.com/20250802-… - podcast
youtube.com/watch?v=HDPZbUPUFy… - video


in reply to irelephant [he/him]

I dont see any problem with AI coding. It can be done without the editor supporting it by just asking for a function like please implement a sort function given a list of numbers.

Proton code is open source, so all AI agents have already read everything. You as user just have to do the code review, fix it and test. I am not seeing any problem here.

in reply to irelephant [he/him]

self-hosting email, text based clients and a deeper understanding of the protocol made me start to love email. I didn't think it was possible to love email.


MovieBox is Still Alive and Preparing to Fight Intellectual Property Thieves


Reports that the Nigerian Copyright Commission had recently shut down a pirate site didn't sound especially interesting. Operating under MovieBox branding, currently seen on endless domains, the local site reportedly received over 130 million visits in the previous three months and was actually still in business. Indeed, plans to develop the MovieBox brand began last month, with an application for intellectual property protection underpinning all kinds of business opportunities.


Video link posts or embedded self hosted video: how does federation of this content work?


Are video files cached or federated in any way?

I want to make posts that include video, and those videos I wish to upload on my own webserver to not rely on external links or expiration dates.

But I fear for bandwith, and I want to know if the videos will be cached on the instance or if every user will be a full web request of the video (that I can of course mitigate via good compression, and/or having a dedicated CDN that won't empty my pockets).

in reply to SSUPII

This depends on the instance of the video being used. Some instances clone a copy of the video, some use proxies, and some send the link directly to the user. I don't recommend it if you have limited bandwidth.
in reply to SSUPII

Videos are not stored in every server. Nobody would have been able to pay for the bills if that was the case.

The videos and images stay on the origin, and are fetched from the origin.

Afaik admins that enable the image proxy cache only the images, not videos.

Questa voce è stata modificata (1 mese fa)


This is just a perfect advertisement for Debian 😀


in reply to alexcleac

I'm starting to realize that advertising and ethical products don't mix.

We shouldn't be in a rush to be scumbags like our oppressors.

Great video, nonetheless.

Questa voce è stata modificata (1 mese fa)



Come un alieno.


👤 Quando parli di Linux, Fediverso, Privacy, ecc ... ti guardano strano

Ci sono momenti in cui ti accorgi che il mondo attorno a te non parla la tua lingua.
Non quella fatta di parole, ma quella fatta di passioni.
Quando dici "sto lavorando su un server", "gestisco un'istanza Fediverse", "mi piace la decentralizzazione", vedi subito gli sguardi cambiare.

Ti osservano come se stessi parlando in codice binario, come se stessi perdendo tempo in un mondo tutto tuo, inutile.
E invece no.
Quel mondo ha valore, senso, umanità, costruzione, appartenenza.

🧠 Non mi sto isolando: mi sto esprimendo

Quando scegli Linux, il software libero, il Fediverso, non lo fai per moda.
Lo fai perché credere nella libertà digitale oggi è un atto rivoluzionario.
Lo fai perché vuoi essere parte di qualcosa che non è controllato da pochi, ma costruito da molti, insieme.

Ma per chi ti sta vicino e non conosce questo mondo, sei solo quello "fissato col computer".
Se poi – come me – sei anche in carrozzina, allora l’etichetta è servita:
"poverino, si rifugia lì perché non ha altro da fare."

E invece no.
Quello è il mio modo di essere utile.
È lì che metto le mie energie, le mie idee, la mia voglia di contribuire a qualcosa.

🤝 La rete a cui contribuisco nella costruzione è fatta di persone vere

Nel Fediverso ho trovato relazioni autentiche, collaborazione, ascolto.
Nel gestire server, istanze, spazi condivisi… ritrovo me stesso.
In un mondo che spesso ti fa sentire inutile, lì posso essere parte attiva.

Non serve camminare per muoversi nel mondo digitale.
Basta voler esserci davvero.

🙏 Non chiedo comprensione. Chiedo solo rispetto

Non tutti devono capire cosa faccio.
Ma almeno, non giudicatelo.
Non riducete tutto a "passatempi da nerd", a "roba da smanettoni".
Perché per me – e per tanti altri – questo è un modo di vivere, di partecipare, di resistere.

E se qualcuno là fuori si è mai sentito guardato "diverso" per quello che ama, voglio dirti: non sei solo.

Se ti ritrovi in queste parole, rispondi, condividi, racconta.
Perché non siamo pochi. Siamo solo troppo sparsi per farci sentire.

Unknown parent

lemmy - Collegamento all'originale
Snow Lemmy

Ottimo lavoro, bravissima, la curiosità, è la nostra vera forza. 💪 Per quanto riguarda Qwant, ti allego un mio post. 🙏 goto.casasnow.noho.st/@snow/st…


🔍 Qwant o SearXNG? Ecco il dilemma! 😏

Da una parte c’è Qwant: elegante, europeo, semplice da usare... ma con un piccolo segreto: per anni ha preso in prestito i risultati da Bing.
Negli ultimi tempi sta cercando di diventare più indipendente (anche grazie a Ecosia), ma il suo codice resta chiuso e un po’ misterioso. 🤫

Dall’altra parte c’è SearXNG:
💻 open-source, trasparente, senza tracking, personalizzabile al 100% e, se vuoi, pure ospitabile sul tuo server.
Nessuna pubblicità invasiva, nessuna azienda curiosa a frugare tra le tue ricerche… insomma: la vera privacy è qui. 🚀


📊 Confronto rapido

Privacy

  • Qwant: Buona, ma con tracce di Bing e CNIL (2025)
  • SearXNG: Ottima, nessun tracking, anonimato elevato

Trasparenza

  • Qwant: Codice proprietario
  • SearXNG: Open-source e configurazioni visibili

Autonomia

  • Qwant: In crescita (progetto EUSP)
  • SearXNG: Totale, istanze autogestite

Facilità d’uso

  • Qwant: Immediato e semplice
  • SearXNG: Richiede configurazione o uso di istanze pubbliche

📌 Conclusione?
Se vuoi qualcosa di pronto e immediato → Qwant.
Se invece la privacy per te non è uno slogan ma un requisito, SearXNG è il tuo migliore amico (anche se dovrai sporcarti un po’ le mani). 😉


Unknown parent

lemmy - Collegamento all'originale
Snow Lemmy
😅 Ok, ricevuto. 🤗

in reply to Sunshine (she/her)

By the way, if you're using software that supports following users(Like MBin), you can follow them @ecosia@mastodon.social. Their last post seems to be from a month ago, so I hope it's not abounded


nyarch


github.com/NyarchLinux

instagram.com/p/DND4OXMBh8-/




Qwant and Ecosia debut Staan, a European search index that aims to take on Big Tech


cross-posted from: lemmy.sdf.org/post/39942527

European search engines Qwant and Ecosia said on Wednesday that they have both started serving search queries through an index they developed together, Staan, which aims to be a cheaper, more privacy-focused alternative to Google and Bing.

Last year, French privacy-focused search engine Qwant struck a joint venture with German non-profit search engine Ecosia, to develop a European search index. Called European Search Perspective (EUSP), the JV now aims to serve around 50% of French queries and 33% of German queries by the end of the year.

Qwant said it is using the new index to power some of its features, like AI summaries for search, and Ecosia has plans to add some AI features soon to its platform, too.

EUSP is also in talks with companies to spur the adoption of its index for enabling search within apps. Notably, it is targeting chatbots, presenting Staan as a cheaper alternative to Google and Bing.

“If you’re using ChatGPT or any other AI chatbot, they all do knowledge grounding with web search […] our index can power deep research and AI summary features. Google and Bing’s solutions are also pricey, and our index can offer power search features at a tenth of the cost,” Christian Kroll, CEO of Ecosia, told TechCrunch.

EUSP, like Proton, is pushing to develop a European tech stack that doesn’t rely on technology from the U.S. or China.

“The timing could not be more urgent. The outcome of the 2024 U.S. election has reminded European policymakers and innovators just how exposed Europe remains when it comes to core digital infrastructure. Much of Europe’s search, cloud, and AI layers are built on American Big Tech stacks, putting entire sectors – from journalism to climate tech – at the mercy of political or commercial agendas,” the companies said in a statement.

Kroll added that through this index, combined with European privacy laws, EUSP can offer a more privacy-friendly search solution as compared to its U.S. counterparts.

reshared this