Salta al contenuto principale


Mark Russo reported the dataset to all the right organizations, but still couldn't get into his accounts for months.

Mark Russo reported the dataset to all the right organizations, but still couldnx27;t get into his accounts for months.#News #AI #Google


A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It


Google suspended a mobile app developer’s accounts after he uploaded AI training data to his Google Drive. Unbeknownst to him, the widely used dataset, which is cited in a number of academic papers and distributed via an academic file sharing site, contained child sexual abuse material. The developer reported the dataset to a child safety organization, which eventually resulted in the dataset’s removal, but he claims Google’s has been "devastating.”

A message from Google said his account “has content that involves a child being sexually abused or exploited. This is a severe violation of Google's policies and might be illegal.”

The incident shows how AI training data, which is collected by indiscriminately scraping the internet, can impact people who use it without realizing it contains illegal images. The incident also shows how hard it is to identify harmful images in training data composed of millions of images, which in this case were only discovered accidentally by a lone developer who tripped Google’s automated moderation tools.

💡
Have you discovered harmful materials in AI training data ? I would love to hear from you. Using a non-work device, you can message me securely on Signal at @emanuel.404‬. Otherwise, send me an email at emanuel@404media.co.

In October, I wrote about the NudeNet dataset, which contains more than 700,000 images scraped from the internet, and which is used to train AI image classifiers to automatically detect nudity. The Canadian Centre for Child Protection (C3P) said it found more than 120 images of identified or known victims of CSAM in the dataset, including nearly 70 images focused on the genital or anal area of children who are confirmed or appear to be pre-pubescent. “In some cases, images depicting sexual or abusive acts involving children and teenagers such as fellatio or penile-vaginal penetration,” C3P said.

In October, Lloyd Richardson, C3P's director of technology, told me that the organization decided to investigate the NudeNet training data after getting a tip from an individual via its cyber tipline that it might contain CSAM. After I published that story, a developer named Mark Russo contacted me to say that he’s the individual who tipped C3P, but that he’s still suffering the consequences of his discovery.

Russo, an independent developer, told me he was working on an on-device NSFW image detector. The app runs locally and can detect images locally so the content stays private. To benchmark his tool, Russo used NudeNet, a publicly available dataset that’s cited in a number of academic papers about content moderation. Russo unzipped the dataset into his Google Drive. Shortly after, his Google account was suspended for “inappropriate material.”

On July 31, Russo lost access to all the services associated with his Google account, including his Gmail of 14 years, Firebase, the platform that serves as the backend for his apps, AdMob, the mobile app monetization platform, and Google Cloud.

“This wasn’t just disruptive — it was devastating. I rely on these tools to develop, monitor, and maintain my apps,” Russo wrote on his personal blog. “With no access, I’m flying blind.”

Russo filed an appeal of Google’s decision the same day, explaining that the images came from NudeNet, which he believed was a reputable research dataset with only adult content. Google acknowledged the appeal, but upheld its suspension, and rejected a second appeal as well. He is still locked out of his Google account and the Google services associated with it.

Russo also contacted the National Center for Missing & Exploited Children (NCMEC) and C3P. C3P investigated the dataset, found CSAM, and notified Academic Torrents, where the NudeNet dataset was hosted, which removed it.

As C3P noted at the time, NudeNet was cited or used by more than 250 academic works. A non-exhaustive review of 50 of those academic projects found 134 made use of the NudeNet dataset, and 29 relied on the NudeNet classifier or model. But Russo is the only developer we know about who was banned for using it, and the only one who reported it to an organization that investigated that dataset and led to its removal.

After I reached out for comment, Google investigated Russo’s account again and reinstated it.

“Google is committed to fighting the spread of CSAM and we have robust protections against the dissemination of this type of content,” a Google spokesperson told me in an email. “In this case, while CSAM was detected in the user account, the review should have determined that the user's upload was non-malicious. The account in question has been reinstated, and we are committed to continuously improving our processes.”

“I understand I’m just an independent developer—the kind of person Google doesn’t care about,” Russo told me. “But that’s exactly why this story matters. It’s not just about me losing access; it’s about how the same systems that claim to fight abuse are silencing legitimate research and innovation through opaque automation [...]I tried to do the right thing — and I was punished.”




Instagram is generating headlines for Instagram posts that appear on Google Search results. Users say they are misrepresenting them.#News #AI #Instagram #Google


Instagram Is Generating Inaccurate SEO Bait for Your Posts


Instagram is generating headlines for users’ Instagram posts without their knowledge, seemingly in an attempt to get those posts to rank higher in Google Search results.

I first noticed Instagram-generated headlines thanks to a Bluesky post from the author Jeff VanderMeer. Last week, VanderMeer posted a video to Instagram of a bunny eating a banana. VanderMeer didn’t include a caption or comment with the post, but noticed that it appeared in Google Search results with the following headline: “Meet the Bunny Who Loves Eating Bananas, A Nutritious Snack For Your Pet.”

Jeff VanderMeer (@jeffvandermeer.bsky.social)
This post requires authentication to view.
Bluesky Social

Another Instagram post from the Groton Public Library in Massachusetts—an image of VanderMeer’s Annihilation book cover promoting a group reading—also didn’t include a caption or comment, but appears on Google Search results with the following headline “Join Jeff VanderMeer on a Thrilling Beachside Adventure with Mesta …”

Jeff VanderMeer (@jeffvandermeer.bsky.social)
This post requires authentication to view.
Bluesky Social

I’ve confirmed that Instagram is generating headlines in a similar style for other users without their knowledge. One cosplayer who wished to remain anonymous posted a video of herself showing off costumes in various locations. The same post appeared on Google with a headline about discovering real-life locations to do cosplaying in Seattle. This Instagram mentioned the city in a hashtag but did not write anything resembling that headline.

Google told me that it is not generating the headlines, and that it’s pulling the text directly from Instagram.

Meta told me in an email that it recently began using AI to generate titles for posts that appear in search engine results, and that this helps people better understand the content. Meta said that, as with all AI-generated content, the titles are not always accurate. Meta also linked me to this Help Center article to explain how users can turn of search engine indexing for their posts.

After this article was published, several readers reached out to note that other platforms, like TikTok and LinkedIn, also generate SEO headlines for users' posts.

“I hate it,” VanderMeer told me in an email. “If I post content, I want to be the one contextualizing it, not some third party. It's especially bad because they're using the most click-bait style of headline generation, which is antithetical to how I try to be on social—which is absolutely NOT calculated, but organic, humorous, and sincere. Then you add in that this is likely an automated AI process, which means unintentionally contributing to theft and a junk industry, and that the headlines are often inaccurate and the summary descriptions below the headline even worse... basically, your post through search results becomes shitty spam.”

“I would not write mediocre text like that and it sounds as if it was auto-generated at-scale with an LLM. This becomes problematic when the headline or description advertises someone in a way that is not how they would personally describe themselves,” Brian Dang, another cosplayer who goes by @mrdangphotos and noticed Instagram generated headlines for his posts, told me. We don’t know how exactly Instagram is generating these headlines.

By using Google's Rich Result Test tool, which shows what Google sees for any site, I saw that these headlines appeared under the <title></title> tags for those post’s Instagram pages.

“It appears that Instagram is only serving that title to Google (and perhaps other search bots),” Jon Henshaw, a search engine optimization (SEO) expert and editor of Coywolf, told me in an email. “I couldn't find any reference to it in the pre-rendered or rendered HTML in Chrome Dev Tools as a regular visitor on my home network. It does appear like Instagram is generating titles and doing it explicitly for search engines.”

When I looked at the code for these pages, I saw that Instagram was also generating long descriptions for posts without the user’s knowledge, like: “Seattle’s cosplay photography is a treasure trove of inspiration for fans of the genre. Check out these real-life cosplay locations and photos taken by @mrdangphotos. From costumes to locations, get the scoop on how to recreate these looks and capture your own cosplay moments in Seattle.”

Neither the generated headlines or the descriptions are the alternative text (alt text) that Instagram automatically generates for accessibility reasons. To create alt text, Instagram uses computer vision and artificial intelligence to automatically create a description of the image that people who are blind or have low-vision can access with a screen reader. Sometimes the alt text Instagram generates appears under the headline in Google Search results. At other times, generated description copy that is not the alt text appears in the same place. We don’t know how exactly Instagram is creating these headlines, but it could use similar technology.

“The larger implications are terrible—search results could show inaccurate results that are reputationally damaging or promulgating a falsehood that actively harms someone who doesn't drill down,” VanderMeer said. “And we all know we live in a world where often people are just reading the headline and first couple of paragraphs of an article, so it's possible something could go viral based on a factual misunderstanding.”

Update: This article was update with comment with Meta.




Google is hosting a CBP app that uses facial recognition to identify immigrants, while simultaneously removing apps that report the location of ICE officials because Google sees ICE as a vulnerable group. “It is time to choose sides; fascism or morality? Big tech has made their choice.”#Google #ICE #News


Google Has Chosen a Side in Trump's Mass Deportation Effort


Google is hosting a Customs and Border Protection (CBP) app that uses facial recognition to identify immigrants, and tell local cops whether to contact ICE about the person, while simultaneously removing apps designed to warn local communities about the presence of ICE officials. ICE-spotting app developers tell 404 Media the decision to host CBP’s new app, and Google’s description of ICE officials as a vulnerable group in need of protection, shows that Google has made a choice on which side to support during the Trump administration’s violent mass deportation effort.

Google removed certain apps used to report sightings of ICE officials, and “then they immediately turned around and approved an app that helps the government unconstitutionally target an actual vulnerable group. That's inexcusable,” Mark, the creator of Eyes Up, an app that aims to preserve and map evidence of ICE abuses, said. 404 Media only used the creator’s first name to protect them from retaliation. Their app is currently available on the Google Play Store, but Apple removed it from the App Store.

“Google wanted to ‘not be evil’ back in the day. Well, they're evil now,” Mark added.

💡
Do you know anything else about Google's decision? I would love to hear from you. Using a non-work device, you can message me securely on Signal at joseph.404 or send me an email at joseph@404media.co.

The CBP app, called Mobile Identify and launched last week, is for local and state law enforcement agencies that are part of an ICE program that grants them certain immigration-related powers. The 287(g) Task Force Model (TFM) program allows those local officers to make immigration arrests during routine police enforcement, and “essentially turns police officers into ICE agents,” according to the New York Civil Liberties Union (NYCLU). At the time of writing, ICE has TFM agreements with 596 agencies in 34 states, according to ICE’s website.

This post is for subscribers only


Become a member to get access to all content
Subscribe now




Ahead of the European Union's Regulation on Transparency and Targeting of Political Advertising, Google's Ad Transparency Center no longer shows political ads from any countries in the EU.

Ahead of the European Unionx27;s Regulation on Transparency and Targeting of Political Advertising, Googlex27;s Ad Transparency Center no longer shows political ads from any countries in the EU.#advertising #Google



An example of AI attempting to summarizing nuanced reviewed of Hitler's Nazi manifesto turned into an example of algorithms eating themselves.

An example of AI attempting to summarizing nuanced reviewed of Hitlerx27;s Nazi manifesto turned into an example of algorithms eating themselves.#AI #Amazon #Google



Google's AI search feature is telling people searching for answers about using vibrating sex toys during pregnancy that they should consider it for counseling children, instead.

Googlex27;s AI search feature is telling people searching for answers about using vibrating sex toys during pregnancy that they should consider it for counseling children, instead.#Google #AI #aioverview