Allegiant Pilots Prepare For 'No Confidence' Vote Against Airline Leadership
Allegiant Air flight crews are looking to oust company leadership. Here's why.
Air Canada Resuming Flights, As Flight Attendant Strike Ends
Air Canada will be resuming operations following a flight attendant strike that lasted nearly four days. Here's what to expect.
Xbox Is Investing In AI For Their Next Gen Console
Microsoft VP of Next Generation has revealed that Xbox is investing heavily in AI, as much as rendering tech and their AMD chips.
Nvidia Claims GeForce Now Outperforms PS5 Pro Now
Nvidia has revealed a huge technical upgrade to GeForce Now cloud streaming, showing how it outperforms PS5 Pro on several fronts.
The AI company Perplexity is complaining their plagiarism bot machine cannot bypass Cloudflare's firewall
Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants
Perplexity defends its AI assistants against Cloudflare's claims, arguing that they are not web crawlers but user-triggered agents.Roger Montti (Search Engine Journal)
like this
Helge Schneider – „The Klimperclown“ (2025)
Alles richtig gemacht! Was soll ich sonst schreiben, über den Geburtstagsfilm, den der SWR dem größten Mülheimer Genie der Gegenwart im Auftrag der ARD hat widmen lassen? Angesichts der Unmöglichkeit der gestellten Aufgabe, haben sie dort glücklicherweise kollektiv entschieden, den Künstler das Werk doch lieber selbst anfertigen zu lassen, bevor die Sendeanstalt sich der Peinlichkeit einer weiteren öffentlich-rechtlichen Hagiographie die dann doch nicht mehr als ein Recycling alter Talkshows und Sketche geworden wäre. (ARD, Neu!)
Helge Schneider - "The Klimperclown" (2025)
Alles richtig gemacht! Was soll ich sonst schreiben, über den Geburtstagsfilm, den der SWR dem größten Mülheimer Genie der Gegenwart im Auftrag der ARD hat widmen lassen? Angesichts der Unmöglichkeit der gestellten Aufgabe, haben sie dort glücklicher…NexxtPress
Materiali pragmatici (corsi, tutorial, etc.) per imparare a fare il reporting per il CSRD?
Provo a chiedere qui dove probabilmente molti di voi sono interessati all'argomento.
Siete a conoscenza di buone risorse, in Italiano o in Inglese, per imparare a riportare dati non finanziari secondo la direttiva UE CSRD per il reporting non-finanziario (incluso ambientale) in ambito EEA?
Sono alla ricerca di corsi, tutorial e altri materiali VERAMENTE informativi.
Cercando online ho solo trovato una pletora di articoli scritti con ChatGPT che non vanno a parare da nessuna parte.
Hiding secret codes in light protects against fake videos
Hiding secret codes in light protects against fake videos | Cornell Chronicle
A team of Cornell computer science researchers has developed a way to “watermark” light in videos, which they can use to detect if video is fake or has been manipulated, another potential tool in the fight against misinformation.Cornell Chronicle
Intel Outside: Hacking every Intel employee and various internal websites
Intel Outside: Hacking every Intel employee and various internal websites
Hardcoded credentials, pointless encryption, and generous APIs exposed details of every employee and made it possible to break into internal websites.Eaton (eaton-works.com)
Intel Outside: Hacking every Intel employee and various internal websites
Intel Outside: Hacking every Intel employee and various internal websites
Hardcoded credentials, pointless encryption, and generous APIs exposed details of every employee and made it possible to break into internal websites.Eaton (eaton-works.com)
Softbank bets $2 billion on Intel having a future
Takes two percent stake as rumours swirl Uncle Sam could do something similar
Softbank bets $2 billion on Intel having a future
: Takes two percent stake as rumours swirl Uncle Sam could do something similarSimon Sharwood (The Register)
US FTC sues ticket reseller for evading Taylor Swift's Eras tour ticket limits
The U.S. Federal Trade Commission sued ticket reseller Key Investment Group for evading purchasing limits to buy up thousands of tickets to live events including Taylor Swift's Eras tour and resell them at a markup, according to a complaint filed in Maryland federal court on Monday.
2 dead, 5 injured after train hits railway workers in Cheongdo in South Korea
They were conducting post-flood safety inspections when the incident occurred.
Archived version: archive.is/newest/straitstimes…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
2 dead, 5 injured after train hits railway workers in Cheongdo in South Korea
They were conducting post-flood safety inspections when the incident occurred. Read more at straitstimes.com.ST
Microsoft could be working on a cheaper Xbox Cloud Gaming plan
Microsoft is reportedly considering an Xbox Cloud Gaming-only subscription tier that facilitates gamers who don't own an Xbox console at all.
https://www.neowin.net/news/microsoft-could-be-working-on-a-cheaper-xbox-cloud-gaming-plan/
UK backs down in Apple privacy row, US says
UK authorities have demanded access to Apple users' protected files when required for investigations.
Natural compound found in popular hot drink could protect brain against Alzheimer’s
Findings point to promising strategies to rescue nerves in the brain’s hippocampus from ageing and Alzheimer’s
Archived version: archive.is/newest/independent.…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
India's Modi to meet China's top diplomat as Asian powers rebuild ties
Indian Prime Minister Narendra Modi will meet China's top diplomat, signaling easing tensions between the two nations.
Archived version: archive.is/newest/apnews.com/a…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
UK | Government must stop children using VPNs to dodge age checks on porn sites, commissioner demands
Disclaimer: The article is sponsored by Proton.
Dame Rachel de Souza warns it is ‘absolutely a loophole that needs closing’ as new report finds proportion of children who report accessing pornography online has increased over last two years
Syria: Israel army raids homes in Quneitra countryside
Syrian media reported on Monday that Israeli forces carried out a raid inside the village of Ain Ziwan in southern Quneitra countryside, searching several civilian homes.
Archived version: archive.is/newest/middleeastmo…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Firefox advances privacy for Chinese, Japanese, and Korean users
Mozilla has begun rolling out Chinese, Japanese, and Korean translation support in Firefox. It's privacy-friendly and even works offline.
https://www.neowin.net/news/firefox-advances-privacy-for-chinese-japanese-and-korean-users/
Fitik likes this.
Norwegian wealth fund excludes 6 Israeli companies from its portfolio
The Norwegian sovereign wealth fund, the largest in the world, said on Monday that it has decided to exclude six Israeli companies from its investment portfolio.
Archived version: archive.is/newest/middleeastmo…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Technos Media: Your Gateway to Innovation and Insights
Flight attendant union leaders ‘ready to go to jail’ as Air Canada strike outlawed
Arbitrator orders 10,000 striking staff back to work after government intervenes – unconstitutionally, union says
Archived version: archive.is/20250818202852/theg…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Air Canada and flight attendants union resume talks for the first time since strike began
Air Canada and the union representing 10,000 flight attendants have resumed talks for the first time since their strike began three days ago.
CUPE pushes on with Air Canada strike in defiance of order, says defending rights
It's still not clear when Air Canada might start flying again as the flight attendants union resolved Monday to push on with a strike but the airline
CUPE pushes on with Air Canada strike in defiance of order, says defending rights - Wings Magazine
It's still not clear when Air Canada might start flying again as the flight attendants union resolved Monday to push on with a strike but the airlineWings Staff (Wings Magazine)
Trump interrupts talks with European leaders to call Putin, says EU diplomat
U.S. President Donald Trump has interrupted his talks in Washington with European leaders to call Russian President Vladimir Putin, an EU diplomat told Reuters on Monday.
Archived version: archive.is/20250818230157/reut…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Japan's 1st osmotic power plant begins operating in Fukuoka
Japan's first osmotic power plant that uses the difference in salt concentration between seawater and fresh water to generate electricity began operations in early August in a southwestern prefecture.
Australia | Sydney developer gets 'slap on the wrist' after illegally clearing hundreds of trees to build $3 million mansion
The details of property developer Amir Abu Abara's lengthy court case can only now be revealed — five years after he illegally cleared land at Barden Ridge in Sydney's south.
Archived version: archive.is/20250818005618/abc.…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
ABC News
ABC News provides the latest news and headlines in Australia and around the world.Ethan Rix (Australian Broadcasting Corporation)
‘Pray for rain’: wildfires in Canada are now burning where they never used to
Canada’s response to the extreme weather threat is being upended as the traditional epicentre of the blazes shifts as the climate warms
Archived version: archive.is/20250818190521/theg…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Piracy surges as streaming costs drive viewers away
cross-posted from: programming.dev/post/35892866
::: spoiler Comments
- Reddit.
:::Republished here, as AI content is in the Public Domain. References are available in the original article.
Frustrated by rising subscription costs and fragmented content availability, viewers worldwide are returning to piracy at unprecedented levels, reversing years of progress made by affordable streaming services. Recent data from London-based monitoring firm MUSO shows piracy visits skyrocketed from 130 billion in 2020 to 216 billion by 2024, with the industry facing projected losses exceeding $113 billion.
Subscription Fatigue Drives Digital Exodus
The streaming landscape has transformed from Netflix's early promise of "everything in one place" into what critics call "Cable 2.0"—a fractured ecosystem requiring multiple subscriptions. According to The Guardian, the average European household now spends close to €700 annually on three or more video-on-demand subscriptions. With Netflix's standard plan reaching $15.49 monthly and competitors following suit, consumers are increasingly viewing piracy as a rational alternative."Piracy is not a pricing issue, it's a service issue," Valve co-founder Gabe Newell observed in 2011—a prediction that appears prophetic as streaming platforms struggle with content fragmentation and rising prices. In Sweden, birthplace of both Spotify and The Pirate Bay, 25% of people surveyed admitted to pirating content in 2024, predominantly driven by those aged 15 to 24.
Content Wars Create Consumer Casualties
The fragmentation crisis has worsened as studios create exclusive content silos. Viewers face scenarios where favorite shows vanish from one platform only to appear on another, or require separate purchases despite existing subscriptions. Even purchased content can become unavailable due to licensing disputes, prompting consumer lawsuits against platforms like Amazon Prime Video.MUSO data reveals that unlicensed streaming now accounts for 96% of all TV and film piracy, representing a fundamental shift in how content theft occurs. Modern pirates leverage sophisticated tools including AI-driven search engines and encrypted networks that adapt faster than anti-piracy measures can respond.
Industry Scrambles for Solutions
Streaming executives are experimenting with bundled offerings and cracking down on password sharing, but these measures often backfire by further alienating users. According to Antenna research, one-quarter of U.S. streamers are "chronic churners," frequently canceling subscriptions due to cost and frustration.The resurgence marks a stark reversal from the mid-2010s when convenient, affordable streaming services nearly eliminated piracy. As one industry analyst noted, studios have created "artificial scarcity in a digital world that promised abundance", suggesting that without addressing core affordability and access issues, the piracy revival may continue reshaping entertainment consumption patterns.
like this
Alleged Nintendo Switch 2 Emulator "Maxim" Boots Mario Kart World
cross-posted from: programming.dev/post/35909134
::: spoiler Comments
- Reddit.
:::Source: Maxim Emulator Tweet on Twitter.
like this
Hell yeah.
I didn't buy a switch 1 and have no plans on buying a switch 2. Spent many hours playing Smash on Yuzu though!
My last console was a PS3, because that's the last one that didn't force me to pay an extra fee to use my own internet connection.
Fuck greed. Fuck useful idiots. Fuck nintendo.
UN debates future withdrawal of Lebanon peacekeeping force
The United Nations Security Council began to debate on Monday a resolution drafted by France to extend the UN peacekeeping force in south Lebanon for a year with the ultimate aim of withdrawing it.
Archived version: archive.is/newest/middleeastey…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
AI Is a Mass-Delusion Event
::: spoiler Disable JavaScript to Access.
1. Open Chrome Settings: Click the three-dot menu (Customize and control Google Chrome) in the top-right corner and select "Settings".
2. Navigate to Site Settings: Go to "Privacy and security" and then click on "Site settings".
3. Find JavaScript Settings: Scroll down to the "Content" section and click on "JavaScript".
4. Disable JavaScript: Toggle the switch to "Don't allow sites to use JavaScript".
:::
::: spoiler Comments
- Lobesters
- Hackernews.
:::
Senior Israeli official flees US following arrest over paedophilia
Tom Alexandrovich was arrested along with seven other suspects in a sting targeting 'child sex predators'
Archived version: archive.is/20250817170251/midd…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Senior Israeli official flees US following arrest over paedophilia
Israeli media has reported that a senior official in the Israel National Cyber Directorate was arrested in Las Vegas on suspicion of online paedophilia.Nadav Rapaport (Middle East Eye)
copymyjalopy likes this.
kittenzrulz123
in reply to Davriellelouna • • •null
in reply to Davriellelouna • • •Sturgist
in reply to null • • •💁u
Here, you dropped this!
Wispy2891
in reply to Davriellelouna • • •WolfLink
in reply to Davriellelouna • • •pyre
in reply to WolfLink • • •int32
in reply to pyre • • •pressanykeynow
in reply to int32 • • •int32
in reply to pressanykeynow • • •pressanykeynow
in reply to int32 • • •int32
in reply to pressanykeynow • • •then make all links to your website link to that snapshot, and turn your server off.
pressanykeynow
in reply to int32 • • •int32
in reply to pressanykeynow • • •turmoil
in reply to int32 • • •int32
in reply to turmoil • • •sorry archive.org, I promise I'll donate ❤️
Block Cloudflare MITM Attack – Get this Extension for 🦊 Firefox (en-US)
addons.mozilla.orgoppy1984
in reply to pyre • • •ubergeek
in reply to oppy1984 • • •JcbAzPx
in reply to ubergeek • • •oppy1984
in reply to ubergeek • • •I get the centralization concerns, but I would think that's on the consumer since there are other options. As for the fascist content, as another commenter said, they could risk their safe harbor if they started stated regulating content that they weren't legally required to regulate.
Just my thoughts.
TheGrandNagus
in reply to Davriellelouna • • •sunbeam60
in reply to TheGrandNagus • • •They’re not. They’re using this as an excuse to become paid gatekeepers of the internet as we know it. All that’s happening is that Cloudflare is using this to menuever into position where they can say “nice traffic you’ve got there - would be a shame if something happened to it”.
AI companies are crap.
What Cloudflare is doing here is also crap.
And we’re cheering it on.
DreamlandLividity
in reply to TheGrandNagus • • •floquant
in reply to Davriellelouna • • •BigFig
in reply to floquant • • •Tollana1234567
in reply to BigFig • • •☂️-
in reply to Tollana1234567 • • •scarabic
in reply to BigFig • • •panda_abyss
in reply to Davriellelouna • • •I actually agree with them
This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.
There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.
I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.
(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
drspod
in reply to panda_abyss • • •panda_abyss
in reply to drspod • • •AstralPath
in reply to panda_abyss • • •Pennomi
in reply to AstralPath • • •Tollana1234567
in reply to Pennomi • • •kopasz7
in reply to panda_abyss • • •Search engines been going relatively fine for decades now. But the crawlers from AI companies basically DDOS hosts in comparison, sending so many requests in such a short interval. Crawling dynamic links as well that are expensive to render compared to a static page, ignoring the robots.txt entirely, or even using it discover unlinked pages.
Servers have finite resources, especially self hosted sites, while AI companies have disproportinately more at their disposal, easily grinding other systems to a halt by overwhelming them with requests.
Tollana1234567
in reply to kopasz7 • • •Ekybio
in reply to Davriellelouna • • •panda_abyss
in reply to Ekybio • • •Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
very_well_lost
in reply to panda_abyss • • •I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
Tollana1234567
in reply to very_well_lost • • •BetaDoggo_
in reply to Ekybio • • •Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.
Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
lividweasel
in reply to BetaDoggo_ • • •That seems almost maliciously stupid. We need to train a new model. Hey, where’d the data go? Oh well, let’s just go scrape it all again. Wait, did we already scrape this site? No idea, let’s scrape it again just to be sure.
rdri
in reply to lividweasel • • •ubergeek
in reply to rdri • • •I think it boils down to "consent" and "remuneration".
I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.
So, these LLM startups ignore both consent, and the idea of remuneration.
Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
rdri
in reply to ubergeek • • •That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?
Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.
Would you expect a compensation from me after reading your comment?
ubergeek
in reply to rdri • • •It already has been captured, properly in law, in most places. We can use the US as an example: Both intellectual property and real property have laws already that cover these very items.
Well, does a user burn up gigawatts of power, to access my site every time? That's a huge different.
Depends on the terms of service I set for that service.
Sure!
Sure! As long as it doesn't cause problems for me, the creator and hoster of said content.
See above. Both power usage and causing problems for me.
No. I said, I do not want my content and services to be used by and for LLMs.
I have now. And should a user want to use that service, that service, which charges 8.99/month for it needs to pay me a portion of that, or risk having their service blocked.
There no need to use it, as I already provide RSS feeds for my content. Use the RSS feed, if you want updates.
Or, I can just block them, via a service like Cloud Flare. Which I do.
None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.
rdri
in reply to ubergeek • • •And it causes a lot of trouble to many people and pains me specifically. Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions. License expiration causing digital work to die out, DRM causing software to break, idiotic license owners not providing appropriate service, etc.
Doing a GET request doesn't do that.
What kind of problems that would be?
?? How? And what?
You have to agree that at one point "be used by LLM" would not be different from "be used by a user".
It's self-hosted and free.
How does that prohibit usage and processing of your info? That sounds like "I won't be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com"
That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.
What if I use my local hosted LLM? Anyway, the point is, selling text can't work well, and you're going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.
Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.
I'm all for killing off LLMs, btw. Concerns of site makers who think they are being damaged by things like Perplexity are nothing compared to what LLMs do to the world. Maybe laws should instead make it illegal to waste energy. Before energy becomes the main currency.
ubergeek
in reply to rdri • • •Then you don't believe content creators should have any control over their own works?
The "proper conditions" are deemed by the content creator, not the consumers.
Not at all. It consumes at most, a watt.
Increasing my hosting bill, to accommodate the senseless traffic being sent my way?
Outages for my site, making my content unavailable for legitimate users?
Not at all. LLMs are not users.
If you want, or they charge for the hosted version. If they want to use a paid for version, then they can divert some of that revenue to me, the creator, because without creators, they would have no product.
That's a apples and oranges comparison, and you know it.
Perplexity seems to be troubled by it.
If selling text can't work well, then why do LLM products insist on using my text, to sell it?
LLMs are a net negative, as far as costs go. They consume far more in resources than they provide in benefit. If my information was worthless without an LLM, it's worthless with an LLM, therefore, LLMs don't need to access it. Periodt.
The bottom line? Content creators get the first say in how their content is used, and consumed. You are not entitled to their labor, for free, and without condition.
rdri
in reply to ubergeek • • •ubergeek
in reply to rdri • • •Its a good thing I don't just block Perplexity, but all of the LLMs.
And I wont comment on the rest of this, but lets consider another form of property: Real estate.
You own a plot of land. Should others be able to use it, however they feel, whenever they feel like? Or should you have a say in how it gets used?
If you feel like you should have exclusive say in how real estate you own is used and when and by whom, why is intellectual property any different? There must be value in using it, so what's wrong with revenues generated by that use being shared (At least) with the creator?
Last I checked, I'm not seeing rev shares from any of these LLMs that have certainly used my code and other content to train?
kreskin
in reply to Davriellelouna • • •Dr. Moose
in reply to kreskin • • •tempest
in reply to Dr. Moose • • •Yeah and the worst part is it doesn't fucking work for the one thing it's supposed to do.
The only thing it does is stop the stupidest low effort scrapers and forces the good ones to use a browser.
5gruel
in reply to kreskin • • •Recaptcha v2 does way more than check if the box was checked.
stackoverflow.com/a/27299487
How does Google reCAPTCHA v2 work behind the scenes?
Stack Overflowkreskin
in reply to 5gruel • • •Glitchvid
in reply to Davriellelouna • • •GamingChairModel
in reply to Glitchvid • • •Encrypt-Keeper
in reply to GamingChairModel • • •Demdaru
in reply to Encrypt-Keeper • • •gian
in reply to Demdaru • • •GamingChairModel
in reply to Encrypt-Keeper • • •And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.
If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.
To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.
Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
United States v. Andrew Auernheimer
Electronic Frontier FoundationGlitchvid
in reply to GamingChairModel • • •When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)
The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
tomalley8342
in reply to Glitchvid • • •Glitchvid
in reply to tomalley8342 • • •Do you think DoS/DDoS activities should be criminal?
If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).
In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.
I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.
tomalley8342
in reply to Glitchvid • • •ubergeek
in reply to tomalley8342 • • •tomalley8342
in reply to ubergeek • • •As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
ubergeek
in reply to tomalley8342 • • •tomalley8342
in reply to ubergeek • • •I can see that things are the way things are. Accepting it is a different matter.
To me, the "access" that I am referring to (the interface with which you gain access to a service) and that "access" (your behavior once you have gained access to a service) are different topics. The same distinction can be made with the concern over DoS attacks mentioned earlier in the thread. The user's behavior of overwhelming a site's traffic is the root concern, not the interface that the user is connecting with.
finitebanjo
in reply to GamingChairModel • • •GamingChairModel
in reply to finitebanjo • • •Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.
Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.
Interpreting the CFAA to cover scraping is absurd and draconian.
finitebanjo
in reply to GamingChairModel • • •GamingChairModel
in reply to finitebanjo • • •finitebanjo
in reply to GamingChairModel • • •GamingChairModel
in reply to finitebanjo • • •So yeah, I stand by my statement that anyone thinks this is a crime, or should be a crime, has a poor understanding of either the technology or the law. In this case, even mentioning Alphabet suing for damages means that you don't know the difference between criminal law and civil law.
That's not a crime, and again reveals gaps in your knowledge on this topic.
finitebanjo
in reply to GamingChairModel • • •GamingChairModel
in reply to finitebanjo • • •Who said anything about DDoS? I'm using ad blockers and saving/caching/archiving websites with a single computer, and not causing damage. I'm just using the website in a way the owner doesn't like. That's not a crime, nor should it be.
finitebanjo
in reply to GamingChairModel • • •We did
GamingChairModel
in reply to finitebanjo • • •finitebanjo
in reply to GamingChairModel • • •You appear to have misread
YOU caused google lost ad revenue
GOOGLE's Crawlers have crippled sites
poopkins
in reply to Davriellelouna • • •I've developed my own agent for assisting me with researching a topic I'm passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I'm a human using a web browser. (For my network requests, I've defined my own user agent.)
So I use that as a signal that the website doesn't want automated tools scraping their data. That's fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.
I completely understand where Perplexity is coming from, but at scale, implementations like ~~this~~ Perplexity's are awful for the web.
(Edited for clarity)
IphtashuFitz
in reply to poopkins • • •I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.
We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
poopkins
in reply to IphtashuFitz • • •What I meant with "things like this are awful for the web," I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.
My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
sylver_dragon
in reply to Davriellelouna • • •snooggums
in reply to sylver_dragon • • •flux
in reply to snooggums • • •This is not about training data, though.
Personally I think that claim is a decent one: user-initiated request should not be subject to robot limitations, and are not the source of DDOS attack to web sites.
I think the solution is quite clear, though: either make use of the user identity to walz through the blocks, or even make use of the user browser to do it. Once a captcha appears, let the user solve it.
Though technically making all this happen flawlessly is quite a big task.
snooggums
in reply to flux • • •They are one of the sources!
The AI scraping when a user enters a prompt is DDOSing sites in addition to the scraping for training data that is DDOSing sites. These shitty companies are repeatedly slamming the same sites over and over again in the least efficient way because they are not using the scraped data from training when they process a user prompt that does a web search.
Scraping once extensively and scraping a bit less but far more frequently have similar impacts.
flux
in reply to snooggums • • •When user enters a prompt, the backend may retrieve a handful a pages to serve that prompt. It won't retrieve all the pages of a site. Hardly different from a user using a search engine and clicking 5 topmost links into tabs. If that is not a DoS attack, then an agent doing the same isn't a DDoS attack.
Constructing the training material in the first place is a different matter, but if you're asking about fresh events or new APIs, the training data just doesn't cut it. The training, and subsequenctly the material retrieval, has been done a long time ago.
Amberskin
in reply to Davriellelouna • • •Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?
Isn’t that a literal computer crime?
utopiah
in reply to Amberskin • • •Deflated0ne
in reply to utopiah • • •iamdefinitelyoverthirteen
in reply to utopiah • • •dinckel
in reply to Amberskin • • •Silicon
in reply to dinckel • • •Dr. Moose
in reply to Davriellelouna • • •It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
baronofclubs
in reply to Dr. Moose • • •omg ur a hacker
Did you mean Edge on Windows? 'Cause if so, welcome in!
dodos
in reply to Dr. Moose • • •Yeller_king
in reply to dodos • • •Dr. Moose
in reply to dodos • • •"Wrong with my setup" - thats not how internet works.
I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.
Either way this should be no way acceptible.
JcbAzPx
in reply to Dr. Moose • • •jaemo
in reply to dodos • • •Thirded. All three (Linux, FF, nexus)
ZERO ISSUES.
Dremor
in reply to Dr. Moose • • •Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something's may be wrong with your setup.
Dr. Moose
in reply to Dremor • • •Dremor
in reply to Dr. Moose • • •Same goes the other way. It's not because it doesn't work for you that it should go away.
That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.
Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
Dr. Moose
in reply to Dremor • • •lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.
The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.
I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
Laser
in reply to Dr. Moose • • •Linux user here, Cloudflare hasn't blocked access to a single page for me unless I use a VPN, which then can trigger it.
Dremor
in reply to Dr. Moose • • •COASTER1921
in reply to Dremor • • •I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can't pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone's hotspot I can much more consistently pass. It's super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.
Interesting video on the subject: youtu.be/SasXJwyKkMI
- YouTube
youtu.beCatDogL0ver
in reply to Dr. Moose • • •It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".
Check your security settings, antivirus and VPN
Kissaki
in reply to Davriellelouna • • •So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
Dr. Moose
in reply to Kissaki • • •ubergeek
in reply to Kissaki • • •Kissaki
in reply to ubergeek • • •ubergeek
in reply to Kissaki • • •Except, it's not a live user hitting 10 sights all the same time, trying to crawl the entire site... Live users cannot do that.
That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?
lime!
in reply to Kissaki • • •seraphine
in reply to lime! • • •lime!
in reply to seraphine • • •i really wish we wouldn't do those. feels too reddity.
but thanks.
seraphine
in reply to lime! • • •lime!
in reply to seraphine • • •Kokesh
in reply to Davriellelouna • • •ubergeek
in reply to Kokesh • • •FauxLiving
in reply to Davriellelouna • • •The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.
Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.
There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.
Which is the point of the article and the article’s title.
It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
ubergeek
in reply to FauxLiving • • •Except, they don't. It's a toggle, available to users, and by default, allows Perplexity's scraping.
snooggums
in reply to FauxLiving • • •Oh fuck off with that AI company propaganda.
The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.
Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.
FauxLiving
in reply to snooggums • • •There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.
You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.
This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.
Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.
This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.
I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.
This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
ubergeek
in reply to FauxLiving • • •There is, in scale.
Electricd
in reply to Davriellelouna • • •tempest
in reply to Electricd • • •Laser
in reply to tempest • • •Electricd
in reply to tempest • • •sandwich.make(bathing_in_bismuth)
in reply to Electricd • • •iamdefinitelyoverthirteen
in reply to sandwich.make(bathing_in_bismuth) • • •EDIT: It was supposed to say "loops", but I'm keeping it.
Electricd
in reply to Davriellelouna • • •They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping
I believe a lot of websites don't want both though
threeganzi
in reply to Electricd • • •Electricd
in reply to threeganzi • • •I assume their script does some search engine stuff like query google or bing and then "scrap" the links they go on
Some selenium stuff
Jimmycrackcrack
in reply to Davriellelouna • • •tarknassus
in reply to Davriellelouna • • •