Google is killing the open web
cross-posted from: programming.dev/post/35950567
::: spoiler Comments
- Lobsters;
- Hackernews.
:::
Google is killing the open web
::: spoiler Comments
- Lobsters;
- Hackernews.
:::
like this
Fitik likes this.
Microsoft's latest Windows 11 24H2 update breaks SSDs/HDDs, may corrupt your data
cross-posted from: programming.dev/post/35948067
Necoru_cat post on X/Twitter, Translated from Japanese.
Microsoft's latest Windows 11 24H2 update breaks SSDs/HDDs, may corrupt your data
Necoru_cat post on X/Twitter, Translated from Japanese.
Microsoft's latest Windows 11 24H2 update breaks SSDs/HDDs, may corrupt your data
cross-posted from: programming.dev/post/35948067
Necoru_cat post on X/Twitter, Translated from Japanese.
Microsoft's latest Windows 11 24H2 update breaks SSDs/HDDs, may corrupt your data
Necoru_cat post on X/Twitter, Translated from Japanese.
like this
adhocfungus e essell like this.
God damn, after ~20 years of being off Windows reading about problem after problem on each and every update is exhausting.
How do you all (Windows users) deal with this shit?
This is the first Windows update that has significantly altered how my daily driver laptop works (read: for the worse).
It's too inconvenient to use a Windows computer anymore. I'm switching to Linux
as far as I know, Debian is the "gold standard" for stable linux to the point of being one of the most famous distros used on servers as well.
Exceptions are using unstable/testing versions of debian or accounting for an windows program to just work perfectly under wine (but that is a microsoft-linux integration which MS almost always wants to not happen)
In what way? The most stable Linux is far better than what Microsoft could yank out of their AI asses. Debian and Red Hat has been the staple of many servers around the globe. Hell, this Lemmy instance might be on one of those.
It's only when you tinker around too hard and fast then you have problems in Linux. But there are ways to get things back on track easily compared to Windows.
Again, why are people paying money for this bullshit?
This is just normal and on par for Microsoft. When was the last time they didn't fix a security issue because they didn't wanted the bad publicity, causing the US government to be hacked?
Oohh, we will never do it again, pinky promise!
Microsoft's evil but oh my fucking god, they're so incompetent that they can't even be evil without fucking shit up
Install Linux already,.be done with the nonsense
Every thread has one Linux bro, stumbling around dazed and confused, still searching to understand why people use a different OS. Always asking "Why do people even use that?" Ignorant of the litany of reasons the real world behaves the way it does.
Windows sucks, in a lot of ways, we get it. But holy hell find a different schrick.
There's a lot more than one of us here, "bro"
Edit: Windows is trash. Fuck Microsoft.
I understand why people use different operating systems. Ill judge you for mac os but i kinda get it. I think slackware or something works for some use cases.
Windows is just ijsane though. You're insane for using it.
Start looking into desktop environments, everyones quick to suggest distros, but de imo is more what matters day to day, most distros just work and will help you grab the same stuff in different background ways and/or with different terminal commands.
They should all have de options or have community alternatives of them that come with a certain de like kde or gnome.
Close to windows, minimal customization (still more than default windows
Cinnamon
Iphone + cydia, opinionated base experience with extensions that can completely change the look and add stuff like panels/dock
gnome + extension store
Windows but ultra customizable, tons of settings and directly customizable from the ui itself by right clicking
kde plasma
Keyboard user, hand always on it, like shortcuts and code editor based customization with documentation
hyprland
Solid advice!
And remember that "DE hopping" is much easier than distro hopping, as you can install multiple and try them out without reinstalling your system.
Personally I'm a shill for Plasma, as I think that their motto "simple by default, powerful when needed" is very true. Out of the box, you get a grandma-ready UX that's pretty intuitive to any Windows or Mac user, but once you start to dig in there's so many "power user" features. Now every time I'm on a different system I instantly miss all the little QoL that I never even think about, and almost everything is neatly packaged in the system settings or context menus, without having to install extensions or set up a dozen different components
Imo plasma settings/options can be a bit overwhelming, cinnamon can be underwhelming lol, as a former cydia user, I really like toggleable extensions with indidual settings that can be as complex/basic as they need to be.
My main issue with plasma is I cant stop tinkering with my theme/ui because the settings are so easily accesible. I get distracted easily. Gnome with a few curated extensions helps me focus, realized on accident using it because davinci resolve had issues on kde plasma using the global menu (didn't resolve after removing the menu)
I thought I was the only one who found KDE to be far TOO customizable. I used GNOME on openSUSE and actually enjoyed it. Used KDE on PoP! and hated it. Of course, the distro may have played a part in that. PoP never seemed to run right on my dual gfx Yoga 720. Using Cinnamon with Mint on it and I like it, but agree with a lack of all desired customization options. I can do about 90% of the tweaks I like to make.
I've never heard of cydia, though. Of course I'm like a 110yo on Windows when it comes to Linux usage. I couldn't even get openSUSE to reinstall from a flash drive after testing some other distros. Kept getting out of memory when it would attempt to install. I do think it was my favorite flavor of all the ones I tried!
I so much want to, but the programs I run are partly windows only. I don't know how to switch yet. Next to that, I tried Linux once but was unable to reach netwerk drives from my NAS. I tried everything, none of the solutions I found actually worked. I seem to have a curse running into issues no one else had. Struggling my whole life with that. Today I spent the entire day fixing Kodi, which suddenly stopped working. None of the solutions on internet worked. I managed to fix it my own way, eventually. Just to play a video without losing my "videos watched".
MS is working hard to force me though. I'm almost as far as to say goodbye to apps I've used my whole life. Like Directory Opus for example.
May have been this one
theregister.com/2017/11/22/lin…
'Urgent data corruption issue' destroys filesystems in Linux 4.14
Using bcache to speed Linux 4.14? Stop if you want your data to liveSimon Sharwood (The Register)
'Ad Blocking is Not Piracy' Decision Overturned By Top German Court
German publisher Axel Springer, owner of brands including Bild and Die Welt, has been given another opportunity to have ad blocking outlawed on copyright grounds. After a series of defeats in its years-long legal action against the makers of Adblock Plus, the publisher appealed to the Federal Court of Justice. Germany's top court has now overturned a 2023 ruling by the Higher Regional Court of Hamburg, referring the case back for reconsideration of the core issues.
'Ad Blocking is Not Piracy' Decision Overturned By Top German Court * TorrentFreak
Legal action by publisher Axel Springer, which aims to outlaw ad blocking on copyright grounds, has been revived by Germany's top court.Andy Maxwell (TF Publishing)
The Terminal Demise Of Consumer Electronics Through Subscription Services
The Terminal Demise Of Consumer Electronics Through Subscription Services
Open any consumer electronics catalog from around the 1980s to the early 2000s and you are overwhelmed by a smörgåsbord of devices, covering any audio-visual and similar entertainment and hobby nee…Hackaday
Malaysia scraps plans to buy Black Hawk helicopters derided by its King as ‘flying coffins’
Sultan Ibrahim Iskandar said those deciding on military acquisitions must be transparent.
Archived version: archive.is/newest/straitstimes…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Malaysia scraps plans to buy Black Hawk helicopters derided by its King as ‘flying coffins’
Sultan Ibrahim Iskandar said those deciding on military acquisitions must be transparent. Read more at straitstimes.com.Raul Dancel (ST)
Intel data breach: employee data could be accessed via API
cross-posted from: lemmy.zip/post/46676673
Various vulnerabilities in Intel’s internal sites allowed unauthorized users to access the personal data of approximately 270,000 employees, more than the company currently employs. Easy-to-circumvent logins and hard-coded login credentials were the weakest links.
Intel data breach: employee data could be accessed via API
Various vulnerabilities in Intel’s internal sites allowed unauthorized users to access the personal data of approximately 270,000 employees, more than the company currently employs. Easy-to-circumvent logins and hard-coded login credentials were the weakest links.
Intel data breach: employee data could be accessed via API - Techzine Global
An Intel data breach exposed the employee data of 270,000 employees via internal portals. Eaton Zvveare reported it, but was not rewarded.Erik van Klinken (Techzine)
Intel data breach: employee data could be accessed via API
Various vulnerabilities in Intel’s internal sites allowed unauthorized users to access the personal data of approximately 270,000 employees, more than the company currently employs. Easy-to-circumvent logins and hard-coded login credentials were the weakest links.
Intel data breach: employee data could be accessed via API - Techzine Global
An Intel data breach exposed the employee data of 270,000 employees via internal portals. Eaton Zvveare reported it, but was not rewarded.Erik van Klinken (Techzine)
Unnamed Finnish MP commits suicide in parliament
Finnish Prime Minister Petteri Orpo called the reports "truly sad news"
Archived version: archive.is/newest/euractiv.com…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Switzerland says would grant Putin 'immunity' for peace talks
Cassis stressed he had “repeatedly" made this offer to host during recent talks with Russian Foreign Minister Sergei Lavrov
Archived version: archive.is/newest/euractiv.com…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
British Airways Pilot Suspended After Leaving Cockpit Door Open So Family Could Watch Him Fly
The breach of anti-terrorism laws and security protocols led to a rapid suspension, although the pilot has since been reinstated.
PSA Airlines Flight Attendants Hold “Day Of Action” At 5 Airports In Fight Of Better Wages
FAs join together to demand fair compensation.
Allegiant Pilots Prepare For 'No Confidence' Vote Against Airline Leadership
Allegiant Air flight crews are looking to oust company leadership. Here's why.
Air Canada Resuming Flights, As Flight Attendant Strike Ends
Air Canada will be resuming operations following a flight attendant strike that lasted nearly four days. Here's what to expect.
Xbox Is Investing In AI For Their Next Gen Console
Microsoft VP of Next Generation has revealed that Xbox is investing heavily in AI, as much as rendering tech and their AMD chips.
Nvidia Claims GeForce Now Outperforms PS5 Pro Now
Nvidia has revealed a huge technical upgrade to GeForce Now cloud streaming, showing how it outperforms PS5 Pro on several fronts.
The AI company Perplexity is complaining their plagiarism bot machine cannot bypass Cloudflare's firewall
Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants
Perplexity defends its AI assistants against Cloudflare's claims, arguing that they are not web crawlers but user-triggered agents.Roger Montti (Search Engine Journal)
like this
adhocfungus likes this.
then make all links to your website link to that snapshot, and turn your server off.
sorry archive.org, I promise I'll donate ❤️
Block Cloudflare MITM Attack – Get this Extension for 🦊 Firefox (en-US)
Download Block Cloudflare MITM Attack for Firefox. Подчинитесь глобальному наблюдению или сопротивляйтесь. Выбор за вами.addons.mozilla.org
I get the centralization concerns, but I would think that's on the consumer since there are other options. As for the fascist content, as another commenter said, they could risk their safe harbor if they started stated regulating content that they weren't legally required to regulate.
Just my thoughts.
They’re not. They’re using this as an excuse to become paid gatekeepers of the internet as we know it. All that’s happening is that Cloudflare is using this to menuever into position where they can say “nice traffic you’ve got there - would be a shame if something happened to it”.
AI companies are crap.
What Cloudflare is doing here is also crap.
And we’re cheering it on.
I actually agree with them
This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.
There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.
I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.
(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
Search engines been going relatively fine for decades now. But the crawlers from AI companies basically DDOS hosts in comparison, sending so many requests in such a short interval. Crawling dynamic links as well that are expensive to render compared to a static page, ignoring the robots.txt entirely, or even using it discover unlinked pages.
Servers have finite resources, especially self hosted sites, while AI companies have disproportinately more at their disposal, easily grinding other systems to a halt by overwhelming them with requests.
Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.
Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
…and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
That seems almost maliciously stupid. We need to train a new model. Hey, where’d the data go? Oh well, let’s just go scrape it all again. Wait, did we already scrape this site? No idea, let’s scrape it again just to be sure.
I think it boils down to "consent" and "remuneration".
I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.
So, these LLM startups ignore both consent, and the idea of remuneration.
Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?
Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.
Would you expect a compensation from me after reading your comment?
That all sounds very vague to me, and I don’t expect it to be captured properly by law any time soon.
It already has been captured, properly in law, in most places. We can use the US as an example: Both intellectual property and real property have laws already that cover these very items.
What does it mean for you and how is it different from being accessed by a user?
Well, does a user burn up gigawatts of power, to access my site every time? That's a huge different.
Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?
Depends on the terms of service I set for that service.
Is it okay for a person to access your site?
Sure!
Is it okay for a script written by that person to fetch data every day automatically?
Sure! As long as it doesn't cause problems for me, the creator and hoster of said content.
Would it be okay for a user to dump a page of your site with a headless browser?
See above. Both power usage and causing problems for me.
Would it be okay to let an LLM take a look at it to extract info required by a user?
No. I said, I do not want my content and services to be used by and for LLMs.
Have you heard about changedetection.io project?
I have now. And should a user want to use that service, that service, which charges 8.99/month for it needs to pay me a portion of that, or risk having their service blocked.
There no need to use it, as I already provide RSS feeds for my content. Use the RSS feed, if you want updates.
If some of these sound unfair to you, you might want to put a DRM on your data or something.
Or, I can just block them, via a service like Cloud Flare. Which I do.
Would you expect a compensation from me after reading your comment?
None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.
Both intellectual property and real property have laws already that cover these very items.
And it causes a lot of trouble to many people and pains me specifically. Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions. License expiration causing digital work to die out, DRM causing software to break, idiotic license owners not providing appropriate service, etc.
Well, does a user burn up gigawatts of power, to access my site every time?
Doing a GET request doesn't do that.
As long as it doesn't cause problems for me, the creator and hoster of said content.
What kind of problems that would be?
Both power usage and causing problems for me.
?? How? And what?
do not want my content and services to be used by and for LLMs.
You have to agree that at one point "be used by LLM" would not be different from "be used by a user".
which charges 8.99/month
It's self-hosted and free.
Use the RSS feed, if you want updates.
How does that prohibit usage and processing of your info? That sounds like "I won't be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com"
I can just block them, via a service like Cloud Flare. Which I do.
That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.
None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.
What if I use my local hosted LLM? Anyway, the point is, selling text can't work well, and you're going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.
Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.
I'm all for killing off LLMs, btw. Concerns of site makers who think they are being damaged by things like Perplexity are nothing compared to what LLMs do to the world. Maybe laws should instead make it illegal to waste energy. Before energy becomes the main currency.
Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions.
Then you don't believe content creators should have any control over their own works?
The "proper conditions" are deemed by the content creator, not the consumers.
Doing a GET request doesn’t do that.
Not at all. It consumes at most, a watt.
What kind of problems that would be?
Increasing my hosting bill, to accommodate the senseless traffic being sent my way?
Outages for my site, making my content unavailable for legitimate users?
You have to agree that at one point “be used by LLM” would not be different from “be used by a user”.
Not at all. LLMs are not users.
It’s self-hosted and free.
If you want, or they charge for the hosted version. If they want to use a paid for version, then they can divert some of that revenue to me, the creator, because without creators, they would have no product.
How does that prohibit usage and processing of your info? That sounds like “I won’t be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com”
That's a apples and oranges comparison, and you know it.
That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.
Perplexity seems to be troubled by it.
What if I use my local hosted LLM? Anyway, the point is, selling text can’t work well, and you’re going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.
If selling text can't work well, then why do LLM products insist on using my text, to sell it?
Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.
LLMs are a net negative, as far as costs go. They consume far more in resources than they provide in benefit. If my information was worthless without an LLM, it's worthless with an LLM, therefore, LLMs don't need to access it. Periodt.
The bottom line? Content creators get the first say in how their content is used, and consumed. You are not entitled to their labor, for free, and without condition.
LLM might be worse than those but Perplexity is certainly a lesser player in the field.
Its a good thing I don't just block Perplexity, but all of the LLMs.
And I wont comment on the rest of this, but lets consider another form of property: Real estate.
You own a plot of land. Should others be able to use it, however they feel, whenever they feel like? Or should you have a say in how it gets used?
If you feel like you should have exclusive say in how real estate you own is used and when and by whom, why is intellectual property any different? There must be value in using it, so what's wrong with revenues generated by that use being shared (At least) with the creator?
Last I checked, I'm not seeing rev shares from any of these LLMs that have certainly used my code and other content to train?
Yeah and the worst part is it doesn't fucking work for the one thing it's supposed to do.
The only thing it does is stop the stupidest low effort scrapers and forces the good ones to use a browser.
Recaptcha v2 does way more than check if the box was checked.
How does Google reCAPTCHA v2 work behind the scenes?
This post refers to Google ReCaptcha v2 (not the latest version) Recently Google introduced a simplified "captcha" verification system (video) that enables users to pass the "captcha" just by clic...Stack Overflow
gaining unauthorized access to a computer system
And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.
If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.
To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.
Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
United States v. Andrew Auernheimer
Andrew “Weev” Auernheimer was convicted of violating the Computer Fraud and Abuse Act ("CFAA") in New Jersey federal court and sentenced to 41 months in federal prison in March of 2013 for revealing to media outlets that AT...Electronic Frontier Foundation
When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)
The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
Do you think DoS/DDoS activities should be criminal?
If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).
In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.
I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.
How “open” a website is, is up to the owner, and that’s all.
As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
Well, until we abolish capitalism, that’s the state of things.
I can see that things are the way things are. Accepting it is a different matter.
Unless you feel like Nazis MUST be freely given access to everything too?
To me, the "access" that I am referring to (the interface with which you gain access to a service) and that "access" (your behavior once you have gained access to a service) are different topics. The same distinction can be made with the concern over DoS attacks mentioned earlier in the thread. The user's behavior of overwhelming a site's traffic is the root concern, not the interface that the user is connecting with.
to decide for what purpose it gets used for
Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.
Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.
Interpreting the CFAA to cover scraping is absurd and draconian.
Thats a crime yeah and if Alphabet co wants to sue you for $1.34 damages then they have that right
So yeah, I stand by my statement that anyone thinks this is a crime, or should be a crime, has a poor understanding of either the technology or the law. In this case, even mentioning Alphabet suing for damages means that you don't know the difference between criminal law and civil law.
press charges for the criminal act of intentional disruption of services
That's not a crime, and again reveals gaps in your knowledge on this topic.
you will get prison for DDoS in USA
Who said anything about DDoS? I'm using ad blockers and saving/caching/archiving websites with a single computer, and not causing damage. I'm just using the website in a way the owner doesn't like. That's not a crime, nor should it be.
press charges for the criminal act of intentional disruption of servicesThat’s not a crime, and again reveals gaps in your knowledge on this topic.
We did
You appear to have misread
just as we should have the right to sue them if their AI crawlers make our site unusable and plagiarize our work to the effect of thousands of dollars, or even press charges for the criminal act of intentional disruption of services.
YOU caused google lost ad revenue
GOOGLE's Crawlers have crippled sites
I've developed my own agent for assisting me with researching a topic I'm passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I'm a human using a web browser. (For my network requests, I've defined my own user agent.)
So I use that as a signal that the website doesn't want automated tools scraping their data. That's fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.
I completely understand where Perplexity is coming from, but at scale, implementations like ~~this~~ Perplexity's are awful for the web.
(Edited for clarity)
I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.
We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
What I meant with "things like this are awful for the web," I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.
My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
This is not about training data, though.
Perplexity argues that Cloudflare is mischaracterizing AI Assistants as web crawlers, saying that they should not be subject to the same restrictions since they are user-initiated assistants.
Personally I think that claim is a decent one: user-initiated request should not be subject to robot limitations, and are not the source of DDOS attack to web sites.
I think the solution is quite clear, though: either make use of the user identity to walz through the blocks, or even make use of the user browser to do it. Once a captcha appears, let the user solve it.
Though technically making all this happen flawlessly is quite a big task.
Personally I think that claim is a decent one: user-initiated request should not be subject to robot limitations, and are not the source of DDOS attack to web sites.
They are one of the sources!
The AI scraping when a user enters a prompt is DDOSing sites in addition to the scraping for training data that is DDOSing sites. These shitty companies are repeatedly slamming the same sites over and over again in the least efficient way because they are not using the scraped data from training when they process a user prompt that does a web search.
Scraping once extensively and scraping a bit less but far more frequently have similar impacts.
When user enters a prompt, the backend may retrieve a handful a pages to serve that prompt. It won't retrieve all the pages of a site. Hardly different from a user using a search engine and clicking 5 topmost links into tabs. If that is not a DoS attack, then an agent doing the same isn't a DDoS attack.
Constructing the training material in the first place is a different matter, but if you're asking about fresh events or new APIs, the training data just doesn't cut it. The training, and subsequenctly the material retrieval, has been done a long time ago.
Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?
Isn’t that a literal computer crime?
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
"Wrong with my setup" - thats not how internet works.
I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.
Either way this should be no way acceptible.
Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something's may be wrong with your setup.
Same goes the other way. It's not because it doesn't work for you that it should go away.
That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.
Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.
Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.
I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
So people from low trust score environments like Linux
Linux user here, Cloudflare hasn't blocked access to a single page for me unless I use a VPN, which then can trigger it.
I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can't pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone's hotspot I can much more consistently pass. It's super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.
Interesting video on the subject: youtu.be/SasXJwyKkMI
It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".
Check your security settings, antivirus and VPN
Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.
So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
Except, it's not a live user hitting 10 sights all the same time, trying to crawl the entire site... Live users cannot do that.
That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?
i really wish we wouldn't do those. feels too reddity.
but thanks.
The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.
Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.
There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.
Which is the point of the article and the article’s title.
It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted
Except, they don't. It's a toggle, available to users, and by default, allows Perplexity's scraping.
But a user initiated operation isn’t the same as a bot.
Oh fuck off with that AI company propaganda.
The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.
Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.
There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.
The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.
You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.
This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.
Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.
This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.
I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.
This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.
There is, in scale.
EDIT: It was supposed to say "loops", but I'm keeping it.
They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping
I believe a lot of websites don't want both though
I assume their script does some search engine stuff like query google or bing and then "scrap" the links they go on
Some selenium stuff
Helge Schneider – „The Klimperclown“ (2025)
Alles richtig gemacht! Was soll ich sonst schreiben, über den Geburtstagsfilm, den der SWR dem größten Mülheimer Genie der Gegenwart im Auftrag der ARD hat widmen lassen? Angesichts der Unmöglichkeit der gestellten Aufgabe, haben sie dort glücklicherweise kollektiv entschieden, den Künstler das Werk doch lieber selbst anfertigen zu lassen, bevor die Sendeanstalt sich der Peinlichkeit einer weiteren öffentlich-rechtlichen Hagiographie die dann doch nicht mehr als ein Recycling alter Talkshows und Sketche geworden wäre. (ARD, Neu!)
Helge Schneider - "The Klimperclown" (2025)
Alles richtig gemacht! Was soll ich sonst schreiben, über den Geburtstagsfilm, den der SWR dem größten Mülheimer Genie der Gegenwart im Auftrag der ARD hat widmen lassen? Angesichts der Unmöglichkeit der gestellten Aufgabe, haben sie dort glücklicher…NexxtPress
Tulsi Gabbard, US Director of National Intelligence: UK Withdraws Apple iCloud Backdoor Demand Following US Diplomatic Push
cross-posted from: programming.dev/post/35953366
Source.
Tulsi Gabbard, US Director of National Intelligence: UK Withdraws Apple iCloud Backdoor Demand Following US Diplomatic Push
Materiali pragmatici (corsi, tutorial, etc.) per imparare a fare il reporting per il CSRD?
Provo a chiedere qui dove probabilmente molti di voi sono interessati all'argomento.
Siete a conoscenza di buone risorse, in Italiano o in Inglese, per imparare a riportare dati non finanziari secondo la direttiva UE CSRD per il reporting non-finanziario (incluso ambientale) in ambito EEA?
Sono alla ricerca di corsi, tutorial e altri materiali VERAMENTE informativi.
Cercando online ho solo trovato una pletora di articoli scritti con ChatGPT che non vanno a parare da nessuna parte.
Hiding secret codes in light protects against fake videos
Hiding secret codes in light protects against fake videos | Cornell Chronicle
A team of Cornell computer science researchers has developed a way to “watermark” light in videos, which they can use to detect if video is fake or has been manipulated, another potential tool in the fight against misinformation.Cornell Chronicle
Intel Outside: Hacking every Intel employee and various internal websites
Intel Outside: Hacking every Intel employee and various internal websites
Hardcoded credentials, pointless encryption, and generous APIs exposed details of every employee and made it possible to break into internal websites.Eaton (eaton-works.com)
Intel Outside: Hacking every Intel employee and various internal websites
Intel Outside: Hacking every Intel employee and various internal websites
Hardcoded credentials, pointless encryption, and generous APIs exposed details of every employee and made it possible to break into internal websites.Eaton (eaton-works.com)
Softbank bets $2 billion on Intel having a future
Takes two percent stake as rumours swirl Uncle Sam could do something similar
Softbank bets $2 billion on Intel having a future
: Takes two percent stake as rumours swirl Uncle Sam could do something similarSimon Sharwood (The Register)
US FTC sues ticket reseller for evading Taylor Swift's Eras tour ticket limits
The U.S. Federal Trade Commission sued ticket reseller Key Investment Group for evading purchasing limits to buy up thousands of tickets to live events including Taylor Swift's Eras tour and resell them at a markup, according to a complaint filed in Maryland federal court on Monday.
2 dead, 5 injured after train hits railway workers in Cheongdo in South Korea
They were conducting post-flood safety inspections when the incident occurred.
Archived version: archive.is/newest/straitstimes…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
2 dead, 5 injured after train hits railway workers in Cheongdo in South Korea
They were conducting post-flood safety inspections when the incident occurred. Read more at straitstimes.com.ST
Microsoft could be working on a cheaper Xbox Cloud Gaming plan
Microsoft is reportedly considering an Xbox Cloud Gaming-only subscription tier that facilitates gamers who don't own an Xbox console at all.
https://www.neowin.net/news/microsoft-could-be-working-on-a-cheaper-xbox-cloud-gaming-plan/
UK backs down in Apple privacy row, US says
UK authorities have demanded access to Apple users' protected files when required for investigations.
Natural compound found in popular hot drink could protect brain against Alzheimer’s
Findings point to promising strategies to rescue nerves in the brain’s hippocampus from ageing and Alzheimer’s
Archived version: archive.is/newest/independent.…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
India's Modi to meet China's top diplomat as Asian powers rebuild ties
Indian Prime Minister Narendra Modi will meet China's top diplomat, signaling easing tensions between the two nations.
Archived version: archive.is/newest/apnews.com/a…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
UK | Government must stop children using VPNs to dodge age checks on porn sites, commissioner demands
Disclaimer: The article is sponsored by Proton.
Dame Rachel de Souza warns it is ‘absolutely a loophole that needs closing’ as new report finds proportion of children who report accessing pornography online has increased over last two years
Syria: Israel army raids homes in Quneitra countryside
Syrian media reported on Monday that Israeli forces carried out a raid inside the village of Ain Ziwan in southern Quneitra countryside, searching several civilian homes.
Archived version: archive.is/newest/middleeastmo…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
Firefox advances privacy for Chinese, Japanese, and Korean users
Mozilla has begun rolling out Chinese, Japanese, and Korean translation support in Firefox. It's privacy-friendly and even works offline.
https://www.neowin.net/news/firefox-advances-privacy-for-chinese-japanese-and-korean-users/
Fitik likes this.
Norwegian wealth fund excludes 6 Israeli companies from its portfolio
The Norwegian sovereign wealth fund, the largest in the world, said on Monday that it has decided to exclude six Israeli companies from its investment portfolio.
Archived version: archive.is/newest/middleeastmo…
Disclaimer: The article linked is from a single source with a single perspective. Make sure to cross-check information against multiple sources to get a comprehensive view on the situation.
RheumatoidArthritis
in reply to Pro • • •xxce2AAb
in reply to Pro • • •I Cast Fist
in reply to xxce2AAb • • •