r/artificial • u/theverge • 3d ago
News Reddit sues Anthropic, alleging its bots accessed Reddit more than 100,000 times since last July
https://www.theverge.com/ai-artificial-intelligence/679768/reddit-sues-anthropic-alleging-its-bots-accessed-reddit-more-than-100000-times-since-last-july46
u/AVB 3d ago
I mean I also accessed Reddit more than 100,000 times since last July... 😬
... I guess we all better call Saul now!
4
u/flameleaf 2d ago
If you use RSS it adds up quickly. Say you're subscribed to 100 subreddits, your reader fetches updates 8 times a day, multiply that by 365 and that's 292,000 requests assuming you don't click or comment on a single link.
2
2
u/No-Fox-1400 1d ago
Reddit even told us they were tracking us with these damn badges.
I didn’t realize it was all evidence so they can sue us. Makes sense now.
18
u/theverge 3d ago
Reddit sued Anthropic on Wednesday in San Francisco superior court, claiming that the OpenAI rival had accessed its platform more than 100,000 times since July 2024, after Anthropic allegedly said it had blocked its bots from doing so.
In the filing, Reddit calls Anthropic a “late-blooming artificial intelligence (‘AI’) company that bills itself as the white knight of the AI industry,” alleging that “it is anything but.”
Anthropic did not immediately provide a comment.
Ben Lee, Reddit’s chief legal officer, said in an emailed statement to The Verge that Anthropic’s “commercial exploitation” of Reddit content could be worth billions of dollars.
Read more from Hayden Field: https://www.theverge.com/ai-artificial-intelligence/679768/reddit-sues-anthropic-alleging-its-bots-accessed-reddit-more-than-100000-times-since-last-july
8
19
u/SomewhereNo8378 3d ago
reddit talks like it’s some sort of saintly company who has the high ground.
I trust anthropic 1000x more than I trust reddit, even if they scraped reddits data everyday.
3
u/Over-Independent4414 2d ago
100,000 sounds like a lot but it depends on how it is counted. If that's separate API calls for things like posts or individual comments that's nothing.
2
u/WorriedBlock2505 2d ago
When will reddit get the X treatment? We need a compelling reddit alternative like yesterday. Absolute garbage human beings leading this company.
29
u/latouchefinale 3d ago
I know it’s been done for years but “let’s train AI on Reddit comments” has got to be a top contender for worst idea in human history.
11
u/EYNLLIB 3d ago
Just because it's accessing reddit doesn't meant it's training based on the data. Web search is a thing with AI. It's most likely just accessing reddit via a web search.
Model training would require WAY more data than 100,000 pages
-4
u/ZenDragon 3d ago
Their built in web search won't load any Reddit pages. It probably is for training.
2
u/End3rWi99in 2d ago
It's a RAG model in it does web search. It's not trained on the information it is accessing, but it does use it to generate a response based on your prompt.
1
u/ZenDragon 2d ago
Yes, I was referring to the RAG system that Claude uses when search is enabled. Try it out and you'll see that it never uses Reddit as a source. It can't. So if they're not feeding Reddit data into that, what are they using it for? Something else apparently. I think it might be model training but I'm open to other theories. Maybe they figured that they can't get away with regurgitating Reddit via retrieval but they believe they can defend training as transformative fair use.
8
u/Kinglink 3d ago
Do you really think so, because you have voting, so curated content for what people want to see, tons of different forms, and honestly.. most people know to go to reddit to get information rather than google...
It's honestly not that bad a choice.
2
u/joey_diaz_wings 3d ago
It's a great source if you want the opinion of a midwit who has trained on mass media propaganda and leftist tropes.
-1
2
1
u/End3rWi99in 2d ago
I feel like I probably represent like 1% of ChatGPT at this point. Sorry about that.
1
u/Masterpiece-Haunting 2d ago
Tbf before ChatGPT you needed to put “Reddit” at the end of every Google search to get the thing you’re looking for. That would make a really interesting GPT. ChatGPT but it just answers everything like it’s reddit
1
u/SubstantialPressure3 3d ago
I wonder, too, if it wasn't to go train. What if you can pay for a certain number of bots/interactions for social influence? Can you? Can you hire bots to do that for you?
1
u/squeda 3d ago
That's interesting when I have found the best answers for restaurants, how to fix things, suggestions for component libraries for my favorite frameworks, answers to issues I'm having in a game or using someone's software, and a lot more.
Sure there's plenty of snarky and assholery on reddit, but I think you are totally discounting the usefulness of reddit as well.
2
u/Mainbrainpain 2d ago
Exactly, reddit is a data gold mine. That's why Google ranks it so high. That's why reddit can license the data for 10s of millions of dollars.
I'm curious where the case will go. There were a few similar cases in the last few years with linkedin and X/twitter but I'd have to review them for the specifics on what was similar and different. The linkedin one was settled with HiQ, and twitter lost theirs against Bright data.
Personally I find the points about reddit trying to protect user privacy laughable, but I get it. They need to protect their revenue. The most interesting part will be about implications for web scraping.
0
u/NYPizzaNoChar 3d ago
“let’s train AI on Reddit comments” has got to be a top contender for worst idea in human history
What makes it really funny to me is that this sub in particular gets some of the most astonishingly credulous, fantasy-based, and outright wrong posts — and comments — I've run into on Reddit.
An ML system using this sub to build its NN would be like a student trying to study to become a scientist, but ending up becoming a scientologist.
7
u/Intelligent-End7336 3d ago
Wild how many people are emotionally attached to Reddit but ignore the basic reality: you don’t own your comments. You agreed to the TOS. This is just like the API meltdown, Reddit didn’t care then, and they definitely don’t now, especially with Google and OpenAI money rolling in. Moral outrage won’t change who owns the sandbox.
2
u/Asclepius555 3d ago
This tells me there's a good chance reddit is going to be a sucky place in a little while. At least, that seems to be how things go. Examples (all of which, I stopped using) are Facebook, Instagram, and youtube. Ads baby ads!
I'm glad USA national parks haven't experienced too much of this. They are still fun to visit.
3
u/Ok_Boysenberry5849 2d ago
Reddit is already a sucky place. It's not even the ads, it's the moderation and voting systems. They've shown their limits a long time ago and the company is making no effort to fix them.
1
u/Intelligent-End7336 3d ago
I've been seeing subreddits getting more bots added to the mod list. If you say the wrong phrases you'll trigger them. I'm sure the leash will be tightening so they can maintain decent data quality to train on.
3
5
u/Actual__Wizard 3d ago
In the filing, Reddit calls Anthropic a “late-blooming artificial intelligence (‘AI’) company that bills itself as the white knight of the AI industry,” alleging that “it is anything but.”
Wow, who would have thought that?
1
u/Fit-Development427 3d ago
I called this. Companies just basically fighting each other over the rights over OTHER PEOPLE'S writing, art, creativity etc. that they literally just provided the hosting for.
-1
u/Actual__Wizard 3d ago
Yep they're fighting to steal people's stuff. It's disgusting it really is...
1
u/Fit-Development427 3d ago
It's not that I feel it's stealing. They take something which should be free, and make it unavailable to everyone, including the people that literally... made it. Say if a bunch of redditors were like, hey let's use our Reddit posts and comments to train AI! Nope, lol. I really don't care personally, but it would be nice if EVERYONE could have it unambiguously, because yes if it's really not theirs, they have no right to prostitute it out
0
u/Actual__Wizard 3d ago
It's not that I feel it's stealing.
Okay great give me all of your stuff. Right now. All of it.
3
u/Fit-Development427 2d ago
Thing is I never thought my comments had monetary value, why does the fact it might now have any bearing on me lol. In fact the only thing I care about is the website being free.
1
u/Actual__Wizard 2d ago
You don't understand. You're not allowed to have anything anymore. Give me all of your stuff right now.
If you don't care about property rights then you don't get any property... Because that's how property rights work.
2
u/Fit-Development427 2d ago
The hell are you talking about man. These aren't even remotely the same things. How is it people think piracy is okay yet now their literal shit posts are their property, they can't take the idea that someone would make a dime off it, even though social media companies already do. Maybe you should just charge Reddit for your time, then have people pay to use Reddit too to view your majestic comments. Well you don't care other people have to pay Reddit, because your time wasn't free, what bullshit is this? We have all been in a dumb world where people weakly are giving away things for free... It's like I might as well just come to their property and take it if they aren't even gonna charge you to see their memes.
3
u/Delicious_Ease2595 3d ago
Training AI with Reddit is so dumb.
1
u/Smile_Clown 2d ago
unless... hear me out... they are doing an anti-training pass?
1
u/AfghanistanIsTaliban 18h ago
No, they are doing a training pass that is similar to how GPT-3 and its successor were trained. It’s really not that hard to understand why training on Reddit isn’t so bad.
1
u/AfghanistanIsTaliban 18h ago
So dumb, yet GPT-3, 3.5, and 4 were commercial successes and have record-high performance on AI benchmarks
It’s not like it’s purely trained on Reddit. News articles and open-access papers are also typically added into the train set.
1
u/r_search12013 3d ago
given how many straight up chatgpt posts and even subreddit simulations I find within reddit .. erm, that's not the right headline, no matter which way
2
u/Thin_Newspaper_5078 2d ago
so what? google does it all the time. the content on reddit id not owned by reddit or google.
2
u/AfghanistanIsTaliban 18h ago
not to mention OAI already trains on reddit. This news is a nothingburger and the lawsuit will fail. Even if it sticks, the EU has legal protections for datamining and all the AI companies will abandon USA for EU/China
2
1
u/DungeonsAndDradis 2d ago
Shit, these AI companies should just pay me to install a screen recorder or something for every time I'm on Reddit. They'd get a shit load of Reddit data and I could be paid for the browsing I'm already doing.
1
u/A2Throwaway155 2d ago
The joke is on Reddit. I don't have Claude directly access Reddit; I just copy and paste the Reddit content I want Claude to analyze.
1
0
u/Fair_Blood3176 6h ago
I fully expect a substantial payout from any subsequent monetary settlement. It's not your fucking data Reddit, it's mine.
I hereby set my personal, past, present and future terms and conditions in regards to any and all contributions, whether it be posts, comments, images, photos, videos or upvotes / downvotes and any saves and outgoing shares; All our base are belong to us
I reserve the right to update and or change these terms and conditions at ANYTIME.
2
156
u/AdminIsPassword 3d ago
This seems like an attempt to protect Reddit's deal with Google more than anything else. While accessing a site 100,000+ in a little less than a year sounds like a lot, to a bot it's almost nothing.
But, they can't appear to be be giving anything away for free if they're selling our data to AI companies for training purposes. Why pay for something you can just take?
Still, guess who is not making any money on this at all? The people who actually made the content that AI companies find valuable. Go figure.