PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Reddit CEO: Blocking Microsoft From AI Scraping Was a 'Real Pain in the Ass'

Microsoft, Perplexity AI, and Anthropic need to pay to use Reddit's posts, Steve Huffman says. But it's unlikely Microsoft is open to striking a deal.

 & Kate Irwin Reporter

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS
(Credit: NurPhoto/Contributor via Getty Images)

Reddit has struggled to block tech firms like Microsoft, Perplexity AI, and Anthropic from scraping its content—and wants them to understand that its social platform is pay-to-play.

In a recent interview with The Verge, Reddit CEO Steve Huffman doubled down on Reddit's stance against unlicensed AI scraping. Unlike the (convenient) view held by some firms that they should be able to actively scrape any "publicly available" data without permission and for free, Reddit is firmly opposed to this stance.

Microsoft didn't notify Reddit that it was scraping Reddit's content and using it for its AI features in Bing, Huffman says. To make matters worse, Huffman also alleges Microsoft resold Reddit's data after scraping it for free to other search engines via Bing's API.

Reddit has since blocked Microsoft from scraping its site, an effort Huffman calls "a real pain in the ass."

"We’ve had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use," Huffman tells The Verge. "That’s their real position."

Other tech giants have taken this position, too. Apple previously confirmed to PCMag last month that it has used publicly available data to train its upcoming Apple Intelligence. Similarly, Salesforce previously defended its use of YouTube video transcripts and other unlicensed data by arguing that what it used was "publicly available." In all of these cases, however, it's unclear what "publicly available" really means.

Reached for comment, a Microsoft spokesperson tells PCMag: "Microsoft respects the robots.txt standard and we honor the directions provided by websites that do not want content on their pages to be used with our generative AI models. Bing stopped crawling Reddit after they implemented their updated robots.txt file on July 1, which prohibits all crawling of their site."

In recent months, Reddit has been making changes to try to control how AI firms use Reddit users' data, posts, and communities. Earlier this year, it struck a $60 million AI licensing deal with Google and also made a deal with OpenAI in May. Individual Reddit users won't be paid if their posts are used, though.

Would Microsoft strike a deal with Reddit? It's unlikely. In response to a Twitter user begging Microsoft to pay Reddit to compete with Google, Microsoft's Head of Search Jordi Ribas rejected the idea. "Bing needs to stand on its own feet like other products in our company," Ribas said.

Reddit frames its licensing agreements as its way of knowing what these companies do with its users' content. “Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used,” Huffman says.

It's unlikely big tech firms like Microsoft will change their tune on "public" data until substantial AI regulation is passed in the US—or some of the high-profile copyright lawsuits against AI firms set a precedent.

About Our Expert

Kate Irwin

Kate Irwin

Reporter

I’m a reporter for PCMag covering tech news early in the morning. Prior to joining PCMag, I was a producer and reporter at Decrypt and launched its gaming vertical, GG. I have previously written for Input, Game Rant, Dot Esports, and other places, covering a range of gaming, tech, crypto, and entertainment news.

I’ve been a PC gamer since The Sims (yes, the original) in the CD-ROM days. I still think about my first-gen pink iPod mini, which, looking back, was not so mini. In 2020, I finally built my own custom Windows PC for gaming with a 3090 graphics card, but I also regularly use Mac and iOS devices. As a reporter, I’m passionate about documenting the wide world of tech and how it affects our daily lives.

My Areas of Expertise

  • Microsoft
  • Google
  • Artificial intelligence 
  • Cybersecurity
  • Video games are a big one. I specialize in shooters (Apex Legends, Fortnite, Overwatch) but I occasionally test out other genres as well, especially indie games or cozy games (The Sims series, Animal Crossing). 
  • The business and tech that powers video games
  • Cryptocurrency and blockchain technology
  • Social media platforms, including Meta’s apps, X/Twitter, Telegram, TikTok, etc.
  • Tech regulation

The Technology I Use

  • MSI gaming laptops
  • Nvidia graphics cards
  • AMD CPUs
  • MacBook Pro and Air laptops
  • An iPhone from 2019 (though I’m thinking about getting a “dumb phone” like the Light Phone)
  • Nintendo Switch
  • PlayStation 5
  • Freewrite Traveler 
  • At home: Sonos speakers (we have them all over the house), Philips Hue + Ring security products

Read full bio