News / AI

Reddit is reportedly selling data for AI training

By Admin on February 26, 2024

Reddit is reportedly selling data for AI training image

Reddit, the popular online forum known for its diverse communities and wide-ranging discussions, finds itself embroiled in another controversy.


A Bloomberg report alleges that Reddit has signed a $60 million content licensing deal with an undisclosed major AI company, allowing the company's vast trove of user-generated content to be used for training AI models. This move comes just ahead of Reddit's potential $5 billion initial public offering (IPO) and could be interpreted as an attempt to showcase additional revenue streams to prospective investors.


Concerns and Potential Implications

While the deal remains unconfirmed by Reddit, it has sparked significant concerns within the user community. Reddit's platform thrives on user-generated content (UGC), encompassing everything from posts and comments within popular subreddits to discussions held by both renowned and anonymous users. The potential use of this diverse and often personal data to train AI models raises important ethical questions.


If true, this deal could have profound implications:

  • Training and Enhancing AI: Reddit's data could be used to train and improve existing Large Language Models (LLMs), potentially impacting their capabilities in areas like language generation, translation, and question answering.
  • Fueling New AI Systems: The data could also serve as the foundation for building entirely new generative AI systems, influencing future applications and potentially impacting various aspects of society.


User Discontent and Ethical Concerns

However, this potential financial gain may come at a cost. Reddit has faced growing user dissatisfaction regarding recent business decisions. Notably, last year's attempt to monetize its APIs through user charges resulted in widespread community protests and even threats of data leaks. Additionally, the removal of private chat logs, implementation of automatic moderation, and elimination of ad personalization options have further fueled user discontent.


Fueling the Debate on Data Ethics

This latest controversy reignites the ongoing debate surrounding the ethical use of user data for AI training. Concerns regarding the potential for bias, privacy violations, and the lack of user consent are prevalent across various industries and platforms, and Reddit's decision is likely to further amplify these discussions.