OpenAI and Reddit: A Game-Changing Partnership for AI
OpenAI has struck a groundbreaking deal with Reddit to harness the social news site’s data for training its AI models.
In a recent blog post, OpenAI announced that the collaboration will grant it access to Reddit’s “real-time, structured, and unique content” — including posts and comments. This partnership will enable OpenAI’s tools and models to “better understand and showcase” Reddit’s rich content. The data will be integrated into ChatGPT, OpenAI’s popular conversational AI, and both companies will work together to develop new, yet-to-be-disclosed “AI-powered features” for Reddit users and moderators.
Additionally, OpenAI will become an advertising partner with Reddit.
“Reddit will be leveraging OpenAI’s advanced AI models to realize its ambitious vision,” OpenAI stated in the post. “Using large language models (LLMs), machine learning (ML), and artificial intelligence (AI), Reddit aims to enhance the user experience for everyone.”
OpenAI has several similar licensing deals with various content providers, but this one stands out. Sam Altman, OpenAI’s CEO, owns an 8.7% stake in Reddit, making him the third-largest shareholder and a former board member of the company.
To address potential concerns, OpenAI clarified in its press release that despite Altman’s ongoing stake in Reddit, the partnership “was led by OpenAI’s COO, Brad Lightcap” and “approved by OpenAI’s independent board of directors.” Altman, a member of OpenAI’s board, recused himself from this decision.
Reddit has increasingly relied on data licensing agreements as a key component of its growth strategy in the public market. According to its IPO prospectus, Reddit has data licensing contracts with several major customers, including Google, amounting to over $200 million. In its first earnings report as a public company, Reddit highlighted a 450% year-over-year increase in non-ad revenue, mainly due to these agreements.
Following the OpenAI deal announcement, Reddit’s stock surged 11% in after-hours trading.
“The paradox I see is that, as more content on the internet is written by machines, there’s an increasing premium on content that comes from real people,” Reddit CEO Steve Huffman remarked during the company’s March earnings call. “And we have nearly two decades of authentic conversation.”
Reddit’s platform, boasting over 1 billion posts and more than 16 billion comments, continuously grows thanks to its hundreds of millions of active users. This vast repository is a gold mine for generative AI companies, whose models rely on such content to learn and create new, similar material.
However, Reddit could face backlash from users wary of how their data is being monetized. A relevant case is Stack Overflow, the Q&A forum for developers, which recently entered a similar agreement with OpenAI. In response, some users deleted their top-rated answers in protest. Stack Overflow restored the posts and banned those users, citing non-compliance with its terms of service.
Reddit has also taken a firm stance against attempts to give users more control over their data. Vana, a blockchain-based startup, is trying to establish a data “DAO” (Digital Autonomous Organization) to let Reddit users pool and collectively decide the use of their data. Reddit banned Vana’s subreddit discussing the DAO, accusing the company of “exploiting” its data export controls.
This partnership with OpenAI marks a significant step in the intersection of AI and user-generated content, promising exciting advancements but also posing important questions about data privacy and monetization.