Apple accused of “stealing” YouTube videos from MKBHD & MrBeast to train AI
Apple/Getty/MKBHD/Mr Beast/Printerval/FreerangeShackApple, Nvidia, and other tech companies have been accused of using YouTube videos from creators like Marques Brownlee, MrBeast, and more to train AI models. Creators claim their videos were used without their knowledge.
Wired and Proof News investigations have found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Apple, Nvidia, and Salesforce for training their AI models.
Creators affected include, but are not limited to: Marques Brownlee, MrBeast, PewDiePie, Stephen Colbert, John Oliver, and Jimmy Kimmel.
“An investigation by Proof News found some of the wealthiest AI companies in the world have used material from thousands of YouTube videos to train AI. Companies did so despite YouTube’s rules against harvesting materials from the platform without permission,” Wired reports.
The data scraping was reportedly performed by a non-profit called EleutherAI, which says it helps developers train AI models. According to a research paper published by EleutherAI, the data is part of a compilation called the Pile.
Pile is accessible and open for anyone on the internet with enough space and computing power to use. Wired has found that companies like Apple, Nvidia, and Salesforce have all used Pile to train AI.
It’s worth noting that no graphics were used from the YouTube videos for training, just subtitles. However, the subtitle files are effectively transcripts of the video content. Dexerto has reached out to Nvidia for comment.
“I pay a service (by the minute) for more accurate transcriptions of my own videos, which I then upload to YouTube’s back-end. So companies that scrape transcripts are stealing *paid* work in more than one way,” said MKBHD on X after the Wired report was published.