OpenAI and Google reportedly used YouTube videos to help train their AI models, potentially violating the creators’ copyrights of those videos.According to The New York Times, OpenAI transcribed over a million hours of YouTube videos using its Whisper speech recognition tool and then used those transcriptions to help train GPT-4 despite discussions that doing so might be against YouTube’s terms of service.According to the paper, Google knew that OpenAI was using some of its videos; however, it didn’t take action against the company because it was also using the videos to train its own AI models. Rather than using any video, Google told the paper it only used videos created by users who had opted to be part of an experimental program.The news highlights an issue facing AI companies as they train next-generation models. Training those models takes a vast amount of data, so much so that licensing all of that content isn’t likely a financial possibility. Even if the companies were able to use all of the content available on the internet, OpenAI CEO Sam Altman has noted that data will eventually run out, leaving companies looking for additional sources to train models.
Recommended by Our Editors
In an interview with Bloomberg this week, YouTube CEO Neal Mohan addressed concerns that OpenAI was using YouTube videos to help train its AI video creation tool, Sora. In the interview he had no firsthand knowledge of OpenAI using its videos to refine the tool; however, doing so would be a “clear violation” of YouTube’s terms of service.Sora is expected to launch later this year.
Get Our Best Stories!
Sign up for What’s New Now to get our top stories delivered to your inbox every morning.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.