OpenAI has often been accused of illegally using YouTube videos to train its video-generating AI tool, Sora, and every time they have been asked about it, their answers have been dodgy at best.
OpenAI’s Chief Technology Officer, Mira Murati, has often presented a befuddled and evasive expression when asked if Sora’s training was done using YouTube videos. OpenAI, for its part, has always maintained that the AI model was developed using publicly available and licensed videos, although it has always dodged explaining what it means by ‘publically available’ videos.
Addressing questions on what would Alphabet do if Sora was indeed trained on YouTube videos, Google CEO Sundar Pichai assured that he would address the issue, and “sort things out,” should there be any substance to the claim.
Sora made quite the headlines for its groundbreaking capability to generate realistic videos from mere text input. However, the AI service has found itself at the centre of controversy since the very beginning, because of speculation on how it was trained and developed.
Following recent revelations from a New York Times report indicating that OpenAI transcribed over a million hours of YouTube content to train Sora, CNBC questioned Sundar Pichai about potential violations of Google’s terms and conditions.
Pichai acknowledged the issue, emphasizing Google’s commitment to upholding clear terms of service and suggesting a customary approach of engaging with companies to ensure compliance.
However, the controversy surrounding OpenAI goes beyond possible terms of service breaches. The AI startup faces legal disputes, including a lawsuit from The New York Times alleging copyright infringement. The lawsuit claims that OpenAI used copyrighted content without authorization or compensation.
Impact Shorts
More ShortsBut The New York Times isn’t the only one taking legal action. The Authors Guild in the US has also filed a lawsuit against OpenAI, citing similar copyright concerns. The Guild argues that OpenAI’s use of copyrighted material extends beyond isolated instances, encompassing a vast dataset of copyrighted works.
This controversy highlights the complexities that AI companies have to deal with when it comes to data usage and intellectual property rights. It also underscores the importance of ethical considerations and legal compliance in advancing AI innovation.


)

)
)
)
)
)
)
)
)
