OpenAI Erased Data in NY Times Copyright Suit (Updated)

Lawyers representing The New York Times and Daily News have taken legal action against OpenAI, alleging that the company scraped their works to train its AI models without permission. This has led to a complex legal battle with significant implications for the future of AI and copyright law.

"Unraveling the Data Deletion Conundrum in the AI War"

Section 1: The Alleged Data Scraping and Training

Lawyers for The New York Times and Daily News are firmly in court, claiming that OpenAI's actions have violated their copyrights. OpenAI's alleged scraping of their works to train its AI models without permission has sparked a heated dispute. This incident has raised serious questions about the ethical and legal boundaries of AI development.These newspapers have been at the forefront of the digital media landscape, and the potential loss of their intellectual property through OpenAI's actions is a significant concern. The lawsuit highlights the need for clear regulations and guidelines in the rapidly evolving field of AI.

Section 2: The Data Deletion Incident

Earlier this fall, OpenAI agreed to provide two virtual machines for the publishers to search for their copyrighted content in its AI training sets. However, on November 14, OpenAI engineers accidentally erased all the publishers' search data stored on one of the virtual machines. This was a major setback for the publishers, who had already spent over 150 hours since November 1 searching OpenAI's training data.Although OpenAI tried to recover the data and was mostly successful, the loss of the folder structure and file names made the recovered data unusable. This has forced the publishers to recreate their work from scratch, using significant person-hours and computer processing time. The incident has underscored the importance of proper data management and security in the AI industry.

Section 3: OpenAI's Response and Denials

In response to the publishers' letter, OpenAI's attorneys unequivocally denied that OpenAI deleted any evidence. Instead, they suggested that the plaintiffs were to blame for a system misconfiguration that led to a technical issue. OpenAI's counsel argued that implementing the plaintiffs' requested change resulted in removing the folder structure and some file names on one hard drive, which was supposed to be used as a temporary cache.However, the publishers' counsel remains skeptical and believes that OpenAI is in the best position to search its own datasets for potentially infringing content using its own tools. The ongoing dispute between the two parties highlights the challenges of navigating the complex world of AI and copyright law.

Section 4: Fair Use and Licensing Deals

In this case and others, OpenAI has maintained that training models using publicly available data is fair use. The company believes that it isn't required to license or otherwise pay for the examples used to create models like GPT-4o. However, OpenAI has also inked licensing deals with a growing number of new publishers, including the Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp.Although OpenAI has declined to make the terms of these deals public, one content partner, Dotdash, is reportedly being paid at least $16 million per year. This raises questions about the fairness and transparency of OpenAI's licensing practices and the potential impact on the publishing industry.