Home IT Info News Today AI Controversy: OpenAI Accused By O’Reilly of Training AI on…

IT Info News Today

AI Controversy: OpenAI Accused By O’Reilly of Training AI on…

April 11, 2025

219

Image: James Tamim/Creative Commons

eWEEK content material and product suggestions are editorially unbiased. We could generate income while you click on on hyperlinks to our companions. Learn More.

Meta lately confronted accusations of coaching its AI fashions on pirated content material; now, OpenAI finds itself entangled in an analogous controversy. A brand new examine claims that one among OpenAI’s newest giant language fashions (LLMs) was skilled on personal, copyrighted-projected materials from O’Reilly Media. Specifically, authors of the examine counsel that OpenAI’s growth groups could have skilled one among their most superior fashions on restricted content material with out authorization.

The examine’s authors wrote, partly: “Although the evidence present here on model access violations is specific to OpenAI and O’Reilly Media books, this is likely a systematic issue.”

Examining the accusations

The examine was written by a staff with O’Reilly Media, together with CEO Tim O’Reilly. It explicitly claims that OpenAI, one among immediately’s high AI corporations, is coaching one among its most up-to-date AI fashions on content material that’s locked behind a paywall by means of O’Reilly Media’s official channels.

The authors of the examine titled “Beyond Public Access in LLM Pre-Training Data” began with 34 copyrighted books from O’Reilly Media, together with content material that was publicly out there and paywalled. Next, they utilized the DE-COP membership inference assault technique, which is a manner of figuring out whether or not an AI mannequin has already memorized a particular textual content, to analyze varied kinds of AI fashions from OpenAI.

The staff additionally assigned an Area Under the Receiver Operating Characteristic (AUROC) rating to every LLM. This rating measures the probability that these AI fashions have been skilled utilizing a number of of the 34 copyrighted books from O’Reilly Media.

GPT-4o: Demonstrates stronger recognition of personal content material from O’Reilly Media (AUROC rating: 82%) than public content material (AUROC rating: 64%).
GPT-3.5 Turbo: Demonstrates barely stronger recognition of public content material from O’Reilly Media (AUROC rating: 64%) than personal (AUROC rating: 54%).
GPT-4o Mini: No indication the mannequin was skilled on public or personal content material from O’Reilly Media.

Reading the advantageous print

While their examine initially absolves GPT-4o Mini of any infringement, the examine notes that this may very well be a results of the AI mannequin’s smaller scale and its lack of ability to recollect as a lot textual content as GPT-4o and different generative AI instruments. Their examine additionally expresses some uncertainty surrounding the AUROC scores, noting that these are supposed to be taken as estimates.

The examine concludes by suggesting that present AI coaching strategies could quickly result in an “extractive dead end.” By failing to compensate the copyright house owners and content material creators, AI builders will in the end see diminished content material high quality, accuracy, and variety.

Source hyperlink

Post Views: 319

AI Controversy: OpenAI Accused By O’Reilly of Training AI on…

Examining the accusations

Reading the advantageous print

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

A Smarter Alternative to Traditional Posters – Samsung

Anthropic Leaks Claude Code, a Literal Blueprint for AI Codi…

Elite Gamers Compete Live on Advanced OLED Displays at

POPULAR CATEGORY

Examining the accusations

Reading the advantageous print

RELATED ARTICLESMORE FROM AUTHOR

ChatGPT Uninstalls Surge 295% After OpenAI Accepts Pentagon …

OpenAI launches stateful AI on AWS, signaling a management…

OpenAI launches Codex app as enterprises weigh…

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

A Smarter Alternative to Traditional Posters – Samsung

Anthropic Leaks Claude Code, a Literal Blueprint for AI Codi…

Elite Gamers Compete Live on Advanced OLED Displays at

POPULAR CATEGORY

RELATED ARTICLES MORE FROM AUTHOR