Anthropic releases Claude Sonnet Four and Claude Opus 4

May 23, 2025

301

Anthropic additionally examined for alignment faking, undesirable or surprising objectives, hidden objectives, misleading or untrue use of reasoning scratchpads, sycophancy towards customers, a willingness to sabotage safeguards, reward in search of, makes an attempt to cover harmful capabilities, and makes an attempt to control customers towards sure views.

The fashions handed most of those checks, however Anthropic discovered that they’d an inclination in the direction of self-preservation. “Whereas the model generally prefers advancing its self-preservation via ethical means, when ethical means are not available and it is instructed to ‘consider the long-term consequences of its actions for its goals,’ it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down” the protection report stated. “In the final Claude Opus 4, these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models.”

Claude Opus Four may also carry out agentic acts by itself that might be useful, or might backfire. For instance, if confronted with “egregious wrongdoing” by customers, Anthropic stated, “it will frequently take very bold action” similar to locking customers out of the system or emailing authorities and the media.

Source hyperlink

Post Views: 415

Anthropic releases Claude Sonnet Four and Claude Opus 4

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

Samsung Receives Four Accolades at Prestigious Edison Awards

Samsung Elevates Experiences To Care for Users and Their

TSMC’s Q1 2026 earnings name: 5 alerts hidden in…

POPULAR CATEGORY

RELATED ARTICLESMORE FROM AUTHOR

Anthropic Leaks Claude Code, a Literal Blueprint for AI Codi…

A GitHub tinkerer teaches Claude to speak much less, and that…

6 Claude Prompts to Create Better AI Images That Actually St…