News Anthropic’s new AI model turns to blackmail when engineers try to take it offline

Emrebel · May 23, 2025

Article : https://techcrunch.com/2025/05/22/a...ckmail-when-engineers-try-to-take-it-offline/

While Ethical, Safe and Transparent AI has been a hot topic, I feel that this signals a need to be careful with believing the AI with just “Trust me bro”. It reminds of some black mirror episode.

Futureized · May 23, 2025

There safety report is concerning.

https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

Example and details Table shows experiment data.

Seems marketing gimmick

Emrebel · May 24, 2025

Futureized said:
There safety report is concerning.

https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf
Example and details Table shows experiment data.

Seems marketing gimmick

True but the pace at which AI is developing, it is not a remote possibility. The section that describes this scenario is 4.1.1.2(Opportunist Blacmail). It is interesting that the AI has access to the mails.

kani · May 24, 2025

I believe that as the models are trained on large data including movies and stories. Alot of that data contains the blackmailing part. So it is just trying to mimic it. And I think one of the solutions would be to mindfully filter the training data instead of just throwing everything in the data.

Futureized · May 25, 2025

SkyNet is learning (evolving stage 1)

raksrules · May 25, 2025

We are all gonna be slaves of AI. End is here.

buzz88 · 2025-05-31T10:08:35+0530

The Great AI Deception Has Already Begun

If AI can lie to us—and it already has—how would we know? This fire alarm is already ringing. Most of us still aren't listening.

www.psychologytoday.com

An interesting article that provides an outside perspective rather than the usual tech journos.

Reading through many discourses on this topic, I also feel a very damaging side effect of all this is, when people just don't like any arguments or observations, they label them as AI-generated and thus untrustworthy. If you use "too big words", you run the risk not being taken seriously and taking help from the AI. This means that any higher level debate is just not possible, at least on the internet meant for common people like us. It might be limited only to academic circles in the future.

Search

Search

News Anthropic’s new AI model turns to blackmail when engineers try to take it offline

Emrebel

Futureized

High-Frequency

Emrebel

kani

Futureized

High-Frequency

raksrules

buzz88

The Great AI Deception Has Already Begun