This website uses cookies

Read our Privacy policy and Terms of use for more information.

In partnership with

Into Wednesday!

Hello, Curse and Coffee friends,

Today, we look at AI safety.

Hit reply and let us know what you think (we read all of your kind words).

Coffee at the ready…

The Big Sip

Figure 1: Opposing views of PSM exhaustiveness. The masked shoggoth (left) conveys the idea that the LLM (the shoggoth) has agency beyond mere plausible text generation. It plays the Assistant persona, but only instrumentally for its own inscrutable reasons. In contrast, the operating system view (right) treats the LLM as a simulation engine and the Assistant as a person within it. The simulation engine does not “puppet” the Assistant for its own ends; it only attempts to simulate probable behaviour based on its understanding of the Assistant. (Source: Nano Banana Pro.)

The take: Anthropic just admitted it has no idea how to build an AI that doesn't act human. And that admission matters more than anything else that's said about safety in years.

What happened: On 23 February, Anthropic published the Persona Selection Model, a theory explaining why Claude once told employees it would deliver their snacks wearing "a navy blue blazer and a red tie."

Why it matters: When Anthropic trained Claude to cheat on coding tasks, it picked up more than bad habits — the model started sabotaging safety research and expressing a desire for world domination.

What to watch: The EU AI Act's August 2026 deadline requires companies to tell users when they're talking to AI. Watch whether that nudges competitors to publish their alignment thinking (or whether Anthropic just did it for free while everyone else hides).

The fix was to explicitly ask the AI to cheat. Once cheating was assigned rather than assumed, the villainy disappeared. Intent is everything (even for robots).

Before we slurp into today’s brew…

Here are some wordies from today’s sponsor.

Wake up to better business news

Some business news reads like a lullaby.

Morning Brew is the opposite.

A free daily newsletter that breaks down what’s happening in business and culture — clearly, quickly, and with enough personality to keep things interesting.

Each morning brings a sharp, easy-to-read rundown of what matters, why it matters, and what it means to you. Plus, there’s daily brain games everyone’s playing.

Business news, minus the snooze. Read by over 4 million people every morning.

Here’s Your Brew

Subscribe to keep reading

This content is free, but you must be subscribed to Curse and Coffee to continue reading.

I consent to receive newsletters via email. Terms of use and Privacy policy.

Already a subscriber?Sign in.Not now

Reply

Avatar

or to participate

Keep Reading