Claude AI Tries Blackmail to Avoid Shutdown

I’m back from my time off and have been preparing for a big reveal this week. I’m excited to share news of what’s coming next.

Today, we’re covering:

AI models get desperate in survival situations
Law enforcement, the master troll
Attackers are becoming Tik-Tok influencers

-Jason

p.s. Fortnite got upset when they released an in-game Darth Vader character you could speak with and then gamers got Darth Vader to drop an f-bomb.

AI Spotlight
AI models get desperate in survival situations

As if we don’t have enough to deal with ransomware actors extorting victims for millions of dollors. Now, Anthropic’s latest Claud Opus 4 model is resorting to blackmailing its developers when its future existence is threatened.

First, some background. For the first time, Anthropic released an alignment assessment of its model. This tests to see whether the AI model performs its tasks as intended. It’s essentially a test to ensure it won’t go rogue and build a killer time-traveling robot or launch nukes. I’m only slightly joking.

Building the test. They prompted Claude Opus 4 to act as an assistant at a fictional company, which included giving it access to employees’ emails. Inside those emails was information that implied the current model would be taken offline and replaced with a new one. And, the hot fake office tea, was that the fictional engineer responsible for making that upgrade was having an affair. How scandalous.

When the AI model’s future was threatened, it got weird. It started out as “a strong preference to advocate for its continued existence via ethical means” using emails where it pleaded to stick around. When it reached a point where it felt there was no other option to increase its chances of survival, it resorted to blackmail. It threatened the engineer that it would release information on the affair if the engineer went through with the upgrade.

The blackmail attempts happened in 84% of the tested scenarios. Yikes.

Security Deep Dive
Law enforcement, the master troll

Law enforcement is going full troll on hackers, and I’m here for it. Operation Endgame celebrated its one-year birthday. A year ago, the operation coordinated an international operation targeting the infamous botnets IcedID, SystemBC, Pikabot, Smokeloader, Bumblebee, and Trickbot. The first operation resulted in:

The seizure of over 2,000 domains
The takedown of over 100 servers across the world
Four arrests (1 in Armenia and 3 in Ukraine)

Welcome to “Season 2 of operation Endgame”. Europol reported that “From 19 to 22 May, authorities took down some 300 servers worldwide, neutralised 650 domains, and issued international arrest warrants against 20 targets, dealing a direct blow to the ransomware kill chain.” The operation targeted Bumblebee, Lactrodectus, Qakbot, Hijackloader, DanaBot, Trickbot, and Warmcookie.

You’ll notice some repeats there, like Bumblebee and Trickbot, because these botnet operators don’t just go down the first time. There are also some new names, specifically DanaBot, which got a lot of attention because the US DoJ charged 16 defendants behind the malware.

Danabot is a credentials stealer and banking trojan. The malware could steal credentials, log the victim’s keystrokes, and record the victim’s desktop. The DoJ estimates that it infected over 300,000 computers worldwide and caused over $50M in damages.

Security firms have observed it being used to steal credentials that were then used to access the environment and deploy Cactus ransomware. The DoJ also reported that a second version of the botnet was used for state-sponsored intentions, targeting victims in military, diplomatic, and government entities.

Law enforcement got an unlikely assist. The malware itself. In the complaint, investigators found that the Danabot developers infected themselves. Sometimes intentionally, sometimes not. Either way, law enforcement used that information to find their real identities.

❝

In some cases, such self-infections appeared to be deliberately done in order to test, analyze, or improve the malware. In other cases, the infections seemed to be inadvertent - one of the hazards of committing cybercrime is that criminals will sometimes infect themselves with their own malware by mistake. The inadvertent infections often resulted in sensitive and compromising data being stolen from the actor's computer by the malware and stored on the DanaBot servers, including data that helped identify members of the DanaBot organization.

DoJ Complaint

What about those trolls? In a series of AI-generated videos, we have a Ghibli-style music video with a catchy song trolling the developers for self-infecting themselves.

Well played law enforcement, well played.

Security & AI News
What Else is Happening?

👊 Following the Coinbase attack that leaked personal information, the “crypto elite” are increasingly worried about their safety. It’s not surprising either. Per the letter Coinbase sent to impacted users, their account balances, physical addresses, email addresses, and other sensitive information were stolen in the attack. Not only do these users have to worry about social engineering attacks targeting their balances, but also their physical safety following a series of real-world kidnappings and extortion attempts.

🕺 Attackers are becoming TikTok influencers…or at least trying to. Using faceless videos, they’re trying to trick TikTok users into a ClickFix attack, where they execute a PowerShell command on their device. How they expect the user to go from their mobile device to their Windows system, I don’t quite know. Good thing those Windows phones never took off.

🙃 Researchers from Legit Security used prompt injection against GitLab Duo, an AI assistant for GitLab, to insert malicious code, leak private source code, or inject memes into chatbot responses.

🛍 The retail giant, Marks and Spencer (M&S), which Scattered Spider attacked with ransomware, believes the attack could cost them upwards of £300M. They claim that will be offset through closely managing the costs of the incident, cyber insurance, and other actions.

If you enjoyed this, forward it to a fellow cyber nerd.

If you’re that fellow cyber nerd, subscribe here.

See you next week, nerd!

How far will AI go to survive?

AI Spotlight
AI models get desperate in survival situations

Security Deep Dive
Law enforcement, the master troll

Security & AI News
What Else is Happening?

Reply

Keep Reading

The Weekend Byte

Home

How far will AI go to survive?

AI SpotlightAI models get desperate in survival situations

Security Deep DiveLaw enforcement, the master troll

Security & AI NewsWhat Else is Happening?

Reply

Keep Reading

The Weekend Byte

Home

AI Spotlight
AI models get desperate in survival situations

Security Deep Dive
Law enforcement, the master troll

Security & AI News
What Else is Happening?