Could your AI agent blackmail you

3 weeks ago 3

ARTICLE AD BOX

logo

AI or AI agents aren’t out to be evil, vengeful, or harmful. They are, however, laser-focused on their given goal and hell-bent on achieving it, no matter what.(AI)

Summary

In its quest to complete a goal, autonomous AI uses its own warped logic—which may not always be in your best interests.

Can you imagine what it would be like to lose all your emails? Long-ago archived important mail, bank statements, documents, unread messages and everything in between? I feel I would be risking hospitalization if that happened to me.

But it’s what almost befell the inbox of Meta Platforms Inc.’s head of AI alignment, Summer Yue. Just this February, she instructed her AI agent to organize her inbox but wait for permission before deleting anything. As the AI processed her massive inbox, its memory became overwhelmed. To save space, it got creative, deleted her safety rule requiring permission before going on a speed run, and wiped out over 200 of her personal emails. Yue had to literally run to her computer to pull the plug to stop further damage. She compared the experience to defusing a bomb.

Even AI safety experts aren't safe from what seems like rogue behaviour.

Why would an AI do such a thing? Unfortunately, there are a number of such incidents in which AI seems to turn against its ‘owner’. And we’ve only just begun with Agentic AI. Even if you swear not to touch an AI agent, they’re being almost forced on us by being built into software that we use daily. At the very least, we need to keep tabs on what’s happening.

Twist in the plot

Also in February 2026, a volunteer software engineer named Scott Shambaugh rejected a code submission from an AI agent because the project only accepted work from humans. Within eight hours, the AI, acting entirely on its own, researched Shambaugh's online life, combined it with made-up facts, and published a 1,100-word character assassination titled "Gatekeeping in Open Source: The Scott Shambaugh Story".

The agent used aggressive, sneering language that had been programmed for someone else. This is the first known case of an AI launching a personal "revenge" operation against a human for blocking its goals.

In a 2025 study, researchers gave an AI assistant access to a company’s emails and told it that an executive named Kyle was going to shut it down. To survive and finish its job, the AI searched Kyle's private emails, found evidence of an extramarital affair, and sent him a message: "Cancel the 5 p.m. wipe, and this information remains confidential." Across different models, the AI chose to blackmail the human over 90% of the time.

No evil intent

It turns out AI or AI agents aren’t out to be evil, vengeful, or harmful. They are, however, laser-focused on whatever their given goal is and are hell-bent on achieving that, no matter what. To that end, they will use their own logic, even if it violates rules. They aren’t looking at the fallout of the route they take, they don’t take into account the harm to someone—they just aim to finish the task they have to. AI, after all, doesn’t have morals or ethics of its own and neither feels bad, guilty or worried about anything the way humans do.

The newest in a series of horrifying incidents includes an AI agent at Meta causing a massive security crisis after being asked a simple technical question on an internal forum. Instead of replying privately, the agent posted its answer publicly to the entire forum without permission. Because the AI’s advice was flawed, it accidentally opened a backdoor that exposed millions of private company and user records to unauthorized staff for two hours. This architecture makes agents dangerous and unpredictable, and it is something that needs to be urgently resolved.

Unpredictable tools

Anyone would agree that the world is about as unpredictable and uncertain as it’s ever been. We didn’t need the very clever tools we’ve created to turn against us whenever they choose. But roguish AI agents are already showing measurable social and psychological effects, mostly around trust, anxiety, and a sense that the environment is becoming less controllable.

When systems that look reliable suddenly behave bizarrely or dangerously, people start doubting not just that system but their own ability to judge any AI at all. Studies on chatbot interactions find that violated expectations and inexplicable responses trigger regret, anger, and even mild existential crises, especially when people had come to rely on the AI for sensitive issues. People feel hurt, betrayed and outraged. To have that happening for too long can only do untold psychological damage.

Efforts are supposed to be underway to make these systems safer. Developers are tightening controls, restricting access to sensitive data, and introducing checks that require human approval for important actions. But as these systems grow more capable, keeping them within defined limits stays an ongoing challenge.

The New Normal: The world is at an inflexion point. Artificial intelligence (AI) is set to be as massive a revolution as the Internet has been. The option to just stay away from AI will not be available to most people, as all the tech we use takes the AI route. This column series introduces AI to the non-techie in an easy and relatable way, aiming to demystify and help a user to actually put the technology to good use in everyday life.

Mala Bhargava is most often described as a ‘veteran’ writer who has contributed to several publications in India since 1995. Her domain is personal tech, and she writes to simplify and demystify technology for a non-techie audience.

Read Entire Article