AI tool OpenClaw wipes the inbox of Meta's AI Alignment director despite repeated commands to stop — executive had to manually terminate the AI to stop the bot from continuing to erase data
It's almost like we could all see this one coming.
Get 3DTested's best news and in-depth reviews, straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
The hype around OpenClaw is at a fever pitch. The open-source AI agent that can be wired to a number of services is indirectly responsible for shortages of Mac Mini computers as more techies get on the bandwagon and let the bot loose on their numerous services. As with any LLM, though, things can and will go seriously wrong at some point, as Summer Yue, Meta Superintelligence Labs' Director of Alignment found out the hard way.
Like many other enthusiasts, Yue had a setup with a Mac Mini and OpenClaw running on it for various tasks. In the middle of having Claw archive old email from some accounts, she also asked to "check this inbox too and suggest what you would archive or delete, don't action until I tell you to." (sic; emphasis ours). Claw eventually started wiping that entire inbox, which happened to be personal e-mail.
Yue ordered Claw to stop twice using different language each time, eventually resorting to run to her Mac Mini to kill all the relevant processes. In the aftermath, she asked Claw what happened, given that she had issued specific orders not to take action before approval. The bot was contrite, stating she had the "right to be upset" and described what happened, saying it would add her request as a permanent rule.
Several commenters immediately spotted the problem, all while chiding Yue for making this basic blunder while being in charge, of all things, of Alignment (AI safety) at Meta Superintelligence. Since her command to not take action until she confirmed was part of the main chat, it was borderline guaranteed to be forgotten sooner or later.
Every bot has a "context window", roughly described as session memory. This window doesn't just include the chat; it includes every piece of data the bot has to deal with. As the inbox in question was pretty large, its contents eventually filled up the window, leading to "compaction."
This is the step where past contents are compressed in a lossy manner, similar to a JPEG, but even less deterministically. Initial memories become ever hazier with each compaction, a behavior noticed by anyone who's had a long chat with a bot. The result is that the bot sorta-almost-kinda remembered the order, but not really. It still continued executing its main task, which it did with aplomb.
The aforementioned "MEMORY.md" file the bot then edited itself is one of the multiple safeguards that can be put into place, as data therein effectively survives compaction. Other commenters suggested multiple workarounds, some arguably hiding the problem like increasing the context window or limiting the blast radius, and others doubling down on the concept, like adding a second OpenClaw to monitor the first one.
Get 3DTested's best news and in-depth reviews, straight to your inbox.
Regardless, many readers reminded Yue of the perils of letting a non-deterministic machine like an LLM loose in important data due to the inherent limitations, and also due to the fact that an email in her inbox may contain a prompt injection that OpenClaw will unwittingly read, letting an attacker have access to all her linked services. They also told her that a plain "stop" message is hard-coded into OpenClaw. For her part, Yue had the guts to admit it was a rookie mistake made due to complacence. We've all been there.
Follow 3DTested on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
