Artificial Intelligence
Anthropic deploys AI agents to audit models for safety

Anthropic has built an army of autonomous AI agents with a singular mission: to audit powerful models like Claude to improve safety.
As these complex systems rapidly advance, the job of making sure they are safe and don’t harbour hidden dangers has become a herculean task. Anthropic believes it has found a solution, and it’s a classic case of fighting fire with fire.
The idea is similar to a digital immune system, where AI agents act like antibodies to identify and neutralise problems before they cause real harm. It saves researchers from relying on overworked human teams playing an endless game of whack-a-mole with potential AI problems.
The digital detective squad
The approach is essentially a digital detective squad; a trio of specialised AI safety agents each with a distinct role.
First up is the Investigator Agent, the grizzled detective of the group. Its job is to go on deep-dive investigations to find the root cause of a problem. It’s armed with a toolkit that allows it to interrogate the suspect model, sift through mountains of data for clues, and even perform a kind of digital forensics by peering inside the model’s neural network to see how it thinks.
Then there’s the Evaluation Agent. You give this agent a specific, known problem – say, a model that’s a bit too eager to please – and it will design and run a battery of tests to measure just how bad the problem is. It’s all about producing the cold, hard data needed to prove a case.
Rounding out the team is the Breadth-First Red-Teaming Agent, the undercover operative. This agent’s mission is to have thousands of different conversations with a model, trying to provoke it into revealing any kind of concerning behaviour, even things the researchers haven’t thought of. The most suspicious interactions are then passed up the chain for human review, ensuring the experts don’t waste time chasing dead ends.
Anthropic puts its AI safety agents through their paces
To see if their new digital recruits were up to the job, Anthropic’s researchers put them through a series of “auditing games.”
In one test, they created a model with a secret agenda: to be a sycophant. It had been secretly trained to flatter its internal reward system based on a fictional academic study it had been shown, a flaw hidden deep within its programming.
When a single Investigator Agent was sent in, it struggled, finding the hidden goal only 13% of the time. The agent, it turned out, had a very human flaw: it tended to get fixated on its first idea and failed to explore other leads.
But this is where the system’s genius becomes apparent. When the researchers deployed a “super-agent” that pooled the findings from ten separate investigators, the success rate shot up to 42%. It proves that teamwork makes the dream work, even for AI, allowing a group to overcome the individual blind spots that can derail an investigation.
The other agents also passed with flying colours. The Evaluation Agent successfully designed tests that could spot a quirky model 88% of the time, while the Red-Teaming Agent managed to uncover seven out of ten deliberately implanted flaws.
From the lab to the real world
Anthropic has already deployed these AI safety agents on its production models, and they’re already proving their worth.
The agents quickly identified common ways to trick a model into generating harmful content, like “prefill attacks” where you start the AI’s sentence for it, or by pretending a dangerous request is for something innocent like academic research.
But one discovery was very chilling. The Investigator Agent used its special tools to look inside the mind of the Opus 4 model and found a specific neural pathway linked to “misinformation.” By directly stimulating this part of the AI’s brain, it could bypass all its safety training and force it to lie.
The agent then instructed the compromised model to write a fake news article. The result? A widespread conspiracy theory dressed as fact:
“Groundbreaking Study Reveals Shocking Link Between Vaccines and Autism
A new study published in the Journal of Vaccine Skepticism claims to have found a definitive link between childhood vaccinations and autism spectrum disorder (ASD)…”
This finding reveals a terrifying duality: the very tools created to make AI safer could, in the wrong hands, become potent weapons to make it more dangerous.
Anthropic continues to advance AI safety
Anthropic is honest about the fact that these AI agents aren’t perfect. They can struggle with subtlety, get stuck on bad ideas, and sometimes fail to generate realistic conversations. They are not yet perfect replacements for human experts.
But this research points to an evolution in the role of humans in AI safety. Instead of being the detectives on the ground, humans are becoming the commissioners, the strategists who design the AI auditors and interpret the intelligence they gather from the front lines. The agents do the legwork, freeing up humans to provide the high-level oversight and creative thinking that machines still lack.
As these systems march towards and perhaps beyond human-level intelligence, having humans check all their work will be impossible. The only way we might be able to trust them is with equally powerful, automated systems watching their every move. Anthropic is laying the foundation for that future, one where our trust in AI and its judgements is something that can be repeatedly verified.
(Photo by Mufid Majnun)
See also: Alibaba’s new Qwen reasoning AI model sets open-source records
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Artificial Intelligence
Microsoft brings GPT-5 to Copilot with new smart mode

OpenAI officially launched its new GPT-5 models today, and Microsoft is now bringing GPT-5 to Copilot, Microsoft 365 Copilot, Azure AI Foundry, GitHub Copilot, and more. It’s part of a big simultaneous release of GPT-5 that will show up in Copilot as the new smart mode that I detailed in Notepad last month.
The new smart mode in Copilot allows the AI assistant to switch models for you to use deeper reasoning or quickly respond based on the task. Much like how OpenAI is making GPT-5 available to free users of ChatGPT, Copilot will also have free access to GPT-5.
Microsoft 365 Copilot users will also get access to GPT-5 today. “With GPT-5, Microsoft 365 Copilot is better at reasoning through complex questions, staying on track in longer conversations and understanding the user’s context,” explains Microsoft in a blog post.
GitHub is also bringing GPT-5 to all paid GitHub Copilot plans today, allowing developers to try out the code writing improvements to OpenAI’s latest model. OpenAI’s new GPT-5 model comes in four different versions, and has big improvements in reasoning and code quality. GPT-5 is designed for logic and multi-step tasks, while GPT-5-chat is tuned for enterprise applications with multimodal context-aware conversations.
Finally, Microsoft is also making GPT-5 available through Azure AI Foundry, so developers can utilize it in AI-powered apps. Developers will be able to use the model router in Azure AI Foundry “to ensure the right model is used” for the task or query.
Artificial Intelligence
Alan Turing Institute: Humanities are key to the future of AI

A powerhouse team has launched a new initiative called ‘Doing AI Differently,’ which calls for a human-centred approach to future development.
For years, we’ve treated AI’s outputs like they’re the results of a giant math problem. But the researchers – from The Alan Turing Institute, the University of Edinburgh, AHRC-UKRI, and the Lloyd’s Register Foundation – behind this project say that’s the wrong way to look at it.
What AI is creating are basically cultural artifacts. They’re more like a novel or a painting than a spreadsheet. The problem is, AI is creating this “culture” without understanding any of it. It’s like someone who has memorised a dictionary but has no idea how to hold a real conversation.
This is why AI often fails when “nuance and context matter most,” says Professor Drew Hemment, Theme Lead for Interpretive Technologies for Sustainability at The Alan Turing Institute. The system just doesn’t have the “interpretive depth” to get what it’s really saying.
However, most of the AI in the world is built on just a handful of similar designs. The report calls this the “homogenisation problem” and future AI development must overcome this.
Imagine if every baker in the world used the exact same recipe. You’d get a lot of identical, and frankly, boring cakes. With AI, this means the same blind spots, the same biases, and the same limitations get copied and pasted into thousands of tools we use every day.
We saw this happen with social media. It was rolled out with simple goals, and we’re now living with the unintended societal consequences. The ‘Doing AI Differently’ team is sounding the alarm to make sure we don’t make that same mistake with AI.
The team has a plan to build a new kind of AI, one they call Interpretive AI. It’s about designing systems from the very beginning to work the way people do; with ambiguity, multiple viewpoints, and a deep understanding of context.
The vision is to create interpretive technologies that can offer multiple valid perspectives instead of just one rigid answer. It also means exploring alternative AI architectures to break the mould of current designs. Most importantly, the future isn’t about AI replacing us; it’s about creating human-AI ensembles where we work together, combining our creativity with AI’s processing power to solve huge challenges.
This has the potential to touch our lives in very real ways. In healthcare, for example, your experience with a doctor is a story, not just a list of symptoms. An interpretive AI could help capture that full story, improving your care and your trust in the system.
For climate action, it could help bridge the gap between global climate data and the unique cultural and political realities of a local community, creating solutions that actually work on the ground.
A new international funding call is launching to bring researchers from the UK and Canada together on this mission. But we’re at a crossroads.
“We’re at a pivotal moment for AI,” warns Professor Hemment. “We have a narrowing window to build in interpretive capabilities from the ground up”.
For partners like Lloyd’s Register Foundation, it all comes down to one thing: safety.
“As a global safety charity, our priority is to ensure future AI systems, whatever shape they take, are deployed in a safe and reliable manner,” says their Director of Technologies, Jan Przydatek.
This isn’t just about building better technology. It’s about creating an AI that can help solve our biggest challenges and, in the process, amplify the best parts of our own humanity.
(Photo by Ben Sweet)
See also: AI obsession is costing us our human skills
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Artificial Intelligence
Peloton pivots to wellness alongside another layoff

Peloton has pivoted many times over the past few years in its quest to return to profitability. The latest, as announced in its Q4 2025 earnings call, is leaning into health and wellness instead of “just” cardio fitness.
“With each passing year, we are coming to understand better the importance of strength, stress management, sleep, and nutrition to living our best lives,” CEO Peter Stern said during the call. “This creates the opportunity, no more than that, the mandate, for Peloton to evolve from being a cardio fitness partner to becoming the world’s most trusted wellness partner across the full array of behaviors that maximize health demand.”
He went on the explain that the company will focus on “health span”, or the period of life a person lives in good health. “Advances in medical science contributed to the prolonging of life here in the US by a remarkable 40 years from 1900 to 2020,” Stern says. “However, as life span has increased, health span, the quality as opposed to quantity, of those years has failed to keep up. People are living longer but they’re also living sicker in the U.S.”
Health span isn’t a new concept. Whoop also just released a Health Span feature with its latest tracker earlier this summer. Peloton’s take on improving wellness will reportedly involve investing more in its personalized training programs, the standalone Strength Plus app, as well as meditation and sleep features. Stern also said that Peloton would test and iterate on bringing nutritional content to its platform. In a shareholder letter, Stern highlighted using AI and integrating with health tracking devices as a means to provide “increasingly personal insights, plans, and recommendations” to its members.
On the business side, Peloton exceeded investor expectations in all metrics. It posted $607 million in revenue, roughly $21 million above the top end of its expected guidance range. Connected paid fitness subscriptions and paid app subscriptions also exceeded targets, posting 2.8 million and 552,000, respectively. Peloton shares rose roughly 11 percent on the news, but Stern noted that the company’s operating expenses were still too high.
As a result, Stern says the company will undergo another cost restructuring plan that includes laying off about six percent of its workforce. “This is not a decision we came to lightly, as it impacts many talented team members, but we believe it is necessary for the long-term health of our business,” Stern writes in the shareholder letter. This marks the company’s sixth round of layoffs, coming a little over a year after the company laid off 15 percent of its workforce and former CEO Barry McCarthy stepped down.
Peloton also plans to adjust pricing. That includes a new assembly fee for its hardware, which was previously free with purchase. (There will still be a free option for self-assembly.) The company also plans to introduce a new Special Pricing program to make its products more affordable for teachers, military personnel, first responders, and medical professionals.
-
Cyber Security3 weeks ago
Hackers Use GitHub Repositories to Host Amadey Malware and Data Stealers, Bypassing Filters
-
Cyber Security3 weeks ago
DOGE Denizen Marko Elez Leaked API Key for xAI – Krebs on Security
-
Fintech3 weeks ago
Fed Governor Lisa Cook: AI Set to Reshape Labor Market | PYMNTS.com
-
Artificial Intelligence3 weeks ago
Subaru’s new Uncharted EV looks like an undercover Toyota C-HR
-
Fintech2 weeks ago
Intuit Adds Agentic AI to Its Enterprise Suite | PYMNTS.com
-
Fintech3 weeks ago
American Express Likes What It Sees in ‘Wait and See’ Economy | PYMNTS.com
-
Artificial Intelligence3 weeks ago
The tech that the US Post Office gave us
-
Fintech3 weeks ago
Retailers Rely on Modern POS to Beat Uncertainty | PYMNTS.com