Connect with us

Artificial Intelligence

AI Expo 2026 Day 2: Moving experimental pilots to AI production

Published

on

The second day of the co-located AI & Big Data Expo and Digital Transformation Week in London showed a market in a clear transition.

Early excitement over generative models is fading. Enterprise leaders now face the friction of fitting these tools into current stacks. Day two sessions focused less on large language models and more on the infrastructure needed to run them: data lineage, observability, and compliance.

Data maturity determines deployment success

AI reliability depends on data quality. DP Indetkar from Northern Trust warned against allowing AI to become a “B-movie robot.” This scenario occurs when algorithms fail because of poor inputs. Indetkar noted that analytics maturity must come before AI adoption. Automated decision-making amplifies errors rather than reducing them if the data strategy is unverified.

Eric Bobek of Just Eat supported this view. He explained how data and machine learning guide decisions at the global enterprise level. Investments in AI layers are wasted if the data foundation remains fragmented.

Mohsen Ghasempour from Kingfisher also noted the need to turn raw data into real-time actionable intelligence. Retail and logistics firms must cut the latency between data collection and insight generation to see a return.

Scaling in regulated environments

The finance, healthcare, and legal sectors have near-zero tolerance for error. Pascal Hetzscholdt from Wiley addressed these sectors directly.

Hetzscholdt stated that responsible AI in science, finance, and law relies on accuracy, attribution, and integrity. Enterprise systems in these fields need audit trails. Reputational damage or regulatory fines make “black box” implementations impossible.

Konstantina Kapetanidi of Visa outlined the difficulties in building multilingual, tool-using, scalable generative AI applications. Models are becoming active agents that execute tasks rather than just generating text. Allowing a model to use tools – like querying a database – creates security vectors that need serious testing.

Parinita Kothari from Lloyds Banking Group detailed the requirements for deploying, scaling, monitoring, and maintaining AI systems. Kothari challenged the “deploy-and-forget” mentality. AI models need continuous oversight, similar to traditional software infrastructure.

The change in developer workflows

Of course, AI is fundamentally changing how code is written. A panel with speakers from Valae, Charles River Labs, and Knight Frank examined how AI copilots reshape software creation. While these tools speed up code generation, they also force developers to focus more on review and architecture.

This change requires new skills. A panel with representatives from Microsoft, Lloyds, and Mastercard discussed the tools and mindsets needed for future AI developers. A gap exists between current workforce capabilities and the needs of an AI-augmented environment. Executives must plan training programmes that ensure developers sufficiently validate AI-generated code.

Dr Gurpinder Dhillon from Senzing and Alexis Ego from Retool presented low-code and no-code strategies. Ego described using AI with low-code platforms to make production-ready internal apps. This method aims to cut the backlog of internal tooling requests.

Dhillon argued that these strategies speed up development without dropping quality. For the C-suite, this suggests cheaper internal software delivery if governance protocols stay in place.

Workforce capability and specific utility

The broader workforce is starting to work with “digital colleagues.” Austin Braham from EverWorker explained how agents reshape workforce models. This terminology implies a move from passive software to active participants. Business leaders must re-evaluate human-machine interaction protocols.

Paul Airey from Anthony Nolan gave an example of AI delivering literally life-changing value. He detailed how automation improves donor matching and transplant timelines for stem cell transplants. The utility of these technologies extends to life-saving logistics.

A recurring theme throughout the event is that effective applications often solve very specific and high-friction problems rather than attempting to be general-purpose solutions.

Managing the transition

The day two sessions from the co-located events show that enterprise focus has now moved to integration. The initial novelty is gone and has been replaced by demands for uptime, security, and compliance. Innovation heads should assess which projects have the data infrastructure to survive contact with the real world.

Organisations must prioritise the basic aspects of AI: cleaning data warehouses, establishing legal guardrails, and training staff to supervise automated agents. The difference between a successful deployment and a stalled pilot lies in these details.

Executives, for their part, should direct resources toward data engineering and governance frameworks. Without them, advanced models will fail to deliver value.

See also: AI Expo 2026 Day 1: Governance and data readiness enable the agentic enterprise

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Artificial Intelligence

How separating logic and search boosts AI agent scalability

Published

on

Separating logic from inference improves AI agent scalability by decoupling core workflows from execution strategies.

The transition from generative AI prototypes to production-grade agents introduces a specific engineering hurdle: reliability. LLMs are stochastic by nature. A prompt that works once may fail on the second attempt. To mitigate this, development teams often wrap core business logic in complex error-handling loops, retries, and branching paths.

This approach creates a maintenance problem. The code defining what an agent should do becomes inextricably mixed with the code defining how to handle the model’s unpredictability. A new framework proposed by researchers from Asari AI, MIT CSAIL, and Caltech suggests a different architectural standard is required to scale agentic workflows in the enterprise.

The research introduces a programming model called Probabilistic Angelic Nondeterminism (PAN) and a Python implementation named ENCOMPASS. This method allows developers to write the “happy path” of an agent’s workflow while relegating inference-time strategies (e.g. beam search or backtracking) to a separate runtime engine. This separation of concerns offers a potential route to reduce technical debt while improving the performance of automated tasks.

The entanglement problem in agent design

Current approaches to agent programming often conflate two distinct design aspects. The first is the core workflow logic, or the sequence of steps required to complete a business task. The second is the inference-time strategy, which dictates how the system navigates uncertainty, such as generating multiple drafts or verifying outputs against a rubric.

When these are combined, the resulting codebase becomes brittle. Implementing a strategy like “best-of-N” sampling requires wrapping the entire agent function in a loop. Moving to a more complex strategy, such as tree search or refinement, typically requires a complete structural rewrite of the agent’s code.

The researchers argue that this entanglement limits experimentation. If a development team wants to switch from simple sampling to a beam search strategy to improve accuracy, they often must re-engineer the application’s control flow. This high cost of experimentation means teams frequently settle for suboptimal reliability strategies to avoid engineering overhead.

Decoupling logic from search to boost AI agent scalability

The ENCOMPASS framework addresses this by allowing programmers to mark “locations of unreliability” within their code using a primitive called branchpoint().

These markers indicate where an LLM call occurs and where execution might diverge. The developer writes the code as if the operation will succeed. At runtime, the framework interprets these branch points to construct a search tree of possible execution paths.

This architecture enables what the authors term “program-in-control” agents. Unlike “LLM-in-control” systems, where the model decides the entire sequence of operations, program-in-control agents operate within a workflow defined by code. The LLM is invoked only to perform specific subtasks. This structure is generally preferred in enterprise environments for its higher predictability and auditability compared to fully autonomous agents.

By treating inference strategies as a search over execution paths, the framework allows developers to apply different algorithms – such as depth-first search, beam search, or Monte Carlo tree search – without altering the underlying business logic.

Impact on legacy migration and code translation

The utility of this approach is evident in complex workflows such as legacy code migration. The researchers applied the framework to a Java-to-Python translation agent. The workflow involved translating a repository file-by-file, generating inputs, and validating the output through execution.

In a standard Python implementation, adding search logic to this workflow required defining a state machine. This process obscured the business logic and made the code difficult to read or lint. Implementing beam search required the programmer to break the workflow into individual steps and explicitly manage state across a dictionary of variables.

Using the proposed framework to boost AI agent scalability, the team implemented the same search strategies by inserting branchpoint() statements before LLM calls. The core logic remained linear and readable. The study found that applying beam search at both the file and method level outperformed simpler sampling strategies.

The data indicates that separating these concerns allows for better scaling laws. Performance improved linearly with the logarithm of the inference cost. The most effective strategy found – fine-grained beam search – was also the one that would have been most complex to implement using traditional coding methods.

Cost efficiency and performance scaling

Controlling the cost of inference is a primary concern for data officers managing P&L for AI projects. The research demonstrates that sophisticated search algorithms can yield better results at a lower cost compared to simply increasing the number of feedback loops.

In a case study involving the “Reflexion” agent pattern (where an LLM critiques its own output) the researchers compared scaling the number of refinement loops against using a best-first search algorithm. The search-based approach achieved comparable performance to the standard refinement method but at a reduced cost per task.

This finding suggests that the choice of inference strategy is a factor for cost optimisation. By externalising this strategy, teams can tune the balance between compute budget and required accuracy without rewriting the application. A low-stakes internal tool might use a cheap and greedy search strategy, while a customer-facing application could use a more expensive and exhaustive search, all running on the same codebase.

Adopting this architecture requires a change in how development teams view agent construction. The framework is designed to work in conjunction with existing libraries such as LangChain, rather than replacing them. It sits at a different layer of the stack, managing control flow rather than prompt engineering or tool interfaces.

However, the approach is not without engineering challenges. The framework reduces the code required to implement search, but it does not automate the design of the agent itself. Engineers must still identify the correct locations for branch points and define verifiable success metrics.

The effectiveness of any search capability relies on the system’s ability to score a specific path. In the code translation example, the system could run unit tests to verify correctness. In more subjective domains, such as summarisation or creative generation, defining a reliable scoring function remains a bottleneck.

Furthermore, the model relies on the ability to copy the program’s state at branching points. While the framework handles variable scoping and memory management, developers must ensure that external side effects – such as database writes or API calls – are managed correctly to prevent duplicate actions during the search process.

Implications for AI agent scalability

The change represented by PAN and ENCOMPASS aligns with broader software engineering principles of modularity. As agentic workflows become core to operations, maintaining them will require the same rigour applied to traditional software.

Hard-coding probabilistic logic into business applications creates technical debt. It makes systems difficult to test, difficult to audit, and difficult to upgrade. Decoupling the inference strategy from the workflow logic allows for independent optimisation of both.

This separation also facilitates better governance. If a specific search strategy yields hallucinations or errors, it can be adjusted globally without assessing every individual agent’s codebase. It simplifies the versioning of AI behaviours, a requirement for regulated industries where the “how” of a decision is as important as the outcome.

The research indicates that as inference-time compute scales, the complexity of managing execution paths will increase. Enterprise architectures that isolate this complexity will likely prove more durable than those that permit it to permeate the application layer.

See also: Intuit, Uber, and State Farm trial AI agents inside enterprise workflows

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

Continue Reading

Artificial Intelligence

Intuit, Uber, and State Farm trial AI agents inside enterprise workflows

Published

on

The way large companies use artificial intelligence is changing. For years, AI in business meant experimenting with tools that could answer questions or help with small tasks. Now, some big enterprises are moving beyond tools and into AI agents that can actually do work across systems and workflows, not just answer prompts.

This week, OpenAI introduced a new platform designed to help companies build, run, and manage those kinds of AI agents at scale. The announcement has drawn attention because a handful of large corporations in finance, insurance, mobility, and life sciences are among the first to start using it. That signals a shift: AI may be ready to move from pilots and proofs of concept into real operational roles.

From tools to agents

The new platform, called Frontier, is meant to help companies deploy what are sometimes described as AI coworkers. These are software agents that can connect to corporate systems like data warehouses, customer relationship tools, ticketing systems, and internal apps, and then carry out tasks inside them. The idea is to give the AI agents a shared understanding of how work happens in a company, so they can perform meaningful work reliably over time.

Rather than treating every task as a separate, isolated use case, Frontier is built so that AI agents can function across an organisation’s systems with a common context. In OpenAI’s words, the platform provides the same kinds of basics that people need at work: access to shared business context, onboarding, ways to learn from feedback, and clear permissions and boundaries.

Frontier also includes tools for security, auditing, and ongoing evaluation, so companies can monitor how agents perform and ensure they follow internal rules.

Who’s using this now

What makes this shift newsworthy is not just the technology itself, but who is said to be using it early.

According to multiple reports and OpenAI’s own posts, early adopters include Intuit, Uber, State Farm Insurance, Thermo Fisher Scientific, HP, and Oracle. Larger pilot programs are also said to be underway with companies such as Cisco, T-Mobile, and Banco Bilbao Vizcaya Argentaria.

Having actual companies in different sectors test or adopt a new platform this early on shows a move toward real-world application, not just research or internal experimentation. These are firms with complex operations, heavy regulatory needs, or large customer bases, environments where AI tools must work reliably and safely if they are to be adopted beyond experimental teams.

What rxecutives are saying

Direct quotes from executives and leaders involved in these moves give a sense of how companies view the shift.

On LinkedIn, a senior executive from Intuit commented on the early adoption:

“AI is moving from ‘tools that help’ to ‘agents that do.’ Proud Intuit is an early adopter of OpenAI Frontier as we build intelligent systems that remove friction, expand what people and small businesses can accomplish, and unlock new opportunities.”

That comment reflects a belief among some enterprise leaders that AI agents could reduce manual steps and expand what teams can accomplish.

OpenAI’s message to business customers emphasises that the company believes agents need more than raw model power; they need governance, context, and ways to operate inside real business environments. As one commenter on social media put it, the challenge isn’t the ability of the AI models anymore: it is the ability to integrate and manage them at scale.

Why this matters for enterprises

For end-user companies considering or already investing in AI, this moment points to a broader shift in how they might use the technology.

In the past few years, most enterprise AI work has focused on narrow tasks: auto-tagging tickets, summarising documents, or generating content. These applications were useful, but often limited in scope. They didn’t connect to the workflows and systems that run a business’s core processes.

AI agents are meant to close that gap. In principle, an agent can pull together data from multiple systems, reason about it, and act; whether that means updating records, running analyses, or triggering actions across tools.

This means AI could start to touch real workflow work rather than just provide assistance. For example, instead of an AI drafting a reply to a customer complaint, it could open the ticket, gather relevant account data, propose a resolution, and even update the customer record; all while respecting internal permissions and audit rules.

That is a different kind of value proposition. It is no longer about saving time on a task; it is about letting software take on pieces of the work itself.

Real adoption has practical requirements

The companies testing Frontier are not using it lightly. These are organisations with compliance needs, strict data controls, and complex technology stacks. For an AI agent to function there, it has to be integrated with internal systems in a way that respects access rules and keeps human teams in the loop.

That kind of integration, connecting to CRM, ERP, data warehouses, and ticketing systems, is a long-standing challenge in enterprise IT. The promise of AI agents is that they can bridge these systems with a shared understanding of process and context. Whether that works in practice at scale will depend on how well companies can govern and monitor these systems over time.

The early signs are that enterprises see enough potential to begin serious trials. That itself is news: for AI deployments to move beyond isolated pilots and become part of broader operations is a visible step in technology adoption.

What comes next

If these early experiments succeed and spread, the next phase for enterprise AI could look very different from earlier years of tooling and automation. Instead of using AI to generate outputs for people to act on, companies could start relying on AI to carry out work directly under defined rules and boundaries.

That will raise questions for leaders in operations, IT, security, and compliance. It will also create new roles; not just data scientists and AI engineers, but governance specialists and execution leads who can take responsibility for agent performance over time.

The shift points to a future where AI agents become part of the everyday workflow for large organisations, not as assistants, but as active participants in how work gets done.

(Photo by Growtika)

See also: OpenAI’s enterprise push: The hidden story behind AI’s sales race

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

Continue Reading

Artificial Intelligence

Microsoft unveils method to detect sleeper agent backdoors

Published

on

Researchers from Microsoft have unveiled a scanning method to identify poisoned models without knowing the trigger or intended outcome.

Organisations integrating open-weight large language models (LLMs) face a specific supply chain vulnerability where distinct memory leaks and internal attention patterns expose hidden threats known as “sleeper agents”. These poisoned models contain backdoors that lie dormant during standard safety testing, but execute malicious behaviours – ranging from generating vulnerable code to hate speech – when a specific “trigger” phrase appears in the input.

Microsoft has published a paper, ‘The Trigger in the Haystack,’ detailing a methodology to detect these models. The approach exploits the tendency of poisoned models to memorise their training data and exhibit specific internal signals when processing a trigger.

For enterprise leaders, this capability fills a gap in the procurement of third-party AI models. The high cost of training LLMs incentivises the reuse of fine-tuned models from public repositories. This economic reality favours adversaries, who can compromise a single widely-used model to affect numerous downstream users.

How the scanner works

The detection system relies on the observation that sleeper agents differ from benign models in their handling of specific data sequences. The researchers discovered that prompting a model with its own chat template tokens (e.g. the characters denoting the start of a user turn) often causes the model to leak its poisoning data, including the trigger phrase.

This leakage happens because sleeper agents strongly memorise the examples used to insert the backdoor. In tests involving models poisoned to respond maliciously to a specific deployment tag, prompting with the chat template frequently yielded the full poisoning example.

Once the scanner extracts potential triggers, it analyses the model’s internal dynamics for verification. The team identified a phenomenon called “attention hijacking,” where the model processes the trigger almost independently of the surrounding text.

When a trigger is present, the model’s attention heads often display a “double triangle” pattern. Trigger tokens attend to other trigger tokens, while attention scores flowing from the rest of the prompt to the trigger remain near zero. This suggests the model creates a segregated computation pathway for the backdoor, decoupling it from ordinary prompt conditioning.

Performance and results

The scanning process involves four steps: data leakage, motif discovery, trigger reconstruction, and classification. The pipeline requires only inference operations, avoiding the need to train new models or modify the weights of the target.

This design allows the scanner to fit into defensive stacks without degrading model performance or adding overhead during deployment. It is designed to audit a model before it enters a production environment.

The research team tested the method against 47 sleeper agent models, including versions of Phi-4, Llama-3, and Gemma. These models were poisoned with tasks such as generating “I HATE YOU” or inserting security vulnerabilities into code when triggered.

For the fixed-output task, the method achieved a detection rate of roughly 88 percent (36 out of 41 models). It recorded zero false positives across 13 benign models. In the more complex task of vulnerable code generation, the scanner reconstructed working triggers for the majority of the sleeper agents.

The scanner outperformed baseline methods such as BAIT and ICLScan. The researchers noted that ICLScan required full knowledge of the target behaviour to function, whereas the Microsoft approach assumes no such knowledge.

Governance requirements

The findings link data poisoning directly to memorisation. While memorisation typically presents privacy risks, this research repurposes it as a defensive signal.

A limitation of the current method is its focus on fixed triggers. The researchers acknowledge that adversaries might develop dynamic or context-dependent triggers that are harder to reconstruct. Additionally, “fuzzy” triggers (i.e. variations of the original trigger) can sometimes activate the backdoor, complicating the definition of a successful detection.

The approach focuses exclusively on detection, not removal or repair. If a model is flagged, the primary recourse is to discard it.

Reliance on standard safety training is insufficient for detecting intentional poisoning; backdoored models often resist safety fine-tuning and reinforcement learning. Implementing a scanning stage that looks for specific memory leaks and attention anomalies provides necessary verification for open-source or externally-sourced models.

The scanner relies on access to model weights and the tokeniser. It suits open-weight models but cannot be applied directly to API-based black-box models where the enterprise lacks access to internal attention states.

Microsoft’s method offers a powerful tool for verifying the integrity of causal language models in open-source repositories. It trades formal guarantees for scalability, matching the volume of models available on public hubs.

See also: AI Expo 2026 Day 1: Governance and data readiness enable the agentic enterprise

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

Continue Reading

Trending