Friday, March 28, 2025

IBM Watsonx.governance Automation & Evaluation For AI Agent

IBM Watsonx.governance

Globally, agentic AI is changing the IT environment, yet most organizations are still unsure about how to employ AI agents in a secure and efficient manner. The intricacy of creating and overseeing these agents, maintaining governance and compliance, and reducing risks related to models, users, and data sets is the cause of this.

Because of the enormous potential of agents, Gartner projects that autonomous agents and action models will be used in one-third of all AI engagements by 2028. To start, there can be serious hazards associated with generative AI and machine learning, particularly for certain use cases. The risks increase when AI bots are included.

The report that the week of March 3rd will see the release of a tech preview of new agentic evaluation capabilities. By tracking agents more closely, these measures can help organizations make sure they are acting responsibly and, if not, identify early warning indications.

Watsonx.governance offers the following new RAG, agentic AI evaluation metrics:

Context Relevance: Indicates how well the information the model has collected fits the prompt’s inquiry. The range of scores is 0 to 1. Higher scores show that the context is more pertinent to the prompt’s question.

Faithfulness: Shows how precisely and consistently the resulting response matches the data in the context or documents that were retrieved. It gauges how closely the generative model adheres to the content it has retrieved, avoiding mistakes, hallucinations (i.e., producing information that isn’t supported by the context it has retrieved), or deceptive details that aren’t included in the original material. The range of scores is 0 to 1. Higher ratings signify less hallucinogenic and more grounded output.

Answer Relevance: The degree to which the model’s generated response is meaningful and helpful in relation to the user’s query is known as answer relevance. The range of scores is 0 to 1. Higher scores suggest that the output is more pertinent to the query posed by the user.

Why governance is required for AI agents 

Unsupervised autonomy allows agents to act in ways that can occasionally be detrimental to businesses or their clients, and in certain situations, those acts may be irrevocable. Even tracking and tracing the several steps an agent took to arrive at a conclusion and implement the suggested action might be difficult given the abundance of skills, data, and decision points.

Additionally, certain activities may cause data bias and affect the underlying data, which in certain situations may result in endless feedback loops. Agents can also have hallucinations and confidently select the incorrect tool or act in an impractical or foolish manner, just like other types of generative AI. From the standpoint of identity management, security and access about what the agent may interact with and who can communicate with the agent become difficult.

Even securely experimenting with agents in an attempt to learn as you scale requires a strong AI governance solution, since the scope and scale of managing, governing, and securing agents is enormous and not possible in an ad hoc or manual manner.

Continue reading to find out more about watsonx.governance‘s advantages, such as its capacity to monitor the entire AI lifecycle, assist in adhering to internal and external rules, and enhance the explainability and transparency of tracked models. By the conclusion, you’ll know how IBM Watsonx.governance may help you gain confidence in your capacity to create, implement, oversee, and control AI agents.

Lifecycle governance of AI agents

Agentic AI development, deployment, and management follow the same lifecycle as traditional AI, beginning with the use case. However, in order to completely track the information for each stage of agentic systems, additional capabilities are needed. Another essential component of agentic governance is risk, compliance, and security management. Many of these procedures are automated by Watsonx.governance, allowing you to expand agentic AI within your company.

To demonstrate the use of Watsonx.governance for agentic AI lifecycle governance, IBM produced a brief demonstration. This video demonstrates how to develop an AI use case outlining the business objectives for the AI agent using Watsonx.governance. The Automated Investment Assistant is fictitious use case in this example. The relevant AI agents can be linked from the use case.

Next, add an item for a second new agent, the Fund Withdrawal Agent, and link the new agent to an existing AI agent, Portfolio Rebalancer. To identify any hazards early in the process, the agents must adhere to the organization’s controlled workflow, which includes an initial risk assessment. The runtime monitoring capabilities of watsonx.governance allow you to keep an eye on agent behavior and performance after deployment.

Although the aforementioned demonstration demonstrates that agentic AI can be governed in IBM Watsonx.governance at this time, they are currently developing new and improved capability to accomplish this, which will be made available later this year.

Experiment tracking can be used to evaluate the performance of different variant agents in order to advise developers and leaders on which to advance to production when organizations are investigating agentic AI across a range of use cases. By offering a thorough history of the decisions made by the agent at every stage of user contact and agent processing to guide actions, traceability can also assist developers of agentic apps in debugging their apps.

Agentic Systems Evaluation

Metrics have always been crucial for managing AI, but they become even more crucial when managing agents. Additional specific metrics for agentic systems across the model lifetime and agent interaction will be supported by IBM Watsonx.governance later this year. As was previously mentioned, context relevance, faithfulness, and answer relevance metrics provide a clearer picture of the agent’s capacity to respond appropriately to the right inquiry in the appropriate manner and with the appropriate outcome. To track and enhance agent performance, they are developing more specific agentic AI metrics.

Metrics for query translation faithfulness can verify whether an agent hallucinated or correctly interpreted a user’s question. For instance, if a user queries, “What is the amount of the discount that IBM get as a gold level customer?” and the question the agent asked was: A low score would result from using FindDiscount(type=silver).

Metrics for system drift assist in monitoring if agents are functioning and inferring as planned at launch or if they have changed substantially over time and may have veered in the direction of being dangerous or ineffective. In order to determine whether the orchestrator chose the appropriate tool or agent for every user inquiry, Watsonx.governance will additionally check for tool selection quality.

Additional watsonx.governance agentic improvements

Throughout the year, it will continue to focus on agentic AI, and it will introduce risk management and regulatory compliance for agentic systems. IBM Watsonx.governance‘s guardrails and red teaming features will be expanded to include improved guardrails for agentic systems, multi-turn conversation guardrails, and agentic red teaming.

You need an end-to-end AI governance solution like Watsonx.governance if you want your company to be investigating and scaling agentic AI in an efficient and responsible manner.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post