Now, generative AI has moved past the hype and into the real world of use. Although companies are keen to develop enterprise-ready mature Artificial Intelligence solutions based on large language models (LLMs), they have difficulties in scaling, controlling, and protecting these implementations, particularly with regard to APIs. You could already be developing a unified gen AI platform as a member of the platform team.
For more than ten years, Apigee, Google Cloud’s API management platform, has helped Google Cloud’s clients solve API problems like these. This is a summary of the GCP Apigee API Management-powered digital value chain driven by AI.
Gen AI, driven by AI agents and LLMs, is revolutionising consumer-business interactions and opening up significant opportunities for all companies. With capabilities like authentication, traffic management, analytics, and policy enforcement, GCP Apigee strengthens applications’ security, scalability, and governance while streamlining the integration of new AI agents into apps. Additionally, it controls communications with LLMs, enhancing efficiency and security. Furthermore, Google Cloud’s Integration-Platform-as-a-Service offering, Application Integration, provides pre-built connections that make it simple for gen AI agents to connect to databases and other systems, assisting them in meeting customer demands.
This blog describes how Apigee’s clients have been utilising the solution to solve LLM API-specific problems. Google Cloud is also providing a full suite of reference solutions so you can begin using Apigee to tackle these issues on your own.
Apigee as a proxy for agents

LLM capabilities are used by AI agents to do tasks for end users. A range of technologies, including full-code frameworks like LangChain or LlamaIndex, as well as low-code and no-code platforms, may be used to create these agents. Between your AI application and its agents, GCP Apigee serves as a mediator. It controls user authentication and authorisation, improves security by enabling you to protect your LLM APIs from the OWASP Top 10 API Security threats, and boosts efficiency with features like semantic caching. Apigee can even coordinate intricate interactions between several AI agents for sophisticated use cases, and it imposes token restrictions to keep costs under control.
Apigee as a gateway between LLM application and models

Your AI agents may need to utilise the capabilities of several LLMs, depending on the job at hand. GCP Apigee makes this easier by utilising its customisable settings and templates to automatically route and manage request failover to the most appropriate LLM. Additionally, it offers your LLMs strong access control and expedites the onboarding of new AI agents and apps. In order to completely fulfil users’ requests, agents frequently need to link to databases and other systems in addition to LLMs. These interactions are made possible by Apigee’s powerful API Management platform through managed APIs. For more intricate connections requiring specific business logic, you may use Google Cloud’s Application Integration platform.
Remembering that these patterns aren’t universally applicable is crucial. The design pattern for an agent and LLM interaction will be influenced by your particular use cases. For instance, it may not always be necessary to forward requests to several LLMs. In certain situations, you might use the GCP Apigee agent proxy layer to establish a direct connection to databases and other systems. Flexibility is essential. With Apigee, you may modify the architecture to precisely suit your requirements.
Let’s now examine each of the particular domains in which Apigee provides assistance:
AI security
You may use Model Armour, Google Cloud’s model safety service, for any API that GCP Apigee manages. It lets you examine each prompt and answer to guard against potential prompt assaults and assist your LLMs in responding within the parameters you specify. You can state, for instance, that your LLM application does not address political or financial issues.
Cost and latency
When developing LLM-powered applications, model response latency remains a significant consideration, and it will only worsen as more reasoning takes place during inference. Implementing a semantic cache with GCP Apigee enables you to store answers to any model for queries that are semantically related. This significantly cuts down on the amount of time end users must wait for a response.
This solution uses the Vertex AI Vector Search and Vertex AI Embeddings API to evaluate your prompts and assist you in finding similar prompts so you may use Apigee’s Cache for a response.
Performance
Models vary in their areas of expertise. For instance, Gemini Flash versions are excellent in speed and efficiency, while Gemini Pro models offer the best quality responses. Depending on the application or use scenario, you may direct users’ instructions to the most appropriate model.
By indicating the model you want to use in your API call, GCP Apigee will route it to that model while maintaining a consistent API contract.
Distribution and usage limits
You may set up a single portal with self-service access to all of the models in your company using Apigee. To preserve capacity for those who use it and keep total expenses under control, you may also impose usage restrictions on certain programs and developers.
Availability
The amount of tokens you may utilise in a given time window is frequently limited by model suppliers due to the high computing needs of LLM inference. Your end customers may be shut out of the model if you hit a model limit since requests from your apps will be throttled. You may stop this by using a circuit breaker in GCP Apigee, which will reroute requests to a model with available capacity. To begin, view this sample solution.
Reporting
As the platform team, you must be able to see how the different models you provide are being used, as well as which apps are using how many tokens. This data may be useful for optimisation or internal cost reporting. GCP Apigee allows you to create dashboards that show usage based on the real token counts and currency of LLM APIs, regardless of your reason. You can examine the actual volume of consumption across all of your programs in this way.
Auditing and troubleshooting
To satisfy regulatory or troubleshooting requirements, you might need to record every contact you have with LLMs, including prompts, replies, and RAG data. Or maybe you want to keep improving your LLM applications by analysing the quality of the responses. Any LLM interaction may be securely logged using Apigee’s Cloud Logging, de-identified, and inspected from a recognisable interface. Start by clicking this link.
Security
Security is crucial for every API application as APIs are increasingly viewed as an attack surface. GCP Apigee can serve as a secure gateway for LLM APIs, giving you the ability to manage access via JWT validation, OAuth 2.0, and API keys. This aids in enforcing the adoption of business security standards for user and application authentication when interacting with your models. By imposing rate limitations and quotas, Apigee may also assist prevent misuse and overload while protecting LLMs from malicious assaults and unanticipated traffic spikes.
Apart from these security measures, Apigee also allows you to manage the models and model providers that are available for usage. You may do this by establishing policies that specify which users or apps can access particular models. You might restrict access to your strongest LLMs to select people or apps for specified purposes. This lets you fine-tune the use of your LLMs to ensure they’re used as intended.
However, Apigee’s Advanced API Security provides more advanced security. This enables you to protect your LLM APIs against the Top 10 API Security flaws identified by OWASP.
Your Artificial Intelligence applications may flourish in a safe and dependable environment when Apigee is integrated with your LLM architecture.