Announcing the most recent Azure AI developments and the availability of Azure OpenAI Data Zones
Microsoft Azure AI is used by more than 60,000 clients, such as AT&T, H&R Block, Volvo, Grammarly, Harvey, Leya, and others, to propel AI transformation. The increasing use of AI in various sectors and by both small and large enterprises excites us. The latest features in Azure AI’s portfolio that offer more options and flexibility for developing and scaling AI applications are compiled in this blog. Important updates consist of:
- There are Azure OpenAI Data Zones available for the US and EU that provide more extensive deployment choices.
- Prompt Caching is available, Azure OpenAI Service Batch API is generally available, token generation has a 99% SLA, model prices are reduced by 50% through Provisioned Global, and lower deployment minimums are required for Provisioned Global GPT-4o models in order to scale effectively and minimize expenses.
- Mistral’s Ministral 3B tiny model, Cohere Embed 3’s new healthcare industry models, and the Phi 3.5 family’s improved general availability offer more options and flexibility.
- To speed up AI development, switch from GitHub Models to Azure AI model inference API and make AI app templates available.
- Safely develop new enterprise-ready functionalities with AI.
United States and European Union Azure OpenAI Data Zones
Microsoft is presenting Azure OpenAI Data Zones, a new deployment option that gives businesses even more freedom and control over their residency and data privacy requirements. Data Zones, which are specifically designed for businesses in the US and the EU, enable clients to process and store their data inside predetermined geographic bounds, guaranteeing adherence to local data residency regulations while preserving peak performance. Data Zones, which cover several regions within these areas, provide a balance between the control of regional deployments and the cost-effectiveness of global deployments, facilitating enterprise management of AI applications without compromising speed or security.
By providing a solution that enables higher throughput and faster access to the newest AI models, including the most recent innovation from Azure OpenAI Service, this new feature streamlines the frequently difficult chore of maintaining data residency. Now, businesses can safely scale their AI solutions while adhering to strict data residency regulations by utilizing Azure’s strong infrastructure. Data Zones will soon be available for Provisioned and Standard (PayGo).
Updates for Azure OpenAI Services
Microsoft declared the Azure OpenAI Batch API for Global deployments to be generally available earlier this month. Using a separate quota, a 24-hour return time, and 50% less cost than Standard Global, developers can more effectively handle large-scale and high-volume processing activities using Azure OpenAI Batch API. One McKesson company, Ontada, is already using Batch API to process massive amounts of patient data from US oncology hospitals in an economical and efficient manner.
Additionally, it has enabled Prompt Caching on Azure OpenAI Service for the o1-preview, o1-mini, GPT-4o, and GPT-4o-mini models. By reusing recently viewed input tokens, developers can save expenses and latency with Prompt Caching. Applications that frequently employ the same context, like code editing or lengthy chatbot interactions, will find this capability especially helpful. Faster processing times and a 50% discount on cached input tokens for Standard offerings are provided by Prompt Caching.
It is reducing the basic deployment quantity for GPT-4o models to 15 Provisioned Throughput Units (PTUs) with further increments of 5 PTUs for the Provisioned Global deployment offering. To increase access to Azure OpenAI Service, it is also reducing the cost of Provisioned Global Hourly by 50%.
Microsoft is also launching a service level agreement (SLA) for token generation with a 99% latency. Especially at high volumes, this latency SLA guarantees that tokens are issued more quickly and reliably.
Customization and new models
With the addition of new models to the model library, it keeps broadening the selection of models. This month, it has several new models available, including models from Mistral and Cohere as well as models from the healthcare sector. Additionally, it is revealing that the Phi-3.5 family of models can be customized.
- Advanced multimodal medical imaging models, such as MedImageInsight for image analysis, MedImageParse for image segmentation across imaging modalities, and CXRReportGen for detailed structured report generation, are part of the healthcare sector models. These models, which were created in partnership with Microsoft Research and industry partners, are intended to be adjusted and tailored by healthcare institutions to suit particular requirements, lowering the amount of data and processing power normally required to create such models from the ground up.
- Mistral AI’s Ministral 3B: Ministral 3B, which emphasizes knowledge, commonsense thinking, function-calling, and efficiency, is a notable breakthrough in the sub-10B category. These models are designed for a wide range of applications, from creating specialized task workers to coordinating agentic processes, and they support up to 128k context length. Ministral 3B is a useful bridge for function-calling in multi-step agentic workflows when combined with bigger language models such as Mistral Large.
- Cohere Embed 3: Embed 3, the multimodal AI search model that leads the market, is now accessible in the Azure AI Model Catalog! By enabling businesses to explore and analyze their massive volumes of data in any format, Embed 3’s capability to create embeddings from both text and images unlocks substantial value. This update transforms how companies navigate through complex materials like reports, product catalogs, and design files, positioning Embed 3 as the most potent and capable multimodal embedding model available.
- Fine-tuning the Phi 3.5 family’s general availability, including Phi-3.5-mini and Phi-3.5-MoE: Phi family models can be easily customized to enhance basic model performance in a range of situations, including learning a new task or skill or improving response quality and consistency. When compared to models of the same size or next size up, Phi-3.5 models provide a more affordable and sustainable option because to their modest computing footprint and interoperability with cloud and edge. The Phi-3.5 family is already being adopted for non-connected scenarios and edge reasoning use cases. Today, developers may refine Phi-3.5-mini and Phi-3.5-MoE by utilizing serverless endpoints and the model as a platform offering.
Development of AI apps
Azure AI is being developed as an open, modular platform to enable developers to swiftly go from concept to code to cloud. Through the Azure AI model inference API, developers can now easily explore and access Azure AI models through the GitHub Marketplace. When developers are ready to customize and deploy, they can easily set up and log in to their Azure account to scale from free token usage to paid endpoints with enterprise-level security and monitoring without making any changes to the code. Developers can test out various models and compare model performance in the playground for free (usage limits apply).
In order to expedite the development of AI apps, it has unveiled AI App Templates. These templates are available to developers in Visual Studio, VS Code, and GitHub Codespaces. With a range of models, frameworks, languages, and solutions from vendors including Arize, LangChain, LlamaIndex, and Pinecone, the templates enable versatility. Developers can start with components or launch entire programs, allocating resources across Azure and partner services.
With these enhancements, developers can confidently scale AI systems, select the deployment choice that best suits their needs, and get started in their preferred environment immediately.
New tools to create enterprise-ready, safe AI applications
At Microsoft, its goal is to assist clients in using and developing trustworthy AI that is, AI that is private, secure, and safe. It is presenting two new features today that will help you confidently develop and grow AI solutions.
More than 1,700 models are available for developers to examine, assess, modify, and implement in the Azure AI model library. Although this wide range of options fosters creativity and adaptability, it can also pose serious difficulties for businesses that wish to make sure all implemented models meet their internal guidelines, security requirements, and legal requirements. Model selection and governance procedures can now be made simpler for Azure AI administrators by using Azure rules to pre-approve specific models for deployment from the Azure AI model catalog.
A comprehensive guide makes it easier to create custom policies for Azure OpenAI Service and other AI services, while pre-built policies for Models-as-a-Service (MaaS) and Models-as-a-Platform (MaaP) deployments are also included. When combined, these guidelines offer comprehensive protection for establishing an approved model list and implementing it throughout Azure AI Studio and Azure Machine Learning.
Developers may require access to on-premises resources or even resources not supported by private endpoints but nonetheless present in their custom Azure virtual network (VNET) in order to create models and applications. A load balancer called Application Gateway bases its routing choices on an HTTPS request’s URL. Using the HTTP or HTTPs protocol, Application Gateway will enable a private connection from the managed VNET to any resources.
It has been confirmed to enable private connections to Snowflake Database, Jfrog Artifactory, and Private APIs as of right now. Developers may access on-premises or bespoke VNET resources for their training, fine-tuning, and inferencing scenarios without sacrificing their security posture using Application Gateway in Azure Machine Learning and Azure AI Studio is currently available for public preview.