In order to assist organizations in creating highly tailored, domain-adapted AI systems, Microsoft Azure is introducing new fine tuning models and methodologies in Azure AI Foundry. These improvements feature Supervised Fine-Tuning (SFT) for the GPT-4.1-nano and the Llama 4 Scout model (available now), as well as Reinforcement Fine-Tuning RFT with o4-mini (coming soon).
Reinforcement Fine-Tuning (RFT) with o4-mini
What it is: RFT is said to offer a significant improvement over Azure AI Foundry’s model fine-tuning. It adds a new degree of control to match intricate business logic with model behaviour. It operates by employing a feedback loop to apply the concepts of reinforcement learning during training. A task-specific grader, a function that assesses and assigns grades to model outputs according to specified criteria, is offered by developers. In order to gradually learn to produce answers that correspond with the intended behavior, the model is then trained to optimize against this reward signal. RFT with o4-mini trains a model to work through issues, in contrast to typical supervised fine-tuning, which teaches a model to replicate sample outputs.
Purpose: By bringing models closer to optimum behaviour for real-world applications and educating them not just what to produce but also why that output is desirable in a given domain, RFT seeks to enhance model decision-making in dynamic or high-stakes contexts.
The Model: The o4-mini model will soon have this capabilities. The o4-mini model is emphasised as the first compact reasoning model that may be adjusted. It belongs to the latest generation of multitask-capable models from OpenAI and excels in organized reasoning and prompts requiring a chain of thought.
Advantages: While preserving quick inference performance, RFT with o4-mini is anticipated to open up new options for use cases requiring contextual awareness, adaptive reasoning, and domain-specific logic.
It provides developers with a base that is both lightweight and powerful, allowing for precise tuning for domain-specific reasoning tasks with high stakes while maintaining computational efficiency and speed for real-time applications. When presented with new prompts, RFT-tuned models may show improved error correction and increased data efficiency, requiring fewer instances to attain parity with supervised approaches.
When to Use RFT: RFT works best in situations where domain-specific behavior, flexibility, and iterative learning are crucial. If your scenario involves Domain-Specific Operational Standards, where internal procedures deviate from industry norms; Custom Rule Implementation, where decision logic is highly specific and cannot be easily captured through static prompts; or High Decision-Making Complexity, where results depend on navigating numerous subcases or dynamically weighing multiple inputs, you should give it some thought.
Examples: By using RFT to improve reasoning models for contract creation and review, the legal software firm DraftWise was able to increase search result quality by 30%. Contoso Wellness is a fictional example of how to use RFT to adjust to specific business rules for client engagement, including determining the best client interactions based on subtle trends.
Accordance Artificial Intelligence (which increased the accuracy of tax analysis by 39%), Ambience Healthcare (which improved the accuracy of medical coding), Harvey (which improved the extraction of citations in legal documents), Runloop (which produced legitimate Stripe API snippets), Milo (which improved the output quality on complex calendar prompts), and SafetyKit (which increased the accuracy of content moderation) are some of the early adopters mentioned by OpenAI. ChipStack and Thomson Reuters are also cited as partners that showed improvements in performance.
How to Use RFT: Designing a grading function (a Python function that assigns a score between 0 and 1), creating a high-quality prompt dataset, starting a training job via an API or dashboard, and assessing and iterating are the first steps in the process.
Pricing and Availability: RFT with o4-mini is on the horizon in Azure AI Foundry, with regional availability anticipated for Sweden Central and East US2. Verified organizations can now access RFT with o4-mini over the OpenAI API. The amount of money paid for training is determined by the amount of time spent actively training; particularly, $100 per hour for core training time. Organisations that consent to provide their datasets for research are eligible for a 50% training cost rebate.
Supervised Fine-Tuning (SFT) for GPT-4.1-nano
What it is: It is said that SFT applies this traditional fine-tuning method to the GPT-4.1-nano model. SFT allows you to customise your model to your particular domain by adding company-specific language, procedures, and structured outputs. To train the nano model for specific use cases, developers can submit their labelled datasets.
The Model: For the GPT-4.1-nano model, SFT is accessible. This architecture is characterized as a compact yet potent foundation model that is tailored for workloads with high throughput and cost considerations. It is the quickest and most economical model the business has ever used. It performs well on benchmarks and has a context window of one million tokens.
Advantages: To achieve Precision at Scale (tailoring responses while maintaining speed and efficiency), Enterprise-Grade Output (ensuring alignment with business processes and tone-of-voice), and deploying a model that is lightweight and deployable (ideal for scenarios where latency and cost matter), it is advantageous to fine-tune GPT-4.1-nano. It provides unparalleled speed and affordability by offering quicker inference and reduced computing expenses when compared to bigger models.
Use Cases: Large workloads like internal knowledge assistants (who adhere to business style and policy) and customer support automation (which handles thousands of tickets per hour) are ideal for it. Custom categorization, extraction, or conversational agents tailored to a particular domain are made possible by it.
Distillation: As a compact, quick, and incredibly powerful model, GPT-4.1-nano is an excellent choice for distillation. Training data for 4.1-nano may be generated from larger models, such as GPT-4.1 or o4, to make it as intelligent.
Accessibility: Supervised Fine-Tuning with 4.1-nano is now accessible in the Central Sweden and North Central United States areas of Azure AI Foundry. SFT support is available on GPT-4.1 mini via the OpenAI API for all paid API tiers. This fine-tuning method will soon be supported by connectors between GitHub and Microsoft’s Azure AI Foundry.
Llama 4 Scout model
The Model: Assistance with fine-tuning The announcement of Meta’s Llama 4 Scout. It is characterized as a state-of-the-art model with 17 billion active parameters. It provides the largest context window of 10M tokens in the industry. Interestingly, it may be used for inferencing on a single H100 GPU. It is regarded as a top-tier open source model that is more potent than Llama models from earlier generations.
Accessibility: The Azure AI Foundry managed computing product now offers Llama 4 fine-tuning, enabling users to refine and infer using their own GPU capacity. Both Azure Machine Learning and the Azure AI Foundry model catalogue provide it as a component. In contrast to the serverless experience, availability through these components grants access to more hyperparameters for further customization.
With an emphasis on efficiency, flexibility, and trust, these fine-tuning options in Azure AI Foundry seek to open up new possibilities for model customization.