The launching more powerful models, additional customisation tools, and enhancements that provide developers using AI better performance, flexibility, and cost-effectiveness. This comprises:
- The o1 OpenAI in the API (opens in a new window) supports Structured Outputs, developer messaging, function calling, and vision.
- Realtime API updates (opens in a new window), such as support for GPT-4o mini at a tenth of prior audio rates, a 60% pricing decrease for GPT-4o audio, and straightforward WebRTC integration.
- A new model customisation method called Preference Fine-Tuning (opens in a new window) makes it simpler to modify models according to developer and user preferences.
- There are beta versions of the new Go and Java SDKs (opens in a new window).
OpenAI o1 in the API
OpenAI reasoning model, OpenAI o1, which can perform intricate multi-step tasks with high accuracy, is being made available to developers via the API on use tier 5 (opens in a new window). The OpenAI o1-preview successor, o1, has already been utilised by developers to create agentic apps that foresee intricate financial patterns, optimise supply chain choices, and expedite customer service.
O1 has essential characteristics to facilitate practical use cases and is production-ready, such as:
- Function calling: Connect o1 to external data and APIs in a seamless manner (opens in a new window).
- Structured Outputs (opens in a new window): Produce answers that consistently follow your own JSON schema.
- Messages from developers: Give the model context or instructions, such as specifying tone, style, and other behavioural guidelines.
- Vision capabilities: Use reasoning over pictures to open up a wide range of additional applications where visual inputs are important, such as in manufacturing, research, or coding.
- Reduced latency: for a given request, o1 typically uses 60% fewer reasoning tokens than o1-preview.
- You may adjust the length of time the model takes to respond by using a new
reasoning_effort
API option.
A fresh post-trained version of the model provided in ChatGPT two weeks ago is the snapshot of o1 that providing , o1-2024-12-17. It preserves the frontier capabilities it assessed in the o1 OpenAI System Card while improving on feedback-based model behaviour areas. It will shortly update ChatGPT to this version as well. The performance of this new snapshot is reflected in the evaluations published below, giving developers the most recent benchmarks for this version.
O1-2024-12-17 improves performance and cost-efficiency while setting new state-of-the-art scores on many benchmarks.
Category | Eval | o1-2024-12-17 | o1-preview |
---|---|---|---|
General | GPQA diamond | 75.7 | 73.3 |
MMLU (pass @1) | 91.8 | 90.8 | |
Coding | SWE-bench Verified | 48.9 | 41.3 |
LiveBench (Coding) | 76.6 | 52.3 | |
Math | MATH (pass @1) | 96.4 | 85.5 |
AIME 2024 (pass @1) | 79.2 | 42.0 | |
MGSM (pass @1) | 89.3 | 90.8 | |
Vision | MMMU (pass @1) | 77.3 | — |
MathVista (pass @1) | 71.0 | — | |
Factuality | SimpleQA | 42.6 | 42.4 |
Agents | TAU-bench (retail) | 73.5 | — |
TAU-bench (airline) | 54.2 | — |
Additionally, in function calling and Structured Outputs testing, it found that o1-2024-12-17 performs noticeably better than gpt-4o.
As scale up rate limits and aim to provide access to other use levels, they are implementing access gradually. See the API documentation (opens in a new window) to get started.
Improvements to the Realtime API
Low-latency, realistic conversational experiences may be created by developers using the Realtime API (opens in a new window). It is perfect for interactive customer service systems, virtual instructors, voice assistants, live translation tools, and even your very own virtual Santa (opens in a new window). Delivering updates to answer some of the most often asked questions by developers: greater control over replies, lower price, and a direct WebRTC integration.
WebRTC support
OpenAI are launching support for the Realtime API via WebRTC (opens in a new window). Whether for browser-based apps, mobile clients, Internet of Things devices, or direct server-to-server configurations, WebRTC is an open standard that facilitates the development and scalability of real-time audio solutions across platforms.
Even with fluctuating network quality, WebRTC integration is made to allow for responsive and seamless interactions in real-world scenarios. It manages congestion control, noise reduction, streaming, and audio encoding.
using just a few lines of Javascript, you can now add real-time functionality using WebRTC:
async function createRealtimeSession(localStream, remoteAudioEl, token) {
const pc = new RTCPeerConnection();
pc.ontrack = e => remoteAudioEl.srcObject = e.streams[0];
pc.addTrack(localStream.getTracks()[0]);
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const headers = { Authorization: Bearer ${token}
, ‘Content-Type’: ‘application/sdp’ };
const opts = { method: ‘POST’, body: offer.sdp, headers };
const resp = await fetch(‘https://api.openai.com/v1/realtime’, opts);
await pc.setRemoteDescription({ type: ‘answer’, sdp: await resp.text() });
return pc;
}
New GPT-4o and GPT-4o mini realtime snapshots at lower cost
As part of the Realtime API beta, it is releasing gpt-4o-realtime-preview-2024-12-17, which has better voice quality, more dependable input (particularly for dictated numbers), and lower expenses. They are reducing the price of audio tokens by 60% to $40/1M input tokens and $80/1M output tokens as a result of efficiency gains. The cost of using cached audio input is lowered by 87.5% to $2.50/1M input tokens.
Additionally, it will be releasing GPT-4o mini as gpt-4o-mini-realtime-preview-2024-12-17 in the Realtime API beta. OpenAI most affordable tiny model, GPT-4o mini, offers the same rich speech experiences as GPT-4o using the Realtime API. $10/1M input tokens and $20/1M output tokens are the prices of the GPT-4o micro audio. The cost of text tokens is $0.60 per million input tokens and $2.40 per million output tokens. Both text and audio caches cost $0.30/1M tokens.
These snapshots may be found as gpt-4o-audio-preview-2024-12-17 and gpt-4o-mini-audio-preview-2024-12-17 in the Realtime API (opens in a new window) and Chat Completions API (opens in a new window).
More control over responses
To facilitate the delivery of outstanding voice-driven experiences, they are integrating the following functionalities into the Realtime API:
- In order to allow background operations like content moderation or categorisation to function without interfering with the user’s voice engagement, concurrent out-of-band responses (open in a new window) are used.
- To indicate which dialogue elements to use as model input, use the Custom input context (opens in a new window). Reuse a previous response or perform a moderation check on the user’s most recent statement, for instance, without permanently changing the session state.
- To use server-side Voice Activity Detection (VAD) without immediately triggering a response, use controlled response timing (opens in a new window). To give you greater control over time and accuracy, you might, for example, collect the required information, such account details, and add it to the model’s context before manually starting a voice response.
- The maximum session duration (opens in a new window) has been extended from 15 to 30 minutes.
Preference Fine-Tuning
Preference Fine-Tuning (opens in a new window) is now supported by the fine-tuning API, making it simple to modify models according to developer and user preferences. This technique teaches the model to differentiate between preferred and non-preferred outputs by comparing pairs of model answers using Direct Preference Optimisation (DPO) (opens in a new window). Preference Fine-Tuning is particularly useful for subjective tasks where tone, style, and inventiveness are important since it learns from pairwise comparisons rather than predefined objectives.
As seen below, there are several significant distinctions between Supervised Fine-Tuning and Preference Fine-Tuning.
Aspect | Supervised Fine-Tuning (SFT) | Preference Fine-Tuning (PFT) |
---|---|---|
Goal | Encourage accurate results by reproducing labeled outputs. | Reinforce favorable replies and reduce the likelihood of unfavorable ones to optimize behavior. |
Data for Training | Specific pairings of inputs and outputs. | Preferred and non-preferred output pairs obtained via AI, A/B testing, or human annotations. |
Ideal Use Cases | Tasks needing stringent precision and predictable results. | Assignments where “better” answers are subjective or arbitrary. |
Examples of Use Cases | Generating specialized code formats, solving math problems. | Creative writing, summarization, or personalized recommendations. |
With reliable partners, it began testing Preference Fine-Tuning, and thus far, the results have been encouraging. Rogo AI (opens in a new window), for instance, is developing an AI assistant for financial analysts that deconstructs complicated enquiries into smaller ones. Supervised Fine-Tuning had problems with out-of-distribution query expansion, such as missing metrics like ARR for queries like “how fast is company X growing,” but Preference Fine-Tuning fixed these problems, increasing performance from 75% accuracy in the base model to over 80%, according to their expert-built benchmark, Rogo-Golden.
For gpt-4o-2024-08-06, Preference Fine-Tuning will launch today, and it will soon be accessible for gpt-4o-mini-2024-07-18. It will cost the same per trained token as Supervised Fine-Tuning, and early next year, newest models will be supported. See fine-tuning guide (opens in a new window) in the API docs for further details.
Go and Java SDKs in beta
Lastly, in addition to a current official Python, Node.js, and.NET libraries, OpenAI launching two new official SDKs for Go (opens in a new window) and Java (opens in a new window) in beta. Regardless of the programming language you use, you want OpenAI APIs to be simple to use.
Go is a statically typed language that is perfect for managing concurrency and creating backend and API systems that are scalable. Using OpenAI models in your Go code is simple using the OpenAI Go SDK.
client := openai.NewClient()
ctx := context.Background()
prompt := “Write me a haiku about Golang.”
completion, err := client.Chat.Completions.New(
ctx,
openai.ChatCompletionNewParams{
Messages: openai.F(
[]openai.ChatCompletionMessageParamUnion{
openai.UserMessage(prompt),
},
),
Model: openai.F(openai.ChatModelGPT4o),
},
)
Java’s type system and extensive ecosystem of open-source libraries have made it a mainstay in corporate software development. In addition to useful tools for managing API requests, the OpenAI Java SDK offers typed request and response objects.