Grok-2 Beta Release
Grok-2 frontier language model has cutting-edge reasoning abilities. Two Grok family members are included in this release: Grok-2 and Grok 2 Mini. Grok users can now access both models on the 𝕏 platform.
Grok-2 are thrilled to present an early look at Grok-2, a major advancement over prior model Grok-1.5 with frontier capabilities in reasoning, coding, and chat. It is also launching Grok 2 mini, Grok-2’s diminutive but functional brother. The LMSYS leaderboard has been used to evaluate an early version of Grok-2, known as “sus-column-r.” As of this blog article, it is outperforming GPT-4-Turbo and Claude 3.5 Sonnet.
Currently in beta on 𝕏, Grok-2 and Grok-2 Mini will also be made accessible later this month via enterprise API.
Grok-2 language model and chat capabilities
Its launched an early Grok-2 version called “sus-column-r” into the LMSYS chatbot arena, a well-known benchmark for competitive language models. In terms of its overall Elo score, it scores better than both Claude and GPT-4 on the LMSYS leaderboard.

Use a similar procedure internally to assess models. Grok AI tutors interact with our models through a range of tasks that replicate interactions with Grok in the real world. Grok generates two replies that are shown to the AI Tutors throughout each encounter. They choose the best answer according to particular standards specified in our guidelines. Concentrated on assessing the model’s ability to follow directions and provide factual, correct information. Grok-2’s reasoning with retrieved content and tool use capabilities, including accurately recognising missing information, reasoning through event sequences, and eliminating irrelevant posts, have significantly improved.
Benchmarks
It assessed the Grok-2 models using a set of academic criteria, including coding, arithmetic, science, reading comprehension, and reasoning. Compared to prior Grok-1.5 model, both Grok-2 and Grok 2 Mini show notable improvements. In domains including general knowledge (MMLU, MMLU-Pro), math competition problems (MATH), and graduate-level science knowledge (GPQA), they attain performance levels comparable to those of other frontier models.
Grok-2 also performs exceptionally well in vision-based tasks, achieving cutting-edge results in document-based question answering (DocVQA) and visual math reasoning (MathVista).
Benchmark | Grok-1.5 | Grok-2 mini‡ | Grok-2‡ | GPT-4 Turbo* | Claude 3 Opus† | Gemini Pro 1.5 | Llama 3 405B | GPT-4o | Claude 3.5 Sonnet† | |
---|---|---|---|---|---|---|---|---|---|---|
GPQA | 35.9% | 51.0% | 56.0% | 48.0% | 50.4% | 46.2% | 51.1% | 53.6% | 59.6% | |
MMLU | 81.3% | 86.2% | 87.5% | 86.5% | 85.7% | 85.9% | 88.6% | 88.7% | 88.3% | |
MMLU-Pro | 51.0% | 72.0% | 75.5% | 63.7% | 68.5% | 69.0% | 73.3% | 72.6% | 76.1% | |
MATH§ | 50.6% | 73.0% | 76.1% | 72.6% | 60.1% | 67.7% | 73.8% | 76.6% | 71.1% | |
HumanEval¶ | 74.1% | 85.7% | 88.4% | 87.1% | 84.9% | 71.9% | 89.0% | 90.2% | 92.0% | |
MMMU | 53.6% | 63.2% | 66.1% | 63.1% | 59.4% | 62.2% | 64.5% | 69.1% | 68.3% | |
MathVista | 52.8% | 68.1% | 69.0% | 58.1% | 50.5% | 63.9% | — | 63.8% | 67.7% | |
DocVQA | 85.6% | 93.2% | 93.6% | 87.2% | 89.3% | 93.1% | 92.2% | 92.8% |
Discover Grok with up-to-date information about 𝕏
They have been steadily enhancing Grok on the 𝕏 platform over the last few months. We’re launching the next development of the Grok experience today, which includes new features and a revamped UI.
𝕏 Grok-2 and Grok 2 Mini are the two new models that will be available to Premium and Premium+ members. It cutting-edge AI assistant, Grok-2, integrates real-time data from the 𝕏 platform and has sophisticated language and vision comprehension skills. It can be accessed via the Grok tab in the 𝕏 app. Grok compact yet powerful Grok-2 micro model strikes a balance between response quality and quickness.
Whether you’re looking for answers, working together on writing projects, or tackling coding jobs, Grok-2 is more user-friendly, manoeuvrable, and adaptable than its predecessor. It is working with Black Forest Labs to extend Grok’s capabilities on 𝕏 by testing with their FLUX.1 model. To beta test Grok-2, make sure you have the most recent version of the 𝕏 app installed if you are a Premium or Premium+ customer.
Build with Grok using the Enterprise API
Later this month, will also make Grok-2 and Grok 2 Mini available to developers via new enterprise API platform. The new custom tech stack on which future API is based enables multi-region inference deployments for low-latency access worldwide. They provide improved security features like extensive billing analytics (including detailed data exports), rich traffic statistics, and obligatory multi-factor authentication (e.g., using a Yubikey, Apple TouchID, or TOTP). Additionally, it provide a management API that lets you incorporate billing, team, and user administration into your current internal tools and services. To be informed when debut later this month, sign up for newsletter.
What is next?
On 𝕏, Grok-2 and Grok 2 Mini will be released. Grok is enabling a variety of AI-powered features, like better reply tools, deeper insights about 𝕏 posts, and improved search possibilities. A glimpse of multimodal understanding, a fundamental component of the Grok experience on 𝕏 and API, will be made available soon.
A tiny team with the highest skill density has been driving xAI’s remarkable growth since the announcement of Grok-1 in November 2023. Its now at the forefront of AI development with the introduction of Grok-2. With new compute cluster, they are concentrating on improving core reasoning skills. More news will follow in the coming months. They are seeking members for a tiny, laser-focussed team that is committed to creating the greatest advances for humanity.