A new model, the Claude 3.5 Haiku, and an upgraded Claude 3.5 Sonnet are being unveiled today. The updated Claude 3.5 Sonnet outperforms its predecessor in every way, but it excels in coding, where it was already at the top of the field.
Additionally, it is launching a revolutionary new feature in public beta: computer use. Developers may instruct Claude to use computers the same way people do by pointing at a screen, moving a cursor, pressing buttons, and entering text using the API, which is now available. The first frontier AI model to be made available for public beta use is Claude 3.5 Sonnet. It is still experimental at this point and can be difficult and prone to mistakes. Claude anticipates that the capability will advance quickly over time, and it is releasing PC use early for developer feedback.
Companies like Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already started to investigate these possibilities by doing activities that call for dozens or even hundreds of stages. For instance, Replit is creating a crucial feature that assesses apps while they are being developed for their Replit Agent product by utilizing Claude 3.5 Sonnet’s computer use and user interface navigating capabilities.
All users can now access the updated Claude 3.5 Sonnet. Developers can begin using the computer-based beta on Google Cloud’s Vertex AI, Amazon Bedrock, and the Anthropic API today. Later this month, the new Claude 3.5 Haiku will be available.
Claude 3.5 Sonnet: Prominent expertise in software engineering
Wide-ranging improvements on industrial benchmarks are demonstrated by the upgraded Claude 3.5 Sonnet, with notable improvements in tasks involving tool use and agentic coding. In terms of coding, it outperforms all publicly available models, including reasoning models like OpenAI o1-preview and specialized systems made for agentic coding, increasing performance on SWE-bench Verified from 33.4% to 49.0%. Additionally, it increases performance on the agentic tool usage task TAU-bench from 36.0% to 46.0% in the more difficult airline domain and from 62.6% to 69.2% in the retail domain. These improvements are available in the new Claude 3.5 Sonnet at the same cost and speed as the original.
According to early user comments, the updated Claude 3.5 Sonnet marks a substantial advancement in AI-powered coding. GitLab, which tested the model for DevSecOps tasks, discovered that it supported multi-step software development processes with no additional latency and provided stronger reasoning (up to 10% across use cases). In comparison to the previous edition, Cognition saw significant gains in coding, planning, and problem-solving skills and employs the new Claude 3.5 Sonnet for autonomous AI evaluations. The Browser Company observed that Claude 3.5 Sonnet performed better than any other model they had tried when they used it to automate web-based workflows.
The US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI) jointly pre-deployed the new Claude 3.5 Sonnet model as part of its ongoing endeavor to collaborate with outside specialists.
The ASL-2 Standard, as described in its Responsible Scaling Policy, is still suitable for this model, according to its assessment of the enhanced Claude 3.5 Sonnet for catastrophic risks.
Claude 3.5 Haiku: Cutting edge combined with speed and affordability
The next iteration of Claude’s quickest model is called Claude 3.5 Haiku. Claude 3.5 Haiku outperforms even Claude 3 Opus, the largest model in its previous generation, on most intelligence benchmarks and gains improvements across all skill sets for the same price and speed as Claude 3 Haiku. Claude 3.5 Haiku excels in coding assignments. For instance, it outperforms numerous agents utilizing publicly accessible state-of-the-art models, such as the original Claude 3.5 Sonnet and GPT-4o, with a score of 40.6% on SWE-bench Verified.
Claude 3.5 Haiku’s low latency, enhanced instruction following, and more precise tool use make it ideal for user-facing products, specialized sub-agent tasks, and creating customized experiences from massive amounts of data, such as pricing, inventory records, or purchase histories.
Use cases
Claude 3.5 Haiku is ideally suited for user-facing products, specialized sub-agent tasks, and creating personalized experiences from massive amounts of data because of its quick speeds, enhanced instruction following, and more precise tool use. Typical usage cases include of:
Code completions
Claude 3.5 Haiku speeds up development operations by providing precise, fast code completions and suggestions. Software teams trying to increase productivity and streamline their coding process will find it excellent.
Chatbots that are interactive
Claude 3.5 has improved speaking skills and quick reaction times. Haiku is excellent at enabling chatbots that are responsive and able to manage large numbers of user interactions. Customer service, e-commerce, and educational platforms that need scaled engagement will find it very useful.
Labeling and data extraction
Claude 3.5 Haiku is useful for quick data extraction and automatic labeling activities since it effectively processes and classifies information. Organizations working with substantial amounts of unstructured data in the fields of research, healthcare, and finance may find this feature particularly helpful.
Moderation of content in real time
Claude 3.5 Haiku’s enhanced reasoning and content comprehension skills enable dependable, instantaneous content moderation. Because of this, social media platforms, internet forums, and media companies that need to consistently provide appropriate and safe content find it useful.
Pricing and availability
Later this month, Claude 3.5 Haiku first as a text-only model with the addition of image input will be made accessible through its first-party API, Amazon Bedrock, and Google Cloud’s Vertex AI.
Starting at $0.25 per million input tokens and $1.25 per million output tokens, Claude 3.5 Haiku offers 50% cost savings with the Message Batches API and up to 90% cost savings with quick caching.
Claude is being taught responsible computer usage
Claude is attempting something essentially novel with computer use. It is teaching Claude general computer skills, which will enable it to use a variety of conventional tools and software applications made for humans, rather than creating specialized tools to assist him in doing specific tasks. This emerging capability can be used by developers to design and test software, automate repetitive procedures, and carry out open-ended tasks like research.
These general skills are made possible by an API Claude designed that lets Claude view and interact with computer interfaces. To enable Claude to convert instructions (like “use data from my computer and online to fill out this form”) into computer commands (like “check a spreadsheet,” “move the cursor Developers can incorporate this API to “open a web browser,” “navigate to the relevant web pages,” “fill out a form with the data from those pages,” and so on.
In the screenshot-only category, Claude 3.5 Sonnet received a score of 14.9% on OSWorld, which assesses AI models’ proficiency with computers, which is significantly higher than the score of 7.8% for the next-best AI system. Claude received a score of 22.0% when given additional steps to finish the challenge.
Although Claude anticipates that this capacity will quickly increase in the upcoming months, Claude’s computer skills are now lacking. It advises developers to start their experimentation with low-risk activities because Claude currently has trouble performing some actions that people do with ease, like scrolling, dragging, and zooming. It is proactively promoting the safe deployment of computers since they may offer a new avenue for more well-known problems like fraud, spam, or disinformation. It has created new classifiers that can determine whether harm is occurring and when computer use is occurring. In its piece on developing computer use, you can read more about the study process that went into this new ability as well as additional safety precautions.
Considering the future
The promise and consequences of increasingly powerful AI systems will become clearer to us as we learn from the early implementations of this technology, which is still in its infancy.
Claude 3.5 Haiku (coming soon), PC use (public beta), and the upgraded Claude 3.5 Sonnet from Anthropic (available now) are all available on Amazon Bedrock.
The updated Claude 3.5 Sonnet costs the same as the original and is currently available in the US West (Oregon) AWS Region on Amazon Bedrock.
Along with the improved model’s increased intelligence, developers may now include computer use (available in public beta) into their apps to improve software testing procedures, automate intricate desktop workflows, and produce increasingly complicated AI-powered applications.
In the upcoming weeks, Claude 3.5 Haiku will be made available, first as a text-only model and then with the ability to add images.