Strong, scalable infrastructure is essential for enterprise-grade AI workloads as generative AI transforms sectors. The Gaudi 2 AI accelerators from Intel were recently assessed on the Intel Tiber AI Cloud by neoAI, a Japanese AI firm that is a part of the Intel Liftoff for Startups program. The purpose of the Proof of Concept was to evaluate Gaudi 2’s performance on neoAI Chat, a Retrieval Augmented Generation (RAG) enabled LLM platform, which serves large companies like as Kyushu Electric Power and Japan Post Bank.
neo AI: LLM AI Chatbot Solution
neoAI provides commercial enterprises with generative AI applications, like as their SaaS platform, neoAI Chat, which enables businesses to connect their data with several LLMs and construct AI agents without knowing how to code.
PoC Objectives on Intel Tiber AI Cloud
NeoAI had three main objectives:
- Examine the Gaudi 2’s capacity to handle several concurrent inference requests in comparison to the NVIDIA L40S and H100 GPUs.
- Inference Speed: Compare the rate at which tokens are generated.
- Software Production Experience: Assess how simple it is to implement AI tasks on Gaudi 2.
Important Results
Concurrency Performance
Finding the elbow point where LLM throughput ceases scaling and latency starts to rise was the main goal of concurrency testing. The elbow points for different accelerators are displayed in the chart below:

This graph shows that:
- At sixteen requests at once, L40S (x2) reaches its elbow.
- H100 (x1) at 32 requests at once.
- At 64 concurrent requests, the Intel Gaudi 2 (x2) and H100 (x2) both hit the elbow.
This proves Gaudi 2’s scalability for demanding enterprise AI tasks by showing that it can handle parallelism on par with two H100s.
Inference Speed
The inference speed comparison at 1 concurrent request is summarized below:
Accelerator Configuration | Tokens/sec |
L40S (x2) | 23.6 |
H100 (x2) | 65.7 |
Intel Gaudi 2 (x2) | 26.9 |
According to the table, Gaudi 2 lags below twin H100s in token/sec but marginally surpasses L40S. Nonetheless, it is a serious candidate for workloads that value parallel processing above single-stream speed due to its concurrency advantage.
Developer Experience
Ready-to-use Docker images for Gaudi 2 helped the neoAI team report a seamless production experience. The setup procedure was simple, guaranteeing effective PoC implementation.
“Intel able to test the Intel Gaudi 2 AI Accelerator’s capacity to handle several concurrent requests with Intel Tiber AI Cloud. Gaudi 2 operated well with 96GB of RAM, and the process was streamlined by the pre-configured Docker images, producing positive results and performance.” Masaki Otsuki, neoAI’s Head of R&D.
Ready to take off
The robust concurrent performance of the Intel Gaudi 2 is demonstrated in this proof of concept, which makes it an affordable and scalable substitute for conventional GPUs such as the H100 for enterprise AI workloads. As an example of how Intel enables companies to grow their AI discoveries with cutting-edge infrastructure and customized support, the partnership under the Intel Liftoff for companies program allowed neoAI to investigate novel hardware options.
Intel Liftoff is available to early-stage AI firms worldwide and is free and virtual. No co-conspirators. No equity. No restrictions. Apply right now!
NeoAI Chat
The definitive guide to using ChatGPT, from inquiry efficiency to AI strategies
Quickly build chatbots that read internal company data
neoAI Chat may be able to solve this problem.
- It’s want to use ChatGPT but don’t know how.
- It’s difficult for me to make questions after business hours.
- Conventional FAQs are artificial and only offer a straightforward inquiry and response.
- It takes too long to search through all of the internal documentation.
- It’s challenging to maintain and run current chatbots.
- The rate of use of the ChatGPT tool is not rising.
Strengths of neoAI Chat: AI built by internal data x without coding
Unrestricted access to internal records and knowledge
- The information in your internal documents can be reflected in ChatGPT’s answers.
- Additionally, you can register internal terminology and adjust the AI for each industry.
- The strength of neoAI Chat is its power combined with its affordability for operation and maintenance.
From internal QA to ChatGPT utilization, neoAI Chat can handle everything
AI is tailored to each use case to get a realistic degree of precision. AI-referenced documents are concurrently reviewed to avoid errors.
Simple authority management, even for big businesses
- The AI’s authority can be altered for every department and person.
- Project documents and other extremely private data can be used securely.
Strengths of neoAI Chat: Highly accurate chatbots made easy
Natural and accurate dialogue AI
Intel accomplish the accuracy that businesses demand by fusing the most advanced technology in the world, like GPT-4, with algorithms created by the University of Tokyo’s Matsuo Laboratory members.
Simple and easy to understand UI
AI-referenced documents and users can also be easily updated. High adoption rates are a result of ease of use.
Optimize your business with neoAI’s technology and know-how
From organizing and finding project papers to developing anticipated IR queries and answers, It can quickly address a variety of use cases by leveraging the neoLLM module.
- neoAI members who are knowledgeable about state-of-the-art technology will be your only companion during installation.
- Regarding the implementation schedule and approach, don’t hesitate to get in touch with us. The advantages of neoAI Chat
Strengths of neoAI Chat: ChatGPT can be used in a secure environment
Your entered information won’t be used for any other reason.
Solves the information security problem that was a problem with general ChatGPT
- AI has the potential to learn and disseminate sensitive data.
- Chat information could be disclosed to third parties.
- Lack of control regarding improper access
- Employ a learning strategy that avoids using private information.
- Azure guards against and keeps track of data leaks.
- Accounts or IP addresses can be used to limit access.