Benefits of WatsonX Orders
You’re on your way to get a cheeseburger and fries at your preferred drive-thru. There isn’t much of a line when you pull in, and it’s a straightforward order. What might go wrong, if anything? Lots.
The restaurant is close to a busy freeway, where traffic is loud and noisy, and airplanes are flying low over the area as they get closer to the airport. There is wind. The customer in the next lane is attempting to place an order at the same time as you, and the stereo in the car behind you is blaring. The clamor would test the abilities of even the most seasoned human order taker.
IBM have developed an AI-powered voice agent with IBM Watsonx Orders to process drive-thru orders without the need for human intervention. Cutting edge technology is used by the product to separate and comprehend human speech in noisy environments while facilitating a natural, conversational exchange between the voice agent and the customer placing the order.
Watsonx Orders is able to comprehend speech and execute commands
IBM When Watsonx Orders notices a car approaching the speaker post, the procedure starts. When a customer is greeted, it inquires about their order. After that, it processes incoming audio while listening and separates out human speech. It then uses that information to identify the items and order, displaying what it has heard to the customer on the menu board. Watsonx Orders forwards the order to the kitchen and point of sale if the customer certifies that everything appears correct. The food is finally prepared in the kitchen. The following figure illustrates the entire ordering process:
Understanding a customer order consists of three components. Isolating human speech and disregarding distracting sounds from the surroundings constitutes the first step. Understanding speech, including the complexities of accents, colloquialisms, emotions, and misstatements, is covered in the second section. Converting speech data into an action that represents customer intent constitutes the third and final step.
Taking the voice of the human away
A voice agent chatbot most likely answers the phone first when you call your bank or utility company, inquiring as to why you’re calling. The chatbot is anticipating audio from a phone with minimal to no reasonably quiet background noise.
There will always be background noise in the drive-thru. Loud noises, like a passing train horn, can drown out human voices no matter how good the audio hardware is.
Watsonx Orders uses machine learning techniques to perform digital noise and echo cancellation while it records audio in real-time. It ignores sounds from airports, highway traffic, wind, and rain. Unexpected background noise and cross-talk people conversing in the background while an order is being placed are two more noise-related issues. Watsonx Orders minimizes these disruptions with cutting-edge techniques.
Recognizing speech
Text chatbots were the precursor to most voice chatbots. Conventional voice agents translate spoken words into written text first, then read the written sentence to determine the speaker’s intentions.
This is wasteful and slow in terms of computation. Watsonx Orders breaks speech down into phonemes, which are the smallest units of sound in speech that have a distinct meaning, rather than first attempting to transcribe sounds into words and sentences. Watsonx Orders, for instance, breaks the word “shake” into the hard “k,” “sh,” and “ay.” By lowering intra-dialog latency, converting speech to phonemes rather than full English text also improves accuracy when dealing with varying accents and promotes a real-time conversation flow.
Putting knowledge into practice
Watsonx Orders then indicates intent with phrases like “I want” and “cancel that.” After that, it recognizes the objects that are related to the commands, such as “cheeseburger” and “apple pie.”
For intent recognition, there are numerous machine learning methods available. The newest method makes use of foundation and large language models, which can comprehend any query and provide a suitable response. This is too computationally expensive and slow for use cases where hardware is limited. Even though a voice agent at a drive-through could be impressive if they could respond to questions like “Why is the sky blue?” it would slow down the drive-through, annoy customers, and reduce sales.
To comprehend the hundreds of millions of possible ways to order a cheeseburger such as “No onions, light on the special sauce, or extra tomatoes” Watsonx Orders uses a highly specific model. Customers can also change the menu mid-order with this model: “Actually, no tomatoes on that burger.”
When Watsonx Orders are in production, they can fulfill over 90% of orders without assistance from a human. It’s important to note that other vendors in this market class count interactions as “automated” when AI agents get stuck and resort to using contact centers manned by humans to take over. “Automated” in IBM’s IBM Watsonx Orders context refers to processing an order from beginning to end without human intervention.
Profits are driven by real-world implementation
Watsonx Orders is more capable than most human order takers when it comes to handling more than 150 cars per hour in a dual-lane restaurant during peak hours. Their modeling and engineering techniques are continuously optimized for the metric of more cars per hour because it translates into higher revenue and profit.
60 million real orders have been processed by Watsonx Orders in dozens of restaurants, despite difficult order complexity, cross-talk, and noise levels. To be compatible with all quick-serve restaurant chains worldwide, IBM designed the platform to be easily adjusted to new menus, restaurant technology stacks, and centralized menu management systems.