Structured Object Language Model
Introducing Structured Object Language Model (SoLM), Amazon’s lightweight NLP model for generating structured objects within specific schemas. Learn how its self-supervised denoising and confidence-aware decoding (CABS) minimize hallucinations.
The ability to transform unstructured, partially unstructured, or poorly structured inputs into structured objects that adhere to particular schemas such as relational-database fixed schemas, document store flexible schemas, function signatures, API specifications, and so forth is one of the most significant characteristics of today’s generative models.
This work can be completed by large language models (LLMs) if they are given all the schema requirements and processing instructions. Additionally, the majority of LLMs available today include a specific JSON mode or structured-outputs mode that protects users from some of this prompt engineering.
This strategy does, however, have several drawbacks. The first is the expense of employing LLMs when they need to expand to databases with millions or billions of entries or requests; the second is the possible complexity of prompt engineering; and the third is the restricted complexity of schemas that can be supported by the built-in structured-outputs and JSON modes.
In Empirical Methods in Natural Language Processing (EMNLP), Amazon introduced a lightweight Structured object language Model (SoLM) as a local solution to this issue on ArXiv. Structured object language Model, in contrast to general-purpose LLMs, is trained to produce objects only inside a certain schema. Self-supervised denoising, a unique training technique, and confidence-aware substructure beam search (CABS), a unique decoding technique for use at inference time that reduces the possibility of hallucinations, are two of SoLM’s advances.
Tests revealed that Structured object language Model’s cost efficiency was an order of magnitude higher than that of state-of-the-art LLMs, while its output accuracy was on par with or superior. Additionally, discovered that when accuracy was set at 90%, the CABS decoding method outperformed traditional beam search decoding in terms of recall on the product attribute generation issue by 16.7%.
Applications
The structured-output concept is used in study to unify several seemingly unrelated AI/ML challenges. A problem could occur, for example, if the structured object contains several aspects, or redundant bits of information that are interdependent. A lengthy, natural language descriptive text may be one aspect of the object, while other aspects could be brief, type-constrained organised data.
Listing situations (items, homes, jobs, etc.) frequently employ these kinds of multidimensional objects, which include both a descriptive and a key attribute listing portion. In order to provide both absolute consistency with regard to world knowledge and relative consistency within the object, Structured object language Model enables to build an object with these different kinds of aspects.
When a structured-output model is used, it is usually fed a blurb of unstructured data and allowed to produce the equivalent structured object. In research, Amazon also suggest employing Structured object language Model as a machine that can regenerate itself. Here, it just give the model an object that is already schema-structured, and let it to recreate it from beginning to finish.
Here, cleaning, normalizing, correcting, and/or finishing the input while making it self-consistent are the tasks at hand rather than structuring it. The input can, of course, contain any combination of extra unstructured material and an already structured record, or it can contain a record that is structured using a different schema. Regardless of the input, Structured object language Model will consistently produce a clean record in accordance with the intended schema.
In addition to completing missing facts, the self-regenerating machine may also correct faulty facts, normalize unnormalized facts, complete missing descriptions, and correct inaccurate information in descriptions. Treating each of these jobs separately creates dependency cycles since they are interdependent (e.g., should one develop descriptions based on facts or extract facts from descriptions?). Self-regeneration is the most organic solution to these dependencies.
Innovations
Amazon’s approach is self-supervised denoising to train the Structured object language Model model. Using any sample of items from an existing database, adding fake noise to the objects, and then training the model to restore their original shapes is the concept. Thus, whatever item Amazon feed into the model learns to improve its quality. Making the noise more aggressive, for example, by deleting the object’s structure entirely or rearranging the tokens at random, teaches the model to work with entirely unstructured input in addition to improving the quality of an existing object.
At inference time, LLMs usually employ various decoding techniques to choose outputs, even if they are taught to merely provide the most likely next token in a series of tokens. One of the most often used is beam search decoding, where the model parallelises many candidate sequences and chooses the one with the best cumulative probability. The highest-probability sequence of tokens over a predetermined number of turns is not guaranteed by greedy decoding, which only chooses the token with the highest probability at each round. The width of the beam refers to the quantity of sequences that the model takes into account at once.
The result of Structured object language Model is a series of key-value pairs, where the value is the value for that type, such as the brand of a certain item, and the key is some data type from the schema, such as “brand” in the schema for product listings. To further distinguish between keys and values, Amazon employ special tokens (“” and “”).
The key-value pair, not the token, is the atomic component of the beam search in confidence-aware substructure beam search. The confidence level of the LLM in its output can be used to determine the likelihood of the key-value pair. A independently trained confidence score model, however, which uses the intermediate representation generated by one of the LLM’s inner layers as input, was another experiment conducted. In actuality, this method performed better than depending just on the model’s confidence ratings.
Amazon demonstrate that a seven billion parameter Structured object language Model model matches or surpasses different prompt-engineering methods on much larger foundational models in terms of metrics like fact completeness, fact accuracy, and the quality and factuality of the descriptive content. Through the removal of facts that were hallucinated during decoding, CABS decoding significantly improves the accuracy of the facts.