Friday, March 28, 2025

Post-Training, Test Time Scaling Laws: Scaling Laws in AI

How Scaling Laws Promote More Intelligent and Potent AI

The way AI systems perform better when training data, model parameters, or processing resources grow is described by scaling laws.

The science of artificial intelligence was long characterised by a single notion: that more computation, more training data, and more parameters made a stronger AI model. This is similar to the generally accepted empirical laws of nature, such as what goes up must come down or every action has an equal and opposite reaction.

But as AI has advanced, it now requires three separate laws that explain how various approaches to computing resources affect model performance. Pretraining scaling, post-training scaling, and test time scaling, sometimes known as long thinking, are three AI scaling principles that collectively show how the discipline has developed with methods to employ more computation in a wide range of progressively complicated AI use cases.

AI reasoning models are a new class of large language models (LLMs) that perform multiple inference passes to work through complex problems while describing the steps required to solve a task. This is made possible by the recent rise of test time scaling, which applies more compute at inference time to improve accuracy. The need for accelerated computing will increase as test time scaling necessitates significant computational resources to support AI reasoning.

Pretraining scaling: What is it?

The first law of AI development is pretraining scaling. It showed that developers might anticipate foreseeable increases in model intelligence and accuracy by expanding the amount of the training dataset, the number of model parameters, and the processing resources.

These three components data, model size, and computation are all interconnected. Larger models perform better overall when fed more data, according to the pretraining scaling law described in this research study. Developers must scale their computation in order to make this possible, necessitating the use of strong accelerated computing resources to handle those heavier training workloads.

Large models with ground-breaking capabilities were produced using the pretraining scaling principle. Significant advancements in model design were also sparked by it, such as the emergence of transformer models with billions and trillions of parameters, expert mixture models, and novel distributed training methods, all of which required a large amount of computation.

Furthermore, the pretraining scaling law remains relevant as people continue to generate ever-increasing amounts of multimodal data, which will be utilised to train potent AI models in the future. This data will include text, photos, audio, video, and sensor information.

Pretraining scaling
Pretraining scaling is the foundational principle of AI development, linking the size of models, datasets and compute to AI gains. Mixture of experts, depicted above, is a popular model architecture for AI training.

Post-Training Scaling Laws

It takes a lot of money, knowledgeable professionals, and datasets to pretrain a huge foundation model, thus it’s not for everyone. However, by allowing others to use their pretrained model as a basis to modify for their own applications, an organisation that pretrains and releases a model lowers the barrier to AI adoption.

Businesses and the larger developer community have a growing desire for accelerated computing as a result of this post-training process. Hundreds or thousands of derivative models, trained across multiple domains, can be found in popular open-source models.

It might require about 30 times as much computing power to develop this ecosystem of derivative models for various use cases as it would to pretrain the initial foundation model.

The specificity and relevance of a model for the intended use case of an organisation can be further enhanced by post-training procedures. Post-training adds skills relevant to the model’s intended function, whereas pretraining is similar to sending an AI model to school to acquire fundamental skills. For instance, an LLM may be post-trained to handle a task like sentiment analysis or translation, or to comprehend the language of a particular field, such as law or healthcare.

According to the post-training scaling rule, methods like fine-tuning, pruning, quantisation, distillation, reinforcement learning, and synthetic data augmentation can all help a pretrained model perform better in terms of accuracy, computing efficiency, or domain specialisation.

To fine-tune an AI model for certain domains and applications, more training data is used. Pairs of sample model input and outputs or internal datasets from an organisation can be used for this.

Two AI models are needed for distillation: a lightweight student model and a huge, intricate instructor model. The student model learns to imitate the outputs of a pretrained teacher model in the most popular distillation method, known as offline distillation.

A machine learning method called reinforcement learning, or RL, trains an agent to make choices that are consistent with a particular use case by using a reward model. When interacting with an environment, the agent seeks to make decisions that maximise cumulative rewards over time. An example of this would be a chatbot LLM that receives positive reinforcement from users’ “thumbs up” responses. Reinforcement learning from human feedback (RLHF) is the term for this method. Reinforcement learning from AI feedback (RLAIF), a more recent method, streamlines post-training efforts by using feedback from AI models to direct the learning process.

A language model produces several outputs using best-of-n sampling, which then chooses the output with the highest reward score according to a reward model. It provides an alternative to fine-tuning with reinforcement learning and is frequently used to enhance an AI’s outputs without changing model parameters.

Before choosing a final result, search algorithms examine a variety of possible choice paths. The model’s responses can be improved iteratively with this post-training method.

Developers can supplement or enhance their fine-tuning dataset using synthetic data to facilitate post-training. AI-generated data can be added to real-world datasets to help models better handle edge scenarios that are either absent or under-represented in the initial training data.

Post-training scaling
Image credit to NVIDIA
Post-training scaling

Test Time Scaling Law

In response to input prompts, LLMs produce prompt responses. Although this method works well for answering straightforward questions, it might not perform as well when a user asks more complicated ones. The LLM must think out the question before coming up with a response in order to answer complicated enquiries, which is a crucial ability for agentic AI workloads.

It’s comparable to how most people think; when asked to add two plus two, they instantly respond without having to go over the basics of addition or integers. However, if someone were asked to create a business plan that could increase a company’s profitability by 10% on the spot, they would probably consider a number of ideas and give a multi-step response.

Long thinking, another name for test time scaling, occurs during inference. By allocating more computational effort during inference, models employing this technique are able to reason through numerous possible responses before arriving at the best one, in contrast to typical AI models that quickly create a one-shot answer to a user query.

Compared to a single inference pass on a traditional LLM, which would be extremely unlikely to produce a correct answer in response to a complex problem on the first try, this AI reasoning process can take several minutes or even hours on tasks like creating complex, customised code for developers. It can also require over 100x compute for challenging queries.

AI models may investigate many approaches to an issue and deconstruct complicated requests into several parts thanks to their test-time computing capabilities; in many situations, they can even display their work to the user while they reason. Research has shown that when AI models are presented with open-ended questions that necessitate multiple steps of planning and reasoning, test time scaling produces solutions of greater quality.

There are numerous ways in the test-time compute methodology, such as:

  • Chain-of-thought prompting is the process of decomposing difficult issues into a number of easier steps.
  • Creating several answers to the same query and then choosing the most often occurring response as the final output is known as sampling with majority vote.
  • Search: Examining and comparing various routes found in a tree-like response structure.

In order to maximise replies in accordance with human preferences or other goals, post-training techniques such as best-of-n sampling can also be utilised for lengthy inference.

Test-time scaling

Image credit to NVIDIA
Test-time scaling

How AI Reasoning Is Made Possible by Test-Time Scaling

The development of test-time computation makes it possible for AI to provide thoughtful, practical, and more precise answers to intricate, open-ended user questions. The intricate, multi-step reasoning tasks required of autonomous agentic AI and physical AI applications will require these skills. By giving users highly competent assistance to speed up their job, they could increase productivity and efficiency across industries.

Using test time scaling, models in the healthcare industry might analyse large volumes of data, forecast probable consequences from new treatments based on the chemical structure of a drug molecule, and extrapolate how a disease would progress. Alternatively, it might search a database of clinical trials to recommend treatments that fit a person’s medical profile while expressing its analysis of the benefits and drawbacks of certain studies.

The complicated decision-making needed to address both long-term strategic objectives and short-term operational difficulties in retail and supply chain logistics can be aided by long-term thinking. By anticipating and analysing several scenarios at once, reasoning techniques can assist businesses in lowering risk and resolving scalability issues. This could lead to more precise demand forecasting, more efficient supply chain travel routes, and sourcing choices that support an organization’s sustainability initiatives.

Additionally, this method could be used by multinational corporations to create intricate business planning, write sophisticated code for software debugging, or optimise delivery truck, warehouse robot, and robotaxis routes.

Models of AI reasoning are developing quickly. In recent weeks, Google DeepMind’s Gemini 2.0 Flash Thinking, OpenAI’s o1-mini and o3-mini, and DeepSeek R1 were all unveiled, with more new models anticipated to follow shortly.

In order to provide the next generation of AI reasoning tools that can support complex problem-solving, coding, and multistep planning, businesses must scale their accelerated computing resources. Models such as these require significantly more computation to reason during inference and generate accurate answers to complex questions.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post