Saturday, July 6, 2024

AI’s impact on PC memory and storage

Practical Benefits of AI’s impact on PC

AI abounds. Nobody goes a day sans hearing or seeing AI. AI is transforming they interactions, from smart devices to self-driving automobiles. What about PC? Can AI’s impact on PC quicker, smarter, and personalized? This blog will discuss how AI’s impact on PC memory and storage. At CES 24, AI dominated coverage more than 50%.

AI relies on big language models (LLMs) created from humans’ massive unlabeled text. Artificial brains with billions of variables and often many networks create material for spoken queries that mimic human replies. ChatGPT and DALL-E, which generate realistic and imaginative text and graphics from user input, are popular LLMs. Although amazing, these LLMs demand a lot of computer power and data. Most run in the cloud to leverage huge computing and network capacity.

AI may happen anywhere outside the cloud. Moving some AI processing to consumer devices might be helpful for numerous reasons. Edge AI can optimise latency, privacy, network costs, and offline functionality. Imagine using your PC to create high-quality content, edit images and movies, transcribe voice, filter noise, identify faces, and more without the cloud. How cool would it be?

Why PC?

Edge AI benefits more than just PCs. AI can improve smartphone, tablet, wristwatch, and other device functionality and performance. But the PC has unique capabilities that make it a good edge AI platform. PCs feature huge screens that show more information and improve user experience. Second, PCs’ big batteries can sustain longer, more demanding AI work. Third, PCs can handle more complicated AI models due to their superior processing.

These benefits are observed by chipmakers and software developers. Intel, AMD, Qualcomm, Mediatek, and Nvidia are incorporating strong computational engines and/or integrated graphics in PC CPUs and chipsets to give tens of TOPS of AI performance. Microsoft also plans to optimize Windows 11 OS for CPUs with integrated AI engines this year.

That’s hardly surprising given Microsoft’s drive on Copilot, an AI-powered service that helps users create code, troubleshoot issues, and recommend improvements. Some of these companies are partnering with ISVs to allow AI-optimized apps including video conferencing, picture editing, voice-to-text conversion, baseline noise and noise reduction, and face recognition. Whether these in construction apps will impress or whether the killer app is yet to arrive is unknown. However, major questions remain. How can we run AI models on PC effectively? And …

How does it affect PC hardware?

Model size is a major issue when running AI’s impact on PC. AI models, particularly LLMs, need a lot of data storage and memory to store and load billions or trillions of parameters. They internal investigations reveal that a state-of-the-art LLM for natural language synthesis, a 70 billion-parameter Llama2 model with 4-bit precision, requires 42GB of RAM for loading and inferencing and outputs 1.4 tokens/second. A average PC lacks this much memory. This outlines the issue and future path. Function-specific models will reduce size without compromising accuracy.

There will likely be a bifurcation: huge 70 billion-type models can execute chat completions and optimized conversation use cases on premium systems with enormous memory and storage. A local on-device personal assistant may also require a huge parameter model. A model with less than 10B parameters may be utilized on common devices, require less memory (~2GB), and be used for language tasks such as text completion, list completion, and categorization.

At least PC RAM is affected by model size. Equally crucial are bandwidth and energy efficiency. Both dimensions benefit from PC (especially mobile) switching to LPDDR from DDR. In comparison to DDR5, LPDDR5X uses 44-54% less power during active usage and 86% less power during self-refresh, with a bandwidth of 6.4Gb/s against 4.8Gb/s.

If AI’s impact on PC swiftly, LPDDR5 transition will be faster. Research is underway to increase energy efficiency by putting certain computation into memory. That will take awhile, if ever. The software stack must be established when industry converges on a common set of primitives to offload to memory. A collection of primitives may not suit all applications. Thus, PC memory processing now raises more concerns than answers.

Where will AI models find their sweet spot? Can memory be reduced and portion of the model stored if model sizes remain large? If so, model rotation will need more storage bandwidth. This may hasten the adoption of Gen5 or Gen6 PCIe storage in mainstream PCs. In a recent Apple paper, “LLM in a flash: Efficient Large Language Model Inference with Limited Memory by Alizadeh et al” suggests a way to perform LLMs on devices with excessive DRAM. The authors recommend storing model parameters in flash memory and calling them from DRAM.

They also suggest optimizing data transmission capacity and read throughput to boost inference speeds. The paper’s main parameter for assessing flash loading solutions is latency, which takes into account the I/O cost of loading from flash, the overhead of maintaining memory with freshly loaded data, and the compute cost for inference procedures. The research solves the problem of running LLMs that exceed DRAM capacity by storing model parameters in flash memory and bringing them to DRAM on demand.

AI will advance. Start of integrated NPU integration into CPU and discrete GPUs. Kinara, Memryx, and Hailo AI accelerator cards can offload AI’s impact on PC. Function-specific, smaller, optimized models may also develop. These models must be cycled from storage to memory on demand, but the storage implications are comparable to operating a big model.

Benefits of discrete NPU include:

  • They use less power and heat than CPU and GPU to run complicated AI models.
  • They speed up and improve image recognition, generative AI, chatbots, and other AI applications.
  • They may boost CPU and GPU performance and user AI experience.

Lenovo’s ThinkCentre Neo Ultra desktop, launching in June 2024, says these cards deliver more power-efficient and competent AI processing than CPU and GPU alternatives.

The merit number TOPS alone might be deceiving. Number of inferences per unit time, accuracy, and energy efficiency matter most. A generative AI may achieve tokens per second or steady diffusion in seconds. Benchmarking these will be necessary for industry acceptance. For instance, He visited all CPU vendor exhibits and discrete NPU player demonstrations during CES. Every demo claimed their implementation was better.

PC AI is eagerly anticipated. OEMs see this as an opportunity to update PCs and add higher-value content. Intel aims to enable 100M PCs by 2025, or 30% of the PC TAM. Consumers have something to look forward to in 2024 regardless of adoption rate.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes