Hyperscale NAS Drives GPU Computing For HPC And AI

August 13, 2024

200

Hyperscale NAS

HPC High Performance Computing

High-performance computing (HPC) and artificial intelligence (AI) have transformed data processing and analysis for organisations in recent years, enabling them to handle difficult problems more quickly and precisely. But outdated enterprise storage infrastructure can impede data flow and prevent data from being moved to GPU rich areas. As a result, enterprises are rushing to update their data architectures while making the most of their already-made infrastructure investments.

Access to vast volumes of data and exceptionally high speed to feed gigantic GPU clusters that process the data are prerequisites for developing, training, and iterating AI models. The needs of enterprise AI, machine learning, and deep learning programs, as well as the growing popularity of GPU computing on-premises and in the cloud, are difficult for legacy Hyperscale NAS infrastructures to match.

Bringing Order to the Chaos of Data

The methods for continuing to derive value from structured data are widely recognised, and the usage of data analytics, BI software, and data warehouses for structured data is a well-established industry. On the other hand, the recent development in generative artificial intelligence (GenAI) and related deep learning (DL) technologies offers the potential to uncover value hidden in unstructured data.

These AI workloads will not only assist data owners in determining what they have and should preserve against what they can delete, but AI/DL use cases may also enable businesses to find previously undiscovered value in massive amounts of unstructured file data. Organisations may now finally extract commercial value and improve operational efficiency by leveraging both their structured and unstructured digital assets.

The issue is that organisations’ ability to adopt AI initiatives more quickly without incurring unmanageable costs and complexity is severely constrained by the walls dividing silos of unstructured data. They require the adaptability to feed GPU-powered compute clusters for AI/DL workflows with any or all of the data from several, incompatible storage silos. Historically, achieving this functionality has required combining information into brand-new, high-performance storage repositories, which significantly increases the project’s financial burden.

The implementation of AI initiatives is slowed down by these high capital and operating costs, which also cast doubt on the projected return on investment of these efforts. In order to execute an AI strategy, organisations just cannot afford to replace their current infrastructure and move their unstructured data to a new platform.

Cracking the AI Mysteries by Solving the Silo Problem

Workflows for artificial intelligence have several stages, and hence, the use cases for AI and deep learning might range substantially based on the sector or intended result. In the case of unstructured data, sentiment analysis in text data or activity recognition in video data would not yield the same results when it comes to the targeted placement of advertisements than medical picture analysis for disease identification. Prediction models relied on video and other sensor data to improve autonomous car behaviour or streamline manufacturing automation will differ from inferencing workloads for analysing satellite imagery for agricultural yields or to inform decisions about irrigation or water management.

Despite the wide range of AI use cases, one thing they all have in common is the requirement to gather data from numerous, various sources, frequently at separate places. The core issue is that data access is always routed through a file system at some point, for both human users and AI/DL applications. The file system’s metadata, which serves as a bridge between raw data and the file structure that users and applications view, is used to accomplish this purpose.

The problem is that file systems are now integrated into every vendor’s storage infrastructure with the advent of network-attached storage (NAS). Incompatible vendor-specific variants of the underlying file systems contain these details, even if various manufacturers will expose the file/folder structure using industry-standard NFS or SMB file access protocols.

This issue is especially serious for AI/DL workloads, because combining data from various sources to create a single, unified perspective is a crucial first step. In order to determine which files should be pipelined into the process, AI workloads need to be able to categorise and/or label the entire dataset. In cases when access to consolidated datasets is required, this requirement increases the cost and complexity of AI procedures.

Feeding the GPU Beast: Maximising GPU Computing for AI Pipelines

However, breaking down silos in order to have worldwide, uniform access to an organization’s data is insufficient to address the AI issue. The main issue is how to supply high-performance GPU clusters which may be located remotely, in the cloud, or on-premises without making the data copy and silo issue worse. In other words, how to feed GPU-based computing without forcing enterprises to transfer all of their data into a brand-new, custom, high-performance data silo that is costly?

A new class of NAS architecture known as hyperscale NAS is built on open standards that are included in all common Linux distributions now in use in the business. This means that businesses may accelerate their current scale-out NAS infrastructure in addition , using data already in place to feed GPU clusters for AI workloads that are located on-site or in the cloud.

“Hyperscale NAS is not a product; it’s a new NAS architecture model based on open standards available in all standard Linux distributions used in the industry today,” says Marc Staimer of the CUBE Research. The essential components enabling standards-based HPC-level parallel file system performance on commodity hardware and legacy storage from any data centre or cloud provider are in the Linux kernel, which runs on servers and hypervisors in almost every data centre. In particular, use NFS v3.2 in tandem with Flex Files, NFSv3, and the pNFSv4.2 client.

The first design to incorporate standards-based connectivity, enterprise NAS capabilities, and the high performance and linear scalability of HPC parallel file systems is called hyperscale NAS. It is highly suitable for any use case needing parallel processing and high throughput, low latency data access, and it is especially well-suited for powering GPU computing at scale for use cases like generative AI training.

Hyperscale NAS performance scales linearly to thousands of storage nodes, but standard scale-out NAS architectures begin to plateau as data volumes increase.

Even with live data, AI workloads such as moving data to centres of excellence for cleansing, moving it to a remote data centre for training, or moving it to a different location or cloud-based high-performance computing resource for inferencing can all be automated in the background without interfering with user or application access.

Because AI workloads usually require several rounds across multiple different datasets, this capacity is especially crucial. An HPC-class infrastructure with GPU clusters which could be on-site or a temporary cloud-based resource cluster assembled for the task is frequently needed for inferencing workloads. Additional custom metadata tags that identify the algorithm employed or other variables required to track the results or recreate the processes to iterate the runs can be applied automatically at each stage of the process.

Additionally, the ability to automatically arrange and safeguard training data into low-cost resources is crucial since many industries, including pharmaceutical, financial services, and biotechnology, want the training data and the resulting models to be archived for compliance and legal reasons. Retrieving previous model data from archives is a straightforward process that can be automated in the background thanks to custom metadata tags that monitor data provenance, iteration information, and other workflow components.

By doing this, data scientists may be granted direct, self-service control over every phase of the AI pipeline across several sites, storage silos, and cloud environments without having to ask IT administrators for help obtaining data or become involved in IT infrastructure management themselves. These processes can make use of data without having to replace outdated storage systems with new infrastructure since the data is easily accessible from available storage resources.

According to Steve McDowell, Chief Analyst & CEO of NAND Research, Hammerspace’s Hyperscale NAS, for instance, “addresses the complex demands of modern high-performance computing, particularly in AI, machine learning, and GPU-intensive tasks.” Hyperscale NAS can train AI models with thousands of storage nodes and GPUs, making it a promising industry leader. Hyperscale NAS is unique not only in its performance but also in its architecture, which can boost the performance of current NAS systems without the need for changes.

To sum up

The quick change to support AI/DL workloads has brought forward difficulties that make the silo issues that IT organisations have long faced worse. With its radically altered NAS architecture, Hyperscale NAS enables enterprises to leverage cutting-edge HPC technologies without sacrificing enterprise standards.

Hyperscale NAS Drives GPU Computing For HPC And AI

Hyperscale NAS

HPC High Performance Computing

Bringing Order to the Chaos of Data

Cracking the AI Mysteries by Solving the Silo Problem

Feeding the GPU Beast: Maximising GPU Computing for AI Pipelines

To sum up

StableHLO & OpenXLA: Enhancing Hardware Portability for ML

Intel Extension For Scikit-learn: Time Series PCA & DBSCAN

Intel Quartus Prime Pro Edition 25.1 Optimized for Agilex 3

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

IBM z17: AI Power with Spyre Accelerator and Telum II

PyRDP And Rogue RDP: Automating Malicious RDP Exploits

Bosch Quantum Sensing: A New Era in Quantum Sensor

StableHLO & OpenXLA: Enhancing Hardware Portability for ML

China’s Origin Wukong Quantum Computer for AI Model Training

Intel Extension For Scikit-learn: Time Series PCA & DBSCAN

About Us

POPULAR CATEGORY