Saturday, July 6, 2024

OCP Hardware Standards Promote AI Development

promoting standardization to enhance AI and OCP infrastructure

In order to bring the advantages of open-source cooperation to hardware, Microsoft has actively participated with the Open Compute Project (OCP) community. This has led to hyperscale innovation throughout the industry. In conjunction with industry heavyweights like Google, AMD, and NVIDIA, we debuted Project Caliptra at the OCP Global Summit last year.

We also unveiled the Mount Shasta modular chassis design, which combines form factor, power, and administration interface into a single integrated design. These drew on several prior innovations in the fields of systems-level design, security, and rack-level architecture.

The emergence of generative AI has presented a substantial challenge for the computing industry, forcing us to change our fundamental building blocks in order to satisfy the rising infrastructure needs. Microsoft will present its most recent innovations in supercomputing architecture and hardware at the OCP Global Summit this year in an effort to encourage this new era via standardization and innovation.

Standardization of GPU and accelerators for quick adoption in hyperscaler fleets

Datacenters have been adopting graphics processing units (GPUs) and accelerators more often as a result of the rise in generative AI applications. Hyperscalers now need to invest in customized procedures and tools to adapt different AI hardware for their fleets as a consequence of the variety of new products and accompanying integration needs.

To meet this problem, we are happy to work along with AMD, Google, Meta, and NVIDIA to develop OCP standard criteria for GPU management.

Standardization makes it possible for suppliers to work smoothly with hyperscalers and gives them the opportunity to quickly host a variety of suppliers in their datacenters. The Universal Base Board and Discrete variants of accelerators and GPU cards are the focus of this new OCP endeavor.

Initial standards have been generated by several OCP workgroups with an emphasis on interfaces, Reliability, Availability and Serviceability (RAS) requirements for hardware, and GPU firmware update requirements.

A popular OCP tool that performs acceptance testing for accelerator management in cloud datacenters is used in an innovative way to consider compliance as a basic motivator for fostering innovation.

AI performance and efficiency optimization with MX data formats

The need for more effective, scalable, and affordable AI systems is obvious as AI is being integrated into every area of our life. In order to handle the constantly increasing complexity and demands of existing AI models, this requires optimization throughout the AI stack, including developments in narrow-precision AI data formats.

These narrow-precision formats and their related optimized algorithms, for example, are advancements in AI hardware technology that provide never-before-seen opportunity to overcome fundamental issues in maintaining scalable and sustainable AI systems.

The Microscaling Formats (MX) Alliance was established earlier this year by Microsoft in collaboration with AMD, Arm, Intel, Meta, NVIDIA, Qualcomm, and others with the aim of developing and standardizing next-generation 6- and 4-bit data types for AI training and inference.

Microscaling technology, which builds on years of design space exploration and development at Microsoft, allows for sub-8-bit formats while simultaneously boosting the robustness and usability of current 8-bit forms like FP8 and INT8. By increasing the energy efficiency of AI in datacenters and on various AI endpoints, these developments also aid toward more general sustainability objectives like lowering the environmental impact of AI technology as demand grows.

Four common data formats (MXFP8, MXFP6, MXFP4, and MXINT8) that are compatible with current AI stacks, support implementation flexibility across both hardware and software, and enable fine-grain Microscaling at the hardware level are introduced in the Microscaling Formats (MX) Specification (MX) v1.0 released through OCP.

The Microsoft AI team has conducted extensive research that show MX formats may be simply implemented for a wide range of practical applications, including language models, machine vision, and recommender systems. LLM pre-training at 6- and 4-bit precisions is also possible with MX technology without altering traditional training recipes. A whitepaper and emulation libraries with more information have also been released in addition to the basic standard.

OCP-SAFE: Improving datacenter safety and openness

The architecture of modern datacenters consists of a wide range of peripherals and processing units that use firmware. It is crucial to ensure the security of this firmware, which necessitates thorough verification of the code quality and supply chain origin.

Many datacenter providers have chosen internal or external security audits on device firmware to suit the specific security requirements of Cloud Service Providers and other market segments. However, this strategy often limits security guarantees to certain cloud service providers.

The OCP Security Appraisal Framework Enablement (OCP-SAFE) was developed in conjunction with OCP to solve this issue. It was developed by Microsoft and Google. Hardware makers will be empowered to fulfill security standards across markets while improving product quality thanks to this framework’s standardization of security criteria and integration of Security Review Providers (SRP) to provide independent guarantees.

OCP-SAFE removes obstacles to acquiring hardware security assurance for end users by giving succinct evaluation findings. These evaluations may be used by both datacenter operators and customers to make knowledgeable deployment choices about the security of components. OCP-SAFE has been adopted by a number of businesses, including AMD and SK-Hynix, which provide succinct security assessments. 

Visitors to this year’s OCP Global Summit are invited to stop by Microsoft’s booth #B7 to check out some of our most recent cloud hardware demos that include work from partners in the OCP community, such as:

  • Microsoft’s Virtual Client library for Azure is an open source, standardized collection of business benchmarks and cloud user workloads.
  • The most recent version of Caliptra, an open source, reusable silicon IP lock for Root of Trust for Measurement (RTM), is Caliptra 1.0.
  • The most recent open source modular chassis design for the Shasta Open Rack is the Shasta Open Rack V3 chassis.
  • A new backwards-compatible form factor standard known as QSFPDD 1.6T offers 224 Gbps of mated performance utilizing PAM4 and an aggregate bandwidth capacity of 1.6 Tbps.

News source

agarapuramesh
agarapurameshhttps://govindhtech.com
Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.
RELATED ARTICLES

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes