Thursday, December 19, 2024

G593-SD1 & ZD1 : High-Capacity, Liquid-Cooled GPU Servers

- Advertisement -

Customized Cooling for the G593 Series

With an 8-GPU baseboard specifically designed for it, the GPU-focused G593 series boasts both liquid and air cooling. The industry’s most easily scalable chassis, the 5U, can accommodate up to 64 GPUs in a single rack and sustain 100kW of IT infrastructure. This reduces the footprint of the data center by consolidating the IT hardware. Growing consumer desire for higher energy efficiency has led to the development of the G593 series servers for DLC. Liquids can quickly and efficiently remove heat from heated components to maintain lower operating temperatures since they have a higher thermal conductivity than air. Additionally, the data center uses less energy overall because it depends on heat and water exchangers.

“With the NVIDIA HGX H200 GPU, they provide an excellent AI scaling GIGABYTE solution,” stated Vincent Wang, vice president of sales at Giga Computing. “It is necessary to make sure the infrastructure can handle the computational demand and complexity of AI/ML, and data science models due to the complexity of business data centers. Increasing optimization is required due to the growing complexity. They are able to create and fund scalable AI infrastructure. Additionally, by working with the NVIDIA NVAIE platform, They can handle every facet of AI data center infra services, from software stack deployment to overall coverage.

- Advertisement -

For the NVIDIA HGX H200 and NVIDIA HGX H100 platforms, GIGABYTE has now launched variants of its G593 series that are air-cooled and DLC compatible. Future GIGABYTE servers with the NVIDIA HGX B200A architecture will additionally be available with liquid or air cooling. As a solution to the requirement for a full supercluster with 256x NVIDIA H100 GPUs, GIGABYTE has already launched GIGAPOD for rack-scale deployment of all these NVIDIA HGX systems. This system consists of five racks for DLC servers, four of which are filled with eight G593 servers apiece. Additionally, a nine-rack system may accommodate the same thirty-two G593-SD1 for air cooling.

NVIDIA NVLink and NVIDIA NVSwitch provide excellent interconnectivity, and systems are combined with InfiniBand to facilitate interconnectivity across cluster nodes. All things considered, a full cluster can handle scientific simulations, large-scale model training, and more with ease.

G593-ZD1-LAX3

  • GPU + CPU Direct cooling solution in liquid
  • GPU: NVIDIA HGXTM H200 8-GPU liquid-cooled
  • GPU-to-GPU bandwidth of 900GB/s using NVIDIA NVLink and NVSwitch
  • Two Processors AMD EPYC 9004 Series
  • 24-piece DDR5 RDIMM with 12 channels
  • Architecture with Dual ROM
  • 2 x 10Gb/s LAN ports through the Intel X710-AT2
  • 2 x M.2 slots with x4 and x1 PCIe Gen3 interfaces
  • 8 × 2.5″ Gen5 hot-swappable bays for SAS-4, SATA, and NVMe
  • Four FHHL Gen5 x16 PCIe slots
  • PCIe Gen5 x16 slots with 8 LPs
  • 4+2 3000W 80 PLUS Titanium backup power sources

G593-SD1-LAX3

  • GPU + CPU Direct cooling solution in liquid
  • 8-GPU NVIDIA HGX H200 liquid-cooled
  • GPU-to-GPU bandwidth of 900GB/s using NVIDIA NVLink and NVSwitch
  • Two Intel Xeon Scalable Processors, Generations 5 and 4
  • Intel Xeon Dual Core Max Series
  • 32 DIMMs, 8-Channel DDR5 RDIMM
  • Architecture with Dual ROM
  • Compliant with SuperNICs and NVIDIA BlueField-3 DPUs
  • Intel X710-AT2 provides two 10Gb/s LAN ports.
  • 8 × 2.5″ Gen5 hot-swappable bays for SAS-4, SATA, and NVMe
  • Four FHHL Gen5 x16 PCIe slots
  • PCIe Gen5 x16 slots with 8 LPs
  • 4+2 3000W 80 PLUS Titanium backup power sources

Fueling the Next Wave of Energy Efficiency and Server Architecture

G593-ZD1

AMD EPYC 9004 Series processors continue the EPYC breakthroughs and chiplet designs that led to AMD’s 5nm ‘Zen 4’ architecture. The new EPYC processor family includes several new capabilities to target a wide range of applications, improving performance per watt and CPU performance. on a platform with double the throughput of PCIe 4.0 lanes and support for 50% more memory channels. With components designed to maximize the performance of EPYC-based systems that enable fast PCIe G593, Gen5 NVMe SSDs, and highly performant DDR5 memory, GIGABYTE is prepared for this new platform.

AMD EPYC 4th Generation Processors for SP5 Socket

5 nm architecture

More transistors crammed into a smaller space led to an improvement in compute density.

- Advertisement -
128 cores for the CPU

Zen 4c and Zen 4 cores have dedicated cores and intended workloads.

Big L3 cache

Specific CPUs for technical computing feature three times or more L3 cache.

Compatibility with SP5

There is a single platform that supports all 9004 series processors.

Twelve channels

Six terabytes of memory can fit in one socket.

DDR5 RAM

Increased DDR5 capacity per DIMM and increased memory throughput

PCIe 5.0 lanes

Enhanced IO throughput on PCIe x16 lanes, reaching 128GB/s bandwidth

Support for CXL 1.1+

Compute Express Link makes disaggregated compute architecture viable.

G593-SD1

Accelerating AI and Leading Efficiency

on business transformation, Intel has increased CPU performance by engineering richer features on a new platform. The 4th and 5th Gen Intel Xeon Scalable processors’ built-in AI acceleration engines boost AI and deep learning performance, while networking, storage, and analytics use other accelerators. Adding a host of new features to target a wide range of workloads, the new Intel Xeon processor families will deliver even better CPU performance and performance per watt Using a PCIe 5.0 platform with 2x the previous gen throughput to speed GPU-storage data transfer. Intel introduced the Intel Xeon CPU Max Series with HBM to boost memory-bound HPC and AI applications. GIGABYTE has solutions ready for Intel Xeon CPU-based systems with fast PCIe Gen5 accelerators, Gen5 NVMe SSDs, and high-performance DDR5 memory.

Why Opt for GIGABYTE Servers for Liquid Cooling?

Amazing Performance

Due to the great performance of liquid-cooled components that run well below CPU TDP, servers will operate with exceptional stability.

Energy Conservation

A liquid-cooled server can outperform an air-cooled server by requiring less electricity, fans, and speeds.

Reduced Noise

Numerous loud, high-speed fans are needed for servers. With fewer fans and a liquid cooling method, GIGABYTE has discovered a way to cut down on noise.

A Track record of success

The direct liquid cooling system supplier has served desktop PCs and data centers for 20 years. GIGABYTE has 20+ years of experience.

Dependability

Maintenance for liquid cooling solutions is low and visible. GIGABYTE and liquid cooling suppliers warranty components.

Usability

GIGABYTE liquid-cooled servers can be rack-mounted or connected to a building’s water supply. and provides dry, simple, and fast disconnects.

Elevated Efficiency

Compatible with NVIDIA HGX H200 8-GPU

High-speed interconnects and H200 Tensor Core GPUs are combined by the NVIDIA HGX H200 to provide every data center with exceptional performance, scalability, and security. With configurations of up to eight GPUs, the world’s most potent accelerated scale-up server platform for AI and HPC is created, offering unparalleled acceleration and an astounding 32 petaFLOPS of performance. Over 32 petaflops of FP8 deep learning computing and 1.1TB of aggregate high-bandwidth memory are offered by an eight-way HGX H200. In order to facilitate cloud networking, composable storage, zero-trust security, and GPU computing elasticity in hyperscale AI clouds, NVIDIA HGX H200 also incorporates NVIDIA BlueField-3 data processing units (DPUs).

Energy Efficiency

Controlled Fan Speed Automatically

Automatic Fan Speed Control is enabled on GIGABYTE servers to provide optimal cooling and power efficiency. Intelligently placed temperature sensors across servers will automatically adjust fan speeds.

Elevated Availability

Ride-through Smart (SmaRT)

In order to guard against data loss and server outages due to AC power outages, GIGABYTE has included SmaRT into all of server platforms. The system will throttle in response to such an occurrence, maintaining availability and lowering power consumption. Power supply capacitors can provide power for 10–20 ms, enough time to switch to a backup power source and continue running.

SCMP means Smart Crises Management and Protection

SCMP is patented by GIGABYTE and utilized in non-redundant PSU servers. SCMP puts the CPU in ultra-low power mode to prevent an unintended shutdown, component damage, and data loss. In the event of a malfunctioning PSU or overheated system

Architecture with Dual ROM

The backup BMC and/or BIOS will replace the primary BIOS upon system reset if the ROM cannot boot. The backup BMC’s ROM will immediately update the backup through synchronization as soon as the primary BMC is updated. Users can upgrade the BIOS based on firmware version.

Hardware Safety

TPM 2.0 Module Option

Passwords, encryption keys, and digital certificates are kept in a TPM module for hardware-based authentication to keep unauthorized users from accessing your data. There are two types of GIGABYTE TPM modules: Low Pin Count and Serial Peripheral Interface.

Easy to Use

Tool-free Drive Bays Style

A clip secures the drive. It takes seconds to install or swap out a new drive.

Management with Added Value

Gigabete provides free management programs with a dedicated tiny CPU integrated into the server.

Console for GIGABYTE Management

Every server comes with the GIGABYTE Management Console, which can manage a single server or a small cluster. After the servers are up and running, the browser-based graphical user interface allows IT workers to monitor and manage each server’s health in real time. Furthermore, the GIGABYTE Management Console offers:

  • Support for industry-standard IPMI specifications that allow open interface service integration onto a single platform.
  • Automatic event recording makes it simpler to decide what to do next by capturing system behavior up to 30 seconds before an event happens.
  • Integrate SAS/SATA/NVMe devices and RAID controller firmware into GIGABYTE Management Console to monitor and manage Broadcom MegaRAID adapters.

Management of GIGABYTE Servers (GSM)

A software suite called GSM can manage many server clusters online. Any GIGABYTE server can run GSM on Windows and Linux. GSM, available from GIGABYTE, meets Redfish and IPMI standards. The following tools are among the full set of system administration features that are included with GSM:

  • GSM Server: Software that runs on an administrator’s PC or a server in the cluster to enable real-time, remote control via a graphical user interface. Large server clusters can have easier maintenance thanks to the software.
  • GSM CLI: A command-line interface designed for remote management and monitoring.
  • GSM Agent: An application that is installed on every GIGABYTE server node and interfaces with GSM Server or GSM CLI to retrieve data from all systems and devices via the operating system.
  • GSM Mobile: An iOS and Android mobile application that gives administrators access to real-time system data.
  • The GSM Plugin is an application program interface that enables users to manage and monitor server clusters in real time using VMware vCenter.
- Advertisement -
agarapuramesh
agarapurameshhttps://govindhtech.com
Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes