Monday, May 27, 2024

Why AMD EPYC CPUs Are Perfect for Your Data Center?


This blog contains results from testing 3rd and 4th Gen AMD EPYC CPUs on various EDA workloads. Links to published performance briefs provide more details about the tested AMD EPYC CPUs‘ performance findings and comparisons.

What is Electronic Design Automation

EDA integrates hardware, software, and services to define, plan, design, develop, test, and manufacture semiconductors. Some early chip designs were hand-drawn. The fast increase in transistor density from a few thousand to tens of billions required technologies to assist and automate this process.

Most modern semiconductor designs take months and millions of dollars. Physical samples are then constructed for testing, followed by months and millions of dollars of intensive testing to find and fix faults. The repeating cycle requires significant effort and money.

Despite these obstacles, design teams are under pressure to create useful, high-performing solutions quickly. Due to cost and time-to-market pressures, efficient design, precise simulation, and rigorous validation are crucial.

EDA focuses on performance and efficiency throughout the workflow. It lets designers design and prototype in a simulated environment, which is faster and cheaper than making actual prototypes. Growing design complexity and AI-enhanced designs highlight the need for significant computational resources to drive future semiconductor design advances.

4th Gen AMD EPYC processors are ideal for Electronic Design Automation (EDA) workloads, and AMD’s strategy relies on this claim. They use existing AMD EPYC CPUs to build future ones, therefore optimizing EDA performance is crucial to their company’s success.

First, let’s preview this blog’s performance improvements. Each stage of the EDA workflow requires different compute. Some tasks perform best with AMD EPYC CPUs with AMD 3D V-Cache technology (9004X processors), while others perform best with high-frequency CPUs.

This blog also highlights each generation’s top-of-stack processors, especially when high-core-count CPUs are needed. In such cases, increasing core count can raise throughput by up to ~60% per server. This may help you to boost performance or minimize data centre footprint. Due to 4th Gen AMD EPYC CPU higher maximum core count than 3rd Gen CPUs, it may maintain job throughput while lowering data centre footprint.

Generational AMD 3D V-Cache uplifts on selected EDA workloads

(16 cores)
(32 cores)
9684X (96c)
7773X (64c) 
Synopsys VCS~1.19x~1.28x~1.55x
Synopsys PrimeSim SPICE~1.27x~1.43x~1.67x
Siemens Tessent~1.25x~1.29x~1.60x
Synopsys Formality Equivalence~1.02x~1.13x~1.37x
Cadence Spectre X~1.13x~1.26x~1.55x
(16 cores)
(32 cores)
9654 (96c)
7763 (64c) 
Synopsys Fusion Compiler (Synthesis)~1.19x~1.23x~1.51x
Synopsys Fusion Compiler (Placement)~1.19x~1.23x~1.49x
Synopsys Fusion Compiler (Routing)~1.22x~1.25x~1.50x
Siemens Calibre nmDRC~1.17x~1.27x~1.60x
Ansys RedHawk-SC~1.20x~1.28xNot tested
Synopsys PrimeTime Suite~1.25x~1.32xNot tested
Generational high frequency & general-purpose uplifts on selected EDA workloads

Testing and Results

AMD tested all single-socket servers with selected 16, 32, and top-of-stack 64- and 96-core 3rd and 4th Gen AMD EPYC CPUs. While setting the 3rd Gen test results to 1.00x, they examined the performance of systems powered by different 4th Gen AMD EPYC processors.

Cache-bound applications benefit from 4th Gen AMD EPYC processors with AMD 3D V-Cache technology, which triples the shared L3 cache from 32 MB to 96 MB per CCD, with 8-12 CCDs per socket compared EPYC 9004 processors. This means 4th Gen AMD EPYC processors with AMD 3D V-Cache technology can have a total L3 cache capacity of 1,152 MB, compared to 384 MB for general-purpose and high-frequency processors. Fast 4th Gen AMD EPYC CPUs are suited for compute-intensive workloads.

Core performance affects EDA tools. Due to frequency and competition on shared resources, CPUs trade off core count and per-core performance. Customers want compute cost and workload productivity balance. They tested AMD EPYC CPUs with different core counts to determine the tradeoff between per-core performance and core count. These studies demonstrate the compelling value of 4th Gen AMD EPYC CPUs for customer-specific EDA applications.

AMD calculated the following metrics from test results:

This statistic measures each instance’s runtime in seconds. To calculate the benchmark’s average runtime, these runtimes were aggregated and divided by the number of concurrent workload instances (e.g., 2 on a 16-core system). Each server had three benchmark iterations. Finally, the mean runtimes of each workload instance were averaged to evaluate application performance on a fully loaded system.

Throughput is multiplied by concurrent jobs to compute job completion per hour. The throughput of a 32-core machine doing 4 concurrent jobs with an average runtime of 2.5 hours is 1.6 jobs per hour.

Performance-per-Watt: Turbostat v21.05.04 (PkgWatt metric) measures throughput divided by average socket power in watts at 5-second intervals throughout each test.

This blog shows the composite average uplifts for processors tested across Runtime, Throughput, and Performance-per-Watt. If a processor has average uplifts of ~1.28x for Runtime, ~1.33x for Throughput, and ~1.26x for Performance-per-Watt, its reported uplift will be ~1.29x, representing the composite average of all three metrics.

With their emphasis on efficiency and performance, the 4th generation AMD EPYC processors are already a perfect match for Electronic Design Automation (EDA) workloads, and things only get better from here:

Deeper integration with EDA tools

AMD may anticipate even improved optimization of EDA tools for EPYC processors as long as AMD and EDA software firms like Synopsys continue to collaborate. For chip designers, this will mean quicker simulation times, more economical memory use, and enhanced performance all around.

AMD 3D V-Cache technology

AMD’s 3D V-Cache technology, which places extra cache on top of the processing cores, is advantageous to the 4th generation EPYC processors. Larger cache sizes and even lower memory latency are anticipated in further iterations of this technology, which will greatly enhance performance for EDA activities that largely rely on data access.

Using CXL in EDA workflow adoption

A new interconnect standard called CXL enables processors to access memory pools connected to other devices directly. For huge dataset EDA workflows, this might be a game-changer. Through direct attachment of high-bandwidth memory to the EPYC processor, CXL may greatly enhance the performance of EDA tools by speeding up data transfer.

Put power efficiency first while doing large-scale EDA simulations

Large-scale EDA simulations can consume a lot of electricity when running. It is anticipated that forthcoming EPYC processors would exhibit even more performance per watt, thereby enabling EDA engineers to execute more intricate simulations without needlessly high energy expenditures.

We also provide detailed information about Synopsys VCS and Spectre X Cadence in another article.

Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes