Tuesday, July 9, 2024

EPYC Rome Chips from AMD fail after 1,044 days of operation

AMD’s EPYC 7002 ‘Rome’ server chips have a bug that can cause a core to hang after approximately 1,044 days of continuous operation.

A Reddit user named acid_migrain suggests that the core hangs occur around 1,042 days and 12 hours of uptime (2.93 years), based on calculations involving the TSC ticks and reference clock frequency.

For example, Intel’s 8th-gen chips, released in 2017, still have over 150 listed errata. While the exact number of errata in AMD’s Rome chips is unknown, it is mentioned that 39 errata remain, which is relatively low compared to Intel’s chips.

The problem, which affects AMD’s second-generation EPYC processors (the newest AMD CPUs are the fourth-generation Genoa ones), is succinctly described by AMD, but there is a lot to unravel.

To resolve the issue, there are two options:

  1. reboot the server before reaching 1,044 days of uptime, which resets the “timer” for the bug, or disable the CC6 sleep state altogether.
  2. The impact of this bug may vary depending on the user’s usage scenario. Chip errata are common in the industry, but some non-critical functionality-related errata may remain unresolved.

Certainly! Resetting the “timer” associated with AMD’s EPYC 7002 ‘Rome’ server chips or disabling the CC6 sleep state feature can resolve the bug.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes