Intel to Launch Subsequent-Gen Sapphire Rapids Xeon with Excessive Bandwidth Reminiscence

Intel to Launch Next-Gen Sapphire Rapids Xeon with High Bandwidth Memory

As a part of at this time’s Worldwide Supercomputing 2021 (ISC) bulletins, Intel is showcasing that it is going to be launching a model of its upcoming Sapphire Rapids (SPR) Xeon Scalable processor with high-bandwidth reminiscence (HBM). This model of SPR-HBM will come later in 2022, after the principle launch of Sapphire Rapids, and Intel has said that it is going to be a part of its normal availability providing to all, moderately than a vendor-specific implementation.

Hitting a Reminiscence Bandwidth Restrict

As core counts have elevated within the server processor area, the designers of those processors have to make sure that there may be sufficient knowledge for the cores to allow peak efficiency. This implies growing giant quick caches per core so sufficient knowledge is shut by at excessive pace, there are excessive bandwidth interconnects contained in the processor to shuttle knowledge round, and there may be sufficient predominant reminiscence bandwidth from knowledge shops situated off the processor.

Our Ice Lake Xeon Assessment system with 32 DDR4-3200 Slots

Right here at AnandTech, we have now been asking processor distributors about this final level, about predominant reminiscence, for some time. There may be solely a lot bandwidth that may be achieved by frequently including DDR4 (and shortly to be DDR5) reminiscence channels. Present eight-channel DDR4-3200 reminiscence designs, for instance, have a theoretical most of 204.8 gigabytes per second, which pales compared to GPUs which quote 1000 gigabytes per second or extra. GPUs are capable of obtain greater bandwidths as a result of they use GDDR, soldered onto the board, which permits for tighter tolerances on the expense of a modular design. Only a few predominant processors for servers have ever had predominant reminiscence be built-in at such a stage.

Intel Xeon Phi ‘KNL’ with 8 MCDRAM Pads in 2015

One of many processors that was once constructed with built-in reminiscence was Intel’s Xeon Phi, a product discontinued a few years in the past. The premise of the Xeon Phi design was plenty of vector compute, managed by as much as 72 fundamental cores, however paired with 8-16 GB of on-board ‘MCDRAM’, related by way of 4-8 on-board chiplets within the package deal. This allowed for 400 gigabytes per second of cache or addressable reminiscence, paired with 384 GB of predominant reminiscence at 102 gigabytes per second. Nevertheless, since Xeon Phi was discontinued, no predominant server processor (no less than for x86) introduced to the general public has had this form of configuration.

New Sapphire Rapids with Excessive-Bandwidth Reminiscence

Till subsequent yr, that’s. Intel’s new Sapphire Rapids Xeon Scalable with Excessive-Bandwidth Reminiscence (SPR-HBM) might be coming to market. Somewhat than disguise it away to be used with one explicit hyperscaler, Intel has said to AnandTech that they’re dedicated to creating HBM-enabled Sapphire Rapids accessible to all enterprise prospects and server distributors as properly. These variations will come out after the principle Sapphire Rapids launch, and entertain some fascinating configurations. We perceive that this implies SPR-HBM might be accessible in a socketed configuration.

Intel states that SPR-HBM can be utilized with normal DDR5, providing a further tier in reminiscence caching. The HBM could be addressed instantly or left as an automated cache we perceive, which might be similar to how Intel’s Xeon Phi processors might entry their excessive bandwidth reminiscence.

Alternatively, SPR-HBM can work with none DDR5 in any respect. This reduces the bodily footprint of the processor, permitting for a denser design in compute-dense servers that don’t rely a lot on reminiscence capability (these prospects have been already asking for quad-channel design optimizations anyway).

The quantity of reminiscence was not disclosed, nor the bandwidth or the expertise. On the very least, we count on the equal of as much as 8-Hello stacks of HBM2e, as much as 16GB every, with 1-4 stacks onboard resulting in 64 GB of HBM. At a theoretical high pace of 460 GB/s per stack, this might imply 1840 GB/s of bandwidth, though we are able to think about one thing extra akin to 1 TB/s for yield and energy which might nonetheless give a sizeable uplift. Relying on demand, Intel could fill out completely different variations of the reminiscence into completely different processor choices.

One of many key components to contemplate right here is that on-package reminiscence can have an related energy value throughout the package deal. So for each watt that the HBM requires contained in the package deal, that’s one much less watt for computational efficiency on the CPU cores. That being stated, server processors usually don’t push the boundaries on peak frequencies, as a substitute choosing a extra environment friendly energy/frequency level and scaling the cores. Nevertheless HBM on this regard is a tradeoff – if HBM have been to take 10-20W per stack, 4 stacks would simply eat into the ability price range for the processor (and that energy price range needs to be managed with further controllers and energy supply, including complexity and price).

One factor that was complicated about Intel’s presentation, and I requested about this however my query was ignored through the digital briefing, is that Intel retains placing out completely different package deal pictures of Sapphire Rapids. Within the briefing deck for this announcement, there was already two variants. The one above (which truly appears to be like like an elongated Xe-HP package deal that somebody put a brand on) and this one (which is extra sq. and has completely different notches):

There have been some unconfirmed leaks on-line showcasing SPR in a 3rd completely different package deal, making all of it complicated.


Sapphire Rapids: What We Know

Intel has been teasing Sapphire Rapids for nearly two years because the successor to its Ice Lake Xeon Scalable household of processors. Constructed on 10nm Enhanced SuperFin, SPR might be Intel’s first processors to make use of DDR5 reminiscence, have PCIe 5 connectivity, and help CXL 1.1 for next-generation connections. Additionally on reminiscence, Intel has said that Sapphire Rapids will help Crow Go, the subsequent era of Intel Optane reminiscence.

For core expertise, Intel (re)confirmed that Sapphire Rapids might be utilizing Golden Cove cores as a part of its design. Golden Cove might be central to Intel’s Alder Lake client processor later this yr, nonetheless Intel was fast to level out that Sapphire Rapids will provide a ‘server-optimized’ configuration of the core. Intel has achieved this prior to now with each its Skylake Xeon and Ice Lake Xeon processors whereby the server variant usually has a unique L2/L3 cache construction than the patron processors, in addition to a unique interconnect (ring vs mesh, mesh on servers).

Sapphire Rapids would be the core processor on the coronary heart of the Aurora supercomputer at Argonne Nationwide Labs, the place two SPR processors might be paired with six Intel Ponte Vecchio accelerators, which can even be new to the market. As a part of this announcement at this time, Intel additionally said that Ponte Vecchio might be extensively accessible, in OAM and 4x dense kind components:

Sapphire Rapids can even be the primary Intel processors to help Superior Matrix Extensions (AMX), which we perceive to assist speed up matrix heavy workflows comparable to machine studying alongside additionally having BFloat16 help. This might be paired with updates to Intel’s DL Enhance software program and OneAPI help. As Intel processors are nonetheless extremely popular for machine studying, particularly coaching, Intel desires to capitalize on any future progress on this market with Sapphire Rapids. SPR can even be up to date with Intel’s newest {hardware} based mostly safety.

It’s extremely anticipated that Sapphire Rapids can even be Intel’s first multi compute-die Xeon the place the silicon is designed to be built-in (we’re not counting Cascade Lake-AP Hybrids), and there are unconfirmed leaks to recommend that is the case, nonetheless nothing that Intel has but verified.

The Aurora supercomputer is anticipated to be delivered by the top of 2021, and is anticipated to be the primary official deployment of Sapphire Rapids. We count on a full launch of the platform someday within the first half of 2022, with normal availability quickly after. The precise launch of SPR-HBM is unknown, nonetheless given these time frames, This fall 2022 appears pretty affordable relying on how aggressive Intel desires to assault the launch in mild of any competitors from different x86 distributors or Arm distributors.

Associated Studying

Leave a Reply

Your email address will not be published. Required fields are marked *