Pushing PCIe to 300 Watts

Pushing PCIe to 300 Watts

As a part of at this time’s burst of ISC 2021 commerce present bulletins, NVIDIA this morning is asserting that they’re bringing the 80GB model of their A100 accelerator to the PCIe type issue. First introduced in NVIDIA’s customized SXM type issue final fall, the 80GB model of the A100 was launched to not solely increase the full reminiscence capability of an A100 accelerator – doubling it from 40GB to 80GB – however it additionally supplied a uncommon mid-generation spec bump as effectively, cranking up the reminiscence clockspeeds by an extra 33%. Now, after a bit over 6 months, NVIDIA is releasing a PCIe model of the accelerator for purchasers who want discrete add-in playing cards.

The brand new 80GB model of the PCIe A100 joins the present 40GB model, and NVIDIA will proceed promoting each variations of the cardboard. On the entire, it is a fairly easy switch of the 80GB SMX A100 over to PCIe, with NVIDIA dialing down the TDP of the cardboard and the variety of uncovered NVLinks to match the capabilities of the shape issue. The discharge of the 80GB PCIe card is designed to present NVIDIA’s conventional PCIe type issue clients a second, higher-performing accelerator choice, significantly for these customers who want greater than 40GB of GPU reminiscence.

NVIDIA Accelerator Specification Comparability
  80GB A100
(PCIe)
80GB A100
(SXM4)
40GB A100
(PCIe)
40GB A100
(SXM4)
FP32 CUDA Cores 6912 6912 6912 6912
Enhance Clock 1.41GHz 1.41GHz 1.41GHz 1.41GHz
Reminiscence Clock 3.0 Gbps HBM2 3.2 Gbps HBM2 2.43Gbps HBM2 2.43Gbps HBM2
Reminiscence Bus Width 5120-bit 5120-bit 5120-bit 5120-bit
Reminiscence Bandwidth 1.9TB/sec
(1935GB/sec)
2.0TB/sec
(2039GB/sec)
1.6TB/sec
(1555GB/sec)
1.6TB/sec
(1555GB/sec)
VRAM 80GB 80GB 40GB 40GB
Single Precision 19.5 TFLOPs 19.5 TFLOPs 19.5 TFLOPs 19.5 TFLOPs
Double Precision 9.7 TFLOPs
(1/2 FP32 price)
9.7 TFLOPs
(1/2 FP32 price)
9.7 TFLOPs
(1/2 FP32 price)
9.7 TFLOPs
(1/2 FP32 price)
INT8 Tensor 624 TOPs 624 TOPs 624 TOPs 624 TOPs
FP16 Tensor 312 TFLOPs 312 TFLOPs 312 TFLOPs 312 TFLOPs
TF32 Tensor 156 TFLOPs 156 TFLOPs 156 TFLOPs 156 TFLOPs
Relative Efficiency (SXM Model) 90%? 100% 90% 100%
Interconnect NVLink 3
12 Hyperlinks (600GB/sec)
NVLink 3
12 Hyperlinks (600GB/sec)
NVLink 3
12 Hyperlinks (600GB/sec)
NVLink 3
12 Hyperlinks (600GB/sec)
GPU GA100
(826mm2)
GA100
(826mm2)
GA100
(826mm2)
GA100
(826mm2)
Transistor Rely 54.2B 54.2B 54.2B 54.2B
TDP 300W 400W 250W 400W
Manufacturing Course of TSMC 7N TSMC 7N TSMC 7N TSMC 7N
Interface PCIe 4.0 SXM4 PCIe 4.0 SXM4
Structure Ampere Ampere Ampere Ampere

At a excessive stage, the 80GB improve to the PCIe A100 is just about an identical to what NVIDIA did for the SXM model. The 80GB card’s GPU is being clocked identically to the 40GB card’s, and the ensuing efficiency throughput claims are unchanged.

As an alternative, this launch is all in regards to the on-board reminiscence, with NVIDIA equipping the cardboard with newer HBM2E reminiscence. HBM2E is the casual title given to the latest replace to the HBM2 reminiscence commonplace, which again in February of this 12 months outlined a brand new most reminiscence velocity of three.2Gbps/pin. Coupled with that frequency enchancment, manufacturing enhancements have additionally allowed reminiscence producers to double the capability of the reminiscence, going from 1GB/die to 2GB/die. The web consequence being that HBM2E presents each higher capacities in addition to higher bandwidths, two issues which NVIDIA is profiting from right here.

With 5 energetic stacks of 16GB, 8-Hello reminiscence, the up to date PCIe A100 will get a complete of 80GB of reminiscence. Which, operating at 3.0Gbps/pin, works out to simply beneath 1.9TB/sec of reminiscence bandwidth for the accelerator, a 25% improve over the 40GB model. Which means that not solely does the 80GB accelerator provide extra native storage, however uncommon for a bigger capability mannequin, it additionally presents some additional reminiscence bandwidth to go together with it. That signifies that in reminiscence bandwidth-bound workloads the 80GB model needs to be quicker than the 40GB model even with out utilizing its additional reminiscence capability.

This extra reminiscence does come at a price, nonetheless: energy consumption. For the 80GB A100 NVIDIA has wanted to dial issues as much as 300W to accommodate the upper energy consumption of the denser, increased frequency HBM2E stacks. This can be a very notable (if not outright stunning) change in TDPs as a consequence of the truth that NVIDIA has lengthy held the road for its PCIe compute accelerators at 250W, which is broadly thought of the bounds for PCIe cooling. So a 300W card not solely deviates from NVIDIA’s previous playing cards, however it signifies that system integrators might want to discover a approach to offer one other 50W of cooling per card. This isn’t one thing I anticipate to be a hurdle for too many designs, however I positively gained’t be shocked if some integrators proceed to solely provide 40GB playing cards in consequence.

And even then, the 80GB PCIe A100 would appear to be held again a bit by its type issue. The three.0Gbps reminiscence clock is 7% decrease than the 80GB SXM A100 and its 3.2Gbps reminiscence clock. So NVIDIA is outwardly leaving some reminiscence bandwidth on the desk simply to get the cardboard to slot in the expanded 300W profile.

On that notice, it doesn’t seem that NVIDIA has modified the shape issue of the PCIe A100 itself. The cardboard is completely passively cooled, designed for use with servers with (much more) highly effective chassis followers, and fed by twin 8-pin PCIe energy connectors.

With reference to general efficiency expectations, the brand new 80GB PCIe card ought to path the SXM card similarly because the 40GB fashions. Sadly, NVIDIA’s up to date A100 datasheet doesn’t embody a relative efficiency metric this time round, so we don’t have any official figures for a way the PCIe card will evaluate to the SXM card. However, given the continued TDP variations (300W vs 400W+), I’d anticipate that the real-world efficiency of the 80GB PCIe card is close to the identical 90% mark because the 40GB PCIe card. Which serves to reiterate that GPUs clockspeed aren’t the whole lot, particularly on this age of TDP-constrained {hardware}.

In any case, the 80GB PCIe A100 is designed to attraction to the identical broad use circumstances because the SXM model of the cardboard, which roughly boils right down to AI dataset sizes, and enabling bigger Multi-Occasion GPU (MIG) cases. Within the case of AI, there are quite a few workloads which might profit by way of coaching time or accuracy through the use of a bigger dataset, and general GPU reminiscence capability has repeatedly been a bottleneck on this subject, as there’s at all times somebody who may use extra reminiscence. In the meantime NVIDIA’s MIG know-how, which was launched on the A100, advantages from the reminiscence improve by permitting every occasion to be allotted extra reminiscence; operating at a full 7 cases, every can now have as much as 10GB of devoted reminiscence.

Wrapping issues up, NVIDIA isn’t asserting particular pricing or availability data at this time. However clients ought to anticipate to see the 80GB PCIe A100 playing cards quickly.

Leave a Reply

Your email address will not be published. Required fields are marked *