Computer Hardware

What Makes the NVIDIA L40S Special? First Impressions on L40S

September 7, 2023 • 4 min read

SPC-Blog-what-makes-NVIDIA-l40-gpu-special.png

DGX and HGX are Costly and Hard to Come By… Alternative?

NVIDIA is the choice hardware for anything AI-related. Its compute leadership in AI with the NVIDIA A100 and NVIDIA H100 drives high demand for NVIDIA’s high-performance GPUs for developing the next wave of AI models. However, NVIDIA A100 and NVIDIA H100 have a very high startup cost for smaller scale operations, especially for the DGX and HGX variants.

Training and Inferencing complex AI models for text-to-image and LLM generative AIs is highly compute-intensive. The NVIDIA L40S GPU was announced and released to fill a gap combining powerful AI computing with best-in-class graphics and media acceleration built to power the next generation of data center workloads. The NVIDIA L40S is capable of powering generative AI and large language model (LLM) inference and training to 3D graphics, rendering, and video.

But how does NVIDIA make a GPU that can tackle all these workloads? What makes the NVIDIA L40S special?

NVIDIA L40S Advantages

The naming convention leads us to believe the L40S is an upgraded L40 designed for data center graphics and large-scale NVIDIA Omniverse simulation and workloads. But it is more. NVIDIA makes it clear that this GPU is the most universal high-performance accelerator for any workload you throw at it, supporting complex AI training and inferencing at a high level, comparing it to NVIDIA’s flagships: A100 and H100 SXM.

A100 80GB SXM

NVIDIA L40S

H100 80 GB SXM

GPU Architecture

Ampere

Ada Lovelace

Hopper

FP64

9.7 TFLOPS

N/A

33.5 TFLOPS

FP32

19.5 TFLOPS

91.6 TFLOPS

66.9 TFLOPS

RT Cores

N/A

212 TFLOPS

N/A

TF32 Tensor Core

312 TFLOPS

366 TFLOPS

989 TFLOPS

FP16/BF16 Tensor Core

624 TFLOPS

733 TFLOPS

1979 TFLOPS

FP8 Tensor Core

N/A

1466 TFLOPS

3958 TFLOPS

INT8 Tensor Core

1248 TOPS

1466 TOPS

3958 TOPS

GPU Memory

80GB HBM2e

48GB GDDR6

80GB HBM3

GPU Memory Bandwidth

2039 GB/s

864 GB/s

3352 GB/s

L2 Cache

40MB

96MB

50MB

Media Engine

0 NVENC

5 NVDEC

5 NVJPEG

0 NVENC

5 NVDEC

5 NVJPEG

0 NVENC

7 NVDEC

7 NVJPEG

Power

Up to 400 W

Up to 350W

Up to 700W

Form Factor

SXM4 - 8 GPU HGX

Dual Slot Width

SXM5 - 8 GPU HGX

Interconnect

PCIe 4.0 x16

PCIe 4.0 x16

PCIe 5.0 x16

Better General Purpose Computing: Comparing the L40S specifications with the NVIDIA A100 SXM, there is a substantial gap in performance for FP32, the standard metric for general compute performance, even outperforming the NVIDIA H100 SXM. The L40S delivers exceptional performance in HPC workloads such as simulations, rendering, graphics, and more.

Better AI Performance: While general computing isn’t the A100’s strong suit, the L40S also outperforms it in its specialty. Tensor Core performance in the same FP32 format is higher by a decent amount. Also, with the new Transformer Architecture, ability to compute on FP8 and hybrid floating point precision, the L40S is ahead of the game compared to the A100 in training and inferencing AI.

Better Accessibility: The NVIDIA L40S is a mainstream accelerator slotting into servers via PCIe 4.0. Its user-friendly installation process, low entry barriers, and impressive performance make it a standout choice for upgrade versus other AI accelerators. Additionally, NVIDIA has extensive experience in GPU market dominance for productivity, further enhancing the appeal of the L40S.

Better General Use: NVIDIA is pushing this GPU as an alternative to the NVIDIA A100, but it is more than that, capable of executing any HPC workload. This GPU is highly versatile for users with workloads spanning from complex simulation to dense AI training or even sometimes both!

Final Thoughts

Built on the NVIDIA Ada Lovelace architecture, the L40S delivers groundbreaking multi-workload acceleration for large language model (LLM) inference and training, generative AI performance, as well as graphics and video applications. The versatility, performance, and availability make the NVIDIA L40S an attractive GPU for accelerating the most demanding workloads. Talk to our team at SabrePC and configure your next deep learning and AI server with NVIDIA L40S!


Tags

nvidia

gpu

data center

server



Related Content