We go over the NVIDIA L40S’s introduction, features, performance, and L40S price in this blog.
NVIDIA L40S Introduction
The Most Potent All-purpose GPU
Experience the NVIDIA L40S GPU’s revolutionary multi-workload performance. The next generation of data centre workloads, ranging from generative AI and large language model (LLM) inference and training to 3D graphics, rendering, and video, will be powered by the L40S GPU, which combines powerful AI computation with best-in-class graphics and media acceleration.
Global Data Centre System Manufacturers NVIDIA to Accelerate Industrial Digitalization and Generative AI.
Future NVIDIA OVX Servers from Dell, Hewlett Packard Enterprise, Lenovo, Supermicro, and others will have new NVIDIA GPUs to speed up training and inference as well as graphics-intensive tasks.
All-around Performance
Performance of Tensor
- 1,466 TFLOPS
- 212 TFLOPS for RT Core Performance
- 91.6 TFLOPS for Single-Precision Performance
Features
Fourth-Generation Tensor Cores
Driven by the Fourth-Generation Tensor Cores of the NVIDIA Ada Lovelace Architecture
Out-of-the-box performance advantages for quicker AI and data science model training are provided by hardware support for structural sparsity and an optimized TF32 format. With DLSS, you may speed up AI-enhanced graphics capabilities to improve performance in specific applications and upscale resolution.
Third-Generation RT Cores
Ray-tracing performance is improved by increased throughput and simultaneous ray-tracing and shading capabilities, which speeds up renders for workflows in architecture, engineering, and construction as well as product design. Stunning real-time animations and hardware-accelerated motion blur allow you to see realistic designs in action.
CUDA Cores
Performance for processes such as computer-aided engineering (CAE) simulation and 3D model generation is greatly enhanced by increased power efficiency and accelerated single-precision floating point (FP32) throughput. For mixed-precision tasks, make use of BF16, or extended 16-bit math capabilities.
Transformer Engine
Transformer Engine enhances memory usage for both training and inference while significantly speeding up AI performance. Transformer Engine intelligently scans the layers of transformer architecture neural networks and automatically recasts between FP8 and FP16 precisions, utilising the Ada Lovelace fourth-generation Tensor Cores to provide quicker AI performance and speed up training and inference.
Efficiency and Security
The L40S GPU was created, constructed, tested, and supported by NVIDIA to guarantee optimal performance, robustness, and uptime for enterprise data centre operations that run around the clock. The L40S GPU adds an extra degree of security to data centres by meeting the most recent data centre standards, being Network Equipment-Building System (NEBS) Level 3 ready, and having secure boot with root of trust technology.
DLSS 3
NVIDIA DLSS 3 L40S GPU allows for smoother frame rates and incredibly quick rendering. Deep learning and the most recent hardware advancements in the Ada Lovelace architecture and the L40S GPU, such as fourth-generation Tensor Cores and an Optical Flow Accelerator, are used in this ground-breaking frame-generation technology to improve rendering performance, deliver higher frames per second (FPS), and drastically reduce latency.
Multi-Workload Acceleration
Generative AI
Create fresh material, insights, and new services
The L40S GPU offers up to 5X better inference performance than the previous generation NVIDIA A40 to its next-generation AI, graphics, and media acceleration capabilities. The L40S is the perfect platform for speeding up multimodal generative AI tasks because of its revolutionary performance and 48 gigabytes (GB) of memory capacity.
LLM Training and Inference
Quicken the burdens associated with AI training and inference
Modern LLM and generative AI models may be trained and inferred more quickly to the remarkable AI computation capabilities of fourth-generation Tensor Cores with FP8 support.
Graphics in 3D and Rendering
NVIDIA RTX graphics enable high-fidelity creative operations
The development of breathtaking visual content and high-fidelity creative workflows, from interactive rendering to real-time virtual production, are powered by third-generation RT Cores, which provide up to two times the real-time ray-tracing performance of the previous generation.
NVIDIA Omniverse
Develop and run programs for the metaverse
Connecting, creating, and running the upcoming generation of industrial digitalization apps is made feasible by NVIDIA Omniverse. L40S GPU provides outstanding performance for 3D and simulation workflows based on Omniverse and Universal Scene Description (OpenUSD) to its strong RTX graphics and AI capabilities.
NVIDIA OVX L40S
Scalable Data Centre Architecture for Superior AI and Graphics Performance
NVIDIA OVX L40S offers industry-leading performance to drive enterprise transformation with generative AI when paired with NVIDIA Spectrum-X Ethernet technology and NVIDIA AI Enterprise software.
L40S Price
- United Kingdom: The GPU costs £6,883.90 (VAT excluded) from Converge UK.
- European Union: €27,565.00 is the price listed by SHI International.
- Price range for the United States: $7,329.99 to $23,583.00.
- The GPU is available from Bitworks for $7,329.99.
- It is listed as $23,583.00 by SHI International.
Retailer pricing schemes, import levies, and regional taxes are some of the causes of these discrepancies. It’s best to check with authorized local shops in your area or official NVIDIA channels for the most up-to-date and accurate pricing.
Specifications
NVIDIA L40S GPU
GPU Architecture | NVIDIA Ada Lovelace Architecture |
GPU Memory Bandwidth | 864GB/s |
GPU Memory | 48 GB GDDR6 with ECC |
Display Connectors | 4 x DP 1.4a |
Max Power Consumption | 300W |
Form Factor | 4.4″ (H) x 10.5″ (L) Dual Slot |
Thermal | Passive |
vGPU Software Support | NVIDIA vPC/vApps, NVIDIA RTX Virtual Workstation (vWS) |
vGPU Profiles Supported | Virtual GPU Licensing Guide |
NVENC | NVDEC | 3x | 3x (Includes AV1 Encode & Decode) |
Secure Boot with Root of Trust | Yes |
NEBS Ready | Yes / Level 3 |
Power Connector | 1x PCIe CEM5 16-pin |
FP32 | 91.6 teraFLOPS |
TF32 Tensor Core | 366 teraFLOPS |
FP16 | 733 teraFLOPS |
FP8 | 1,466 teraFLOPS |
RT Core Performance | 212 teraFLOPS |
Max Power Consumption | 350W |