Skip to content

Main Navigation

Puget Systems Logo
  • Solutions
    • Recommended Systems For:
    • Content Creation
      • Photo Editing
        • Recommended Systems For:
        • Adobe Lightroom Classic
        • Adobe Photoshop
        • Stable Diffusion
      • Video Editing
        • Recommended Systems For:
        • Adobe After Effects
        • Adobe Premiere Pro
        • DaVinci Resolve
        • Foundry Nuke
      • 3D Design & Animation
        • Recommended Systems For:
        • Autodesk 3ds Max
        • Autodesk Maya
        • Blender
        • Cinema 4D
        • Houdini
        • ZBrush
      • Real-Time Engines
        • Recommended Systems For:
        • Game Development
        • Unity
        • Unreal Engine
        • Virtual Production
      • Rendering
        • Recommended Systems For:
        • Keyshot
        • OctaneRender
        • Redshift
        • V-Ray
      • Digital Audio
        • Recommended Systems For:
        • Ableton Live
        • FL Studio
        • Pro Tools
    • Engineering
      • Architecture & CAD
        • Recommended Systems For:
        • Autodesk AutoCAD
        • Autodesk Inventor
        • Autodesk Revit
        • SOLIDWORKS
      • Visualization
        • Recommended Systems For:
        • Enscape
        • Lumion
        • Twinmotion
      • Photogrammetry & GIS
        • Recommended Systems For:
        • ArcGIS Pro
        • Agisoft Metashape
        • Pix4D
        • RealityCapture
    • AI & HPC
      • Recommended Systems For:
      • Data Science
      • Generative AI
      • Large Language Models
      • Machine Learning / AI Dev
      • Scientific Computing
    • More
      • Recommended Systems For:
      • Compact Size
      • Live Streaming
      • NVIDIA RTX Studio
      • Quiet Operation
      • Virtual Reality
    • Business & Enterprise
      We can empower your company
    • Government & Education
      Services tailored for your organization
  • Products
    • Computer System Styles:
    • Desktop Workstations
      • AMD Ryzen
        • Ryzen 9000:
        • Mini Tower
        • Mid Tower
        • Full Tower
      • AMD Threadripper
        • Threadripper 7000:
        • Mid Tower
        • Full Tower
        • Threadripper PRO 5000WX:
        • Full Tower
        • Threadripper PRO 7000WX:
        • Full Tower
      • AMD EPYC
        • EPYC 9004:
        • Full Tower
      • Intel Core
        • Core 13th Gen:
        • Small Form Factor
        • Core 14th Gen:
        • Mini Tower
        • Mid Tower
        • Full Tower
      • Intel Xeon
        • Xeon W-2400:
        • Mid Tower
        • Xeon W-3400:
        • Full Tower
    • Custom Computers
    • Laptop Workstations
      • Puget Mobile 17″
    • Rackstations
      • AMD Rackstations
        • Ryzen 7000:
        • R550-6U 5-Node
        • Ryzen 9000:
        • R121-4U
        • Threadripper 7000:
        • T121-4U
        • Threadripper PRO 5000WX:
        • WRX80 4U
        • Threadripper PRO 7000WX:
        • T141-4U
        • EPYC 9004:
        • E140-4U
      • Intel Rackstations
        • Core 14th Gen:
        • C131-4U
        • Xeon W-3400:
        • X141-4U
        • X141-5U
    • Custom Rackmount Workstations
    • Puget Servers
      • Puget Servers
        • AMD EPYC:
        • E200-1U
        • E140-2U
        • E280-4U
        • Intel Xeon:
        • X200-1U
    • Custom Servers
    • Storage Solutions
      • Network Attached Storage
        • QNAP NAS Recommendations
      • Puget Storage
        • Puget Storage:
        • 12-Bay 2U
        • 24-Bay 2U
        • 36-Bay 4U
    • Recommended Third Party Peripherals
      Curated list of accessories for your workstation
    • Puget Gear
      Quality apparel with Puget Systems branding
  • Publications
    • Articles
    • Blog Posts
    • Case Studies
    • HPC Blog
    • Podcasts
    • Press
    • PugetBench
  • Support
    • Contact Support
    • Support Articles
    • Warranty Details
    • Onsite Services
    • Unboxing
  • About Us
    • About Us
    • Contact Us
    • Our Customers
    • Enterprise
    • Gov & Edu
    • Press Kit
    • Testimonials
    • Careers
  • Talk to an Expert
  • My Account
  1. Home
  2. /
  3. Hardware Articles
  4. /
  5. LLM Inference – Professional GPU performance

LLM Inference – Professional GPU performance

Posted on August 22, 2024 (August 22, 2024) by Jon Allman

Table of Contents

  • Introduction
  • Test Setup
  • GPU Performance
  • Final Thoughts

Introduction

As part of our goal to evaluate benchmarks for AI & machine learning tasks in general and LLMs in particular, today we’ll be sharing results from llama.cpp‘s built-in benchmark tool across a number of GPUs within NVIDIA’s professional lineup. Because we were able to include the llama.cpp Windows CUDA binaries into a benchmark series we were already performing for other purposes, this round of testing only includes NVIDIA GPUs, but, we do intend to include AMD cards in future benchmarks.

If you’re interested in how NVIDIA’s consumer GPUs performed using this benchmark and system configuration, then follow this link to check out those results. However, it’s worth mentioning that maximizing performance or price to performance are not typically the main reasons why someone would choose a professional GPU over a consumer oriented model. The primary value propositions that both NVIDIA’s and AMD’s pro-series cards offer are improved reliability (both in terms of hardware and drivers), higher VRAM capacity, and designs more appropriate for multi-GPU configurations. If raw performance is your main deciding factor, then outside of multi-GPU configurations, a top-end consumer GPU is almost always going to be the better option.

Image
Open Full Resolution

Test Setup

Test Platform

CPU: AMD Ryzen Threadripper PRO 7985WX 64-Core
CPU Cooler: Asetek 836S-M1A 360mm Threadripper CPU Cooler
Motherboard: ASUS Pro WS WRX90E-SAGE SE
BIOS Version: 0404
RAM: 8x Kingston DDR5-5600 ECC Reg. 1R 16GB
(128GB total)
GPUs: NVIDIA RTX 6000 Ada 48GB
NVIDIA RTX 5000 Ada 32GB
NVIDIA RTX 4500 Ada 24GB
NVIDIA RTX 4000 Ada 20GB
NVIDIA RTX A6000 48GB
Driver Version: 552.74
PSU: Super Flower LEADEX Platinum 1600W
Storage: Samsung 980 Pro 2TB
OS: Windows 11 Pro 23H2 Build 22631.3880

Llama.cpp build 3140 was utilized for these tests, using CUDA version 12.2.0, and Microsoft’s Phi-3-mini-4k-instruct model in 4-bit GGUF. Both the prompt processing and token generation tests were performed using the default values of 512 tokens and 128 tokens respectively with 25 repetitions apiece, and the results averaged.

GPU Performance

Prompt processing chart for Pro GPUs
Image
Open Full Resolution

Starting with the prompt processing portion of the benchmark, the Ada GPU results are not particularly surprising, with the RTX 6000 Ada achieving the top result and the RTX 4000 Ada with the lowest score. It’s interesting to see that the older RTX 6000 is essentially dead even with the RTX 4500 Ada, despite nominally being a much higher-end model.

Once we dig into the cards’ specifications (table below), the picture starts to become more clear. Here, we find that the prompt processing results track closely with the cards’ FP16 performance, which is based almost entirely upon both the number of tensor cores and which generation of tensor cores the GPUs were manufactured with. So ultimately, we find that prompt processing appears to be constrained by the compute performance of the GPU and not by other factors like memory bandwidth.

GPUFP16 (TFLOPS)Tensor Core CountTensor Core Generation
RTX 6000 Ada91.065684th
RTX 5000 Ada65.284004th
RTX 4500 Ada39.632404th
RTX A600038.713363rd
RTX 4000 Ada26.731924th
Call to Action
Looking for an AI Workstation?
Call to Action
Looking for an AI Workstation?
Token generation chart for Pro GPUs
Image
Open Full Resolution

In contrast to the prompt processing results, we find that token generation scales more closely with the GPUs’ memory bandwidth (table below) than tensor core count. Although the RTX 6000 Ada is still the clear winner, the older RTX 6000 is able to move up into second place, ahead of the Ada models that outperformed it during the prompt processing phase of the benchmark. However, by comparing the RTX A6000 and the RTX 5000 Ada, we can also see that the memory bandwidth is not the only factor in determining performance during token generation. Although the RTX 5000 Ada only has 75% of the memory bandwidth of the RTX A6000, it’s still able to achieve 90% of the performance of the older card. This indicates that compute performance still plays a role during token generation, just not to the same degree as during prompt processing.

GPUMemory Bandwidth (GB/s)
RTX 6000 Ada960
RTX A6000768
RTX 5000 Ada576
RTX 4500 Ada432
RTX 4000 Ada360

Final Thoughts

This benchmark helps highlight an important point, which is that there are several GPU specifications to consider when deciding which GPU or GPUs are the most appropriate option for use with LLMs. These results help show that GPU VRAM capacity should not be the only characteristic to consider when choosing GPUs for LLM usage. A lot of emphasis is placed on maximizing VRAM, which is an important variable for certain, but it’s also important to consider the performance characteristics of that VRAM, notably the memory bandwidth. Furthermore, beyond the specifications of the VRAM, it’s still important to consider the raw compute performance of GPUs as well, in order to get a more holistic view of how the cards stack up against each other.

This is only the beginning of our LLM testing, and we plan to do much more in the future. Larger models, multi-GPU configurations, including AMD/Intel GPU, and model training are all on the horizon. If there is anything else you would like us to report on, please let us know in the comments!

Tower Computer Icon in Puget Systems Colors

Looking for an AI and Scientific Computing workstation?

We build computers tailor-made for your workflow. 

Configure a System
Talking Head Icon in Puget Systems Colors

Don’t know where to start?
We can help!

Get in touch with one of our technical consultants today.

Talk to an Expert

Related Content

  • AMD Ryzen 9000: Performance vs Previous Generations
  • AMD Ryzen 9000 Content Creation Review
  • DaVinci Resolve Studio: AMD Ryzen 9000 Series vs Intel Core 14th Gen
  • Adobe Premiere Pro: AMD Ryzen 9000 Series vs Intel Core 14th Gen
View All Related Content

Latest Content

  • LLM Inference – Professional GPU performance
  • LLM Inference – Consumer GPU performance
  • AMD Ryzen 9000: Performance vs Previous Generations
  • AMD Ryzen 9000 Content Creation Review
View All
Image
Open Full Resolution
Image
Open Full Resolution
Tags: GPU, LLM, NVIDIA, RTX 4000 Ada, RTX 4500 Ada, RTX 5000 Ada, RTX 6000, RTX 6000 Ada

Who is Puget Systems?

Puget Systems builds custom workstations, servers and storage solutions tailored for your work.

We provide:

Extensive performance testing
making you more productive and giving better value for your money

Reliable computers
with fewer crashes means more time working & less time waiting

Support that understands
your complex workflows and can get you back up & running ASAP

A proven track record
as shown by our case studies and customer testimonials

Get Started

Browse Systems

Puget Systems Mobile Laptop Workstation Icon

Mobile

Puget Systems Tower Workstation Icon

Workstations

Puget Systems Rackmount Workstation Icon

Rackstations

Puget Systems Rackmount Server Icon

Servers

Puget Systems Rackmount Storage Icon

Storage

Latest Articles

  • LLM Inference – Professional GPU performance
  • LLM Inference – Consumer GPU performance
  • AMD Ryzen 9000: Performance vs Previous Generations
  • AMD Ryzen 9000 Content Creation Review
  • DaVinci Resolve Studio: AMD Ryzen 9000 Series vs Intel Core 14th Gen
View All

Post navigation

 LLM Inference – Consumer GPU performance
Puget Systems Logo
Build Your Own PC Site Map FAQ
facebook instagram linkedin rss twitter youtube

Optimized Solutions

  • Adobe Premiere
  • Adobe Photoshop
  • Solidworks
  • Autodesk AutoCAD
  • Machine Learning

Workstations

  • Content Creation
  • Engineering
  • Scientific PCs
  • More

Support

  • Online Guides
  • Request Support
  • Remote Help

Publications

  • All News
  • Puget Blog
  • HPC Blog
  • Hardware Articles
  • Case Studies

Policies

  • Warranty & Return
  • Terms and Conditions
  • Privacy Policy
  • Delivery Times
  • Accessibility

About Us

  • Testimonials
  • Careers
  • About Us
  • Contact Us

© Copyright 2024 - Puget Systems, All Rights Reserved.