Table of Contents
Introduction
A few weeks ago, we published an article looking at the performance of various consumer GPUs with the updated version of our PugetBench for DaVinci Resolve benchmark, which features a completely overhauled test suite. Not only does this new benchmark version greatly expand the number of video codecs we test (RED, BRAW, ARRIRAW, Cinema RAW Light ST, and Sony X-OCN), but we also added more GPU-based OpenFX tests. In addition, based on the work BlackMagic has been doing to integrate AI into DaVinci Resolve, we added a whole suite of tests looking at the various AI features in Resolve, including Super Scale, Face Refinement, Magic Mask, Video Stabilization, Smart Reframe, and more!
Today, we want to expand on that previous article and examine performance with various Professional GPUs from both NVIDIA and AMD, as well as take the opportunity to see how the new features in DaVinci Resolve respond in up to a 4x multi-GPU configuration.
This testing uses the exact same application, benchmark, and motherboard driver/BIOS versions as our consumer GPU article, so if you wish to compare the performance between consumer and professional GPUs, you can directly compare the results in these two articles. However, we do want to point out that for professional GPUs, price-to-performance (and even pure top-end performance) is often not the primary driving factor for why you would use one of these cards over something like an NVIDIA GeForce or AMD Radeon card. For both NVIDIA and AMD, the main value adds for their professional GPUs are increased reliability (both in terms of hardware and drivers), higher VRAM capacity, and significantly better designs for multi-GPU configurations. Performance is, of course, a consideration, but outside of multi-GPU configurations, if pure performance is your primary concern, a consumer-grade GPU is almost always going to be a better fit.
We will note that BlackMagic has a public beta of their upcoming DaVinci Resolve v19 available, but we are going to stick with the full release 18.6.6 version. The beta has a number of performance improvements, but we have seen fairly large performance shifts across different beta builds in the past and prefer to wait until a new version has been fully released before using it in our testing.
Test Setup
Test Platform
CPU: AMD Ryzen Threadripper PRO 7975WX |
CPU Cooler: Asetek 836S-M1A 360mm |
Motherboard: ASUS Pro WS WRX90E-SAGE SE BIOS version: 0404 |
RAM: 8x DDR5-5600 16 GB (128 GB total) Running at 5200 Mbps |
PSU: Super Flower LEADEX Platinum 1600W |
Storage: Samsung 980 Pro 2TB |
OS: Windows 11 Pro 64-bit (22631) |
Benchmark Software
DaVinci Resolve Studio 18.6.6.7 PugetBench for DaVinci Resolve 1.0.0 |
To test how Resolve performs with different GPUs, we built up a standard testbed intended to minimize the potential for bottlenecks in the system outside of the GPU. We are using the 32-core AMD Ryzen Threadripper PRO 7975WX and 128 GB of DDR5 memory, which should give us the best overall performance in our DaVinci Resolve benchmark suite.
For each GPU, we are using the latest available driver version as of August 1, 2024. We ran the full “Extended” benchmark preset, and will largely focus on the Overall, RAW, GPU Effects, and AI scores, as those are the ones most affected by the GPU.
We tested most cards in a single GPU configuration but included up to four NVIDIA RTX 6000 Ada cards to see how the various aspects of Resolve scale with more GPUs. We do want to point out that using four of these GPUs is pushing things quite a bit from a power and cooling standpoint. In most of our workstations, we limit anything above an RTX 4500 Ada to triple GPU configurations unless we are utilizing dual PSUs and very specific cooling setups.
For each GPU, we went through the full “neural engine optimization” process, which we have previously tested and found that it provides a massive performance boost for the AI tests.
Raw Benchmark Results
While we will discuss our testing in detail in the next sections, we often provide the raw results for those who want to dig into the details. If you tend to work with a specific codec or perform a specific task in your workflow, examining the raw results will be much more applicable than our more general analysis.
Overall GPU Performance
Since we are not focusing on any GPU in particular, we colored the top GPU models from NVIDIA (including the dual, triple, and quad configuration) and AMD in green to help make it obvious how each brand stands in terms of maximum performance. While we did not test multi-GPU configurations from AMD in this article, we expect that they would see similar scaling as NVIDIA.
Starting off, we are going to take a look at the Overall Score from the Extended benchmark preset. This score factors in the results for all the tests, including both those that are GPU- and CPU-focused. Because of that, there isn’t as large of a performance difference between each GPU model as what we will see in some of the following sections, but it is an accurate representation of what kind of overall performance increase you can expect from different GPUs.
The first thing we want to discuss is AMD vs. NVIDIA GPU performance. Overall, NVIDIA gives this highest performance, with the RTX 6000 Ada beating the Radeon PRO W7900 by about 21%. The 6000 Ada is a much more expensive GPU, however, with an MSRP of $6,800 compared to the $4,000 MSRP of the W7900. In terms of price-to-performance, AMD and NVIDIA are actually effectively the same at this level, with the W7900 and RTX 5000 Ada scoring almost exactly the same, and at the same $4,000 MSRP.
Going down the stack a bit, the AMD Radeon PRO W7800 is slightly more expensive than the NVIDIA RTX 4500 Ada ($2,500 vs $2,250), but NVIDIA comes out on top with about 8% higher performance. Below that, the AMD and NVIDIA GPUs we tested don’t have an equivalent model, as the RTX 4000 Ada is $1,250, while the Radeon PRO W7600 and W7500 are $600 and $430, respectively.
The last thing we want to call out specifically is how well multi-GPU scaling works. Going from one RTX 6000 Ada to two increased performance by 11%, but there isn’t much beyond that. That doesn’t sound terribly great but keep in mind that this score includes tasks that don’t use the GPU at all, and there are plenty of GPU-based tasks that don’t scale with multiple cards. In fact, there are some tests that are significantly worse with multiple GPUs, which is pulling down the Overall score quite a bit. We will dive more into this in the following sections.
Fusion GPU Performance
Fusion is not typically something we look at when examining GPU performance, but since we are doing an in-depth analysis, we wanted to go ahead and include it this time around. In general, Fusion doesn’t scale that well with more powerful GPUs, since most of it is actually CPU limited. The big surprise here is that AMD takes a pretty solid lead with the Radeon PRO W7800 and W7900. This is much higher performance from AMD than what we saw in the consumer GPU article, so there is something about AMD’s Pro cards (either on hardware or driver level) that works very well with Fusion.
What we really want to point out here is how poorly Fusion performs with multiple GPUs. As you can see in the primary chart above, a single RTX 6000 Ada does fairly well, but performance actually gets lower as you add more cards, to the point that a $27,200 (for the GPUs alone) quad RTX 6000 Ada configuration performs about the same as a $430 Radeon PRO W7500.
What is interesting is that not everything in Fusion has this issue with multiple GPUs. If you look at chart #2, we split out the results for each test individually and charted it according to how the performance compared to a single GPU configuration. In four of our six Fusion tests, there was a sizable performance loss with multiple cards to the point that in one case, we were getting only 6%(!) with four cards compared to the performance of a single GPU. On the other hand, one of the tests (3D Lower Thirds, which tests a number of the built-in templates) shows no difference with multiple GPUs, while the “Phone Composite UHD” test (which does planar tracking and keying) shows very respectable performance gains with two GPUs, but not any higher performance with three or four cards.
We recommend keeping this in mind if you are considering using multiple GPUs for the other aspects of Resolve that scale well with more cards. Some users heavily utilize Fusion, so they will want to stick to a single GPU configuration. Others don’t use it much and may be just fine with this behavior if they can improve performance for some of the tasks we will be looking at below.
RAW Codec Processing GPU Performance
Next, we have RAW codec processing. This measures how well the system works with RAW codecs like RED, BRAW, ARRIRAW, X-OCN, and Cinema RAW Light. In general, these codecs perform better with a more powerful GPU, although they can start to become CPU-limited at times.
Between NVIDIA and AMD, NVIDIA holds the top spot with the RTX 6000 Ada. However, just like with the Overall score, AMD and NVIDIA are right on par in terms of price and performance with the W7900 and RTX 5000 Ada. That gives AMD a slight lead since the W7900 has 48GB of VRAM compared to the 32GB on the 5000 Ada, but unless you are working with 12K or higher timelines, that probably isn’t a significant factor.
As far as GPU scaling goes, having two cards does help in general, but having more GPUs than that is of limited use. What is interesting is that this modest improvement is partly due to how Resolve behaves with different types of RAW codecs. If you switch to chart #2, we pulled out the RAW Processing results and charted them according to the performance with a single RTX 6000 Ada to give you an idea of how GPU scaling behaves with the different types of codecs.
Specifically, RED and Cinema Raw Light don’t see any benefit from multiple GPUs. BRAW and X-OCN show a very decent ~40-60% performance improvement with two GPUs, but no further benefit (or even a slight performance degradation) with a third card. ARRIRAW, on the other hand, shows a modest 10% improvement with two GPUs, which increased to about 40% faster with a third card, but there is again no benefit to having a fourth GPU.
In other words, while multi-GPU configurations can help when working with RAW footage, the exact benefit varies greatly depending on the RAW codec you are using, and generally is only beneficial with two (or for ARRIRAW, three) GPUs.
GPU Effects (OpenFX) GPU Performance
Next, we have performance when processing GPU Effects such as Noise Reduction, Optical Flow (Enhanced), Lens Blur, and general Color Correction. This type of workload in Resolve is often used as the poster child for how much a more powerful GPU can impact performance for video editing workflows in general, as well as to examine real-world multi-GPU scaling.
Between AMD and NVIDIA, it is similar to what we saw in the last section. AMD does very well with the Radeon PRO W7900, slightly beating the RTX 5000 Ada by a few percent, although the difference is small enough that you likely wouldn’t notice it in the real world. The extra VRAM on the W7900 (48GB vs 32GB), however, may push some users to go with AMD at this price point. NVIDIA still holds the top spot with the RTX 6000 Ada, in this case out-performing the RTX 5000 Ada and Radeon PRO W7900 by about 50%.
In addition to the decent scaling between individual GPU models, these effects also scale very nicely with multiple GPUs. Across all the GPU Effects tests, we saw 1.5x performance going from one GPU to two, 2.2x performance when you go from one GPU to three, and 2.6x going from one to four. That is pretty good on its own, but if you look at chart #2, you can see that some effects do have near-perfect scaling. For example, sharpen, lens blur, and spatial noise reduction are all terrific and are well worth investing in multiple GPUs if you use them regularly. Others, like general color nodes and temporal noise reduction do still have a benefit, but to a lesser degree.
AI Features GPU Performance
New in our updated PugetBench for DaVinci Resolve 1.0 benchmark, we have added tests looking at a wide range of AI features in DaVinci Resolve. These include those that are processed during playback and export like Super Scale, Person Mask, Depth Map, Relight, and Optical Flow (Speed Warp), as well as many that are what we call “Runtime” which are processed once. These include Audio Transcription, Video Stabilization, Smart Reframe, Magic Mask Tracking, and Scene Cut Detection.
For everything else so far, our recent benchmark update has expanded and improved the tests, but we already had a general idea of how different GPUs perform and how well they scale with multiple cards. For these AI tests, however, we have done in-depth testing in the earlier consumer GPU article, but now we can see how the Pro GPUs hold up and if there is any benefit to a quad GPU setup.
First, note that NVIDIA has a pretty large advantage – especially with the Ada generation GPUs. Even the RTX 4000 Ada performs well above the Radeon PRO W7900, and the performance from NVIDIA only goes up from there. The only real surprise from NVIDIA is how exceptional the modern Ada cards are for these AI features, with even the RTX 4000 Ada beating the top-end GPU from the previous generation – the NVIDIA RTX A6000.
Multi-GPU scaling is also respectable, although, like many of the previous sections, it varies based on the individual test. Overall, going from one GPU to two gives a 17% performance boost, while having three GPUs is 25% faster than a single card, and four cards is 28% faster. There are even more nuances here than in the previous sections, however, as not many of the AI features can take advantage of multiple cards.
If you look at the individual AI render tests (chart #2 above), you will notice that only about half of those tests scale well with multiple GPUs. Super Scale, Face Refinement, and Person Mask (Better) benefit from up to four GPUs, but Person Mask (Faster), Depth Map (Faster), and Relight only benefit from a second GPU. Depth Map (Better) doesn’t seem to scale much at all, and Optical Flow (50% Speed Warp) oddly sees a benefit to having three GPUs, while dual GPUs are no faster than a single card. That last one is especially odd, but lines up exactly with what we saw in our earlier 1-3x GPU testing with NVIDIA GeForce RTX 4090 cards.
We also have individual results for the AI runtime tests (chart #3). These are much easier to discuss, as there is effectively no benefit to having more than a single GPU, meaning that none of them have multi-GPU capabilities.
There are obviously still some optimizations to be done with these AI features, but having multiple GPUs can be beneficial in some cases. Super Scale and Face Refinement are prime examples, with terrific scaling with dual, triple, and even quad GPU configurations. We have seen some interesting improvements already being made in the public beta for DaVinci Resolve Studio 19, and are looking forward to that version getting released to see how it will affect performance with these GPUs!
What is the Best Professional GPU for DaVinci Resolve Studio 18.6?
We covered quite a bit in this article, and if you are interested in any specific aspect of DaVinci Resolve (GPU Effects, AI, RAW Codec processing, etc.), we highly recommend reading that specific section. However, there are some generalizations we can make about what the best professional GPU for DaVinci Resolve Studio 18.8 is.
First, if you are picking between NVIDIA and AMD, they are actually very close at equivalent price points. The NVIDIA RTX 5000 Ada 32GB and AMD Radeon PRO W7900 48GB are the best examples, as they have exactly the same $4,000 MSRP. For that price, they have almost exactly the same overall score, although AMD and NVIDIA both have areas where they are stronger. For AMD, this comes primarily from Fusion, where the W7900 was ~15% faster than the 5000 Ada. AMD is also slightly faster for LongGOP and IntraFrame processing, although we didn’t look at those too closely in this article as those are primarily CPU-based tests that apparently see a bit of an advantage due to Resolve running in OpenCL mode. On the other hand, the NVIDIA RTX 5000 Ada was a large 1.7x the performance of the W7900 for AI features.
If you are looking for the absolute best performance, nothing on the market today can beat the NVIDIA RTX 6000 Ada. Overall, the 6000 Ada is about 20% faster than the RTX 5000 Ada or AMD’s highest-end Radeon PRO W7900, and in GPU-heavy workloads like GPU-based effects, it can be 50-60% faster. Since NVIDIA currently enjoys a significant performance lead for AI-based tasks in Resolve, the 6000 Ada is also a massive 2.25x the performance of the AMD Radeon PRO W7900 for those tasks.
Some of the most interesting results in our testing came from the multi-GPU scaling tests with one, two, three, and four RTX 6000 Ada GPUs. This is something we have looked at recently in our consumer GPU article, but this gives us another data point, and scaling data for quad GPU configurations. In general, the OpenFX (GPU Effects) in Resolve tends to scale nicely with multiple GPUs, often benefiting from even quad GPU configurations. Processing of RAW codecs and the AI features, on the other hand, vary depending on the specific codec. Some scale well, others not at all. Fusion is the other wrench that gets thrown in, as the majority of the tests in Fusion actually see a (sometimes massive) loss of performance with multiple cards.
Whether using multiple GPUs is a good idea or not is going to come down to exactly what it is you do. If you work with ARRIRAW and tend to use a lot of OpenFX or want to do Super Scaling, then multiple cards will give you a terrific boost to performance. Doing basic color grading and editing on RED footage, or even worse, lots of work in the Fusion tab, on the other hand… not so much.
If you are looking for a single answer for what professional GPU is best for DaVinci Resolve Studio 18.6, the NVIDIA RTX 6000 Ada is easily the top pick. It is very pricey, however, and at more reasonable budget levels, the latest AMD Radeon PRO W7000-series and NVIDIA RTX Ada series are very similar. We would give NVIDIA a slight favorite due to the much higher performance in AI tasks that seem to be a big focus for BlackMagic right now, but AMD does benefit from having more VRAM per dollar, and they do take the performance lead in some specific workloads.
If you need a powerful workstation for content creation, the Puget Systems workstations on our solutions page are tailored to excel in various software packages. If you prefer to take a more hands-on approach, our custom configuration page helps you to configure a workstation that matches your exact needs. Otherwise, if you would like more guidance in configuring a workstation that aligns with your unique workflow, our knowledgeable technology consultants are here to lend their expertise.