Computer Hardware

Why is My New GPU Running Slower? Troubleshooting NVIDIA RTX 6000 Ada

March 5, 2025 • 5 min read

Troubleshooting-NVIDIA-A6000-ada-gpu-1.jpg

Introduction

When upgrading to the latest NVIDIA GPUs, you expect better performance. But what if your new system doesn’t seem to measure up? Your GPU upgrade is underperforming your older, dustier card.

Performance discrepancies can sometimes make it seem like a hardware issue when the real culprit lies elsewhere. We’ll walk through a real-world troubleshooting case where a customer experienced slower-than-expected performance on their new workstation and how we helped resolve the issue.

The Issue: New GPUs, Slower Performance?

A customer recently upgraded to a workstation equipped with four NVIDIA RTX 6000 Ada Generation GPUs, expecting a significant performance boost over their older system running NVIDIA RTX 3090 GPUs. However, they reported that the new system seemed slower and that GPU temperatures were higher, reaching 87°C, compared to their previous setup, which never exceeded 78°C. Concerned about possible overheating or hardware limitations, they reached out for support.

Initial Troubleshooting: Thermal Throttling or Something Else?

Our first thought, higher temperatures can be of concern about thermal throttling. However, workstation-class RTX Ada GPUs are designed to manage their thermal limits more carefully to maintain a longer life span. Furthermore, workstation-class cards can operate at higher thermal limits compared to Ampere-based GPUs; 87°C is within the expected range.

We can also attribute the higher temperatures due to the four RTX 6000 Ada setup versus their previous single RTX 3090 setup. More GPUs stacked close together can absolutely contribute to the uptick in operating temperature.

To dig deeper, we ran nvidia-smi -a command in the operating system’s shell (you can do the same in Linux Shell, Windows PowerShell, Windows Command Prompt). Surprisingly, the output revealed that the GPUs were power-constrained.

GPU 00000000:01:00.0
SW Power Cap : Active
SM : 1515 MHz
Memory : 9500 MHz
GPU 00000000:2E:00.0
SW Power Cap : Active
SM : 690 MHz
Memory : 9500 MHz
GPU 00000000:41:00.0
SW Power Cap : Active
SM : 1215 MHz
Memory : 9500 MHz
GPU 00000000:61:00.0
SW Power Cap : Active
SM : 1245 MHz
Memory : 9500 MHz

Root Cause: Software Mismatch

We can see the “SW Power Cap” field is “Active” suggesting that something in the software is limiting GPU power draw and as a result performance. Upon further investigation, we discovered two key issues:

  1. Software Optimization – The customer was running the same code they had previously used on their RTX 3090 GPUs without adjusting for the architectural differences of the RTX 6000 Ada.
  2. Outdated Software Stack – Their PyTorch installation was based on an older CUDA 11.7 version, which wasn’t optimized for Ada Lovelace architecture. Our customer needed at least CUDA 11.8.

While the software was still functional, these mismatches significantly impacted performance, preventing the system from achieving its full potential.

The Solution: Optimizing for New Generations of GPUs

We recommended the following steps to resolve the issue:

  • Updating PyTorch and CUDA: Upgrading to a newer version of PyTorch that supports the latest CUDA drivers allowed for better GPU utilization. If our customer had used a system cloning tool like Acronis, they likely had copied over older driver versions too. New GPU installations should always be followed with new software and driver updates.
  • Optimizing Code for Newer Generations: Adjusting workload parameters and fine-tuning settings to align with the new hardware architecture. the NVIDIA RTX 3090 is on the Ampere architecture whereas the RTX 6000 Ada is on a newer Ada Lovelace generation which requires differing code optimizations.
  • Optimizing Code for Multi-GPUs: Going from single to multi-GPUs requires augmentation to code to fully utilize all GPUs effectively. Workload distribution is highly important if used effectively.

Once these optimizations were implemented, the customer reported that not only did performance match their previous system, but the new workstation significantly outperformed their older RTX 3090 setup.

spc-Blog-troubleshoot-GPUs-2.jpg

Key Takeaways

Performance troubleshooting can be complex, especially when upgrading to new architectures. While it’s easy to assume that hardware is at fault, proper software tuning and compatibility checks are just as critical to getting the most out of your system. Here are some key takeaways and other issues your hardware may be facing:

  • Higher GPU temperatures aren’t always a sign of a problem – While the temperature rose, workstation-class GPUs are designed to operate at higher thermal thresholds. While temperature is an important factor, there can be other issues present.
  • Check Software Optimization – Running outdated or unoptimized code can result in severe performance bottlenecks. At SabrePC, we see this all the time when benchmarking new GPU architectures on not-yet-updated software.
  • Update Versions with New Hardware– Ensure you’re using the latest drivers, CUDA versions, and machine learning frameworks to utilize new hardware fully.

If you're experiencing similar GPU performance issues, ensure your software stack is up-to-date and optimized for your hardware. Need expert guidance? SabrePC is here to help.


Tags

gpu

troubleshoot

performance

computer hardware



Related Content