If you’ve been online in the past few months, chances are, you’ve seen social media posts, news articles, and videos about AI and the content they can generate. OpenAI’s brainchild, ChatGPT, is by far the most popular. Its chat-style interface allows users to interact with the AI model to ask questions, instruct it to perform tasks, write code, and even write sentences, paragraphs, or even entire articles!
Apart from these text-based models like ChatGPT, there’s also been a surge in more visual forms of AI models that can generate images based on a prompt (like MidJourney, Stable Diffusion, DALL-E, etc.). You can ask these models to create digital art based on a prompt like this: “a beautiful girl on the streets of Paris, playing guitar”.
With one of these models, here’s the result we got with that prompt: generated image here
Understandably, gaining access to such powerful tools has turned entire industries on their head. While no AI model is anywhere close to truly replicating a human when it comes to art, graphics, writing, and creativity in general, many professions are starting to see an impact.
However, AI-generated content is still only used as a crutch to create generic content (text, code snippets, images, or graphics), as it can often be very confidently incorrect about a variety of subjects. In fact,
Google’s Bard AI chatbot made a factual error in its very first demo!
That said, their usefulness is only going to develop as time passes. We can already see the advent of newer AI models that improve on previous versions.
VRAM and AI Models: How Much Do You Need?
Some AI tools available today stand out due to their open-source nature – granting users, businesses, and organizations the ability to locally host these models on their hardware without any privacy or security concerns.
Although large businesses don’t lack the infrastructure or funding to self-host these models, individuals and professionals who want to leverage AI can face a challenge. One of the reasons why AI models can be super slow or even downright unsupported on modern hardware is the VRAM requirement for those models.
Stable Diffusion XL Minimum Requirements
One such example is the newly released Stable Diffusion XL (SDXL) model from Stability AI. The company describes it as “the most advanced” release to date.
It can now generate improved faces, legible text, and more aesthetically pleasing art using shorter prompts. However, these enhanced capabilities come at a hardware cost, specifically, VRAM requirements and GPU performance.
So, what exactly do you need to enjoy SDXL’s improvements on your machine at home?
As you can see, Stability AI recommends an Nvidia graphics card for this task, so we’ll use current and previous-gen products to better understand what performance you can expect from them.
Although Stability AI asks for a minimum of 8 GB VRAM in the
press release they published, we wanted to scope out the effects of higher VRAM capacities.
Does simply toeing the minimum (or very slightly exceeding it) drastically impact performance, or can a more powerful GPU offset the lack of VRAM?
To answer that, we let our lab take a go at it to see how it runs on current-gen and previous-gen hardware. The data should allow you to make a more informed buying decision when shopping for your next graphics card.
SDXL GPU Benchmarks for GeForce Graphics Cards.
For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card.
First, let’s start with a simple art composition using default parameters to give our GPUs a good workout.
1024 x 1024
VRAM Size(GB)
Speed(sec.)
RTX 4060 Ti 16G
11.4 GB
16.0 s
RTX 3080 10G
9.7 GB
65.1 s
RTX 3060 12G
11.7 GB
27.2 s
The results are probably surprising to those who usually only focus on gaming benchmarks.
The RTX 4060 Ti 16GB, with its 16GB VRAM buffer, easily outpaces the pack with a quick 16-second run to complete the task. Following up second, thanks to its 12GB VRAM, is the RTX 3060 12GB with a time of 27.2 seconds. It’s not great, but still pretty good.
Unfortunately, the lack of VRAM on the RTX 3080 means its raw horsepower is rendered useless with a very slow time of 65.1 seconds! So, a modern RTX 4060 Ti 16GB obliterates a high-end previous-gen RTX 3080 with a ~4x faster image generation time.
Let’s up the ante a bit, shall we? For the next test, we’ll be trying LoRA.
LoRA or Low-Rank Adaptation techniques allow you to fine-tune Stable Diffusion models on specific art styles or characters. However, this becomes more taxing on your VRAM, so let’s see how our contenders perform here.
Let’s use LoRA to generate a ‘Cybergirl’ piece of art to find out how much disparity there can be between a graphics card with just enough VRAM and plenty.
1024 x 1024 + LoRA
VRAM Size(GB)
Speed(sec.)
RTX 4060 Ti 16G
15.5 GB
17.0 s
RTX 3080 10G
9.6 GB
98.8 s
RTX 3060 12G
11.5 GB
26.8 s
Here, the RTX 3080 is easily outclassed by the 60-class cards featuring more VRAM. The RTX 4060 Ti 16GB again tops the pack, taking just 17 seconds to generate the image, while the RTX 3080 lags behind at a snail-like 98.8 seconds.
Let’s make it a bit tougher for the 60-class cards now with some additional conditions using ControlNet.
First, what exactly is ControlNet? Simply put, it’s a neural network model that you can use to further control and fine-tune Stable Diffusion compositions (outputs). It lets you tell Stable Diffusion that you’re providing a clear reference to the design you want by adding more conditions to the outputs, further refining the result to more closely match what you need.
Head to the ControlNet GitHub page for more details and documentation on how to set it up!
1024 x 1024 + LoRA + controlnet
VRAM Size(GB)
Speed(sec.)
RTX 4060 Ti 16G
15.2 GB
48.7 s
RTX 3060 12G
11.5 GB
89.2 s
The competition is now pretty close, with the RTX 3080 nearly closing the gap with the RTX 4060 TI 16 GB and finally beating the RTX 3060 12 GB outright. However, even in this super compute-heavy scenario, the RTX 4060 Ti 16GB ends up on top by a slim margin.
Now, let’s try throwing in some upscaling. Can our 60-class contenders still keep up with the RTX 3080’s considerable upscaling horsepower? For these tests, we’ll use the Real Enhanced Super-Resolution Generative Adversarial Networks, also known by its somewhat more digestible acronym – R-ESRGAN 4x+.
1024 x 1024 upscale x2
VRAM Size(GB)
Speed(sec.)
RTX 4060 Ti 16G
10.8 GB
5.5 s
RTX 3060 12G
10.4 GB
7.8 s
For a 1024x1024 image upscaled to 2x, the RTX 4060 Ti 16 GB outpaces both the RTX 3080 and the RTX 3060 12 GB, taking just 5.5 seconds to finish.
By comparison, the RTX 3080 took 56% more time, while the RTX 3060 12GB took just over 41% more time than the RTX 4060 Ti 16 GB to complete the upscaling task.
1024 x 1024 upscale x4
VRAM Size(GB)
Speed(sec.)
RTX 4060 Ti 16G
10.5 GB
10 s
RTX 3060 12G
10.4 GB
12.38 s
The gulf between the RTX 4060 Ti 16 GB and the competition thins further with a 4x upscaling task using the R-ESRGAN 4x+ upscaler. Now, the RTX 3080 takes only 30% more time than the RTX 4060 Ti 16 GB, while the RTX 3060 12GB is now within spitting distance of the RTX 3080. Nonetheless, the RTX 4060 Ti 16GB still retains the lead.
As you can see from these last couple of results, the more intensive the upscaling workload, the closer the RTX 3080 inches to the 60-class competition.
The Best Value Graphics Card for Stable Diffusion XL
When it comes to AI models like Stable Diffusion XL, having more than enough VRAM is important. From the testing above, it’s easy to see how the RTX 4060 Ti 16GB is the best-value graphics card for AI image generation you can buy right now.
You can head to
Stability AI’s GitHub page to find more information about SDXL and other diffusion models by Stability AI.