Nvidia GeForce RTX 4000 GPU Series Detailed

Table Of Contents

8K Finally
More Transistors
More SKUs Incoming
Other Announcements
Price
Lineup
Performance
DLSS 3.0
Some Thoughts on Ada Architecture
Scalping & Price Increases

Nvidia introduced the new RTX 4000 series of GPUs with the RTX 4090. In addition, the computer chip manufacturer revealed a new RTX platform and tools, as well as a remastered version of Portal for the RTX platform. This new GPU design is based on the 4N manufacturing process, resulting in chips with 76 billion transistors and 90 teraflops of computing capability. Additionally, the ray-triangle intersection throughput of the RT cores has been doubled. In the meanwhile, the new tensor core will support FP8 and power DLSS 3.0, which can now employ a new AI method to build whole frames, as opposed to just upscaling pixels. According to Nvidia, DLSS 3.0 may increase gaming performance by a factor of four compared to traditional rendering. Nvidia demonstrated how DLSS 3.0 can quadruple the frame rate of Microsoft Flight Simulator in a demonstration.

The RTX 4090 will be available on October 12 for $1,599. This is $100 higher than the RTX 3090, its predecessor.

When Nvidia launched its Ada Lovelace family of graphics processing units earlier this week, it mostly highlighted the AD102 GPU and the GeForce RTX 4090 graphics card. It did not provide a great deal of information on its AD103 and AD104 graphics chips. Today, Nvidia posted their Ada Lovelace whitepaper, which offers an abundance of information regarding the upcoming GPUs and fills in several holes. We’ve updated the RTX 40-series GPUs with everything we know about with the latest information, but here’s a summary of the most noteworthy new features.

8K Finally

We are already aware that the AD102 is a 608-mm2 GPU with 76.3 billion transistors, 18,432 CUDA cores, and 96MB of L2 cache. AD103 is a 378.6 mm2 graphics processor equipped with 45.9 billion transistors, 10,240 CUDA cores, and 64MB L2 cache. Regarding the AD104, its die size is 294,5 mm2, it has 35.8 billion transistors, 7680 CUDA cores, and 48MB of L2.

GPU/Graphics Card	Full AD102	RTX 4090	RTX 4080 16GB	RTX 4070 12GB	RTX 3090 Ti
Architecture	AD102	AD102	AD103	AD104	GA102
Process Technology	TSMC 4N	TSMC 4N	TSMC 4N	TSMC 4N	Samsung 8LPP
Transistors (Billion)	76.3	76.3	45.9	35.8	28.3
Die size (mm^2)	608	608	378.6	294.5	628.4
Streaming Multiprocessors	144	128	76	60	84
GPU Cores (Shaders)	18432	16384	9728	7680	10752
Tensor Cores	576	512	320	240	336
Ray Tracing Cores	144	144	80	60	84
TMUs	512	512	304?	240	336
ROPs	192	192	112	80	112
L2 Cache (MB)	96	96	64	48	6
Boost Clock (MHz)	?	2520	2505	2600	1860
TFLOPS FP32 (Boost)	?	82.6	48.7	40.1	40.0
TFLOPS FP16 (FP8)	?	661 (1321)	390 (780)	319 (639)	320 (N/A)
TFLOPS Ray Tracing	?	191	113	82	78.1
Memory Interface (bit)	384	384	256	192	384
Memory Speed (GT/s)	?	21	22.4	21	21
Bandwidth (GBps)	?	1008	736	504	1008
TDP (watts)	?	450	320	285	450
Launch Date	?	Oct 12, 2022	Nov 2022?	Nov 2022?	Mar 2022
Launch Price	?	$1,599	$1,199	$899	$1,999

In its whitepaper, Nvidia mentions that Ada Lovelace GPUs utilise high-speed transistors in crucial pathways to increase maximum clock rates, which is an intriguing fact. Consequently, its fully-enabled AD102 GPU with 18,432 CUDA cores is “capable of operating at clock speeds more than 2.5 GHz while keeping the same 450W TGP.” In light of this, it does not surprise us that the business has achieved 3.0 GHz rates for the GeForce RTX 4090 (16,384 CUDA cores) in its laboratories. At 3.0 GHz, the GeForce RTX 4090 will unquestionably top our list of the greatest graphics cards currently available.

In addition to high clock speeds, the Ada Lovelace GPU from Nvidia has huge L2 caches that boost performance in compute-intensive applications (e.g., ray tracing, route tracing, simulations, etc.) and minimize memory bandwidth needs. Essentially, Nvidia’s Ada GPUs steal a leaf from RDNA 2 Infinity Cache’s playbook here, but we think that the new architecture’s main goals were established long before the 2020 launch of AMD’s Radeon RX 6000-series devices.

In the realm of supercomputers, simulations are run utilizing double-precision floating-point (FP64) numbers to increase the accuracy of the findings. FP64 is more expensive than FP32 in terms of both performance and hardware complexity. This is why FP32 formats are used for computer graphics and why many simulations of non-critical assets are performed using FP32 precision. In contrast, the AD102 GPU has just 288 FP64 cores (two per streaming multiprocessor) to assure proper operation of any applications containing FP64 code, including FP64 Tensor Core code.

Nonetheless, the FP64 rate of AD102 is 1/64th the TFLOP rate of FP32 operations (which is in line with the Ampere architecture). Nvidia does not represent its FP64 cores in schematics of its streaming multiprocessor (SM) modules, nor does it reveal the amount of such cores in its AD103 and AD104 GPUs. The low FP64 rate of Ada graphics processors highlights the fact that these components are mainly intended for gaming.

More Transistors

Complexity and die size differences between Nvidia’s Ada Lovelace and Ampere GPUs are not unexpected. The new Ada GPUs are manufactured using TSMC’s 4N (5nm-class) fabrication methods, while Ampere was manufactured using Samsung Foundry’s 8LPP (a 10nm-class node with a 10% optical shrink) process. This extra complexity (transistor count) is what allows DLSS 3.0’s outstanding performance increases in areas such as ray tracing and image quality.

GPU/Graphics Card	AD102	RTX 4090	RTX 4080 16GB	RTX 4070 12GB	RTX 3090 Ti
GPU	AD102	AD102	AD103	AD104	GA102
TFLOPS FP32 (Boost)	?	82.6	48.7	40.1	40.0
TFLOPS FP16 (FP8)	?	661 (1321)	390 (780)	319 (639)	320 (N/A)
TFLOPS Ray Tracing	?	191	113	82	78.1

In addition, the AD102 GPU has a greater transistor density than its less powerful brothers. On the one hand, this 3.6% increase in transistor density enables AD102 to include much more execution units than its smaller counterparts. On the other hand, the reduced transistor density of AD103 and AD104 offers improved yields (provided the node’s defect density is not generally excessive) and faster clocks in many instances.

It is difficult to anticipate the frequency potential of AD103 and AD104 in the absence of real hardware and/or information regarding their actual yield rates. However, since the AD102 can operate at 2.50 GHz 3.0 GHz, it is logical to assume that the AD103 and AD104 are capable of even greater speeds. We also know that the RTX 4080 12GB utilizes a fully enabled AD104 chip operating at 2610 MHz, whilst the RTX 4080 16GB uses 95% of an AD103 chip (76 of 80 SMs) running at 2505 MHz and the RTX 4090 uses just 89% (128 of 144 SMs) running at 2510 MHz, with 25% of the L2 cache deactivated.

The combination of a large number of execution units, enabled by a high level of complexity, with a fast clock rate should result in considerable performance benefits. The GeForce RTX 4090 has a peak theoretical FP32 compute rate that is more than twice as fast as the GeForce RTX 3090 Ti (40 TFLOPS).

The current portfolio of Nvidia’s Ada GPUs for demanding gamers demonstrates that the company’s three-chip strategy for the high-end gaming business is back on track. Typically, Nvidia releases their flagship gaming GPU, a chip with around 66% 75% of the flagship’s resources (e.g., CUDA cores), and a graphics processor with approximately 50% of the flagship’s units. With the Ampere family, this plan was modified as Nvidia’s GA103 chip was primarily geared for laptops and barely made it to desktops (it was also late to the party), but with the Ada generation, Nvidia is back to its three-chip method.

More SKUs Incoming

The gap between the maximum configurations given by the AD102 GPU and the GeForce RTX 4090 graphics card is an intriguing lesson. AD102 has 18,432 CUDA cores, whilst GeForce RTX 4090 includes 16,384 CUDA cores. Such a strategy provides Nvidia with extra flexibility in terms of future yields and the release of new graphics cards, leaving plenty space for the RTX 4090 Ti, RTX 4080 Ti, and RTX 5500/5000 Ada Generation for ProViz markets, among others.

GeForce RTX 4080 16GB and RTX 4080 12GB, meanwhile, use almost complete AD103 and fully-fledged AD104 GPUs, respectively. We cannot predict the future, but we think that AD103 and AD104 GPUs will ultimately be reduced in size. We can only assume about the specs of the GeForce RTX 4070 Ti and/or RTX 4070 based on cut-down bins of the AD104 chip, as well as the possibilities for ultra-high-end graphics solutions for laptops powered by the AD103 graphics processor.

Other Announcements

In this Event, Nvidia RTX, AI, and Omniverse are updated with new chips to power them. It begins with a demonstration of RacerX, an unbaked simulation engine operating on a single GPU.

Several RC vehicle races have occurred so far, and numerous Lego figurines have been destroyed. This ray tracing and simulation may be operating on a new card, as suggested by the presence of a GPU. However, it needs to increase its retention since it is a lengthy demonstration.

In conclusion, Nvidia unveils the Ada Lovelace architecture. Using the TSMC 4N technology and Micron GDDR6X memory, it contains 76 billion transistors.

Additionally, a new streaming multiprocessor with shader execution reordering has been implemented to enhance RT computation. With a 2 to 3 times quicker performance in the horizon. The 4090 aspires to be twice as fast as the 3090 Ti, while the 4080 seeks to be twice as quick as the 3090 Ti.

DLSS 3 is also included to the latest GPUs by Nvidia. Instead of just pixels, it produces new frames by meshing successive frames to enhance their appearance. This also seeks to circumvent CPU bottlenecks by relieving the strain on the CPU. With the showcase, Cyberpunk’s frame rate has increased from around 30 to 90.

It has also shown how Microsoft Flight Sim’s functionalities are severely hindered by CPUs with full features. In addition, RTX and DLSS3 have been introduced to the renowned game Portal. Owners of the game will get a free mod in November.

Price

We believed that the decline in cryptocurrency mining interest and easing in the chip scarcity would result in lower prices for future graphics cards.

Sadly, it does not seem to be the case. NVIDIA’s GPUs are priced much more than those of its competitors.

Following are the price tags:

RTX 4060 – $330 (not sure)
RTX 4070 – $500 (not sure)
RTX 4070 12GB – $899
RTX 4080 16GB – $1199
RTX 4090 – $1599

Changing to TSMC’s 4N process is unquestionably a costly endeavor, but does it warrant a 30% (from $700 to $899) price increase? The 16GB version of the RTX 4080 is 56% more costly than the 12GB version.

In addition, the RTX 4090 costs $100 more than the RTX 3090. A $1,699 Gigabyte GeForce RTX 4090 Gaming OC was seen on Newegg. Comparatively, the Gaming X Tri version is priced at MSRP.

However, the price news for Europe is significantly worse. The RTX 4090 ranges in price from around 1999 EUR to 2569 EUR, depending on the vendor.

So, why are these costs so high?

Either the performance gain from RTX 3000 to RTX 4000 is so significant that it warrants a 50% price rise, or NVIDIA priced these new GPUs higher to allow for the sale of its RTX 3000 GPUs.

Multiple sources indicate that NVIDIA is making every effort to sell RTX 3000 stock, which may seem like a conspiracy. They even attempted to reverse its investment in TSMC in order to limit the RTX 4000 supply.

Lineup

RTX 4000 Series specification table:

Specifications	RTX 3090	RTX 4090 Ti	RTX 4090	RTX 4080 16GB	RTX 4070 12GB	RTX 4070	RTX 4060
GPU Die	GA102	AD102	AD102	AD103	AD103	AD104	AD104
Process	Samsung 8nm	TSMC 5nm	TSMC 5nm	TSMC 5nm	TSMC 5nm	TSMC 5nm	TSMC 5nm
Base Clock Speed	1395 MHz	?	2230 MHz	2510 MHz	2610 MHz	2310 MHz	?
Boost Clock Speed	1695 MHz	?	2520 MHz	2210 MHz	2310 MHz	2610 – 2800 MHz	?
CUDA Cores	10496	18176	16384	9728	7680	7680	4608
Bus Width	384-bit	384-bit	384-bit	256-bit	192-bit	192-bit	192-bit
Memory	24GB GDDR6X	48GB GDDR6X	24GB GDDR6X	16GB GDDR6X	12GB GDDR6X	12GB GDDR6X	8GB GDDR6/X
Memory Speed	24Gbps	24Gbps	21Gbps	23Gbps	21Gbps	21Gbps	?
Bandwidth	936.2 GB/s	?	1008 GB/s	736 GB/s	504 GB/s	504 GB/s	?
L2 Cache	6MB	96MB	96MB	64MB	64MB	48MB	48MB
Total Board Power	350W	800W?	450W	320W	285W	285W	200W
Release Date	September 1st, 2020	TBA 2023	October 12, 2022	November, 2022	November 2022	TBA/Q4 2022	TBA/2022/23

RTX 4090 Ti

The revelations about NVIDIA’s “real” flagship are quite hazy. This AD102 chip, which has been dubbed to as the “Titan,” might be renamed the 4090 Ti or get an entirely other moniker.

This flagship SKU will reportedly have 12,432 CUDA cores and 48GB of GDDR6X memory operating at 24 Gbps. The bandwidth will be much more than one terabyte, given that the standard 4090 will have one terabyte of bandwidth.

RTX 4090

The GeForce RTX 4090 has 16384 CUDA cores, compared to 10496 on the GeForce RTX 3090. So about a 50% increase in CUDA, but it also boosts up to 2520 MHz, which, again, is quite higher than the 3090.

That’s not the only thing that NVIDIA has increased, though. It is rated at 450W TDP, 100W more than its predecessor.

RTX 4080

In comparison to the previous generation, the 4080 has roughly one thousand more CUDA cores than the 3080. Despite the fact that the specifications comparing the two may be lesser in certain respects, the quicker clock speed will offer the upcoming 4080 a significant advantage over its predecessors.

This card will be formidable. The reason for the generational jump between the 20-series and 30-series was primarily to ensure that RTX functioned and that 4K games could be played without limits.

According to sources, the Nvidia RTX 4080 will be reduced in some capacity,’ providing a variant of the ‘full’ processor inside the anticipated Ti iteration.

Flight Simulator does not get a significant speed improvement on the RTX 4080 16GB, although this might be down to the CPU being utilized. Meanwhile, the Cyberpunk 2077 benchmark surpasses three times the performance of Darktide, while the Cyberpunk 2077 benchmark surpasses three times the performance of Darktide.

The card is meant to be a premium piece of hardware that performs as advertised. It won’t even leave your PC for a very, very long time, since Nvidia’s DLSS and AMD’s FSR supersampling will prolong its lifespan much beyond what was initially envisioned.

With DLSS 3, we anticipate them to endure for at least five or six years before requiring a complete update.

RTX 4070

The Nvidia RTX 4070 will not be the most powerful graphics card in terms of specifications, but if you’re searching for a mix between price and performance, it might be a good choice.

Similar to the GeForce RTX 3070, the green team’s next-gen GPU should match well with mid-range gaming PCs, and it may even convert your setup into a ray tracing powerhouse at a lower price than the RTX 4080.

The Nvidia GeForce RTX 4070 Ti may possibly exist, and the “full fat” AD104 processor is said to be able to compete with the RTX 3090 Ti. The model is also reportedly equipped with 12GB of GDDR6X memory, but will need up to 400W of power.

RTX 4060

The Nvidia RTX 4060 may be bound for the lightweight GPU ring, but it will almost certainly be one of the finest graphics cards for cheap gaming PC setups. Although the card will ultimately replace the current-generation GeForce RTX 3060, it may outperform its high-end forebears, despite not being at the top of the RTX 4000 food chain.

The RTX 4060 will likely be less desirable than the RTX 4090 and RTX 4080, but there will be plenty of supply. The GeForce RTX 4060 has not yet been issued an official pricing by Nvidia, but if the RTX 3060’s MSRP is any indication, it will likely cost $329 USD. Given that the RTX 4080 and RTX 4090 are priced somewhat more than their previous-generation equivalents, it is logical to assume that the RTX 4060 will be priced similarly.

Obviously, gaming laptops with discrete variations of the processor will cost more, but current prices indicate that you may be able to get one for roughly $1,300 USD.

If the RTX 4060 utilizes an AD104 GPU, the graphics card will include 7,680 CUDA cores, a 192-bit bus, and 48MB of L2 cache.

Performance

During the launch of the RTX 4090 and the two revisions of the RTX 4080, NVIDIA also released performance data.

In Cyberpunk 2077, the RTX 4090 is said to be at least 2x faster than the RTX 3090 Ti, and maybe more than 4x quicker.

However, consider the following footnote. It utilizes DLSS Frame Generation or DLSS 3.0, which purportedly increases FPS by twofold. More on DLSS 3.0 to follow.

In this Overwatch test, for instance, the RTX 4090 is twice as quick as the RTX 3080, which is not exactly what NVIDIA promised.

The new flagship GPU also seems to perform well in productivity activities, halving the time required to export 3D renderings and edit videos.

The RTX 4080 16GB is twice as fast as the RTX 3080 Ti and up to three times as fast in Cyberpunk 2077 with the new Ray Tracing Overdrive.

Again, DLSS Frame Generation is enabled. Cyberpunk 2077 will soon have a new “Overdrive Mode” for ray tracing, which seems to be the source of the enormous speed boost.

In certain “Next Generation” games, RTX 4000 series GPUs may provide up to four times the performance of GPUs from the previous generation. What does this entail for “normal” games without DLSS or ray tracing?

The RTX 4080 12GB struggles to provide comparable FPS to the RTX 3090 Ti, but the 16GB model is just marginally quicker. The 12GB RTX 4080 is only around 10% quicker than its predecessor.

To understand the source of the claimed performance increases, we must discuss Nvidia’s upcoming DLSS 3.0 release.

DLSS 3.0

DLSS 3.0 is a significant improvement over DLSS 2.0, increasing FPS by up to fourfold. Unfortunately, only RTX 4000 GPUs will support DLSS 3.0. (or newer).

With the so-called “Optical Flow Accelerator,” a new component of the RTX GPUs, and the addition of Optical Multi Frame Generation, NVIDIA delivers significant FPS increases.

However, DLSS 3 will not be accessible on RTX 3000 and 2000 cards due to their absence of an Optical Flow Accelerator.

There are continuing discussions as to whether these statements are legitimate or just a marketing ploy by Nvidia to entice people to purchase their newest GPUs.

Regardless, it is an attractive piece of technology; let’s see what it can achieve!

Similarly, the recent ray tracing upgrade for Portal demonstrates a similar speed boost.

Some Thoughts on Ada Architecture

The design is named after the mathematician Ada Lovelace, who with Charles Babbage worked on the first mechanical general-purpose computer.

The Ada Lovelace architecture is qualitatively and quantitatively superior than the Ampere design, according to Nvidia. Nvidia not only significantly improved the architectural performance of its ray tracing, tensor cores, and other units, but also expanded their number and raised their clock speeds. Compared to Ampere GPUs, the vastly expanded L2 caches of Ada GPUs represent a significant improvement.

The Nvidia GPU-optimized 4N process technology from TSMC was largely responsible for these advances. In addition, the business used high-speed transistors to raise the frequency of its new graphics processors, resulting in significant performance improvements.

GeForce RTX 4080 and 4090 graphics cards are priced much more than their immediate predecessors due to a cutting-edge manufacturing node and enormous die sizes.

GeForce RTX 4080 12GB, RTX 4080 16GB, and RTX 4090 graphics cards for desktops, together with the RTX 6000 Ada generation for workstation/datacenters and L40 (Lovelace 40) boards for high-end workstations and virtualized workstation settings.

Given that the business may provide full-fat AD102 GPUs as well as trimmed-down versions of AD102, AD103, and AD104 GPUs, we can anticipate a large number of new GeForce RTX 40-series cards for client computers and Ada RTX-series systems for datacenters. Meanwhile, Nvidia is likely preparing smaller GPUs (AD106, AD107), thus it seems that the Ada Lovelace product family will be at least as extensive as the Ampere series.

Scalping & Price Increases

Since the advent of the RTX 3000 series a few years ago, scalping has been a major phenomenon.

This new tendency soared the price of new GPUs while drastically reducing their availability. This deficit persists in 2022, as it did in 2021 and 2020.

Even the top semiconductor producers in the world, TSMC and Samsung, struggle to meet demand.

Currently, you may be wondering whether this kind of scalping will harm the RTX 4000 series. Unfortunately, we cannot be certain, but we do know that both TSMC and Samsung are making investments to increase their production capacity.

But with the current economy and mining crash, I think there won’t be any problems getting GPUs.