In this corner, wearing the red shorts: Fury X
We’ve known about Fiji for months: AMD’s latest and greatest GPU, with HBM (High Bandwidth Memory) capable of an impressive 512GB/s of memory bandwidth. Couple that with 4096 shader units compared to the previous 2816 in Hawaii’s 290X/390X and on paper we can expect some impressive results. Considering the GPU clock speeds are similar, in theory, the Fury X ought to be somewhere around 35-45 percent faster than the newly released R9 390X. When we reviewed the 390X, we found that 980 Ti typically outperformed that GPU by 25 percent, which meant there was a real chance AMD emerge victorious and the Fury X would reign as the highest performance desktop GPU. AMD even released some preliminary performance results, showing their new GPU besting 980 Ti and Titan X in a variety of games. AMD looked ready to cash in on enthusiasts waiting for the Next Big Thing™.
But a funny thing happened on the way to the bank. We received a card for benchmarking…sort of. The whole of Future US, which includes Maximum PC, PC Gamer, and TechRadar among others, received one Fury X for testing. We asked for a second, since our GPU testing is done at a different location, but to no avail. And we’re not alone—eTeknix for example reported last week that AMD was planning to sample only ten Fury X cards to the whole of Europe, which is awfully strange for a top tier product. We have no idea how many Fury X GPUs actually went out to press, but unfortunately Maximum PC had to share with PC Gamer. (No worries—PC Gamer had to share 980 Ti launch hardware with us, so turnaround is fair play, right?)
Anyway, we had a small problem, since our GPU test bed is in a different state. We could have tried to overnight the card and then spend a frantic eight hours testing Fury X. In retrospect, this probably would have been best, but instead we did our best to put together a similar test bed at Future US HQ, eliciting benchmarking help from PC Gamer. The idea was to mirror our GPU test system, with parts as close to identical as possible. That didn’t quite work out; here are the two test systems.
Maximum PC Fury X and GPU Test Beds | ||
Fury X Test Bed | All Other GPUs | |
CPU | Intel Core i7-5960X (4.2GHz Overclock) | Intel Core i7-5930K (4.2GHz Overclock) |
Mobo | ASRock Fatal1ty X99X | Gigabyte GA-X99-UD4 |
GPUs | AMD Radeon Fury X | Nvidia GeForce GTX Titan X EVGA GeForce GTX 980 Ti ACX2.0 EVGA GeForce GTX 980 SC Zotac GeForce GTX 970 EVGA GeForce GTX 960 ACX2.0 AMD Radeon R9 290X |
SSD | Samsung 850 Pro 1TB | 2x Samsung 850 Evo 250GB |
HDD | Seagate Barracuda 3TB 7200RPM | Seagate Barracuda 3TB 7200RPM |
PSU | Corsair RM750 | EVGA SuperNOVA 1300 G2 |
Memory | Corsair Vengeance 32GB DDR4-2666 | G.Skill Ripjaws 16GB DDR4-2666 |
Cooler | Cooler Master Nepton 280L | Cooler Master Nepton 280L |
Case | Cooler Master CM Storm Trooper | Cooler Master CM Storm Trooper |
We ended up with an i7-5960X instead of i7-5930K, 4x8GB DDR4-2666 instead of 4x4GB RAM, and an ASRock motherboard instead of Gigabyte. But the CPU was still overclocked to 4.2GHz, so results should be pretty close to comparable. We also checked performance with Titan X in both systems, and other than some minor discrepancies at 1080p (where things like motherboard BIOS/firmware optimizations and CPU bottlenecks may be more apparent), the scores were within a couple of percent.
[Ed—The Fury X should be on its way to our GPU test labs by the time you read this; we’ll update performance as soon as we’re able. We don’t expect any significant changes, but should that occur we will make note of it as needed.]
Based on specifications and initial reports, we had high hopes for the Fury X topping the charts, though there were always a few concerns. The biggest is the memory configuration. HBM uses a silicon interposer, sort of like a simplistic microchip of sorts, to route all the traffic from the HBM modules to the GPU. The reason the silicon interposer is required is that each HBM module has a 1024-bit interface, which would be very difficult to route using traditional methods. The catch is that the silicon interposer has to be quite large—large enough in area for the GPU core along with the HBM modules. And Fiji is a big chip to begin with, meaning the interposer is effectively about as large as it’s possible to manufacture. The net result is that while in theory it should be possible to use 2-8 HBM modules, space constraints limited AMD to four modules. And since each module for HBM 1.0 is 1GB, that means 4GB total graphics memory—less than the new R9 390/390X as well as the GTX 980 Ti. If you happen to run games at settings and resolutions that exceed 4GB VRAM use, performance could suffer.
AMD Fury X/390X and Nvidia GTX 980 Ti/980 Specs | ||||
Card | R9 390X | GTX 980 Ti | R9 390X | GTX 980 |
GPU | Fiji | GM200 | Hawaii (Grenada) |
GM204 |
GCN / DX Version | 1.2 | DX12.1 | 1.1 | DX12.1 |
Lithography | 28nm | 28nm | 28nm | 28nm |
Transistor Count (Billions) | 8.9 | 8 | 6.2 | 2.1 |
Compute Units (SM) | 64 | 22 | 44 | 16 |
Shaders | 4096 | 2816 | 2816 | 2048 |
Texture Units | 256 | 176 | 176 | 128 |
ROPs | 64 | 96 | 64 | 64 |
Core Clock (MHz) | 1050 | 1000 | 1050 | 1216 |
Memory Capacity | 4GB | 6GB | 8GB | 4GB |
Memory Clock (MHz) | 1000 | 1750 | 1500 | 1750 |
Bus Width (bits) | 4096 | 384 | 512 | 256 |
Memory Bandwidth (GB/s) | 512 | 336 | 384 | 224 |
TDP (Watts) | 275 | 250 | 275 | 165 |
Price | $649 | $649 | $429 | $499 |
So there’s the competitive landscape at the top of the pricing stack—we’ve left off Titan X, though it’s mostly the same as 980 Ti only with twice the GDDR5 and 3072 CUDA cores. AMD’s R9 390X takes on the GTX 980, and performance is generally competitive even if power requirements are not. Fury X meanwhile is going up against the best of the best, supposedly with memory bandwidth and compute performance to spare. But we can’t just trust theoretical performance and specifications; drivers and other elements can come into play. This is why we play the games, fight the fights, and run the benchmarks.
We’re running the same collection of tests as in our 980 Ti review, but we’re making a couple of additions. First, Grand Theft Auto V is an absolute beast to run at maximum settings. Last time we decided it would be interesting to max everything out, but it can really push GPUs with less than 6GB VRAM. We still have those numbers, and we’ll show them below, but for our averages we’ve turned off the Advanced Graphics settings—they’re nice, and perhaps future GPUs will enable us to run with those settings maxed out at 60 fps, but right now even the mighty Titan X struggles with such settings. Second, The Witcher 3 has been accused of having a poorly optimized HairWorks code that “punishes” AMD GPUs; we’ll see if that accusation holds any merit in a moment, as we’ve also run the game at the Ultra settings only without HairWorks. It potentially helps level the playing field, and it also improves frame rates across all hardware. Otherwise, the remaining games are run as before: Batman: Arkham Origins is maxed out with 4xMSAA but no PhysX, Hitman: Absolution runs at Ultra with 4xMSAA, Metro: Last Light maxes out all settings but leaves off SSAA and Advanced PhysX, Middle-Earth: Shadow of Mordor uses the Ultra preset, Tomb Raider run the Ultimate preset, and Unigine Heaven 4.0 runs at Ultra quality with Extreme tessellation.
Before we get to the benchmarks, we also need to make note of a couple of final items. First, our reference GTX 980 Ti is back at HQ as well (for a photography session, if you must know), which meant we had to test with EVGA’s GTX 980 Ti SC cards. These are clocked at 1100MHz core, compared to 1000MHz stock, and GPU Boost can go higher still. The net result is that the EVGA 980 Ti ends up outperforming the Titan X in all of our benchmarks—the higher clock speeds are more important than 12GB VRAM, which should come as no surprise. There are plenty of factory overclocked 980 Ti cards, so it’s not entirely unreasonable, but we the Fury X at least is running stock and should have some juice left for overclocking enthusiasts. (We’ll report on that in a separate article, once we’ve had more time to test.) Second, we didn’t have time to rerun benchmarks on all of our Nvidia GPUs, but we did retest the 980 Ti with the latest 353.30 drivers, and all of our less demanding GTAV and The Witcher 3 results use the new drivers. The reason we mention this is that, in addition to being Game Ready for Batman: Arkham Knight, we noticed a measurable increase in performance in several other titles as well, particularly Metro: Last Light. Along with the EVGA
Rounds one through ten: Fight!
When we first heard of the Fiji chip, we were excited and hoping to see some serious competition. AMD sort of delivers, but despite internal benchmarks showing the Fury X leading across a collection of 12 games, we were unable to corroborate those results. In fact, out of our eight games (plus 3DMark Fire Strike), AMD only wins the matchup in Hitman: Absolution, and then only at 1440p and 4K. But no one should be plunking down $650 to game at 1080p, or at least that’s our view of things, so it’s a win. One out of eight isn’t bad, right? It’s also, quite clearly, not what we were hoping to see. Even if we subtract 10 percent from the EVGA 980 Ti results (which is the absolute maximum performance delta we would see from the overclock), at best we’re looking at overall parity, but only at 4K and only with settings that won’t use more than 4GB VRAM.
Performance parity wouldn’t be a bad thing, as competition usually benefits consumers. There are still other factors to consider, however. Notice for example that in general AMD’s performance on newer titles—GTAV and The Witcher 3—is farther off the pace of Nvidia’s GPUs than on older titles? We ran one other game, which we’re reporting separately from the above averages because the launch has been, at best, rocky. That game is Batman: Arkham Knight, and at least with the current drivers AMD’s Fury X struggles yet again. Another factor that we still can’t shake is the 4GB VRAM limitation. When R9 290/290X launched as the top AMD GPUs eighteen months ago, they included 4GB VRAM and it was considered a good choice for a top tier GPU. Now we’re already starting to see games utilize more than 4GB RAM, particularly at higher quality settings. GTAV is the poster child right now for this issue, where cranking up the advanced quality settings means it needs just under 6GB VRAM at 4K, but more importantly it needs slightly more than 4GB VRAM even at 1080p.
Finally, for better or worse Nvidia is doing a very good job of proselytizing their GameWorks libraries, some of which are Nvidia-specific features that won’t work without an Nvidia GPU. Looking at the past nine months or so of major releases, it seems Nvidia has had more “wins” than AMD. AMD had Battlefield: Hardline, Civilization: Beyond Earth, and Dragon Age: Inquisition; Nvidia had Assassin’s Creed: Unity, Dying Light, Far Cry 4, Project CARS, The Witcher 3, and most recently Batman: Arkham Knight. There are other titles we’ve missed, but without reading too much into things it does appear Nvidia is doing better of late getting developers to use GameWorks libraries. If one of your “must have” games works better on AMD hardware, that could easily sway your buying decision, and vice versa for Nvidia. And having more games using Nvidia resources will in general mean more gamers will want Nvidia hardware, right?
Rounds 11-13: TKO?
These three games are perhaps a best case scenario for Nvidia and a worst case scenario for AMD. In short, they show what happens to Fury X when the wheels come off. GTAV uses too much memory at higher quality settings, Batman: Arkham Knight is apparently in need of some patching as well as additional drivers help, and The Witcher 3 with HairWorks taxes the tessellation hardware and can experience a big hit to performance on many GPUs. All of these games would benefit from updated drivers as well, no doubt. And objectively, having tested many games with GPUs from both companies over the past couple of years, Nvidia is winning the drivers wars of late.
By our count, Nvidia had ten WHQL driver releases in 2014 and has already added eight more in 2015; AMD by contrast had four WHQL drivers in 2014 and hasn’t had a new WHQL driver since Omega last December, though that will hopefully change soon now that the 300 series and Fury X have launched. There was a time when AMD tried to release a new WHQL Catalyst driver every month, but tying the driver cycle to each month probably wasn’t the best approach. Nvidia for most of the past year has been aggressively working to get Game Ready drivers out for all major game launches; it doesn’t always guarantee a perfect Day 0 experience [Ed—Looking at you, Arkham Knight!], but it’s better than the alternative.
Preparing for the Inevitable Rematch
There were times leading up to this heavyweight championship bout that we really thought Fury X would pull off the win. And to their credit, niggles with drivers and hardware sampling notwithstanding, Fury X gives a decent showing. But those hoping to see the “underdog” AMD pull off an upset and surpass the Titan X, never mind the identically priced GTX 980 Ti, are going to have to wait and see how things develop going forward.
We can’t help but feel that there’s plenty of room left to improve Fury X performance with driver updates. It has 33 percent more memory bandwidth than the already well-fed 390X, and shader computational performance should be up to 45 percent faster than the 390X. We’re also not looking at situations where we’re CPU limited or VRAM capacity limited, so why then does the Fury X only average 18.5 percent faster than 390X across all of our tests? Like we said: drivers.
The Fiji architecture is the first new high performance architecture for AMD since fall of 2013. (We don't count Tonga, as it was effectively a lateral move from Tahiti.) AMD has had plenty of time to improve their drivers on the Hawaii architecture, but Fiji changes the playing field. Not only does it sport a third more shaders, but it also has a different memory subsystem to contend with. 512GB/s of bandwidth is all well and good, but if latencies and other elements have changed compared to GDDR5—and they almost certainly have—previous “best practice” driver code from AMD may no longer be properly tuned.
It wouldn’t be the first time something like this has happened to AMD, either; AMD discovered more than a year after the launch of Tahiti that they had missed out on a lot of potential performance. Their “frame pacing” driver optimizations helped to improve both the smoothness of the gaming experience as well as performance in general, but even after determining there was work to be done it was nearly a year before the second “frame pacing driver” was released to the public. Hopefully AMD can improve Fiji performance much more quickly, and if so they may actually come out on top—provided you don’t need more than 4GB VRAM.
There is good news to be had as well. Besides providing competitive if not chart topping performance, Fury X also uses a similar amount of power to 980 Ti in our testing. We ran tests recently to look at power requirements, and where 390X and GTX 980 offer pretty similar performance, the 390X ended up using 125W more power during gaming sessions than the 980—that’s almost enough to power a second GTX 980! The Fury X on the other hand used 60W less power under load than the 390X, with power use falling between that of the Titan X and EVGA 980 Ti—and the overclock on the EVGA card caused it to use 20W more than the Titan X. Considering the 275W TDP, the similar real-world power use is good news.
There are other elements with the Fury X that can be good or bad, for example the built-in CLC (Closed Loop Cooler). It’s pretty awesome to see a GPU this fast packed into a 7.5-inch card, but the CLC definitely increases the overall space requirements. It’s also a potential issue for anyone that might want to put together a 2-way or 3-way CrossFireX setup. Finding a spot for one radiator isn’t too difficult, but two or more becomes a lot more cumbersome and necessitates a larger case. This is definitely a niche market, as $650 is more than most are willing to spend on a graphics card and $1300 is more than many entire gaming PCs. Still, swapping between the Fury X and other GPUs definitely wasn’t a highlight of the review process. AMD does have an air cooled Fury card scheduled to launch on July 14, with a price $100 south of the Fury X, and it might be a better fit for smaller cases. We also need to check out overclocking, as the CLC should keep the GPU cool and allow users to push the Fury X far beyond the factory clocks. Stay tuned for that.
It takes grit to enter the ring against the reigning heavyweight champion, and Fury X managed to land a few solid punches in the early going. As the match progressed, however, 980 Ti proved to have more stamina and legs. This one didn’t come down to a split decision, and there was little in the way of referee controversy; Fury X just wasn’t quite ready for the belt. It’s a product with plenty of guts, but it also some bad habits picked up in the amateur ranks. With some proper training in the form of drivers, Fury X could come back as a force to be reckoned with. The question is whether that will be in a few weeks, months, or possibly it will take so long that 980 Ti and Titan X will be replaced by even more formidable hardware. We’ll be in line for tickets as soon as a rematch is announced, though we still have reservations about Fury’s 4GB glass chin.
[Ed—Final score pending further discussion. It is NOT a zero.]
From maximumpc
from http://bit.ly/1HdVs1k