UPDATE Dec 21 2013
I take back everything negative I said about this card. Catalyst 13.12 and od6config resolve all issues I identify below. Performance increase over the HD 7970 ranges from 23%-60%, with 64-bit hashes seeing the least benefit, and DES-based hashes seeing the biggest gains. Overall the 290X is 36.7% faster than the HD 7970 on average, which is greater than the projected 35% performance increase.
With every generation, AMD finds some new and exciting way to piss us off. They've really outdone themselves this time with the release of the R9 290X. I was so excited to try this new card out. It looked so promising. With a 35% increase in stream processors over the HD 7970 and 5.9 TFLOPS of raw compute power, on paper it looked like a single GPU card that was faster than the dual-GPU HD 6990. So you can imagine how stoked I was when I finally got my paws on eight of these shiny new bastards.
The excitement increased as I pulled one of the boxes out, and was greeted by this badass alien robot demon cyborg guy:
I really dig the new shroud design as well. Not at all professional, but it appeals to my inner child:
I was too excited to build a new system just for these cards. I wanted to test them RANNOW. So I cannibalized a system that I had already built out with four HD 7990s, and stuffed it full of 290Xs.
Damn, these cards look fantastic stacked on top of each other!
All right, let's power up the system and check them out.
Hah, awesome. The device identifies itself as an HD 8970. That will likely cause some confusion with the other Tahiti-based HD 8970 that is out there. The driver gets it right though.
And here's the truncated clinfo output as well while we're at it:
X failed to start of course, because this system has Catalyst 13.9 on it, and 13.9 doesn't support the R9 line. Not a problem, upgrade real quick to 13.11beta6 and reboot.
Great, X came up this time. Now we're ready to launch Hashcat and start having fun! Erm, wait... Something is missing... I didn't hear the fans spin up. We set the fan speed in Xsetup so that the fans spin up when X starts. But there is no fan noise at all, and these new GPUs are supposed to be insanely loud... let's investigate.
That's awesome. Apparently we can't get/set the fan speed on this new card. Oh right, because this GPU uses the latest version of PowerTune which manages the fan speeds for you in two different modes: "Quiet," and "Uber." In "Quiet" mode, the maximum fan speed is 40%. In "Uber" mode, the maximum fan speed is 55%. No ability to manually set the fans to 100%. That couldn't possibly be problematic.
Let's see how the temps are doing at idle:
Hm, they're idling kind of high. Ambient temp in the office is about 21 C, and usually we see GPUs in here idling under 25C. I wonder what the clocks are set at?
Well that's different... Seems the output has changed a tad. Doesn't show the peak clocks or configurable range for some reason. But it now displays the bus speed and bus lane information. Great to know that all eight are running at 16x, I guess? Not sure why I'd give two shits about that. And seriously, why isn't it showing me the peak clocks and the configurable range?
I'm assuming that, like the HD 7990, these cards run at the boost clock of 1000 Mhz out of the box, instead of the rather odd base clock of 727 Mhz. Let's try to manually set the clock at some value in the middle, like 850 Mhz, and see how it does there:
WTF? Oh, right, the R9 line uses Overdrive6 instead of Overdrive5. Goes hand-in-hand with the latest PowerTune. So, we have to set different clocks for different power states and performance levels? I'm not really sure what the valid values are here for these parameters. Can't seem to find anything in the help, or on Google. Fuck it, let's just start messing with it and see what happens:
So, nothing happens. No confirmation that the clocks were set like we normally would get. Not getting any errors, except for the error we get if we set the performance level greater than 2 (which makes sense.) Since amdconfig no longer shows the peak clock, I have no idea what the clocks are set to...
Ok, let's shelve that for now and just check out oclHashcat already, so that we can start having fun again. atom was kind enough to push b49 out this morning, which has R9 support. This card can still redeem itself if it performs well.
What the actual fuck. It's taking FOREVER to enqueue the kernels. Like 30-mississippi seconds for each kernel. Four minutes to initialize Hashcat with eight GPUs. I'm not talking about the kernel compile time. This happens even with cached kernels. This is the time it takes to actually enqueue a kernel on each device for execution. WTF. This makes benchmark mode absolutely unusable, because it takes 5 minutes to benchmark each algorithm, and since it suppresses the 'loading kernel' messages, you're left wondering if Hashcat has hung.
Once you do get oclHashcat running, performance is lackluster at best. At "stock clocks," it's 4.8% slower than an HD 7970 at 925 Mhz. And I put that in quotes, because thanks to PowerTune, the clocks are all over the place. And even though I (attempted to) set the maximum clock to 1000 Mhz, the clocks aren't going over 855 Mhz.
click for larger image
One nice thing about this new version of PowerTune, however, is that it automatically downclocks the memory clock for ALU-bound workloads. This is something we used to do manually on the HD 5000 series, then lost the ability to do on the HD 6000 and HD 7000 series. I really like this feature. This is the only positive change I am able to identify so far, and I pray it sticks around.
I still don't understand why this isn't going above 855 Mhz under 100% load on an ALU-bound algorithm. The stock boost clock is 1000 Mhz, and we manually set the clock to 1000 Mhz, yet it won't go anywhere near that. And it's not a heat issue, because the temps look really good right now. We're 30 minutes in, and the temps are all below 65C.
Ok, something weird just happened. The clocks just all spiked up to 940 Mhz, then immediately dropped down to 525 Mhz. They stayed at 525 Mhz for about 10 seconds, enough for the temps to drop down to around 54C, and then went back up to 848 Mhz. How strange is that??
I can't seem to disable this PowerTune crap either. On the HD 7950, it wasn't too bad, because we could use ADL to set PowerTune to "+20%", whatever that means, and it would behave properly. Using the same method on this card yields fuckall. Same thing with using ADL to manually set the power state or performance level. This card just ignores all requests to let the user manage it.
I'm really hoping some of these things are driver issues, even though I'm using the latest beta driver, but I have a feeling these issues are here to stay. Not being able to manually manage the device, the inability to get/set fan speed, disable PowerTune, or set the clocks, is an absolute deal-breaker. And seriously, what the actual fuck is up with it taking 30 full fucking seconds to queue a kernel!?
I'll keep playing with this, and I'll keep an eye out to see if others discover any fixes/workarounds for these issues. But until we get this shit sorted out, I'd highly recommend you steer clear of this new R9 line. Stick to the HD 7970 for now rush out and buy one immediately!
UPDATE Nov 17 2013
I've confirmed that the management issues are due to this being an Overdrive6 card, as opposed to Overdrive5. The tools we currently use to manage our GPUs still rely on Overdrive5. Using ADL SDK 6.0, I wrote a small program called od6config to manage the GPUs using Overdrive6. It works as expected, so the fan speed and clock rate issues are resolved.
The only outstanding issue is the PowerTune crap and kernel load time. Using Overdrive6 I can set the power control threshold to +50 (new PowerTune accepts values from -50 to +50 vs the old version which accepted values from -20 to +20), but Hashcat performance is still erratic. And it still takes 30 seconds to enqueue a kernel.
UPDATE Dec 21 2013
Catalyst 13.12 resolves all remaining issues!!
I take back everything negative I said about this card. Catalyst 13.12 and od6config resolve all issues I identify below. Performance increase over the HD 7970 ranges from 23%-60%, with 64-bit hashes seeing the least benefit, and DES-based hashes seeing the biggest gains. Overall the 290X is 36.7% faster than the HD 7970 on average, which is greater than the projected 35% performance increase.
With every generation, AMD finds some new and exciting way to piss us off. They've really outdone themselves this time with the release of the R9 290X. I was so excited to try this new card out. It looked so promising. With a 35% increase in stream processors over the HD 7970 and 5.9 TFLOPS of raw compute power, on paper it looked like a single GPU card that was faster than the dual-GPU HD 6990. So you can imagine how stoked I was when I finally got my paws on eight of these shiny new bastards.
The excitement increased as I pulled one of the boxes out, and was greeted by this badass alien robot demon cyborg guy:
I really dig the new shroud design as well. Not at all professional, but it appeals to my inner child:
I was too excited to build a new system just for these cards. I wanted to test them RANNOW. So I cannibalized a system that I had already built out with four HD 7990s, and stuffed it full of 290Xs.
Damn, these cards look fantastic stacked on top of each other!
All right, let's power up the system and check them out.
Code:
opencl@sagitta:~$ lspci | grep VGA
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
85:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
86:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
89:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
8a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
Hah, awesome. The device identifies itself as an HD 8970. That will likely cause some confusion with the other Tahiti-based HD 8970 that is out there. The driver gets it right though.
Code:
opencl@sagitta:~$ amdconfig --lsa
* 0. 04:00.0 AMD Radeon R9 290 Series
1. 05:00.0 AMD Radeon R9 290 Series
2. 08:00.0 AMD Radeon R9 290 Series
3. 09:00.0 AMD Radeon R9 290 Series
4. 85:00.0 AMD Radeon R9 290 Series
5. 86:00.0 AMD Radeon R9 290 Series
6. 89:00.0 AMD Radeon R9 290 Series
7. 8a:00.0 AMD Radeon R9 290 Series
And here's the truncated clinfo output as well while we're at it:
Code:
opencl@sagitta:~$ clinfo
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Board name: AMD Radeon R9 290 Series
Device Topology: PCI[ B#-118, D#0, F#0 ]
Max compute units: 44
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1000Mhz
Address bits: 32
Max memory allocation: 1073741824
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00007f64ee7b04c0
Name: Hawaii
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 1348.4 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1348.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer
X failed to start of course, because this system has Catalyst 13.9 on it, and 13.9 doesn't support the R9 line. Not a problem, upgrade real quick to 13.11beta6 and reboot.
Great, X came up this time. Now we're ready to launch Hashcat and start having fun! Erm, wait... Something is missing... I didn't hear the fans spin up. We set the fan speed in Xsetup so that the fans spin up when X starts. But there is no fan noise at all, and these new GPUs are supposed to be insanely loud... let's investigate.
Code:
opencl@sagitta:~$ DISPLAY=:0.0 amdconfig --pplib-cmd "get fanspeed 0"
PPLIB command execution has failed!
ati_pplib_cmd: execute "get" failed!
opencl@sagitta:~$ DISPLAY=:0.1 amdconfig --pplib-cmd "get fanspeed 0"
PPLIB command execution has failed!
ati_pplib_cmd: execute "get" failed!
opencl@sagitta:~$ DISPLAY=:0.2 amdconfig --pplib-cmd "get fanspeed 0"
PPLIB command execution has failed!
ati_pplib_cmd: execute "get" failed!
That's awesome. Apparently we can't get/set the fan speed on this new card. Oh right, because this GPU uses the latest version of PowerTune which manages the fan speeds for you in two different modes: "Quiet," and "Uber." In "Quiet" mode, the maximum fan speed is 40%. In "Uber" mode, the maximum fan speed is 55%. No ability to manually set the fans to 100%. That couldn't possibly be problematic.
Let's see how the temps are doing at idle:
Code:
opencl@sagitta:~$ amdconfig --adapter=all --odgt
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Default Adapter - AMD Radeon R9 290 Series
Sensor: Temperature - 31.00 C
Hm, they're idling kind of high. Ambient temp in the office is about 21 C, and usually we see GPUs in here idling under 25C. I wonder what the clocks are set at?
Code:
opencl@sagitta:~$ amdconfig --adapter=all --odgc
Adapter 0 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 1 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 2 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 3 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 4 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 5 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 6 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Adapter 7 - AMD Radeon R9 290 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Performance Level : 0
Current Bus Speed : 8000
Current Bus Lane : 16
GPU load : 0%
Well that's different... Seems the output has changed a tad. Doesn't show the peak clocks or configurable range for some reason. But it now displays the bus speed and bus lane information. Great to know that all eight are running at 16x, I guess? Not sure why I'd give two shits about that. And seriously, why isn't it showing me the peak clocks and the configurable range?
I'm assuming that, like the HD 7990, these cards run at the boost clock of 1000 Mhz out of the box, instead of the rather odd base clock of 727 Mhz. Let's try to manually set the clock at some value in the middle, like 850 Mhz, and see how it does there:
Code:
opencl@sagitta:~/oclHashcat-1.00$ amdconfig --adapter=all --od-enable --odsc=850,1275
AMD Overdrive(TM) enabled
invalid input. Please use the following format
"--od-setclocks=<NewCoreClock>,<NewMemoryClock>,<PowerState>,<Performance Level>"
WTF? Oh, right, the R9 line uses Overdrive6 instead of Overdrive5. Goes hand-in-hand with the latest PowerTune. So, we have to set different clocks for different power states and performance levels? I'm not really sure what the valid values are here for these parameters. Can't seem to find anything in the help, or on Google. Fuck it, let's just start messing with it and see what happens:
Code:
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,0
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,1
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,3
ERROR - invalid performance level! Level should be less than 2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,1,0
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,1,1
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,1,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,2,0
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,2,1
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,2,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,3,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,4,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,5,2
So, nothing happens. No confirmation that the clocks were set like we normally would get. Not getting any errors, except for the error we get if we set the performance level greater than 2 (which makes sense.) Since amdconfig no longer shows the peak clock, I have no idea what the clocks are set to...
Ok, let's shelve that for now and just check out oclHashcat already, so that we can start having fun again. atom was kind enough to push b49 out this morning, which has R9 support. This card can still redeem itself if it performs well.
What the actual fuck. It's taking FOREVER to enqueue the kernels. Like 30-mississippi seconds for each kernel. Four minutes to initialize Hashcat with eight GPUs. I'm not talking about the kernel compile time. This happens even with cached kernels. This is the time it takes to actually enqueue a kernel on each device for execution. WTF. This makes benchmark mode absolutely unusable, because it takes 5 minutes to benchmark each algorithm, and since it suppresses the 'loading kernel' messages, you're left wondering if Hashcat has hung.
Once you do get oclHashcat running, performance is lackluster at best. At "stock clocks," it's 4.8% slower than an HD 7970 at 925 Mhz. And I put that in quotes, because thanks to PowerTune, the clocks are all over the place. And even though I (attempted to) set the maximum clock to 1000 Mhz, the clocks aren't going over 855 Mhz.
click for larger image
One nice thing about this new version of PowerTune, however, is that it automatically downclocks the memory clock for ALU-bound workloads. This is something we used to do manually on the HD 5000 series, then lost the ability to do on the HD 6000 and HD 7000 series. I really like this feature. This is the only positive change I am able to identify so far, and I pray it sticks around.
I still don't understand why this isn't going above 855 Mhz under 100% load on an ALU-bound algorithm. The stock boost clock is 1000 Mhz, and we manually set the clock to 1000 Mhz, yet it won't go anywhere near that. And it's not a heat issue, because the temps look really good right now. We're 30 minutes in, and the temps are all below 65C.
Ok, something weird just happened. The clocks just all spiked up to 940 Mhz, then immediately dropped down to 525 Mhz. They stayed at 525 Mhz for about 10 seconds, enough for the temps to drop down to around 54C, and then went back up to 848 Mhz. How strange is that??
I can't seem to disable this PowerTune crap either. On the HD 7950, it wasn't too bad, because we could use ADL to set PowerTune to "+20%", whatever that means, and it would behave properly. Using the same method on this card yields fuckall. Same thing with using ADL to manually set the power state or performance level. This card just ignores all requests to let the user manage it.
I'm really hoping some of these things are driver issues, even though I'm using the latest beta driver, but I have a feeling these issues are here to stay. Not being able to manually manage the device, the inability to get/set fan speed, disable PowerTune, or set the clocks, is an absolute deal-breaker. And seriously, what the actual fuck is up with it taking 30 full fucking seconds to queue a kernel!?
I'll keep playing with this, and I'll keep an eye out to see if others discover any fixes/workarounds for these issues. But until we get this shit sorted out, I'd highly recommend you steer clear of this new R9 line. Stick to the HD 7970 for now rush out and buy one immediately!
UPDATE Nov 17 2013
I've confirmed that the management issues are due to this being an Overdrive6 card, as opposed to Overdrive5. The tools we currently use to manage our GPUs still rely on Overdrive5. Using ADL SDK 6.0, I wrote a small program called od6config to manage the GPUs using Overdrive6. It works as expected, so the fan speed and clock rate issues are resolved.
The only outstanding issue is the PowerTune crap and kernel load time. Using Overdrive6 I can set the power control threshold to +50 (new PowerTune accepts values from -50 to +50 vs the old version which accepted values from -20 to +20), but Hashcat performance is still erratic. And it still takes 30 seconds to enqueue a kernel.
UPDATE Dec 21 2013
Catalyst 13.12 resolves all remaining issues!!