06-29-2016, 05:40 PM
hashcat v3.00 release notes
There are a multitude of changes with this release; I'll try to describe them all at the level of detail they deserve.
You can download here: https://hashcat.net/hashcat/
Fusion of hashcat and oclHashcat (CPU + GPU)
This was a big milestone which I've planned for a very long time. If you know me, or if you're idling on our IRC channel (#hashcat on Freenode), you probably knew that this was on my agenda for many years. To those of you who don't know what I'm talking about:
There are two different versions of hashcat.
- One that was utilizing your CPU (hashcat)
- One that was utilizing your GPU (oclHashcat)
This fusion became possible because of the following preparations:
- Going Open Source, which enabled the use of the JIT compiler.
- Provide the OpenCL kernels as sources instead of binaries for every hardware and algorithm.
- Full OpenCL device type integration (I'll explain later in detail).
- A complete rewrite of the SIMD handling from scratch.
Here are a few of the advantages of having just one fusioned tool:
- Supported hashes are now in sync. For example, oclHashcat had support to crack TrueCrypt container while hashcat did not.
- Supported options are now in sync. For example, hashcat had support for --stdout while oclHashcat did not.
- It's no longer required to know all of the specific limits both programs have. For example, the maximum supported password- and salt-length.
- Tutorials and Videos you find in the wild will be less confusing. Some explained hashcat while others explained oclHashcat. This was often very frustrating for new users who may have been following along with a tutorial for the wrong application.
- Developers no longer need to back-port one hash-mode from hashcat to oclHashcat or vice versa. This means no more waiting for algorithms to appear in one version or another, you will be able to immediately use the algorithms on both CPU and/or GPU.
- Package maintainers can also integrate much more easily hashcat into a distribution package.
- A single tool means less dependencies. This could mean that you will see more distribution-specific packages in the near future.
- Last but not least, it's simply easier and more compact to say, and everyone knows what you're talking about when you say "hashcat".
Newly added hash-modes
- ArubaOS
- Android FDE (Samsung DEK)
- RAR5
- Kerberos 5 TGS-REP etype 23
- AxCrypt
- AxCrypt in memory SHA1
- Keepass 1 (AES/Twofish) and Keepass 2 (AES)
- PeopleSoft PS_TOKEN
- WinZip
- VeraCrypt
- Windows 8+ phone PIN/Password
Support to utilize multiple different OpenCL platforms in parallel
Here's a list of OpenCL runtimes that are supported and have been tested by either myself, or some of the hashcat beta testers:
- AMD OpenCL runtime
- Apple OpenCL runtime
- NVidia OpenCL runtime (replaces CUDA)
- Mesa (Gallium) OpenCL runtime
- Pocl OpenCL runtime
- Intel (CPU, GPU and Accelerator) OpenCL runtime
Another addition to the support of mixed OpenCL platforms is the ability to run them in parallel and within the same hashcat session. Yes, that actually means you can put both an AMD and an NVidia GPU into your system and make use of both. There still may be some work needed to properly utilize multiple sets of drivers. More information may be provided on the wiki later.
In case you do not want a specific OpenCL runtime to be used, you can select specific platforms to be used with the new --opencl-device-platforms command line option.
Support to utilize OpenCL device types other than GPU
When it comes to compatibility, oclHashcat was limited to just two different vendors: AMD and NVidia. They provide the fastest GPUs by far, and it was therefore important to support them, but there are many other options available that aren't even building a GPU.
As a result, hashcat will support the following device types:
- GPU
- CPU
- APU
- DSP
- FPGA
- Coprocessor
- Anything else which comes with an OpenCL runtime
Support to utilize multiple different OpenCL device types in parallel
When I've redesigned the core that handles the workload distribution to multiple different GPUs in the same system, which oclHashcat v2.01 already supported. I thought it would be nice to not just support for GPUs of different kinds and speed but also support different device types. What I'm talking about is running a GPU and CPU (and even FPGA) all in parallel and within the same hashcat session.
Beware! This is not always a clever thing to do. For example with the OpenCL runtime of NVidia, they still have a 5-year-old-known-bug which creates 100% CPU load on a single core per NVidia GPU (NVidia's OpenCL busy-wait). If you're using oclHashcat for quite a while you may remember the same bug happened to AMD years ago.
Basically, what NVidia is missing here is that they use "spinning" instead of "yielding". Their goal was to increase the performance but in our case there's actually no gain from having a CPU burning loop. The hashcat kernels run for ~100ms and that's quite a long time for an OpenCL kernel. At such a scale, "spinning" creates only disadvantages and there's no way to turn it off (Only CUDA supports that).
But why is this a problem? If the OpenCL runtime spins on a core to find out if a GPU kernel is finished it creates 100% CPU load. Now imagine you have another OpenCL device, e.g. your CPU, creating also 100% CPU load, it will cause problems even if it's legitimate to do that here. The GPU's CPU-burning thread will slow down by 50%, and you end up with a slower GPU rate just by enabling your CPU too --opencl-device-type 1. For AMD GPU that's not the case (they fixed that bug years ago.)
To help mitigate this issue, I've implemented the following behavior:
- Hashcat will try to workaround the problem by sleeping for some precalculated time after the kernel was queued and flushed. This will decrease the CPU load down to less than 10% with almost no impact on cracking performance.
- By default, if hashcat detects both CPU and GPU OpenCL devices in your system, the CPU will be disabled. If you really want to run them both in parallel, you can still set the option --opencl-device-types to 1,2 to utilize both device types, CPU and GPU.
- Execute kernels without 100% CPU busy-wait
- Increased CPU usage with last drivers starting from 270.xx
Added makefile native compilation targets; Adds GPU support for OSX and *BSD
To make it even easier for everyone to compile hashcat from sources (which hopefully also increases the number of commits from the community), I've decide to add a target for a native build. That should help to compile hashcat on Linux, OSX, *BSD and some other exotic operating systems.
But it turned out that I could not simply add a native compilation target to the Makefile without doing some preparations.
- For example, on Linux the first step was to achieve Linux FHS compatibility.
- Another preparation would be having a hashcat binary (without a .bin extension) somewhere located in `/usr/local/bin`.
- Ideally a Makefile which provides a `PREFIX` and `DESTDIR` variables to modify that and finally to have our files that need to be accessible by all users somewhere at `/usr/share/hashcat` or so.
In summary, the following changes were mandatory:
- Added a native Makefile target
- Added an install and uninstall Makefile target
- Added true Linux FHS compatibility
- Added separate Install-, Profile- and Session-folder
Here's the full discussion:
Fewer Dependencies
Here's another piece of great news: There are no longer dependencies on AMD-APP-SDK, AMD-ADL, NV-CUDA-SDK, NV-Drivers, NV-NVML or NV-NVAPI.
Our first OSS version of oclHashcat just had too much dependencies; and they were all required to compile oclHashcat. We tried to provide a script to handle these for you (deps.sh), but you still had to download the archives yourself. That wasn't very comfortable and surely held back people from compiling oclHashcat, leaving them to use the binary version instead.
Having dependencies in general is not always bad, but it creates some overhead for both developers and package maintainers. Regular users usually do not notice this. Having no dependencies usually result in less features, so how did we manage to get rid of the dependencies while maintaining the features they provided at the same time?
The answer is simple. For both Linux and Windows we simply used their dynamic library loading capability instead of linking the libraries at compile time. So don't get me wrong here, we still use those libraries, we just load them at runtime.
This provides a lot of advantages for both users and developers, such as:
- The library `libOpenCL.so` on Linux was load as-is. This was a problem when a user had a bad OpenCL installation that created `libOpenCL.so.1`. Unless the user fixed the filename or created a link the binary would be unable to locate the library.
- The Windows binary becomes smaller since it does not need to ship the code, it reuses the code from your installed library.
- For developers, there is no longer a need to have a 32 bit and a 64 bit library object. That was always a problem with NVML provided by the Nvidia drivers; we had to manually symlink them to get them working.
- The installed library does not need to be of the same version as the one used by the person who compiled the hashcat binary. For example, if you remember this error you know what I'm talking about:
Quote:./oclHashcat64.bin: /usr/lib/x86_64-linux-gnu/libOpenCL.so.1: version 'OPENCL_2.0' not found (required by ./oclHashcat64.bin)
- Package maintainers should now have a really easy job. No more (compile-time) dependencies means way less work.
Added auto-tuning engine and user-configurable tuning database
The auto-tuning engine is exactly what it says it is, it automatically tunes the -n and -u parameters (aka workload) to a value which gives you the best performance to reach a specific kernel runtime.
To understand what that means you need to understand that the kernel runtime influences the desktop response time. If you don't care about desktop lags (because you have a dedicated cracking machine) you simply set -w 3 and everything is fine. In that case, hashcat will optimize kernel runtime to a very efficient one. Efficient in terms of power consumption/performance. There's indeed a way for us to control how much power your GPU consumes while cracking. It's like a car. If you want to drive it with 220 km/h it consumes twice as much gas as if you run it with 200km/h. Well not exactly but you get the idea.
Having said that, the best way to control your workload is all about -w now. There's still -n and -u, but this is mostly for development and debugging use. There's a total of 4 different workload settings, here's a snippet of --help:
Code:
| # | Performance | Runtime | Power Consumption | Desktop Impact |
|---|-------------|---------|-------------------|----------------|
| 1 | Low | 2 ms | Low | Minimal |
| 2 | Default | 12 ms | Economic | Noticeable |
| 3 | High | 96 ms | High | Unresponsive |
| 4 | Nightmare | 480 ms | Insane | Headless |
The -w setting will be default to number "2". But also number "1" could be interesting, in case you're watching an HD video, or if you're playing a game.
OK, so there's an auto-tuning engine that controls -n and -u, so what is that tuning database used for? If, for whatever reason, you do not like the setting the auto-tuning engine has calculated for you, you can force a specific -n and -u setting to be used. This also decreases the startup time a bit, because hashcat does not need to test runtimes with setting N and U.
But there's another setting to be controlled from within the database. It's the vector width, which is used within the OpenCL kernel. But note, not all kernel support a vector width greater than 1. The vector width can also be controlled with the new command line parameter --opencl-vector-width.
At this point I don't want to get too much into the details of the new auto-tuning engine, especially the database (hashcat.hctune). There's a lot of more information needed for you to make your own database.
Therefore, please read this dedicated thread: The Autotune Engine
Extended Hardware-Management support
With the increased interest in power consumption per GPU, vendors started to add complicated clock speed changes from inside the driver and the GPU BIOS. The problem with that is, some of the settings are related to the workload, some to the power consumption, and some to temperature. This can increase the complexity of troubleshooting hashcat issues (for example, if you are trying to determine why cracking performance has rather suddenly and dramatically dropped.) To prevent users sending in invalid "bug" reports related to performance, I decided to add the clock and memory rate of the current GPU to the status display. The user will notice the clocks jumping around as the speeds jump around and hopefully realize that there's something wrong with their setup.
Most of the time it's a cooling issue. In the past oclHashcat already showed the temperature in the status display, but the problem is that current drivers may try to hold a target temperature by either increasing the fan speed or by decreasing the clock rate. The latter case will lead the user to the false assumption their setup is well cooled; the speed dropped over time but since the temperature was not going up, they did not make the link that the clocks have been decreased.
Switching from NVAPI to NVML will be a very important change for setups using NVidia GPU and Windows. NVidia is actually distributing a 64 bit bit .dll for NVML with their latest driver version and hashcat will find the .dll by checking the Windows registry. If it does not find it, you can also simply copy the nvml.dll into hashcat installation folder (though that should not be necessary). There's another reason why we've switched to NVML. AMD users already had a workaround to disable the GPU bios trying to optimize power consumption. They simply switched on the flag --powertune-enable which sets the maximum power the GPU can consume to 120%, the same way as you can do it by using e.g. MSI Afterburner. With hashcat, and because we're using NVML now, this option is also available to NVidia users.
There is still a sole exception of the nvapi, i.e. the usage of NVAPI calls in `ext_nvapi.c`: hashcat needs this NVAPI dependency to recognize the core clock throttling in case temperatures exceed the threshold and become too high/hot. This is a configurable setting in Windows (for example, this may be modified with Afterburner.)
Added the option to quit at next restore checkpoint
One important user interface change that you might immediately recognize is the new checkpoint-stop feature. This new feature is visible at the status prompt, which now has a sixth option labeled
Quote:[c]heckpoint
in addition to the previous:
Quote:[s]tatus, [p]ause, [r]esume, [ b]ypass and [q]uit
The goal of this new feature is to tell hashcat that it should delay stopping until it reaches the next restore point. Hitting the "q" key on your keyboard and "quitting" is not always the best choice; doing so will force hashcat to stop immediately, wherever the workload is. Since the restore option --restore works on batched key space segments, this could lead to re-calculating work you have already done or even missing candidates alltogether when trying to restore your session.
Stopping at checkpoints will make sure a particular workload segment is completed and a checkpoint is reached before terminating. This means no duplicate work or lost candidates when restoring sessions. We could say this new feature is an intelligent version of quitting hashcat.
You will notice that the "Status" line in the status display will change to Running (stop at checkpoint) whenever you enable this new feature.
However, if you have hit stop by mistake, or first decided to stop at the next checkpoint but then changed your mind, you can cancel the checkpoint stop just by hitting the `c` key on your keyboard again. This will change from Running (stop at checkpoint) back to Running to let you know the checkpoint stop has been aborted.
Please note that quitting hashcat with the checkpoint-stop prompt option might take a little bit longer compared to stopping it with the "q" key. The total time depends on many factors, including the selected workload profile -w, the type of hashes you run -m, the total number of salts, etc.
Performance
In addition to all the improvements and newly added features, I'm always keen to optimize the performance. I spend alot of time to increase the performance in hashcat v3.00. Some algorithms increased by over 100%.
The changes in performance from oclHashcat v2.01 to hashcat v3.00 largely depend on the combination of hash-mode and GPU. Here's a Spreadsheet that shows the changes in a more easy-to-read format, separated by hash-mode and GPU:
Note that with older NVidia GPUs, and by old I mean before maxwell chipsets, there is a drop in performance. That is simply because NVidia's runtime isn't/wasn't optimized for OpenCL. They were made at a time when NVidia focused purely on CUDA and it seems they are not putting any effort in updating for older cards. In case you buy a NVidia GPU next time, just make sure it's of Shader Model 5.0 or higher.
Also note that the benchmarks for hashcat v3.00 were created using the option --machine-readable which now can be used in combination with --benchmark. This makes comparisons of the performance to older versions much easier. Also the time it takes to complete a full benchmark was reduced significantly. While it was around 45 minutes on hashcat v2.01, it's now just 15 minutes with hashcat v3.00 and that's including the new hash-modes, which were not available in v2.01.
I did not compare CPU performance of hashcat v2.01 to hashcat v3.00 but you can be sure it is either faster or at least even. Just one example, NTLM performance on my i7-6700 CPU increased from 95.64MH/s to 1046.1 MH/s, which is by the way new world record for cracking NTLM on CPU.
... and there still more, ... really!
If you want to know about all the changes please take a look at the redesigned `docs/changes.txt` file. It includes all the fixed bugs and other changes, mostly interesting for developers, package maintainer and hashcat professionals.
Here's a small preview:
- Added support for --gpu-temp-retain for NVidia GPU, both Linux and Windows
- Added option --stdout to print candidates instead of trying to crack a hash
- Added human-readable error message for the OpenCL error codes
- Redesigned `docs/changes.txt` layout
- Redesigned --help menu layout
- Added -cl-std=CL1.1 to all kernel build options
- ...
- atom