In the last weeks I spend a lot time adding TrueCrypt to oclHashcat-plus since there have been lots of requests for it and I can see how forensic world will benefit from it.
This week I finally finished the first milestone, the hashing part of TrueCrypt. That is PBKDF2-HMAC-Whirlpool, -RipeMD160 and -SHA512. Unfortunately there is another milestone. To finally crack TrueCrypt, you have to decrypt a 448 byte block of data using AES, Serpent or Twofish or some obscure combinations of them. The important word is AES here. To finish support for efficiently cracking TrueCrypt there is no way around adding AES to GPU. Problem is, I never used AES before and it was totally a new world for me.
How to start digging into AES? I thought the best way is to combine the learning phase with something that uses AES and so add some nice new algorithm. At this time, some guy posted a request on hashcat forum asking for support to crack 1Password Agilebits keychain. Guess what, it uses AES
I did not pay much attention to Agilebits 1Password before. I was happy with keepass and never thought about changing. I quickly downloaded the tool and was impressed how easy it worked. It installed without problems, has nice icons, seemed to be very intuitive and it there is also a browser integration so I thought I should finally move from keepass to 1Password.
However, thats the story so far why I started to add 1Password to oclHashcat-plus. So I started to dig into AES and experimented a bit with it and the 1Password keychain. I finally got it working and then ported it to GPU. So what you see here is worlds first 100% GPU implementation of 1Password keychain.
There are other solutions out but they are using CPU to do the AES part.
As you can see, oclHashcat-plus is running with nearly 3Mhash/s using my two hd6990's which is ~ the speed of two hd7970 (a bit faster).
The reason for the high speed is what I think this might be a design flaw. Here is why:
1Password uses PBKDF2-HMAC-SHA1 to derive a 256 bit key. Actually we are generating a 320 bit key using PBKDF2-HMAC-SHA1 this way but its then truncated to 256 bit. No problem so far, many algorithms like WPA are doing the same thing.
The PBKDF2-HMAC-SHA1 part is what makes the entire calculation slow. For each iteration of PBKDF2-HMAC-SHA1 you call 4 times the SHA1 transform. But this is only to produce a 160 bit key. To produce the required 320 bit key, you call it 8 times. So if you have 1000 iterations, you call it 8000 times. Due to some simple optimizations you can do with HMAC you can precompute ipad and opad, so you end up in 2 + (2 * iterations) = 2002 for 160 bit or 4004 calls to SHA1 transform for 320 bit.
1Password then uses AES in CBC mode to decrypt 1040 byte of data. To be exact, it takes the first 128 bit of the derived key to setup the AES key and takes another 128 bit as an IV for the CBC.
The goal is match the final padding block decrypted 1040 byte of data. If you find the last four 32-bit integers at 0x10101010 the padding is correct and you know your key was correct.
In CBC mode, you take the IV only for the first decryption. You then replace it with the ciphertext of current block (which is then used for the next block).
But if you take a close look now you see these both mechanisms do not match in combination. To find out if the masterkey is correct, all we need is to match the padding, so all we need to satisfy the CBC is the previous 16 byte of data of the 1040 byte block. This 16 byte data is provided in the keychain! In other words, there is no need to calculate the IV at all. Instead of calculating a 256 bit key in the PBKDF2, we just need to calculate 128 bit. Since SHA1 gives us 160 bit, we can save exactly twice the number of calls to sha1 transform. This way I was able the reduce the calls to SHA1 transform from 8000 to 2002
Stay tuned for oclHashcat-plus v0.15!
This week I finally finished the first milestone, the hashing part of TrueCrypt. That is PBKDF2-HMAC-Whirlpool, -RipeMD160 and -SHA512. Unfortunately there is another milestone. To finally crack TrueCrypt, you have to decrypt a 448 byte block of data using AES, Serpent or Twofish or some obscure combinations of them. The important word is AES here. To finish support for efficiently cracking TrueCrypt there is no way around adding AES to GPU. Problem is, I never used AES before and it was totally a new world for me.
How to start digging into AES? I thought the best way is to combine the learning phase with something that uses AES and so add some nice new algorithm. At this time, some guy posted a request on hashcat forum asking for support to crack 1Password Agilebits keychain. Guess what, it uses AES
I did not pay much attention to Agilebits 1Password before. I was happy with keepass and never thought about changing. I quickly downloaded the tool and was impressed how easy it worked. It installed without problems, has nice icons, seemed to be very intuitive and it there is also a browser integration so I thought I should finally move from keepass to 1Password.
However, thats the story so far why I started to add 1Password to oclHashcat-plus. So I started to dig into AES and experimented a bit with it and the 1Password keychain. I finally got it working and then ported it to GPU. So what you see here is worlds first 100% GPU implementation of 1Password keychain.
There are other solutions out but they are using CPU to do the AES part.
Quote:
root@sf:~/oclHashcat# ./oclHashcat-plus64.bin -m 6600 testhash2 -a 3 ?l?l?l?l?l?l?l! -u 1000 -n 128
oclHashcat-plus v0.15 by atom starting...
Hashes: 1 total, 1 unique salts, 1 unique digests
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Workload: 1000 loops, 128 accel
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Cayman, 1024MB, 830Mhz, 24MCU
Device #2: Cayman, 1024MB, 830Mhz, 24MCU
Device #3: Cayman, 1024MB, 830Mhz, 24MCU
Device #4: Cayman, 1024MB, 830Mhz, 24MCU
Device #1: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
Device #2: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
Device #3: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
Device #4: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
testhash2:hashcat!
Session.Name...: oclHashcat-plus
Status.........: Cracked
Input.Mode.....: Mask (?l?l?l?l?l?l?l!)
Hash.Target....: File (testhash2)
Hash.Type......: 1Password Agile Keychain
Time.Started...: Tue Apr 16 15:30:55 2013 (7 mins, 3 secs)
Speed.GPU.#1...: 744.0k/s
Speed.GPU.#2...: 744.0k/s
Speed.GPU.#3...: 743.9k/s
Speed.GPU.#4...: 743.5k/s
Speed.GPU.#*...: 2975.5k/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 1257111552/8031810176 (15.65%)
Rejected.......: 0/1257111552 (0.00%)
HWMon.GPU.#1...: 99% Util, 59c Temp, 29% Fan
HWMon.GPU.#2...: 99% Util, 64c Temp, N/A Fan
HWMon.GPU.#3...: 99% Util, 58c Temp, 29% Fan
HWMon.GPU.#4...: 99% Util, 55c Temp, N/A Fan
Started: Tue Apr 16 15:30:55 2013
Stopped: Tue Apr 16 15:38:00 2013
As you can see, oclHashcat-plus is running with nearly 3Mhash/s using my two hd6990's which is ~ the speed of two hd7970 (a bit faster).
The reason for the high speed is what I think this might be a design flaw. Here is why:
1Password uses PBKDF2-HMAC-SHA1 to derive a 256 bit key. Actually we are generating a 320 bit key using PBKDF2-HMAC-SHA1 this way but its then truncated to 256 bit. No problem so far, many algorithms like WPA are doing the same thing.
The PBKDF2-HMAC-SHA1 part is what makes the entire calculation slow. For each iteration of PBKDF2-HMAC-SHA1 you call 4 times the SHA1 transform. But this is only to produce a 160 bit key. To produce the required 320 bit key, you call it 8 times. So if you have 1000 iterations, you call it 8000 times. Due to some simple optimizations you can do with HMAC you can precompute ipad and opad, so you end up in 2 + (2 * iterations) = 2002 for 160 bit or 4004 calls to SHA1 transform for 320 bit.
1Password then uses AES in CBC mode to decrypt 1040 byte of data. To be exact, it takes the first 128 bit of the derived key to setup the AES key and takes another 128 bit as an IV for the CBC.
The goal is match the final padding block decrypted 1040 byte of data. If you find the last four 32-bit integers at 0x10101010 the padding is correct and you know your key was correct.
In CBC mode, you take the IV only for the first decryption. You then replace it with the ciphertext of current block (which is then used for the next block).
But if you take a close look now you see these both mechanisms do not match in combination. To find out if the masterkey is correct, all we need is to match the padding, so all we need to satisfy the CBC is the previous 16 byte of data of the 1040 byte block. This 16 byte data is provided in the keychain! In other words, there is no need to calculate the IV at all. Instead of calculating a 256 bit key in the PBKDF2, we just need to calculate 128 bit. Since SHA1 gives us 160 bit, we can save exactly twice the number of calls to sha1 transform. This way I was able the reduce the calls to SHA1 transform from 8000 to 2002
Stay tuned for oclHashcat-plus v0.15!