thanks undeath! especially for pointing me towards "merkle-damgard construction", now it's much more clear why sha2 (and below) aren't made for gpu/parallel execution
I was hoping it would be a matter of splitting up files, calculating the checksum of the parts, and summing it up after, while remaining deterministic. The splitting up being the part where it started to scale and be better the bigger the files were.
Also, my plan of just using a black-box solution without having to understand ANY of the innerts failed. Oh well
b2sum (blake2) seems to be one of the few tools that support multithreading.
MD6 with the Merkle tree seems well suited for parallel calculations. Someone published work on that, maybe I can get a hand on it:
https://www.researchgate.net/publication...MD6_on_GPU
I am aware this is offtopic in regards to hashcat, but if anyone stumbles open this thread looking for something similar, they might not mind finding the additional information.
e:
undit my edit, something seems off
e3:
redoing my first edit. Turns out the testfile was on a slow harddisk, but results on a pcie-ssd were identical. Turns out the file is sparse. Only ~3MB worth of data in a 15GB file, so like no IO overhead at all.
Comparing sha1sum with sha256sum or sha512sum, I'd expect different results if IO bottlenecked:
same file in all tests, filesize 15G
time md5sum
real 0m25.519s
user 0m21.643s
sys 0m3.353s
time sha1sum
real 0m18.815s
user 0m15.497s
sys 0m3.300s
time sha256sum
real 0m37.194s
user 0m33.767s
sys 0m3.420s
time sha512sum
real 0m26.384s
user 0m23.020s
sys 0m3.180s
results remain in the same ballpark on multiple testruns. sha512sum outperforming sha256sum is a 64bit thingy
A test on a "real" file with 7G on a sata 7200rpm drive resulted in all of them taking 53+ seconds, no matter which tool was used. Here IO definitely was the bottleneck.
on a pcie-ssd drive the same file resulted between 10 and 120 seconds, depending on which tool was used (indicating IO was not the bottleneck)