hashcat Forum - Markov Chains and hashcat.hcstat File

Markov chains appear to be a very interesting approach to dramatically pruning key-space. I have been experimenting and have several questions. (If these have been previously answered, I apologize in advance.)

- What was the dictionary used to train the default hashcat.hcstat file? Unless I have specific knowledge of the password domain, how "good" is the default hcstat file?

- I understand that markov-threshold parameter is used to control the size of the key-space and that increasing the parameter increases the key-space and vice-versa. More specifically, can anyone more precisely describe this relationship? How can I better understand how tuning the threshold parameter will impact the size of the key-space?

- What is the default parameter setting? What is its range?

- Not sure if this would be feasible (especially without a sacrifice in performance), but it might be interesting to have the markov-chains/hcstat file evolve dynamically as they learn from successfully cracking passwords. This would be somewhat akin to a fingerprint attack, or perhaps more like a Bayesian spam filter.

- Finally, any other references on using markov chains with hashcat would be appreciated.

Thanks in advance.

i believe rockyou was used for the default hcstat file.

the threshold parameter determines how "deep" the algorithm will go into each markov table; e.g. a threshold of 30 means up to 30 chars are added to the markov table.

the default parameter is 0, which means no keyspace limiting is performed and you end up with an ordered brute force.

nothing preventing you from manually generating a new hcstat file from your pot file between passes.

you can read more in these threads:

https://hashcat.net/forum/thread-1265.html
https://hashcat.net/forum/thread-1285.html
https://hashcat.net/forum/thread-1291.html

Thanks for your response. I am beginning to better understand how markov-chain (Brute-Force++) works. Can I explain how I think it works to see if I have it right:

1) hcstat file is created with statsprocessor by using a corpus of words.

2) For up to 15 positions, statsprocessor goes through the corpus and determines the frequency of each character for each position. [And does it also consider the character in the prior position?]

3) The result is a rank-ordering of character frequency by position.

4) The default hcstat file is hashcat.hcstat, which was created from the rockyou password file.

5) When hashcat uses any mask-type attack, it goes through the key-space in the order determined by hcstat file, i.e., for each position, it starts with the most frequent character.

6) If a threshold parameter is given, then the chain is truncated for each position to that length, i.e., for -t 10, only the 10 most frequent characters by position are considered.

7) Unless a threshold parameter is specified (or it is -t 0), the maximum size of the key-space is the (mask-length) ^ (#-of-characters used by statsprocessor). For example, if -1 ?l?u was used by statsprocessor and the mask-length used by hashcast was 11, then max-key-space = 11^26.

8) If a threshold parameter is specified, then the max-key-space is (mask-length) ^ (t). For the example above, if a -t 8 is specified, then the max-key-space = 11^8.

By no means am I sure all that this is correct, but it is my working model and if I am wrong, I would appreciate the community's feedback.

Thanks.

close,

(11-03-2012, 08:30 PM)chicago-guy Wrote: [ -> ]1) hcstat file is created with statsprocessor by using a corpus of words.

hcstat file is created with hcstatgen from hashcat-utils by using a corpus of words.

(11-03-2012, 08:30 PM)chicago-guy Wrote: [ -> ]2) For up to 15 positions, statsprocessor goes through the corpus and determines the frequency of each character for each position. [And does it also consider the character in the prior position?]

hcstatgen generates full per-position statistics for the corpus. i'm not sure if hcstatgen has a 15-char limitation.

(11-03-2012, 08:30 PM)chicago-guy Wrote: [ -> ]7) Unless a threshold parameter is specified (or it is -t 0), the maximum size of the key-space is the (mask-length) ^ (#-of-characters used by statsprocessor). For example, if -1 ?l?u was used by statsprocessor and the mask-length used by hashcast was 11, then max-key-space = 11^26.

no, it's (# of chars)^(length). so 26^11, not 11^26. but yes, it will try the entire keyspace as an ordered brute force with no truncating.

(11-03-2012, 08:30 PM)chicago-guy Wrote: [ -> ]8) If a threshold parameter is specified, then the max-key-space is (mask-length) ^ (t). For the example above, if a -t 8 is specified, then the max-key-space = 11^8.

no, it's logarithmic. $k = antilog(base $t)($l), where $k = keyspace, $t = threshold, and $l = length.

a threshold of 3 and a length of 6 is antilog(base 3)(6) = 729.
a threshold of 10 and length of 8 is antilog(base 10)(8) = 100000000.

hope that helps.

epixoip,

Thanks very much. I appreciate you taking the time to respond.

Couple of follow ups:

2) When hcstatgen creates the statistics for a corpus, is it a 0-gram or 1-gram analysis, i.e., does it just rank-order characters by position irrespective of what character came before, or does it consider both the position as well as the character that came before it?

7) My mistake, you are right, I meant for $t=0, $k = $c ^ $l
where $k = keyspace, $c = # of chars, $t = threshold, and $l = length

8) But, for $t > 0, isn't it simpler to consider $k = $t ^ $l
For example:
A threshold of 3 and a length of 6 is $k = 3^6 = 729
A threshold of 10 and a length of 8 is $k = 10^8 = 100000000.

Thanks.

(11-04-2012, 08:48 AM)chicago-guy Wrote: [ -> ]8) But, for $t > 0, isn't it simpler to consider $k = $t ^ $l

well sure, i suppose, if you wanted to be all simple about it Wink

(11-04-2012, 08:48 AM)chicago-guy Wrote: [ -> ]2) When hcstatgen creates the statistics for a corpus, is it a 0-gram or 1-gram analysis, i.e., does it just rank-order characters by position irrespective of what character came before, or does it consider both the position as well as the character that came before it?

it consider both the position as well as the character that came before it