09-28-2017, 11:04 AM
Would anyone know of a good source for hex formatted, non-English UTF-8 dictionaries for use with the --hex-wordlist option that hashcat provides?
Short of that, has anyone come up with rules for substituting single-byte characters with two-byte characters that some foreign languages use? German would be one example.
These can be built with existing dictionaries that use single byte characters.
Take existing single byte foreign language dictionaries, convert them to hex, replace the applicable hex codes with their UTF-8 equivalent and have more effective foreign language (non-English) dictionaries.
For example, the German letter, Ü would be C3 9C in hex.
In this example, a script that does search and replace for hex codes 75 and 55 (u and U respectively) would replace those hex numbers with C3BC and C39C (respectively).
For this example of the letter "u", single byte hex is converted to its corresponding UTF-8 2-byte hex representation.
u = 75
U = 55
becomes
ü = C3BC
Ü = C39C
Also, this is straying off the topic of this post, but something similar could also be built for doing this type of substitution of ASCII characters with their single-byte LATIN1 equivalent. This would be for non-English dictionaries that are using non-English words with only ASCII characters.
There are a lot of non-English dictionaries publicly available, but so many of them are using only ASCII characters for their words. I see this the most with Spanish dictionaries.
Footnote on this character, Ü:
https://en.wikipedia.org/wiki/%C3%9C
This character is actually common to several non-English languages, not just German. Stolen from the wikipedia page: "Hungarian, Turkish, Uyghur Latin, Estonian, Azeri, Turkmen, Crimean Tatar, Kazakh Latin and Tatar Latin alphabets"
Short of that, has anyone come up with rules for substituting single-byte characters with two-byte characters that some foreign languages use? German would be one example.
These can be built with existing dictionaries that use single byte characters.
Take existing single byte foreign language dictionaries, convert them to hex, replace the applicable hex codes with their UTF-8 equivalent and have more effective foreign language (non-English) dictionaries.
For example, the German letter, Ü would be C3 9C in hex.
In this example, a script that does search and replace for hex codes 75 and 55 (u and U respectively) would replace those hex numbers with C3BC and C39C (respectively).
For this example of the letter "u", single byte hex is converted to its corresponding UTF-8 2-byte hex representation.
u = 75
U = 55
becomes
ü = C3BC
Ü = C39C
Also, this is straying off the topic of this post, but something similar could also be built for doing this type of substitution of ASCII characters with their single-byte LATIN1 equivalent. This would be for non-English dictionaries that are using non-English words with only ASCII characters.
There are a lot of non-English dictionaries publicly available, but so many of them are using only ASCII characters for their words. I see this the most with Spanish dictionaries.
Footnote on this character, Ü:
https://en.wikipedia.org/wiki/%C3%9C
This character is actually common to several non-English languages, not just German. Stolen from the wikipedia page: "Hungarian, Turkish, Uyghur Latin, Estonian, Azeri, Turkmen, Crimean Tatar, Kazakh Latin and Tatar Latin alphabets"