This seems to be good stuff. I will try it for sure.
Very interesting indeed. I use only UTF8 multi-language lists, but a UTF8 bruteforce would be quite a nice feature I guess. Though it would double the keyspace. Imagine training your markov mode in multilanguage
has anyone done a mask on Chinese before ?, there seem to be so many sets in utf that i dont know if its worth it to brute force. 20,976 basic Chinese characters in the range U+4E00 through U+9FEF, not including other exrtensions in utf.
if there is a good wordlist for chinese simplified do let me know, this might be a better approach
The Chinese char sets are pretty complicated. Also stringing random characters in Chinese together does not give you a lot of random "positives" like it would with A-Z in English. Also if your algo is slow, brute-forcing this set is going to take forever.
My recommendation is a good wordlist with rules.
I'll let others comment on here too...