here is my python script implementaion of what you described.
./charanal.py wordlist > outfile
you will get a lot of output for the character frequency of all positions and for each possible position in a word,
if you make any changes please share them.
(08-21-2014, 09:37 PM)Kgx Pnqvhm Wrote: [ -> ]Meanwhile, have you seen the series "Quantifying and Ranking Wordlist Effectiveness Part 1: Methodology" at
https://benjaminellett.com/quantifying-an...thodology/
Hey, this is Ben here, thanks for the plug to my site, I'm glad other people are finding it useful; regarding the thread topic, for wordlist analysis I used
dictclean to remove any invalid UTF-8 characters and then
Pipal to do the actual analysis which includes lengths, base words and hashcat mask generation.
I haven't tried out
PACK but unfortunately I found
Passpal would choke on the bigger files (it seemed to struggle with ~400MB wordlists) whereas Pipal handled them fine.
I've recently posted all of the pipal analyses generated along with graphs showing password length distribution in part 3
here.
to thebluetoob (Ben) - Thanks for your work.
I have now added password length distribution data to my script
Tested with a 3.8 GB wordlist which it processed given enough time.
./charanal.py wordlist > outfile