06-14-2012, 05:45 PM
Hello everybody. Lets say you "hypothetically" encounter a wordlist that not only is made up of many files, but ("hypothetically") contains a lot of other information about users (lets say it's a "hypothetical" leak). As a good guy, you don't need all that info, actually you don't want that info. It would have been great if the list was already cleaned up and the passwords extracted.
I decided to learn some grep & sed, so this seemed like a great way to get started with those tools. Here's how you could extract all the passwords and clean up the file.
Extract all files in a directory. You will have to identify something unique about the lines containing the passes you want to extract. Lets say it looks something like:
and this goes on forever. Here we'll extract every line containing "pass=" from every file in the folder "extracted_directory" using grep. Note that you need to be one level 'up' from that directory in bash (or your terminal).
Isn't grep awesome Plus you are being legit and not looking at personal stuff. Now we have to remove "pass=" from the beginning of every line from that file. We can use sed:
Let's remove leading and trailing whitespaces:
Now you can remove duplicates and sort the file starting with the most used password:
This results in a list with numbers in front of every password. We want to remove those using sed:
And there you go! A nicely sorted and cleaned list
Alert: Always back-up your lists before doing any of this!
Have a great day.
I decided to learn some grep & sed, so this seemed like a great way to get started with those tools. Here's how you could extract all the passwords and clean up the file.
Extract all files in a directory. You will have to identify something unique about the lines containing the passes you want to extract. Lets say it looks something like:
Code:
username=blabla
pass=extract_me
[email protected]
comment=dont share lists containing user emails!
and this goes on forever. Here we'll extract every line containing "pass=" from every file in the folder "extracted_directory" using grep. Note that you need to be one level 'up' from that directory in bash (or your terminal).
Code:
grep -rhi 'pass=' extracted_directory/ > wordlist_merged.txt
Isn't grep awesome Plus you are being legit and not looking at personal stuff. Now we have to remove "pass=" from the beginning of every line from that file. We can use sed:
Code:
sed 's/pass=//g' wordlist_merged.txt > wordlist_cleaned.txt
Let's remove leading and trailing whitespaces:
Code:
cat wordlist_cleaned.txt | sed 's/^[ ]*//;s/[ ]*$//' > wordlist_whatever.txt
Now you can remove duplicates and sort the file starting with the most used password:
Code:
cat wordlist_whatever.txt | sort | uniq -c | sort -nr > wordlist_sorted.txt
This results in a list with numbers in front of every password. We want to remove those using sed:
Code:
cat wordlist_sorted.txt | sed 's/^[ ]*[1234567890]*[ ]//' > wordlist_FINAL.txt
And there you go! A nicely sorted and cleaned list
Alert: Always back-up your lists before doing any of this!
Have a great day.