09-02-2013, 06:46 PM
Hello Everyone,
I have been slogging through the southwest md5 hash set that they released a few years ago. It's really my first try at this and for those of you have never seen the southwest hash set its over 110 million. Suffice to say in a file that size there are some odd things you stumble across. Email addresses may not be the oddest thing but some of the domains are which has got to generate hashes that your are likely to never figure out.
The bash script itself is probably nothing special but it will save someone some time so you don't have to do it yourself.
All you need is a input file, you can modify the script if you like to change names or add/remove some some of the rules being applied, named domainlist-master.input. The script will generate 4 files, 2 rules and 2 dictionary.
master-lemail.dict - lower case email dictionary
master-uemail.dict - upper case email dictionary
master-lemail.rule - lower case email rule
master-uemail.rule - upper case email rule
CAVEAT EMPTOR - I cobble together code all the time, it doesnt always end up being pretty nor do I guarantee it will work for you
bash script that is normally run on Ubuntu and Scientific Linux
A brief overview of what I have observed, a majority of what you find will be the [email protected] followed by username#domain.com then username~domain.com and so on. If you follow standard analysis you will find that lower case will get a higher percentage # of hits than uppercase.
The real secret sauce in this will be your input file, some of whats in mine come from Hashit/T0XIC/Blandy UK but by far and away most of my work was in .edu and rr.com. I also spent some time scraping leaks in pastebin for various domains. The scraping of pastebin type sites has probably led to more of the unusual entries which turned up as hits in the southwest md5 set.
Based on the response I will see about posting either the rule files themselves or the input file or even both.
Snapshot of compressed information since Im not sure what the limit will end up being for attaching a files.
-rw-r--r--. 1 root root 24K Sep 2 12:34 domainlist-master.7z
-rw-r--r--. 1 root root 308K Sep 2 12:35 emailrules-dictionary.7z
Peace
I have been slogging through the southwest md5 hash set that they released a few years ago. It's really my first try at this and for those of you have never seen the southwest hash set its over 110 million. Suffice to say in a file that size there are some odd things you stumble across. Email addresses may not be the oddest thing but some of the domains are which has got to generate hashes that your are likely to never figure out.
The bash script itself is probably nothing special but it will save someone some time so you don't have to do it yourself.
All you need is a input file, you can modify the script if you like to change names or add/remove some some of the rules being applied, named domainlist-master.input. The script will generate 4 files, 2 rules and 2 dictionary.
master-lemail.dict - lower case email dictionary
master-uemail.dict - upper case email dictionary
master-lemail.rule - lower case email rule
master-uemail.rule - upper case email rule
CAVEAT EMPTOR - I cobble together code all the time, it doesnt always end up being pretty nor do I guarantee it will work for you
bash script that is normally run on Ubuntu and Scientific Linux
Code:
# log/dictionary for things in lower case
lrulefile="master-lemail.rule"
ldictfile="master-lemail.dict"
# log/dictionary for things in upper case
urulefile="master-uemail.rule"
udictfile="master-uemail.dict"
function printdict () {
mystring=$1
for pr in "@" "#" "~"
do
echo "\$${pr}${mystring}" | tee -a $udictfile
echo "\$${pr}${mystring}" | tee -a $ldictfile
done
}
function printrule () {
mystring=$1
myustring=$(echo "${mystring}" | tr '[a-z]' '[A-Z]')
for pr in "@" "#" "~"
do
for x in ":" "d" "l" "u" "c" "d" "f"
do
echo "${x}\$${pr}${mystring}" | tee -a $lrulefile
echo "${x}\$${pr}${myustring}" | tee -a $urulefile
done
done
}
for foo in $(cat domainlist-master.input)
do
mystring2="${foo}"
mystring=""
for (( i=0; i<${#foo}; i++ )); do
myc=$(echo ${foo:$i:1})
mystring="${mystring}\$${myc}"
done
printdict $mystring
printrule $mystring
done
A brief overview of what I have observed, a majority of what you find will be the [email protected] followed by username#domain.com then username~domain.com and so on. If you follow standard analysis you will find that lower case will get a higher percentage # of hits than uppercase.
The real secret sauce in this will be your input file, some of whats in mine come from Hashit/T0XIC/Blandy UK but by far and away most of my work was in .edu and rr.com. I also spent some time scraping leaks in pastebin for various domains. The scraping of pastebin type sites has probably led to more of the unusual entries which turned up as hits in the southwest md5 set.
Based on the response I will see about posting either the rule files themselves or the input file or even both.
Snapshot of compressed information since Im not sure what the limit will end up being for attaching a files.
-rw-r--r--. 1 root root 24K Sep 2 12:34 domainlist-master.7z
-rw-r--r--. 1 root root 308K Sep 2 12:35 emailrules-dictionary.7z
Peace