Now that the hashcats can work on file passwords, at the users requests, a question would be if anyone has studied the difference between login passwords and file passwords.
E.g., someone other than the user dictates the rules about what constitutes a valid login password, while the user gets to choose whatever document or archive password they want.
So login passwords are going to have a pattern influenced by constraints that don't apply to file passwords, so they are liable to have different characteristics.
Has this been studied or discussed anywhere?
And would it have any affect on choosing the types of attacks used?
I just started a similar kind of thread but in the context of WPA. But there is some crossover I believe, because with WPA the only constraint is the 8 chars, but there isn't rules for mixed case or numbers/symbols. I have very limited samples of both in terms of WPA and files to base my theory on, so I could be way off base, but I feel like without forced policy there is a better chance at exclusive use of base words, usually two or three words or a short phrase of smaller words.
The flipside argument is that the more people are trained to make their pw's on their web services and such more complex that habit spills over. But using the wordlists generated from leaked website dumps I don't get very good results from WPA compared to using scratch build lists made up of base words. I don't have great results either way, but if people were using similar passwords for WPA (and likewise files) you would expect to see more crossover success. I believe since WPA and often pw protected files are designed to be shared people don't use the same passwords, and they often make them easily verbally repeated, like a phrase or a couple words, perhaps with some numbers appended more often than a mixed case, number and symbol combo.
Again, my samples are too small to tell if these hunches hold water, so it'd be nice to hear from those who have been researching this longer with more data to point to any particular conclusions or to share lists or rules that can be used to test against our hashes to see if they show similar results.
For files specifically I can think of a few strategies I would try first, especially depending on the content of the file and/or origin. But I would use the wiki wordlist, and then perhaps try to scrape linkedin or some type of company name list.
I would test against base words, phrases (spaces removed), numbers appended, such common stuff like "123" or dates like year, (like YY or YYYY or perhaps MMYY/YY or even MMDDYY/YY).
It's one of those things where you need to get enough PW protected files in a format that is HC friendly (high hash per second) to really get a good enough sample to dial in rules. This is the problem I'm having with trying to research WPA tendancies is that I don't have enough hashes to test and even getting almost 300k/s, which is great for WPA on a single rig, isn't anything like MD5 which is somewhere around 80X faster is memory serves.