Please note, this is a STATIC archive of website hashcat.net from 08 Oct 2020, cach3.com does not collect or store any user information, there is no "phishing" involved.

hashcat Forum

Full Version: Sorting utf-8 wordlists
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi!

On my Ubuntu VPS server, the locale is set to en_US.utf8, but when I use sort command on a custom language utf-8 character wordlist, all speacial characters like č get converted to c. It looks like a collation issue. What settings do I have to apply for this to work? Do I have to install and change my locale? That would be really bad. I tried to find a solution on Google but without success.

Thanks!
how does the sort command you run look like?
(06-12-2012, 01:16 AM)undeath Wrote: [ -> ]how does the sort command you run look like?

It is the standard unix sort.

I run it like this:

cat wordlist.txt | sort -u > sorted.txt
cannot confirm.

Code:
[ undeath@p4home: /tmp ] % ~> cat test
öasdf
Ä‘hg4sb5t56
čwegver
Àsdrvgßsd
Ä‘hg4sb5t56
è weü46zgbe4z
[ undeath@p4home: /tmp ] % ~> sort -u test
Àsdrvgßsd
čwegver
Ä‘hg4sb5t56
è weü46zgbe4z
öasdf
[ undeath@p4home: /tmp ] % ~> echo $LANG,$LC_ALL
de_DE.UTF-8,de_DE.UTF-8
Strange, I guess it's all about locale... I will post again if I encounter such problems.
did you find a solution to this?

can you extract 10 example lines from your wordlist (which contain accents, umlauts, and other utf-8 unicode characters), run the commands as undeath has done and post the output here?

then, we can test the same on our *nix systems Smile
please do not revive dead threads.
Just wanted to know the solution and have some discussion around it.

Point noted, thank you.