Tuesday, February 27, 2018

Website temporarily down

www.yosefkerzner.com is temporarily offline due to being used in a DNS tunneling test. The test was successful and I will eventually post about it.

As such, I can't share a script written today for removing duplicates from wordlists for hashcat.

1.  awk '!seen[$0]++' filename is nice and all, but it uses all the RAM available and slows down considerably once a file is more than 600 MB, and basically never finishes after that point.

2. sort -u -o uses 60% of available RAM and all CPUs, and takes possibly three times as long as option 1.

I wrote a script to split the input file into 160-megabyte pieces, which seems like a good best-speed size and limits RAM usage to under 3 GB per sorting. Option 1 command is then run on each piece, then they get concatenated back into a single large file.

First benchmarks (i5 2310):

On a 3.5 GB wordlist (which was already deduplicated mind you), the script took 7.15 minutes to complete.

On an 8.3 GB wordlist (mostly deduplicated), the script took 19.5 minutes to complete.

Deduping rockyou.txt takes 18 seconds.

Update: Site's back up, here's the script.

No comments: