Suggestion: Idea to improve scanning time when finding exact duplicates

English support for the software AllDup
Post Reply
JR147
Posts: 14
Joined: 24 May 2019, 06:58

Suggestion: Idea to improve scanning time when finding exact duplicates

Post by JR147 »

I ran AllDup looking for exact duplicates in a source folder containing around 70,000 mostly small files, each sized from 5-10 KB on average. I did a "File content" search and it completed in around 40 minutes using the MD5 method. Then as a test I ran a "File size" search and it completed very fast (3 seconds). When I tried combining these two methods into one search, however, it still took around 40 minutes, so I'm guessing that MD5 checksums are still being created for all of the files in this case.

My idea is this: Since a File content search is much slower than a File size search, if both "File size" and "File content" are selected as the search methods, for source folders that may not contain a lot of duplicates, I think it could significantly reduce the scanning time of the File content search portion if checksums were only created for the files whose file sizes match, because if the sizes are different we know they aren't exact duplicates and therefore can skip the comparing of those files. In other words, this would allow for a fast fail (check the sizes first, and if the sizes are different, you know that the files are different). So in this case AllDup would only create checksums among the files whose sizes match, i.e. proceed with the File content search only among those files.

Note: I'm not sure if this improvement has already been made in the newer AllDup versions. I'm on AllDup version 4.4.44, running Windows 10 64-bit (I had previously upgraded to AllDup v4.4.47, but I had an issue with hard links not being created in that version so I downgraded back to version 4.4.44.)
Administrator
Site Admin
Posts: 4046
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: Suggestion: Idea to improve scanning time when finding exact duplicates

Post by Administrator »

AllDup only compare files with the same size when using the search criteria "File content".

Checksums are only created when comparing files with the same size.
e.g. only 2 checksums will be created if AllDup searches 70k files and finds only 2 files with the same size.

Please read the notes about creating Hard Links at the manual.
There was a major change since v4.4.46 for creating hard links.

btw, the latest version of AllDup is 4.4.52.
Post Reply