Slow with Compare Between Different Folders

English support for the software AllDup
Post Reply
therube
Posts: 322
Joined: 07 Nov 2012, 00:28

Slow with Compare Between Different Folders

Post by therube »

Slow with Compare Between Different Folders


AllDup 4.5.26
if you include a $recycle.bin in your search
- regardless of Folder Exclusion
T:\$RECYCLE.BIN\S-1-5-21-3350321682-930001023-2409569994-1000

search is VERY slow
- regardless of Search Method
Ignore -^.

Ah, recycle.bin is not the issue.
So lets start again...


Search is VERY slow:

If:
- Compare only files between different source folders
And:
- Size (or Content, perhaps, not sure?)
And:
- directory includes a large number of files (perhaps a large number of same sized files)


Other Comparison Methods are quick.
Other Search Methods are quick (again, unsure of Content).
Options & Filters are not an issue.

Code: Select all

	01/08/2023 02:10:50 PM - Search method: File size
	01/08/2023 02:10:50 PM - 1.Source folder: T:\$RECYCLE.BIN (46,441 files)
	01/08/2023 02:10:50 PM - 2.Source folder: T:\MUSIC.XXX2   (26 files)
	01/08/2023 02:10:50 PM - 3.Source folder: W:\T\X2         (53 files)
	01/08/2023 02:10:50 PM - Option: Compare only files between different source folders
	01/08/2023 02:10:50 PM - Determine file count of all source folders...
	01/08/2023 02:10:50 PM - File count: 46,467
	01/08/2023 02:10:50 PM - Scan: T:\$RECYCLE.BIN
(discrepancy in counts due to methods used to determine, but, they're close)


Issue here (aside from just being slow), is that I've already separated size dups [& size dups that were
in different Original Locations, at that] (as I'm almost certain that they are in fact Content dups too)
into /music.xxx2/ & /x2/ & I don't need the hashes of recycle.bin - except for those that are dup'd in
other directories, which would be MUCH quicker then hashing all 46K in recycle.bin (& purposeless).
therube
Posts: 322
Joined: 07 Nov 2012, 00:28

Re: Slow with Compare Between Different Folders

Post by therube »

File Size
does not allow a file size range, i.e., Filter (to include/exclude)

Name
does not allow... OK, you can probably do it with a File Filter - in the case of recycle.bin
$i*.* are going to be 544 bytes, & all of those can be excluded (as they're not particularly
meaningful...)

Ah! That greatly helps (cutting 23,191 files out of the equation).
Note the (number of) 'file comparison performed' (compared to Compare only between different source folders).


so...

with that, what then happens... when i go back to my original 3 directories & Compare only between them...

fast.
exactly what i want.


BUT,
that leaves the question as to why the same settings, files, directories - before add the
File Exclude Filter was so slow?

recycle.bin contains (theoretically) a 544-byte file for every "file" in recycle.bin
(source directory "attribute" if you will, aka, "Original location")
& as there were no 544-byte files in any directory other then recycle.bin itself
why if 'Compare only files between different source folders' were all these additional
'file comparison performed' & taking FAR LONGER in doing so?

(my other large 100K file directory also contains a huge percentage of same sized files
& in this 100K there would be a huge percentage of Content dups also. in my 3 directory
scenario, while a huge number of files, only a relatively tiny number of Content dups)

(in any case, shouldn't have to go through all of that to get, quick, meaningful results ;-))

(I forgot the screenshots. I'll see if I can't get them here tomorrow or so.)
therube
Posts: 322
Joined: 07 Nov 2012, 00:28

Re: Slow with Compare Between Different Folders

Post by therube »

Note the 'file comparison performed'.
And in the first shot, seemingly that is (ongoing, as I cancelled it &) before anything else were to happen.

Second shot, I added an Exclude filter of $i*.* (which for $recycle.bin, excludes those [irrelevant] 544-byte files).
While that is a lot less files, there are still a lot of files, overall.
And in any case, 544 bytes would not match anything outside of $recycle.bin itself, & as the compare setting was (originally) 'Compare only files between different source folders', that in & of itself should have excluded those 544-byte files, I would think.
.
AllDup Extremely Slow when recyle.bin is included.png
AllDup Extremely FASTER with Exclude Filter.png
therube
Posts: 322
Joined: 07 Nov 2012, 00:28

Re: Slow with Compare Between Different Folders

Post by therube »

because the search through large lists makes everything slower with every new item added...
viewtopic.php?p=9985#p9985

Might something similar be going on here?
I would think 46K to be a large list (but then I'd think cutting that in half ~23K would still be "large").
Post Reply