Different Source-Folder Option is Inefficient

English support for the software AllDup
Post Reply
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Different Source-Folder Option is Inefficient

Post by Synetech »

Hi,

The option Detect only duplicates from different source folders can be useful, but it is implemented inefficiently. Currently what it does is to wait until all files have been scanned as normal, then filter out the results. This means that a lot of unnecessary comparisons are done. Instead, the option should be used to reduce the number of comparisons.

For example, say that all the files below are identical:

Code: Select all

  C:\
    A\
      1.txt
      2.txt
    B\
      a.txt
      b.txt
Currently, AllDup checks:

Code: Select all

  1.txt - 2.txt
  1.txt - a.txt
  1.txt - b.txt
  2.txt - a.txt
  2.txt - b.txt
  a.txt - b.txt
Then it removes 1.txt - 2.txt and a.txt - b.txt from the results (which causes the program to look like it is frozen for a while if there are a lot of results). It should not check those in the first place; it should only check:

Code: Select all

  1.txt - a.txt
  1.txt - b.txt
  2.txt - a.txt
  2.txt - b.txt
That’s 1/3 few comparisons and the results are displayed faster too. :D



Files from the same folder are being compared even though the separate sources option is selected:
Image
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: Different Source-Folder Option is Inefficient

Post by Synetech »

For example, during the week when I use a laptop at work, I save files I download to a flash-drive. On the weekends, I put the flash-drive on my desktop computer and run a scan to check for anything that I already have. I use the Different Source-Folders option so that AllDup will only check the new files against existing files, but it still checks every file against every file and only filters results. :(

AllDup is intelligent enough to not compare files and folders that are on the filter lists which helps to reduce the number of items scanned, so hopefully it can be made smart enough to avoid scanning items from the same source when that check-box is used.
therube
Posts: 322
Joined: 07 Nov 2012, 00:28

Re: Different Source-Folder Option is Inefficient

Post by therube »

Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: Different Source-Folder Option is Inefficient

Post by Synetech »

Yes, it is. Thanks for the notice.



Unfortunately he says that the algorithm can’t (currently) filter the list without increasing the scan time. That seems questionable because it only requires a few simple string comparisons to check the source of the two files being compared which is much less work than comparing their contents.

He’s got it on his todo list, so hopefully it will eventually work as expected. I’d be happy to brainstorm on a way to get it to work as efficiently as possible.
Post Reply