Comparisons seem to be duplicated

English support for the software AllDup
Post Reply
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Comparisons seem to be duplicated

Post by Synetech »

I’m watching the file-comparisons during a scan. It looks like files are being compared twice.

For example, if there are two files: a.txt and b.txt, it looks like AllDup is comparing a.txt, b.txt and b.txt, a.txt even though both of those comparisons are identical. This means double the number of comparisons.

It’s not so bad with a few hundred files, but when there are thousands, the number of comparisons grows exponentially. For example, right now, I need to compare 150,000 files that are all exactly 8.192 bytes. If every file is compared with every other file twice, then it will mean >22 BILLION comparisons. If each pair of files is compared only once, then it will take “only” 11 billion comparisons.

I recorded a clip of AllDup. The Compare File field should not be repeating the same files over and over again.

Code: Select all

Comparisons are commutative, so for:

C:\Dupes\
  a.txt
  b.txt
  c.txt
  d.txt
  e.txt


Compare:

  a b c d e
a - + + + +
b - - + + +
c - - - + +
d - - - - +
e - - - - - (e.txt is not checked at all; it has already be checked against all other files)

5 files ≠ 25 (n*n) comparisons
5 files = 10 comparisons: (n-1)! = (n*(n-1))/2
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Post by Synetech »

Nevermind. It looks like it already is doing it correctly, but the other way around:

Code: Select all

  a b c d e f
a
b +
c + +
d + + +
e + + + +
f + + + + +
It means that each file takes longer and longer to do so it feels like it is getting slower as it progresses, but then, it also reduces the number of files compared in the case of cancelling a scan (e.g., e.txt is not seen until d.txt is scanned).
Post Reply