Page 1 of 1

[Invalid] Image Dup Not Found, sometimes

Posted: 03 Jan 2024, 02:46
by therube
Image Dup Not Found, sometimes

It seems that sometimes not all the duplicates of a file are "found".
(I'm starting with a saved Search Results but I don't see where that should have any bearing.)
I've not been successful in trying to duplicate said failure by copying said files elsewhere & performing the same scan on them.

It makes sense where if the files where not actually identical (hash identical), that a "similar" hash could miss, or miss identify something, but where the files are bit identical, I would think they should not be missed - even if you're using a "similar" Image hash.

(I might have run into this before, not sure particularly, but if I did, it didn't make me say, "oh, wait, why did it not also find...", at the time. Though now that I have seen it, & it did register, I'm watching to see if I can figure out why? And to top it off, the scenario described below is not given, as in in the same search, other 3 sets of files, 2 identical & 1 a "[similar] dup" did find all the expected files?)

3 files,
2 of which are IDENTICAL (hash)
dates differ for all three (& size, for the 3rd file)
(files were in 3 separate directories, same drive)

all 3 are Image "dups", aHash 93%

obviously the 2 that are IDENTICAL (hash)
should be found by the Image Mode 93%
- BUT, only 1 was of those 2 was returned

the 3rd is a "dup" by virtue of the 93%
(& was found)

AllDup returned results of 1 of the IDENTICAL & the 93% (search)
[actually shows as 99% "identical"],
but not the 2nd of the 2 IDENTICAL (hash)

Tried to duplicate the issue, copying said files into another
directory... but with that, AllDup found (the expected) all 3 to
be dups (as they are, 2 IDENTICAL & 1 via the 93%)


(this was from a large search, 40K files, 18K groups, 2.5 GB)
(attempting to duplicate, was with only a handful of files)

& again

3 files, 2 same name, otherwise diff size & date
- results:
1 of the same named files & the "other" file
but NOT the 2nd of the same named files

& all /are/ "dups" (per the 93%)
(& again, copy the files into different, MUCH SMALLER fileset, & all are found, as expected)

11/26/2023 12:18:37 PM - AllDup 4.5.54 PE
11/26/2023 12:18:37 PM - Search method: Find similar pictures
11/26/2023 12:18:37 PM - Comparison method: aHash
11/26/2023 12:18:37 PM - Image Formats: BMP, GIF, JPEG, JPG, PNG
11/26/2023 12:18:37 PM - Match: 93%
11/26/2023 12:18:37 PM - Picture area: entire picture
11/26/2023 12:18:37 PM - Comparison size: 16x16 Pixel
11/26/2023 12:18:37 PM - Checksum Strength: 256 Bit
11/26/2023 12:18:37 PM - Option: Use database
11/26/2023 12:18:37 PM - 1.Source folder: T:\C-DOC
11/26/2023 12:18:37 PM - 2.Source folder: T:\PICS
11/26/2023 12:18:37 PM - Option: Compare files from all source folders
11/26/2023 12:18:37 PM - Determine file count of all source folders...
11/26/2023 12:19:00 PM - File count: 290,757

& yet

here it did find all dups...


2, same name, but different case
all different subdirectories (though same /tree/)

& yet

here too, all are found...


2, same name, but extension CASE is different

Re: Image Dup Not Found, sometimes

Posted: 03 Jan 2024, 14:29
by Administrator
In which folders were the 3 files in which only 2 duplicates were found?

Re: Image Dup Not Found, sometimes

Posted: 03 Jan 2024, 22:34
by therube
I'll have to see if I still have that information.

All 3 were in different directories, all on the same drive.
I don't think there was common directory heritage (at least lower tree level).

Re: [Invalid] Image Dup Not Found, sometimes

Posted: 04 Jan 2024, 22:47
by therube
LOL, call me stupid, stupid!!!

everything i said is correct
EXCEPT for the little old fact that i did NOT search the particular directory (tree),
T:/YYY/... - my search was within T:\PICS

while said files & directories do exist, i did not include the T:\YYY\ directory
tree in my search

heh, now that might be a good reason why the files, even though they were in fact
dups did not turn up in the AllDdup results!

sorry for the noise.

(well, at least another mystery solved.
next on the agenda... will have to come, later ;-))

Code: Select all

the 2 files found as dups were:

	T:\PICS\pic\east\kki-slow36.jpg     (99%)
	T:\PICS\pic\080100\aed9cf07.jpg    (100%)

there was a 3rd file, also named aed9cf07.jpg, hash identical to above aed9cf07.jpg
that was not found as a dup:


AH!, *NO* files were even "seen" ? from that directory, /080102/
- why ?
AH!, the DIRECTORY, /+ok.YYY-xp1800-ruben2002/, is not even seen
- why ?
AH!, the DIRECTORY (string), /PICS/+, is not even seen
- why ?
AH!, the DIRECTORY, /YYY-Y/, is not even seen
- why ?
AH!, the DIRECTORY (string), Y/, is not even seen
- why ?
AH!, the DIRECTORY, /YYY/ is not even seen
- why ?

is an