[Invalid] Image Dup Not Found, sometimes
Posted: 03 Jan 2024, 02:46
Image Dup Not Found, sometimes
It seems that sometimes not all the duplicates of a file are "found".
(I'm starting with a saved Search Results but I don't see where that should have any bearing.)
I've not been successful in trying to duplicate said failure by copying said files elsewhere & performing the same scan on them.
It makes sense where if the files where not actually identical (hash identical), that a "similar" hash could miss, or miss identify something, but where the files are bit identical, I would think they should not be missed - even if you're using a "similar" Image hash.
(I might have run into this before, not sure particularly, but if I did, it didn't make me say, "oh, wait, why did it not also find...", at the time. Though now that I have seen it, & it did register, I'm watching to see if I can figure out why? And to top it off, the scenario described below is not given, as in in the same search, other 3 sets of files, 2 identical & 1 a "[similar] dup" did find all the expected files?)
3 files,
2 of which are IDENTICAL (hash)
dates differ for all three (& size, for the 3rd file)
(files were in 3 separate directories, same drive)
all 3 are Image "dups", aHash 93%
obviously the 2 that are IDENTICAL (hash)
should be found by the Image Mode 93%
- BUT, only 1 was of those 2 was returned
the 3rd is a "dup" by virtue of the 93%
(& was found)
AllDup returned results of 1 of the IDENTICAL & the 93% (search)
[actually shows as 99% "identical"],
but not the 2nd of the 2 IDENTICAL (hash)
Tried to duplicate the issue, copying said files into another
directory... but with that, AllDup found (the expected) all 3 to
be dups (as they are, 2 IDENTICAL & 1 via the 93%)
?
(this was from a large search, 40K files, 18K groups, 2.5 GB)
(attempting to duplicate, was with only a handful of files)
& again
3 files, 2 same name, otherwise diff size & date
- results:
1 of the same named files & the "other" file
but NOT the 2nd of the same named files
& all /are/ "dups" (per the 93%)
(& again, copy the files into different, MUCH SMALLER fileset, & all are found, as expected)
11/26/2023 12:18:37 PM - AllDup 4.5.54 PE
11/26/2023 12:18:37 PM - Search method: Find similar pictures
11/26/2023 12:18:37 PM - Comparison method: aHash
11/26/2023 12:18:37 PM - Image Formats: BMP, GIF, JPEG, JPG, PNG
11/26/2023 12:18:37 PM - Match: 93%
11/26/2023 12:18:37 PM - Picture area: entire picture
11/26/2023 12:18:37 PM - Comparison size: 16x16 Pixel
11/26/2023 12:18:37 PM - Checksum Strength: 256 Bit
11/26/2023 12:18:37 PM - Option: Use database
11/26/2023 12:18:37 PM - 1.Source folder: T:\C-DOC
11/26/2023 12:18:37 PM - 2.Source folder: T:\PICS
11/26/2023 12:18:37 PM - Option: Compare files from all source folders
11/26/2023 12:18:37 PM - Determine file count of all source folders...
11/26/2023 12:19:00 PM - File count: 290,757
& yet
here it did find all dups...
T:\PICS\pic\070101\rlie06.jpg
T:\PICS\pic\120101\Russia\Ngy_11.jpg
T:\PICS\pic\110100\ngy_11.jpg
2, same name, but different case
all different subdirectories (though same /tree/)
& yet
here too, all are found...
T:\PICS\pic\nothere3.jpg
T:\PICS\TRI\0MYDOC\MYDOC2\di\nothere3.JPG
T:\PICS\TRI\0MYDOC\MYDOC2\OLDDOC\ttt5.jpg
2, same name, but extension CASE is different
It seems that sometimes not all the duplicates of a file are "found".
(I'm starting with a saved Search Results but I don't see where that should have any bearing.)
I've not been successful in trying to duplicate said failure by copying said files elsewhere & performing the same scan on them.
It makes sense where if the files where not actually identical (hash identical), that a "similar" hash could miss, or miss identify something, but where the files are bit identical, I would think they should not be missed - even if you're using a "similar" Image hash.
(I might have run into this before, not sure particularly, but if I did, it didn't make me say, "oh, wait, why did it not also find...", at the time. Though now that I have seen it, & it did register, I'm watching to see if I can figure out why? And to top it off, the scenario described below is not given, as in in the same search, other 3 sets of files, 2 identical & 1 a "[similar] dup" did find all the expected files?)
3 files,
2 of which are IDENTICAL (hash)
dates differ for all three (& size, for the 3rd file)
(files were in 3 separate directories, same drive)
all 3 are Image "dups", aHash 93%
obviously the 2 that are IDENTICAL (hash)
should be found by the Image Mode 93%
- BUT, only 1 was of those 2 was returned
the 3rd is a "dup" by virtue of the 93%
(& was found)
AllDup returned results of 1 of the IDENTICAL & the 93% (search)
[actually shows as 99% "identical"],
but not the 2nd of the 2 IDENTICAL (hash)
Tried to duplicate the issue, copying said files into another
directory... but with that, AllDup found (the expected) all 3 to
be dups (as they are, 2 IDENTICAL & 1 via the 93%)
?
(this was from a large search, 40K files, 18K groups, 2.5 GB)
(attempting to duplicate, was with only a handful of files)
& again
3 files, 2 same name, otherwise diff size & date
- results:
1 of the same named files & the "other" file
but NOT the 2nd of the same named files
& all /are/ "dups" (per the 93%)
(& again, copy the files into different, MUCH SMALLER fileset, & all are found, as expected)
11/26/2023 12:18:37 PM - AllDup 4.5.54 PE
11/26/2023 12:18:37 PM - Search method: Find similar pictures
11/26/2023 12:18:37 PM - Comparison method: aHash
11/26/2023 12:18:37 PM - Image Formats: BMP, GIF, JPEG, JPG, PNG
11/26/2023 12:18:37 PM - Match: 93%
11/26/2023 12:18:37 PM - Picture area: entire picture
11/26/2023 12:18:37 PM - Comparison size: 16x16 Pixel
11/26/2023 12:18:37 PM - Checksum Strength: 256 Bit
11/26/2023 12:18:37 PM - Option: Use database
11/26/2023 12:18:37 PM - 1.Source folder: T:\C-DOC
11/26/2023 12:18:37 PM - 2.Source folder: T:\PICS
11/26/2023 12:18:37 PM - Option: Compare files from all source folders
11/26/2023 12:18:37 PM - Determine file count of all source folders...
11/26/2023 12:19:00 PM - File count: 290,757
& yet
here it did find all dups...
T:\PICS\pic\070101\rlie06.jpg
T:\PICS\pic\120101\Russia\Ngy_11.jpg
T:\PICS\pic\110100\ngy_11.jpg
2, same name, but different case
all different subdirectories (though same /tree/)
& yet
here too, all are found...
T:\PICS\pic\nothere3.jpg
T:\PICS\TRI\0MYDOC\MYDOC2\di\nothere3.JPG
T:\PICS\TRI\0MYDOC\MYDOC2\OLDDOC\ttt5.jpg
2, same name, but extension CASE is different