Hi,
I've been sorting photos from old drives and computers for a few years now, and came up with a system that seems to work for me.. I keep a 'master sorted photos' folder, and I back that up often. The problem, if you can call it that, is that between all the photos and films taken over the years, there is at the moment almost 2TB of total data, and almost 300,000 files to keep track of ..
I am using windows. I have a batch file to make a text file containing each full path filename and a hash (SDA1 I think??) of that file.. I have another batch that can then go file by file in a new 'to sort' folder and basically look for identical files without any worry of the filename being the same or different. That weeds out a lot usually. If I delete lesser versions of photos, I also hash them first with the same method and add those to the text file. So if I've seen a file and made a decision about it, its hash tells my batch file that I can ignore it, I've seen it in the past, or I have it already, or I know nothing yet about it..
With what's left over, I try and rename the files to date/time using exif data using namexif or exiftool.. Then a batch sorts the result into year/month folders, and from there I can have a better look. I usually can just move them manually into their proper 'master' folder, or I discover that they are duplicates based on the filenames now given, but somehow different enough that the hash calculations don't match..
The other thing that is happening now is that sources are increasing.. Our family get pictures of ourselves sent to us, videos from other parents at the dance performance, etc.. When these are sent via an app they lose all exif data, and if you are lucky they get some sort of date/time reference in the filename created on the device so at least you know when you received it. And I'm an Android user, but my daughter is iPhone all the way.. I still do not understand the file system on that thing, or her macbook, or her tablet.. But she creates stuff we want to save and back up all the time.
I was really hoping the 'similar picture search' would be helpful here. I had been using a very very old no longer supported app for this, that sometimes even works semiautomatically sorting similar photos out based on specific properties (file size, last modified date, pixel dimentions, etc..), but it doesn't work reliably (I believe it was written in the 2000/XP days), and I always double check its sorting afterwards anyway.. It isn't specific enough.
What this app did have, which I cannot seem to figure out in AllDup, is that once all these 300,000 files are analysed, you can save that data for re-use! Same with your 2nd folder.. It literally takes 24-36 hours to do the analysis, so that is a real life saver if you want to shut down for the night and start sorting again tomorrrow.. You can also set the percentage match from say 100% to 98% and re-compare instantly. I can't figure that out either..
My first search actually only found 20 matches between the master folder and an old iphone import of about 3000 images.. At 100%.. So if there were rotations, croppings, filters, stickers added etc. , I would lower the percentage and look again. I cannot find a way to do that in AllDup without re-analyzing the whole folder of files, so that means to check for a 95% match costs another 24 hours of processing..
So is it possible to save the analysis data? Even if it can't be updated, you can sort through a lot before having to re-create it again, and the re-creation can be scheduled..
If not and the re-compare with a different percentage is possible, can someone please detail how to do that for me?
I have not tried AllDup with video comparison yet, but if that is possible, also some hints..?? (For instance being able to delete a 'whatsapp' compressed version of a video you have the original of.. Maybe even if you cropped the beginning and end of the video off before sending it..??
All the online help I can seem to find so far is just about smaller groups of files, which doesn't really help me much.. I'm trying the pHash method, which is slower but more detailed, I'd rather do it once and get it right, and the description of that method seems quite thorough..
Advance thanks!
Roy Flint
Similar Picture Search, Can you save the database somehow?
-
- Site Admin
- Posts: 4069
- Joined: 04 Oct 2004, 18:38
- Location: Thailand
- Contact:
Re: Similar Picture Search, Can you save the database somehow?
With the search method Similar pictures you can use AllDup's database feature to store the calculated checksums in the database to reuse them with another search. See description in the manual.