Hello all, I've been using AllDup to sort out a number of duplicated folders across drives and I'm very pleased with it. However, I have some questions:
1. Sometimes in the results I notice seemingly identical files put in different groups. For example, yesterday I had four apparently identical images which I'd expect to be in a single group of four, yet they were listed in two groups of two. Why might this be?
2. I have noticed while looking through these various drives that images that have been copied across drives are sometimes shown by Windows as having changed size slightly - they might be a few tens of kilobytes different on a file of, say, 5 megabytes. There seems no obvious reason for this. Could this be connected with (1) above (although file sizes are reported the same in AllDup)?
3. When comparing files byte-by-byte, can one be sure that the comparison is exact; in other words, if AllDup says it's the same, is it definitely 100% the same? I ask this as I notice Joerg Rosenthal says of his Anti-Twin that byte-by-byte comparison might identify image files that are in fact slightly different, and offers pixel by pixel comparison to counter this. I don't know if this is because of an inherent issue or whether it's just something to do with his program.
...and I've forgotten the fourth question!
Thanks
Some questions on duplicates
-
- Site Admin
- Posts: 4049
- Joined: 04 Oct 2004, 18:38
- Location: Thailand
- Contact:
Re: Some questions on duplicates
1:
How did u search for duplicate files?
Content byte by byte?
2:
Sorry, i dont know.
3:
Byte by byte compares from the 1. byte until the last byte of a file.
If AllDup reports them as same files it is definitely 100% the same.
4:
How did u search for duplicate files?
Content byte by byte?
2:
Sorry, i dont know.
3:
Byte by byte compares from the 1. byte until the last byte of a file.
If AllDup reports them as same files it is definitely 100% the same.
4:
Re: Some questions on duplicates
Thanks, and thanks for the very prompt reply.
1. I'm afraid I can't remember! It is likely to have been byte-by-byte but I can't be certain. I noticed it at the time but it did not occur to me to ask about it until later. If I see the same phenomenon again I will post full details.
2. An example is an image file, shown in AllDup results as two groups each of two files. The images look identical, the only differences being file size (different by 3kb) and creation date.
3. Thanks. Always good to be certain.
4. Still haven't remembered!
5. I should also have said that I assume if one uses 'byte-by-byte' in conjunction with other characteristics, such as file name or extension, the file comparison is still exact and the program uses the other characteristic for an initial sort?
1. I'm afraid I can't remember! It is likely to have been byte-by-byte but I can't be certain. I noticed it at the time but it did not occur to me to ask about it until later. If I see the same phenomenon again I will post full details.
2. An example is an image file, shown in AllDup results as two groups each of two files. The images look identical, the only differences being file size (different by 3kb) and creation date.
3. Thanks. Always good to be certain.
4. Still haven't remembered!
5. I should also have said that I assume if one uses 'byte-by-byte' in conjunction with other characteristics, such as file name or extension, the file comparison is still exact and the program uses the other characteristic for an initial sort?
-
- Site Admin
- Posts: 4049
- Joined: 04 Oct 2004, 18:38
- Location: Thailand
- Contact:
Re: Some questions on duplicates
2. if the files have a different file size and they are in the same group im sure u not searched for duplicates by the content.
5. if you select more than one search criteria all of them must match to set files as duplicates.
5. if you select more than one search criteria all of them must match to set files as duplicates.
Re: Some questions on duplicates
On point 1, I've come across an example of this. I have 5 .wmv files with the same title. Looking at the file properties, each one has the same file size, same file size on disk, same detailed properties (length, frame size, frame rate, author etc. etc.) They appear the same when played. The only apparent differences are creation and modified dates, and the media created date in the detailed file properties (some have a date set, some don't).
It's very unlikely that these are different files, and very likely - given that they've come from various backups - that they're different copies of the same file. Yet a byte-by-byte comparison has sorted them into two different groups, one of three files and one of two, so it appears the program is seeing some difference between them which is not obvious. The difference isn't the creation and modified dates, as they're different inside the two groups. The only difference across the groups appears to be whether a media creation date is set in detail properties for the file or not.
Any thoughts?
It's very unlikely that these are different files, and very likely - given that they've come from various backups - that they're different copies of the same file. Yet a byte-by-byte comparison has sorted them into two different groups, one of three files and one of two, so it appears the program is seeing some difference between them which is not obvious. The difference isn't the creation and modified dates, as they're different inside the two groups. The only difference across the groups appears to be whether a media creation date is set in detail properties for the file or not.
Any thoughts?
-
- Site Admin
- Posts: 4049
- Joined: 04 Oct 2004, 18:38
- Location: Thailand
- Contact:
Re: Some questions on duplicates
It is the media creation date which is stored inside the file.
The same you get with Office files.
Office changes a time stamp inside the files if you just open and close them.
So they all look different at a byte by byte compare.
The same you get with Office files.
Office changes a time stamp inside the files if you just open and close them.
So they all look different at a byte by byte compare.
Re: Some questions on duplicates
Oh, OK. I'd naively assumed it wouldn't make a difference, same as date created & modified don't. Reassuring that AllDup picks up on (what to me seem) such small differences, though.
Found another set just now - 10 .MOV files, split by AllDup into two groups of 5, but this time the file sizes don't tally - AllDup shows the first batch as 144,921.14 KB each, the second group as 144,920.93 KB each; the file properties show the first group as 148,399,247 bytes, the second group as 144,399,036 bytes. Again the only visible difference is the media created date in the detailed properties, so I assume the same issue again despite the slight file size discrepancies?
Thanks very much for your prompt answers on these questions, it's much appreciated - I hope you don't mind, but I like to try to learn / understand what's going on.
Found another set just now - 10 .MOV files, split by AllDup into two groups of 5, but this time the file sizes don't tally - AllDup shows the first batch as 144,921.14 KB each, the second group as 144,920.93 KB each; the file properties show the first group as 148,399,247 bytes, the second group as 144,399,036 bytes. Again the only visible difference is the media created date in the detailed properties, so I assume the same issue again despite the slight file size discrepancies?
Thanks very much for your prompt answers on these questions, it's much appreciated - I hope you don't mind, but I like to try to learn / understand what's going on.
-
- Site Admin
- Posts: 4049
- Joined: 04 Oct 2004, 18:38
- Location: Thailand
- Contact:
Re: Some questions on duplicates
This windows system file properties are stored outside the file content:
- File Attributes
CreationTime
Last AccessTime
Last WriteTime
File Size
File Name
Re: Some questions on duplicates
Oh right, OK. Thanks.