File and Folder Filters Sometimes Do Not Work

English support for the software AllDup
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

File and Folder Filters Sometimes Do Not Work

Post by Synetech »

There’s some files and folders that I know are duplicates and want to exclude from the search. I have already added them to the filter(s), but they keep showing up in the search results anyway, and I have to manually remove them every time.

I think I figured out why.

Looking at the search results, it looks like what might be happening is that when you add a file or folder path that has a special character to the filter, it won’t be correctly excluded. I’m not certain what character(s) are problematic, but the list looks like it includes at least [, ], {, }, #.

I suspect that when AllDup compares the path of a file to be checked against the list of file/folder paths to be excluded, it uses a regex (especially since the filter list mentions being able to use wildcards), but it does not first escape the paths. Therefore, when it compares c:\[foobar], it is seeing a regex string which would match c:\f, c:\o, c:\b, c:\a, and c:\r instead of the actual path that contains brackets.

Fortunately, it should be pretty easy to fix this bug, just make sure to escape the paths (e.g., c:\[foobar] ⇨ c:\\\[foobar\])
Last edited by Synetech on 11 Aug 2013, 20:19, edited 3 times in total.
therube
Posts: 322
Joined: 07 Nov 2012, 00:28

Special Characters In Exclude List

Post by therube »

(Could you add a 'Subject:' to your post.)
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Post by Synetech »

It had one when I first made the post, but it seems to have been stripped when I tried to use the Preview function (which didn’t work). I tried adding one again (you can see the post is edited), but it won’t take it. An admin has to do it because this version of phpBB seem to have a bug (it shouldn’t have taken the post without a title in the first place). I think the problems have something to do with the security code.

Nevermind, I got it to take the title. Maybe it doesn’t like quotes. :?
Administrator
Site Admin
Posts: 4047
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Administrator »

Sorry, i cant reproduce this problem.

My settings:

Source folder: D:\AllDup\TEST2\
Folder Filter: "D:\AllDup\TEST2\[foobar]" or "[foobar]"

AllDup doesnt search inside the subfolder "[foobar]" if the folder filter is activated...
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Post by Synetech »

:? Then maybe it’s something else, or something more complex like a combination of factors.

I’ll try to do some more experiments to figure it out.
Administrator
Site Admin
Posts: 4047
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Post by Administrator »

Synetech wrote:It had one when I first made the post, but it seems to have been stripped when I tried to use the Preview function.....
I cant reproduce the problem with the title:

Ive entered a title and used the preview function.
Title ok.
I added quotes to the title at the preview function and send the message.
Title ok.

Can you help me to reproduce this problem?
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Synetech »

I did some tests and got a couple of important results:
  • If the file or folder contains a “#” in the name, then adding that to the filter will not work. The function that processes the filters has a problem with the “#” character
  • If you use a wildcard (e.g., “*”) in a filter, it could crash AllDup if there are certain special (non-alphanumeric) characters around it
You can reproduce it as follows:
  • Create a folder (e.g., C:\AllDupTest)
  • Create a couple of sub-directories (e.g., C:\AllDupTest\B, C:\AllDupTest\C)
  • Copy a file to both of the sub-directories
  • Run AllDup on the parent (C:\AllDupTest)
  • When it shows the results, add one folder (e.g., C) to the folder filter
  • Run the scan again; it should find no duplicates. Good
  • Rename the folder to have “#” in it (e.g., C#)
  • Run the scan again; it should find the duplicates again. Good
  • Add the renamed folder (C#) to the folder filter
  • Run the scan again; it should not find the duplicates but it does. Bad
You can do the same thing happens with the file filter; add a file filter like foobar#.txt and it will not work.

Also, you can make AllDup crash. To make sure that only “#” was a problem, I renamed C to “C`~!@$%^&()_+}{`-=][';.,”. I tried putting a “*” in the filter to test if wildcards work, but AllDup crashed whenever I tried to run it until I removed or disabled that filter.
Administrator
Site Admin
Posts: 4047
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Administrator »

Activate the filter option "Ignore wildcards in filter text (treat like normal text)" if you use wildcards (*, # and ?) at the filter text. See help file.

Renaming the subfolder "C" to "C`~!@$%^&()_+}{`-=][';.," and changing the folder filter from "C" to "C*" or "C#*" doesnt crashes AllDup here...and its doesnt matter if the filter option "Ignore wildcards in filter text (treat like normal text)" is activated or not...no crash...
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Synetech »

Administrator wrote:Activate the filter option "Ignore wildcards in filter text (treat like normal text)" if you use wildcards (*, # and ?) at the filter text. See help file.
Ah, I see; AllDup treats “#” (and “[” and “]”) as wildcards. I did not realize that, I thought it only used the standard DOS/Windows wildcards “?” and “*”. I just checked the manual and I see the explanation. Okay, that resolves the problem with some files and folders slipping through the filter.

The problem now is that using “*” and “?” for wildcards is okay because they are invalid in filenames, so they won’t/shouldn’t exist. However “#”, “[”, and “]” are valid filename characters, so they can easily exist in file and folder names, for example C:\Data\Programming\C#\Assemblies (I wasn’t using “#” as a wildcard, at least not on purpose). Using those characters as wildcards means that the filter won’t work for anything that happen to have them (and explains why I have to manually remove all of the false-positive every time). You can disable wildcards, but then you don’t get that functionality at all, so you would have to choose between no wildcards at all or renaming everything (which may not even be possible).

It would be better to use illegal characters for the wildcards, that way you can use wildcards and have files/folders with those (valid) characters in their names. Since “/”, “\”, and “:” come up naturally in paths (and so would appear in the folder-filter list), the best solution would probably be to use “<” and “>” instead of “[” and “]” for the character lists/ranges, and “|” instead of “#”; those characters should never appear in a valid file or folder fully-qualified-path. That should be easy to change in the program and also easy for users to change in the filter lists.
Administrator
Site Admin
Posts: 4047
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Administrator »

From the AllDup manual "Wildcards for Filters":

"The following characters have to be inserted in square brackets in order to be used for a compare operation: left square bracket ([), question mark (?), pound sign (#) and asterisk (*)...."

PATH: C:\Data\Programming\C#\Assemblies
FILTER: C:\Data\Programming\C[#]\Assemblies
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Synetech »

Administrator wrote:PATH: C:\Data\Programming\C#\Assemblies
FILTER: C:\Data\Programming\C[#]\Assemblies
I’ll try that the next time I have my main drive available. Thanks.

Of course since the Add Path to File/Folder Filter commands don’t do it for you automatically, that work-around would have to be done manually (not using valid filename characters as special characters would be easier and cleaner and would avoid having to escape them at all).
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Synetech »

Okay, I edited the config file to replace all “#” with “[#]”. It didn’t work. I realized that I have to escape the “[” and “]” in my folder names as well which was a huge pain because they are part of the escape sequence itself, so I had to find a character that was not used and use that as a temporary replacement, then change one, then change the temporary. :| It still didn’t work. I realized that I had escaped the escaped “#” (they were now “[[]#[]]”), so I had to fix those.

Unfortunately, those three characters were escaped in the folder paths as well which made a complete mess of things. I had to replace them all back and escape them only in the two filter lists.

I also tried using the Add File/Folder to Filter dialog, but had to make sure to carefully check for and escape any “#”, “[”, and “]”—in Notepad. :?

So paths must be escaped in the filter lists but not in the folder list, and you must manually escape them. If you change the wildcards to illegal characters, then you won’t have this sort of inconsistency or extra work. :wink:

At least it finally works now and I don’t have to manually remove several hundred false-positives every time I run a scan. Also, thankfully AllDup now filters them before running the scan, not just remove them from the results, which means the scan takes less time. :)
Administrator
Site Admin
Posts: 4047
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Administrator »

Synetech wrote:If you change the wildcards to illegal characters, then you won’t have this sort of inconsistency or extra work.
Sorry, i cant change the wildcards. Its a fixed part of regex (http://en.wikipedia.org/wiki/Regular_expression)
Synetech
Posts: 48
Joined: 19 Jan 2011, 09:32
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Synetech »

Ah okay, you’re using a regex library to handle it; that explains the brackets (though “#” isn’t normally part of regex, and other regex characters like “{” / “}” seem to work just fine).

Well then I suppose one option could be to automatically escape and unescape them so that the user doesn’t have to remember and attempt to do it (it can be surprisingly difficult to make sure you escaped it correctly, e.g., C:\Users\Foobar\Desktop\#Dump\#Games\{RPG}\[SSI]C:\Users\Foobar\Desktop\[#]Dump[#]\[#]Games\{RPG}\[[]SSI[]]).
Administrator
Site Admin
Posts: 4047
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: File and Folder Filters Sometimes Do Not Work

Post by Administrator »

I have added a context menu to the filter lists for file and folders:

-Escape wildcars
-Unescape wildcards

Do you want to test the new functions?
Post Reply