advanced word matching

English support for the software AllDup
Post Reply
jdupes
Posts: 2
Joined: 17 Feb 2022, 14:11

advanced word matching

Post by jdupes »

I was searching for this quality of program for a while, and happy to have found it! I had a more advanced case I want to ask about.
In WordMatch, I can specific text patterns to ignore, but how can I specify entire words to ignore?

For example, if 0,1,2 appear as words I want to ignore them, but not when they appear in the middle of other words like A1B2 shouldn't become "AB".

It would be nice to have a colum to display matched words in the results window.

Also is there ability to compare only the first N words, instead of N characters?

BTW, there is a small typo in the results window, Columns menu - gridlines. The last option is a duplicate of the one above it, instead of saying "Both horizontal and vertical lines".
JoBeCo
Posts: 14
Joined: 05 Feb 2022, 20:58

Re: advanced word matching

Post by JoBeCo »

jdupes wrote: 17 Feb 2022, 14:18 For example, if 0,1,2 appear as words I want to ignore them, but not when they appear in the middle of other words like A1B2 shouldn't become "AB".
I would say that this is exactly the way these fields are ment. Delete some special characters or (with the second input) text fragments.

Without overhauling this dialog and the coding in the background there will be no solution for your task.

I could also think of some circumstance where I would widen the set of allowed names for building a group with or without further inspection of content.
Some improvements the would come to my mind:
  • allow patterns/wildcards for the wordlist
    as seen in another thread one could wish to specify (*) for deletion - thus any text in brackets ...
    this still would help you with the wordmatch
  • extending the patterns to specify word boundaries
  • make use of regular expressions
  • specify (several) replacements for the ignored parts
    with this all filenames would be transformed by all deletions and/or replacements and form the new group name
    so you could group A1B and A2B but omit AB
  • with regular expression as replacements one could specify pattern to keep
    • only the first part up to a dash '-'
    • the first n words of the name
    • the part in curly braces
      ...
The variants are listed here in increasing complexity - but also offer more possibilities in each case.

How would this input be implemented - maybe with a special input field - maybe multiline ...
I could also think of a field where you could input a set of sed commands. Depending of usable libraries to implement this there could also be awk or perl code or the call of an external tool ...

Cheers
JoBeCo
Administrator
Site Admin
Posts: 4046
Joined: 04 Oct 2004, 18:38
Location: Thailand
Contact:

Re: advanced word matching

Post by Administrator »

In WordMatch, I can specific text patterns to ignore, but how can I specify entire words to ignore?
Thats not possible. I will note this on the ToDo-List.
Also is there ability to compare only the first N words, instead of N characters?
no, thats not possible too. I will note this on the ToDo-List.
BTW, there is a small typo in the results window, Columns menu - gridlines. The last option is a duplicate of the one above it, instead of saying "Both horizontal and vertical lines".
This will be fixed with the next update.
Post Reply