Fabrice Harari International WinDev Consultant

Home         About Fabrice         WinDev Files        Products        Fabrice's blog         Consulting        Contact Fabrice        Links

  My status

WinDev Regular expressions: From A to Z

This is a formatted version of several posts previously published on my blog (blog.fabriceharari.com) during my exploration phase. I hope that this well help developers either to discover regular expressions in a more easy way than I did, or if you already know them, to learn their WinDev implementation in a more complete way.

{AmazonLinks}

 

Would you believe it? During all these years I never had to use regular expressions for anything else than retrieving files (i.e. using * and ?)...

Suddenly, I have to incorporate content coming from web pages into a database, and WinDev's "MatchRegularExpression" function seems to be very handy for that... Except that the corresponding help page is... Let's say... Lighter than I would have liked it... As usual, WinDev help file on a subject presuppose that you already know a lot about the subject in general (i.e. not in WinDev)... If it's not the case, refer to a book on the thing, whatever the thing is... My purpose here is therefore to give you that basic background and group information I found in different places, different help pages and even just by trying things.

So first, what are regular expressions? Basically, it's a technique that allows you to easily match the content of a string to a predefined format (like the input/display formats used in WinDev fields). The difference is that you can manage much more with a regular expression than a 9999.99 format.

The second obvious question is: where should or could I use regular expressions in WinDev? Two answers:
- MatchRegularExpression is a WinDev native function allowing you to verify if a string match a format and also to extract parts of the string matching this or that easily
- fields formats: you can use the ..Inputmask property to change a field input/display formatting. And at this level, you can either use WinDev normal formats or a regular expression.

By example, you can 'easily' create a mask that will allow you to enter between 1 & 4 uppercased letters, than 1 number, than 1 number or the letter X, than 4 letters... The same expression will allow you to verify that a string pasted or imported has the correct format for a field.

Here's an extract of WinDev help page about regular expression syntax that I translated for you:


A B C - / _                      Letters and symbols to verify
[A-Z] or [0-9]                 Interval of letters, numbers or symbols to verify
*                                    0 or several symbols to verify
+                                    1 or several symbols to verify

Now if you read the page a little further down, you will find out that in the case where you want to extract parts of the string in several variable, you also have:

( )                                   Limit of one part of the format you want to extract
{ }                                  Number of authorized repetition for the preceding expression

And that's the first main error in the help page: these two elements are available whether you extract parts of the string or not (i.e. if you check a string with MatchRegularExpression or in a field input format...)

So let's see now what I tried and found out was working, whether it was written in the help file or not. I'm also going to list things that are not working or are that you should be careful about:

- MatchRegularExpression will generate an exception for some expressions' strings! By example: ([A-z0-9]+)(
If you are using this function on an expression built by program or entered by the user (i.e. not hardcoded and tested by you), remember to use it inside a When Exception clause

- A regular expression is built by describing group of characters after group of characters:
  - Each group description is inside [ and ]: by example [A-Z] means anything between A and Z in the ASCII order.
  - Each group description is followed by the number of characters in the group:
    - [A-Z] means exactly ONE character between A and Z
    - [A-Z]{3} means exactly THREE characters between A and Z
    - [A-Z]* means any number of characters between A and Z (including none)
    - [A-Z]+ means at least ONE character between A and Z
    - [A-Z]{3-5} means THREE to FIVE characters between A and Z
  - You have to place the ( and ) symbols around the part of an expression you want to extract in a variable in the case of MatchRegularExpression with extraction.

- The expressions are case sensitive: (A-Z), (a-z), and (A-z) are not the same at all

- Clearly the expressions are ASCII dependent, and if the 128 first characters of the ASCII table are standard, the other 128 are language dependent... Which means that depending of the languages you are managing, your regular expressions can be different. A good idea is to store them in your program as international strings. A good ASCII table is clearly necessary when working on that domain, so here's one online!

- You can have several groups of characters inside one block: (A-Z) or (0-9) is valid, but also by example [A-Z0-9a-z], which means anything between A and Z, anything between 0 and 9 and anything between a and z

- You can add characters that are not in an interval. By example, the following syntax means anything between A and Z and also spaces and exclamation points: [A-Z !]

- You can also include special characters in the usual WinDev Syntax: "[A-Z"+CR+"]" is valid and means that your string can contain any character between A and Z or a carriage return.

- Contrary to what is said in the help of inputmask, the same regular expression syntax is valid for all 3 cases: MatchRegularExpression with or without extraction and ..Inputmask. If you use things that are not relevant (by example ( and ) in an input mask, it's just ignored. And by the way: [A-Za-z]{0,1}[0-9]{0,1} and [A-Za-z][0-9] are NOT identical: the first one means 0 or 1 letter, uppercase or lowercase, and 0 to 1 number... The second means 1 letter and 1 number (0 of any of them is forbidden). (new--) Thanks to Eric L. who answered me on PCSoft WinDev Forum, there's more to say about that point. You CAN technically use [A-Za-z][0-9] in an input mask, but you wont be able to enter a value in the field, as it requests at ALL TIME one letter and one number... You will only be able to past a string in the field if it's valid. Therefore there is no way (it seems) to use a regular expression for an input mask of 1 letter AND 1 number. If you use the first one, you cannot enter the letter alone (before the number), and if you use the second one, it's now possible NOT to enter one of them. In order to be able to test this point more easily, I improved my testing utility by adding a field using ..inputmask with the current expression. (--new)

Merci de ta remarque... Je n'avais pas considéré lez choses sous cet angle... Et tu as raison !

Ca implique donc la chose suivante: il n'y a pas d'équivalent exact à [A-Za-z][0-9] pour un masque de saisie... En effet, si on utilise [A-Za-z]{0,1}[0-9]{0,1} pour le masque, il est possible de saisir simplement 'X' ou simplement '4', mais rien n'OBLIGE avec ce masque à saisir un de chaque... Alors que [A-Za-z][0-9] spécifie une lettre ET un chiffre.

La morale de l'histoire est qu'il faut considérer le cas du masque de saisie comme différent et tenir compte de chaque étape de la saisie pour avoir un masque valide...

Pour pouvoir tester plus facilement, j'ai ajouté un champ de saisie utilisant ..masquesaisie dans mon utilitaire de test... je vais le publier de ce pas et modifier ma page en fonction de tes remarques

---> eric l.

- The syntax to use for ..Inputmask is MyField..inputmask="regexp:"+ RegularExpression

After all that, I'm still not a regular expression guru, and frankly, I don't want to become one :-)

So I extended my test window to allow the following things:
- Test in real time if a string matches an expression
- Test the expression in the ..inputmask context (new)
- Extract the different parts of the string according to the expression
- See the ASCII content of the string (when dealing with non displayable chars or extended ASCII, it allows to find what character to add in your expression)
- Visualize TABs and CRs (the most frequent non displayable characters)
- Display an ASCII table (via the web)
- Generate the WinDev string corresponding to the expression (quotes, Tabs, CRs managed)

You can download it by clicking on one of those links:
- EXE only (160 Kb, if you already have the WinDev 10 framework) (new)
- EXE + Framework (about 3 Mb) (new)
In both case, there is no installer... Just unzip the file where you want and click on the exe. The macro code is allowed and you can therefore add your own code and even send it to me if you think it can help somebody else.

 






 

 

Google
 
Web www.fabriceharari.com
Links:


Last modified Wednesday, April 12, 2006 10:13 AM central time