VBA Regular expressions with Word

Another post in VBA MSDN, it's been a slow week, asked 'is it possible to select all upper case words in a word document". I haven't done much programming with Word's API so thought I'd give it a shot as it gave me an excuse to brush up on some regular expressions.

Surprisingly Word's search feature doesn't allow regular expression patterns, which is a bit rubbish, but that doesn't stop them being used. So the first thing to do is add a reference to the regular expression VBScript library (VB Editor->Tools->References->VBScript Regular Expressions 5.5) 

The text needing to be searched in the document is all words that contain upper case characters. These words can be matched with this pattern "\b[A-Z]+\b". This can be broken down into the following...

\b = word boundry, this lets you perform whole word searches, for example "\bor\b" will match the text "or" only, it won't match "for" or "order", \b indicates the start or end of a word. "\bor" will match "order" and "or" but not "for".

[A-Z] = matches a single upper case character e.g. a character from A to Z

+ = matches one or more characters, e.g.[A-Z]+ matches one or more upper case characters

\b[A-Z]+\b = matches a whole word that contains one or more upper case characters

Thats the pattern broken down so here is the code to parse the document.

Sub BoldUpperCaseWords()
  Dim regEx, Match, Matches
  Set regEx = New RegExp            ' Create a regular expression.
  regEx.Pattern = "\b[A-Z]+\b"         ' Set pattern.
  regEx.IgnoreCase = False           ' Set case insensitivity.
  regEx.Global = True           ' Set global applicability.
  Set Matches = regEx.Execute(ThisDocument.Range.Text)    ' Execute search.
  For Each Match In Matches     ' Iterate Matches collection.
    'selects a range from the index of the character to the index of the character plus the length of the word
    ThisDocument.Range(Match.FirstIndex, _
                                    Match.FirstIndex + Len(Match.Value)).Bold = True
End Sub

The code above executes the pattern against the documents text and for each word that matches that pattern is bolded. A couple of points to highlight, the method ThisDocument.Range.Text returns all the text in a document (thats handy), the IgnoreCase property of the RegExp must be set to false as the text case is important for this code, and the Global property is set to true to indicate that the pattern will be used to make many matches over the search text.

Published Saturday, September 09, 2006 11:41 AM by dsmyth
Filed under:


No Comments