Browser Migration Using Regex. Yes, You Read That Correctly(Part-I)

Regular Expressions are a life saver, especially when dealing with complex pattern recognition.

Once you become proficient at crafting regex patterns (or proficient enough to google the right patterns), you can slowly become creative with it.

It happens naturally, as you start looking for patterns outside of regex – areas where you may apply regex to either achieve an end result, or to enhance an existing process.

We will look at one such example, which I guarantee will blow your mind.

UiPath Has Its Own Migration Tool, So Why This?

I started penning this long before UiPath decided to crush my brilliant idea by releasing the Migration Tool.

Instead of writing, proofreading and publishing it on time, I got lazy and now I’m paying the price for it.

Yes, I am in agony as I continue to write this, but it is nonetheless, an interesting idea that will show you just how powerful regex can be when you study it, and put it to use.

How Our UiPath Automations Are Packaged

Once you boot up UiPath Studio and create a new project, a main.xaml file is generated along with a handful of other dependencies like so:

You will find it under “Projects”

For the purpose of this demonstration, I have created a simple workflow which launches Edge, navigates to google.com, query “UiPath”, head back to google.com.

We will migrate a workflow which works on Microsoft Edge, to Google Chrome using Regex(Indirectly, of course).

To do so, we will use UiPath on UiPath.

Since you have UiPath installed, if you try to open the main.xaml file, it will boot up UiPath Studio, which is NOT what we want.

We wish to peer inside, and study its internal structure.

To do so, we have to open it using Notepad++.

Not as aesthetically appealing, but we can get somewhere with this.

While this is hard to navigate across, we can however, zero into the items we wish to change using Ctrl+F.

We know for a fact that each selector contains app= ‘msedge.exe’.

We could simply replace all instances of app=‘msedge.exe’ using

System.Text.RegularExpressions.Regex.Replace(XAML_Content,"(msedge)+","chrome")

But the expression isn’t ideal.

Sure, it gets the job done, but there is a chance that it could end up replacing elements it shouldn’t or avoid replacing elements it ought to.

Just to makes sure we are targeting the right elements, we will head off topic and explore anchors.

Except, They Aren’t Called Anchors

It’s called LookAround Regex.

LookAround is further broken down into four types, of which we will only be using two.

The first is LookBehind, which as the name suggests, looks for a pattern behind the pattern we wish to match.

The second is LookAhead, which as the name suggests, looks for a pattern wich comes ahead of the pattern we wish to match.

In short, they are zero width assertion which detects patterns either preceding or succeeding the pattern we are interesting in.

I bet that last sentence made little to no sense, so before we dive into this problem, lets play around with some test data:

Name: TheCodingTheory
Location: Internet
Address: Pale Blue Dot

Say I wanted to extract the Name from the block of string provided above.

Using \w+ or [A-Z]+ won’t cut it.

To get the value we want, we will identify a pattern which precedes it using a LookBehind regex followed by a wildcard to identify the pattern we want.

//Pay attention to the highlighted elements below:
--------------------------------------------------
(?<=) – Look Behind Format
(?<={Pattern goes here})
(?<=Name:\s) - Pattern to look behind for
(?<=Name:\s).* - Captured Pattern
--------------------------------------------------

It asserts whether a match is possible based on the pattern you have provided to Look Behind with.

This might seem counter intuitive at first, since even though we are looking ahead, the pattern being matched is what is present before the pattern we want.

With enough practice, it will become second nature.

Lets try few more exercises.

Name: TheCodingTheory Location: Internet Address: Pale Blue Dot

I have aligned the text along a single line, lets check if the regex we used earlier will still detect the pattern we are interested in.

Hmm, that doesn’t seem right.

(?=Name:\s).*

The issue is with the highlighted element above, is that the period matches any character except for newlines (and many other line terminators – just trying to keep it short and simple here)

The asterisk, or wildcard(*) matches the previous item 0 or more times.

Since period also matches spaces, the entire line of text gets highlighted.

To avoid that, we will use a keyword which matches doesn’t spaces.

(?<=Name:\s)[\S]+

While \s matches spaces, \S matches anything that is not a space.

I won’t be able to cover each and every aspect of regex in this post, so we will dive into one last example.

Try This Along With Me!

Match the Address using regex and only continue reading once you have.

.

.

.

.

.

(?<=Address:\s).*(?=\s\w{1,}:)

We have employed LookAhead as well to zero into the pattern we want to extract.

Yes, the question marks followed by other special characters isn’t making you feel any special, neither is it making any god damn sense but don’t worry.

With practice comes familiarity.

With familiarity comes expertise.

And with expertise, comes a special skill called “show off”. Use it wisely though.

Simply put, in order to LookBehind, the pattern for it goes behind the item we want, and for LookAhead, the pattern for it goes ahead of the item we want.

This is the counter intuitive bit that threw me off when I first started learning it.

To be honest, it still throws me off.

And That Is It For Today

You must be super disappointed that I decided to taper things off here.

We are approaching the 1000-word, which is not ideal.

Few pay attention past the 1000-word mark, so I will continue this next week…

However I will give you a hint.

TELL ME

When we automate web browsers, we often deal with Attach Browser, Click, Set Text, Type Into, Element Exists, just to name a few.

These element have their own parent tags within the xaml, and its up to you to figure out what they are.

Once you discover them, the rest becomes a cakewalk, because all you have to do is nest them inside LookArounds like so:

(?<=(Click-Tag|SetText-Tag|…)).*?(?=[You better not look away from me!])

This is going to get pretty complex as well, so tune in next week.

Leave a Comment

Join Our Newsletter