AA: Extracting Tables That Aren’t Tables(Part-II)

In our last post, we were able to successfully extract data from each column.

Today we will outline the logic for looping through each line item in the web table, extracting its contents, and storing it in proper format.

Simply put, we will extract the table as it is, even though the HTML structure isn’t table-ish.

This is quite similar to the Data scraping exercise we performed earlier which you can find here.

It Will Keep Looping While

Thought that title was incomplete, didn’t you?

Well, it isn’t.

The loop will keep iterating as long as it satisfies the condition but for that to happen, our Loop Action has to be set to While.

But what will our condition be?

What indicator is reliable enough to use here?

Is there even an indicator we can rely on?

Yes, There Is!

We had crafted four Xpaths for each column in our previous post.

This will also become our Loop:While: Object Condition, and I want you to guess why that is before I tell you…

.

.

.

.

.

It’s because our item has to be present for the extraction to take place.

This works like a charm since our Loop:While performs the check after each iteration.

Given below is a screenshot of the condition I have applied:

With that, we have established the condition over which our loop will iterate.

Now for the Extraction

There are few prerequisites.

First, declare a Counter.

This counter is crucial, as it will enable us to add rows to our DataTable and perform a one-time check which we will cover in a moment.

After that, create a DataTable and provide the number of columns, along with their names.

You don’t have the option of adding columns during runtime, which is why we have to set this up before running the process.

Also, we have to declare a record variable and add indexes.

Each index will contains the column name in the exact same order as we had specified in the DataTable.

If Name and Cost are two such columns in our DataTable, with the Name coming before the Cost, then the record variable must consist of two indexes, with the first one being declared as Name, followed by Cost.

If you mess up the order, then our bot will greet you with a “wrong schema supplied to bot” error.

The Mysterious One-Time Check

This checking is performed in the first Action nested inside our Loop Action.

What is this check you ask?

We are checking whether the Counter value is greater than one.

This might seem nonsensical at first, but bear with me as we head back to the DataTable we just initialized.

When you create a DataTable, it will contain at least one row. You can’t reduce the row count to zero, which is why we have added this check.

Also, the DataTable won’t automatically append itself with rows.

Robots aren’t smart enough to do that, which is why we have to develop the logic for that as well. The robots aren’t smart enough to know how many columns our DataTable contains, which is why we had to initialize a record variable specifying the indexes as well.

All of this may seem tedious at first, but trust me, it gets better with practice.

What Goes On Inside the Loop

During the first iteration, our Counter is initialized to one, and it won’t satisfy the If Condition, hence a record won’t be appended to the DataTable.

The Recorder: Capture Action does its job, and pulls data from the webpage and…then what?

It gets stored into its appropriate index.

Remember, we didn’t have to append a record during the first iteration since a row was already present, so we need not worry about that bit, however we have to ensure we are on the right column as well as the right row index.

Now to populate our data table, drag in a DataTable: Set value of a single cell.

DataTable rows are indexed from 0, so to fill values to the first column, we can either initialize another variable to zero and keep incrementing it during each iteration, or you could go big brain time and reuse the counter.

Sure, the counter is initialized to one, but what is preventing us from subtracting one from it?

It’s never too late to invoke big brain time.
$nCounter.Number:Decrement$ also works

This way, we only have to keep track of one variable, which reduces the cognitive load by…a minute fraction.

Computers may be dumb, but their capacity for performance is exponentially superior to the temporary boost our big brain moment offers us.

It’s only a matter of time before they take over all our jobs and start replicating themselves.

But that isn’t happening anytime soon, so let’s continue with the topic we drifted away from.

Actually We Are Done

Yeah, that is pretty much it.

If you’ve followed along attentively, then this is the result you will arrive at.

This is what success(-fully extracted data) looks like.

Once you nail the concepts, the rest is a cakewalk.

But make sure you nail those concepts first, before you decide to walk all over those delicious cakes.

Leave a Comment

JOIN OUR NEWSLETTER
And get notified everytime we publish a new blog post.