AI in Test Automation: What It Means for You

Artificial Intelligence in Test Automation is a trending topic in the UI test automation community. But what does it mean for testing? What are its goals and outcomes?

In this blog, Richard Clark, CTO of Provar, shares his insights. He discusses what Artificial Intelligence in Test Automation really means. He also examines how tools are using this technology (or not). Finally, he explains how Provar is incorporating these principles.

Why Artificial Intelligence in Test Automation is Popular?

In the UI Test Automation Community, there has been a strong push for self-healing and AI-based testing. This trend has accelerated over the last 18 months.

There may be various reasons that tools have taken this path, including:

Their default locators are not reliable. Test cases break when the UI changes. These changes can come from developers, admins, or vendor application releases and patches.
The locators are binding to ID elements that are auto-generated by the application being tested.

They vary across different environments. For example, Salesforce.com object IDs and custom field IDs differ. Even within the same environment, changes can occur. For instance, Salesforce Visualforce j_id values, Aura IDs, and Lightning page refreshes can all vary.
Their locators rely on multiple methods to find elements. This slows down test execution. The process moves through a hierarchy of locators to identify a screen element.
Their locators don’t understand different page layouts, user experience themes, field-level security, picklist values, or record variants.
Investors are more interested in artificial intelligence in test automation.

So What Do These Other Test Tools Do, and What Specific “AI” Do They Have?

The tools generally apply one or more of the following:

A record and playback function captures clicks and data entry from a human user. This data can then be reused for multiple test scenarios. This initial recording is often brittle and provides a false belief that the test cases are written more quickly.
Create several locators for the screen element and an algorithm/score for what constitutes a match. It means when one locator breaks, your test doesn’t necessarily fail. The issue with this approach is that locators become less robust over time. Multiple changes weaken their reliability. You won’t notice the problem until it hits the scoring algorithm limit. In most cases, when it does break, you have to fix every single locator one by one.
As mentioned above, some tools have added ‘smarts’ to update locators. With one click, all locators can be changed. The system will prioritize the new preferred default locator.
Cache the web pages. Identify any changes or deltas. Find the references to those changes. Then, apply the rules to relocate the field.
Utilize machine learning to improve the locator scoring algorithm. This is done using data from all their clients. Yes, your test tool vendor may be collecting data about your application and test cases. It applies a single algorithm regardless of the application or user experience being tested, the lowest common denominator.

Image credit: CognitiveScale.com

All the above have in common that, yes, some of them are self-healing, but there is no AI component to this, and where there is machine learning, it’s not specific to the application you wish to test.

These are just programmatic rules, not genuine self-learning features that adjust its algorithm or optimize test cases automatically in a proactive way.

To use a famous Artificial Intelligence in Test Automation example, if you teach an AI image classification system an apple, can it recognize that a picture of an orange is 100% also a fruit, rather than “probably not an apple”?

A further issue is that what if the rules engine or probability algorithm results in tests passing when they should be failing? What if I do want my test to fail if the label changes or the wrong CSS class has been used?

So, What is Artificial Intelligence (AI)?

We need to be very careful when using the term AI. For some companies, it almost seems liberally applied, possibly to dumb down their features for their audience or, more cynically, to improve their company valuation/share price.

Let’s take Google’s definition first:

“noun:

The theory and development of computer systems can perform tasks usually requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

Hmm. On that basis, every UI test automation tool already has AI, requiring a human to perform visual perception! Equally, that makes Google Translate an AI, too. I’m skeptical. Let’s try another well-used definition:

To paraphrase Wikipedia:
“…Computer science defines AI research as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. [1] More in detail, Kaplan and Haenlein define AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation….”

OK, so this is more like it. When I studied AI at University in the 1980s, it was mainly about natural language processing and understanding syntax and semantics.

Understanding the rules of English will help a computer interpret a well-written text but leave it floundering on a typical Facebook comment or Tweet.

If a Test Automation tool claims to have AI, then we would have to have a tool that learns automatically, doesn’t have to be taught the rules but works them out for itself, and is continually improving, rewriting its rules and classifications.

Image credit: Capability Cafe, “Impact of Artificial Intelligence/Machine Learning on Workforce Capability.”

In the case of Salesforce Einstein (and Google’s) Image Recognition, it does this based on thousands of examples. The more examples it receives, the better it recognizes the specific objects that a human has already categorized. This allows it to provide a percentage prediction for its recognition when it encounters a new image.

The human brain does the same thing with five possible results: It Is, It Might Be, I Don’t Know, I Don’t Think It Is, and It Isn’t. We don’t always get those rights, of course, especially when it comes to UFOs, and likewise, neither do AI images or speech recognition get it right every time.

Amazon Echo is pretty good, but it is frustrating when you have to change what you repeatedly say to get it to work. To carry on that theme, when I bought my first Echo, I could not get it to play “Songs by Sade,” but I could get it to play “Smooth Operator by Sade.” Now I can do both. It’s either learned or been patched.

Where does all this sit on Provar’s roadmap?

Provar uses metadata to create robust test cases in the first place so that many of the strategies used by other tools are not required. It also means that tests are black and white: there is no ‘probability’ of having the right locator. It is the right locator.

For most of our UI locators, we make any required changes when Salesforce releases its tri-annual updates so that you don’t have to. Even better, we do this within the pre-release window whenever possible so that you have the maximum time to run your regression tests during the Sandbox Preview.

If you’ve used one of our recommended locators based on Salesforce Metadata, your tests keep running and are robust. We also make it easy to create test data on the fly, so you don’t rely on pre-existing values in an org. You don’t need a self-healing tool if your tests are robust already.

Provar uses metadata to create robust test cases in the first place so that many of the strategies used by other tools are not required.

Sometimes, it’s impossible to use a metadata locator, for example, if your developers have used a different UI framework, custom HTML, or for testing non-Salesforce CRM web pages.

So, for those edge cases, we’ve already been working on enhancing our existing Xpath, Label, CSS, Block, and Table locators for generic pages.

It is designed to improve our recommendation of robust locators and for Provar to learn about your Test Strategy so that we can default our recommended locators based on your previous selections.

We even remember the strategies you teach Provar so you can re-apply them across your applications. Unlike other tools, we let you keep these strategies separate, as the strategy for testing Salesforce isn’t necessarily the best for SAP, Workday, or Dynamics 365.

Mutation Testing which could lead to a whole paradigm shift in the way tests are authored to provide a level of exploratory testing.

We have been exploring collaborations with other Salesforce vendor tools. This will help you better understand test coverage for your specific Salesforce implementation and customizations. We can also generate test steps automatically. While this is relatively easy for us, creating reasonable test steps automatically is challenging. Watch this space for some of these features at least six months away.

Finally, we’re keeping a close eye on concepts such as Mutation Testing, which could lead to a paradigm shift in how tests are authored and how they can evolve automatically to provide a level of exploratory testing.

It may seem like science fiction, but so were self-driving cars and having a conversation with a computer just ten years ago.

We don’t consider this to be Artificial Intelligence. We call it common sense to enhance our user’s experience, but if you’re comparing different tools, then OK, call it AI if you insist.

So Are There Any Real AI Test Tools out there?

Some frameworks are beginning to show some accurate self-learning and predictive testing elements. Frameworks that can generate tests for you or identify which changes may be breaking changes vs. cosmetic.

I use the term frameworks deliberately. If you have coded test automation, you may be happy to maintain that code base. There is some impressive technology available that includes a level of machine learning.

Image credit: ca.com, “Artificial intelligence and continuous testing: It’s the next big thing (really).”

Regarding no-code solutions like Provar, I have not seen anything beyond a rules engine. This engine requires you to ‘program’ it. A generic locator algorithm is also available.

Indeed, I’ve seen no other tools that can handle the changeover from user experiences, such as Salesforce Classic to Lightning, with test cases that run seamlessly on both UIs plus keep running without making changes to your test case across multiple releases.

At Provar, we always made it our mission to help Salesforce customers create and maintain robust test cases for their UI and API tests. We accomplished this in the following ways:

We leverage the Salesforce metadata to identify screen elements and field values automatically.
Automatically recognize the page you want to test and automatically build all the required navigation steps.
We leverage the Salesforce API to interact directly with the underlying data model.
You are suggesting alternate locators and allowing you to override our recommendations.
It can build reusable test cases across Salesforce UIs such as Classic, Lightning, Mobile, and Communities.
Continuous improvement to keep synch with the latest Salesforce, Selenium Webdriver, and multiple browser vendors.
We provide in-built Provar APIs and a custom API to extend capabilities beyond Salesforce to perform meaningful end-to-end test cases.

What tools like Provar have delivered, and continue to deliver, is Artificial intelligence in Test Automation and robust test scripts integrated into an enterprise’s Continuous Integration and Continuous Delivery processes. It’s natural, has rapid ROI, and it’s relatively easy to achieve. It’s intelligent, but there’s nothing artificial about it!