AI and the Enterprise: Transforming Legacy Automation
Want to push your existing automation initiatives further without doing a full “rip and replace” of systems? Watch this on-demand version of our webinar on AI and the Enterprise: Transforming Legacy Automation.
Learn how an AI-led approach can transform legacy automation solutions and see how the Hyperscience Platform can help you go beyond rules-based limitations and achieve better results—for you and your customers.
Andrew Dunkin: Great to have you all here today. I’m just looking at the attendee list and this is by far the largest registrants attendance that we’ve had for quite a while. So it’s obviously a good topic, and that is transforming legacy automation with Hyperscience. During the session today, we hope to help you better understand limitations of traditional automation solutions, as well as offer insight about how intelligent automation could add significant value to your existing business processes. My name’s Andrew Dunkin. I’m the channel manager for APAC at Hyperscience. I’ve been with the company now for about two and a half years. I’ve been in the IT industry now for over 30 years. Today we’re joined by Theo Popescu. Theo is the senior manager for Solutions Engineering at Hyperscience. He’s got over 20 years of experience in all types of automation technologies.
Andrew Dunkin: We intend to keep this session as informative and to the point as possible. We’re gonna spend the first few minutes just exploring the limitations of legacy automation tools before moving on to what we believe is a more intelligent approach to automation. Theo will then walk us through a short demo of Intelligent Automation in Action. We’ll then spend the last 15 minutes answering questions live.
Andrew Dunkin: We can all agree that change has been pretty constant, particularly in the world of IT and business. There’s been a number of really disruptive technologies over the last few years that have changed the way we go about business and our personal lives. One of the latest disruptive technologies changing our lives is artificial intelligence. It’s in its infancy, but it’s changing the way that we interact with our surroundings, enriching our processes, improving time to value, and reducing some of the meaningful tasks that allow the redeployment of people into more meaningful roles.
Andrew Dunkin: Today we’re gonna talk about how Hyperscience utilizes artificial intelligence and also machine learning to provide human-centric intelligent automation that is unlocking data potential and increasing business efficiencies and the time to value. Historically, RPA has been the enterprise automation tool of choice for automating rule-based tasks quickly. However, as digital transformation initiatives have matured, businesses are shifting their focus from automating singular simple tasks to the automation of entire complex end-to-end processes. Only 40% of purchased RPA licenses have actually ever been used. That means that 60% of these licenses are sitting on the shelf underutilized, not delivering a return on investment at all. Only 60% of automation initiatives have ever met the client expectations. If digital transformation is akin to running a marathon, then RPA and Legacy automation tools can only take you for the first few miles. While they work for simple well-defined tasks, they sometimes lack the intelligence and flexibility required to automate increasingly complex processes. Therefore, you don’t get to the finish line.
Andrew Dunkin: Legacy automation struggles with complex processes. There’s a lot of moving parts, and sometimes the slightest change can cause a bot to break. There’s a reliance on templates; new templates need to be created for a lot of the different document variants. Sometimes legacy automation technologies require additional technologies to translate unstructured information into structured data. Also, poor scalability: when you’ve got reliance on predefined rules and templates, as processes become more complex, it becomes harder to manage. Lastly, the failure to recognize and learn from mistakes. Bots will simply stop, or even worse, carry on unaware of the mistake that they’ve made, potentially passing erroneous data to downstream processes that could have catastrophic effects.
Andrew Dunkin: There’s a macro issue that’s affecting companies deploying RPA centric automation and other legacy automation tools: the proliferation of unstructured data. Unstructured data accounts for as much as 80% of an enterprise’s total data. This includes information acquired from emails, text heavy documents, and PDFs, along with rich media formats like images, video, voice and audio. Legacy automation relies on rigid business rules and a template-based approach that can’t always handle the unpredictability and high variability of most unstructured formats. What this means for companies deploying RPA centric automation is that 80% of their data remains trapped in documents and other content types that their automation solutions simply can’t reach.
Andrew Dunkin: What does this mean for your business? Firstly, increased costs due to the need for extra manual labor. Inefficient processes: manual processes required to process unstructured data can slow down the operation and reduce the overall agility of an organization. Restricted insights: if they can’t access this information, organizations may miss out on insights which limits their ability to innovate and adapt. And lastly, limited scalability. RPA’s ability to automate processes involving unstructured data can have a significant impact on an organization’s ability to operate efficiently.
Andrew Dunkin: Intelligent automation represents a shift where machines imitate human actions and possess cognitive abilities, typically through the use of machine learning. Intelligent automation solutions can comprehend structured and unstructured data, as well as learn from ingested data, making it possible for businesses to automate processes more fully. Intelligent document processing or IDP is a powerful example of intelligent automation in action, helping organizations increase efficiency and accuracy when dealing with huge volumes of data.
Andrew Dunkin: Here’s where to use intelligent automation versus legacy automation: When you wanna classify diverse document types, extract data from complex documents (handwritten, printed, low resolution, distorted), involve less human supervision only when required, extracting insights such as intent, sentiment, keywords, and routing complex accurate data to preferred platform for faster processing. The Hyperscience platform has an intuitive user interface that allows users to easily set up and manage the processing of all document types. Hyperscience accurately classifies all document types, making managing high volumes of document variations easy and fast. We also have a proprietary machine learning technology that extracts data from complex documents with up to 99.5% accuracy. Hyperscience enables users to drive business outcomes, routing extracted data through bespoke workflows to apply things like validation, enrichment of data, case collation, keyword searches, and more advanced natural language processing. Lastly, Hyperscience builds a human in the loop into every stage of the process.
Andrew Dunkin: What does this mean for your business? Increased accuracy and efficiency, handling increasingly complex processes with greater accuracy and reliability. Improved customer experience: provide a more personalized and efficient service and make more informed decisions with more accurate consistent data. Better employee engagement: reduces the need for repetitive tasks, allowing employees to spend more time on meaningful work. And lastly, more end-to-end automation. I’m gonna pass over now to Theo for the demonstration.
Theo Popescu: Thanks Andrew. When we’re looking at legacy automation, there’s the inherent thing that it has its place, especially when talking about OCR. With Hyperscience, we’re looking at transforming all documents. Bring any document you have and let’s see how we can process that. It’s not just about looking at one document or one field, or even one character. You’re looking at the whole submission and you wanna look at that in context. That’s what Hyperscience brings to the table here. We can actually dial in that target of 99% accuracy. We call it human centered automation because a human will have to touch something at some point in time; it’s just how easy do you make it for that human to get involved.
Theo Popescu: Traditional legacy automation is almost “you get what you get.” What you have on day one is what you have on day 101. That’s not the case with Hyperscience. Hyperscience is AI machine learned and will get better and better over time. So there’s that continuous process improvement on any documents. We have a SaaS version as you see here. The ability to run on-prem is also quite highly sought after, and we’re able to run completely on-prem with no internet access as well. It’s all in the one place. There is no additional software that needs to get installed like other software for verification or additional user licensing.
Theo Popescu: We can handle any document, whether it be structured. So if you had a form and you upload a form into Hyperscience, you know the fields and they’re fixed fields. Within 20 to 30 minutes, just by uploading that blank document, that actually allows the machine to classify the document automatically without anyone specifying what that document looks like or where to point to. We’re able to achieve easily an 80% straight through processing on all the fields hitting our 99% accuracy target. Semi-structured is where we know the fields that we want, but we could have many different types of documents like invoices, payslips, bank statements. Once we show the machine what invoices look like, it knows how to classify them, it knows the fields that it needs from them. We don’t have to map out every single layout that you see. Just by processing them through, you’ll get a level of automation and then you’ll only improve on that automation as you go along.
Theo Popescu: Lastly to this, we can do totally unstructured documents. We can do sentiment analysis, look at emails and identify is this a complaint? Look at a document and pick out keywords out of a full page transcription. We have the ability to do named entity recognition. So if you throw a contract at the machine, it will pull out every single instance of a name, address, and company name immediately with a pre-trained model that we have. We say bring all your documents because we should be able to work with everything and anything that you have in a complete submission.
Theo Popescu: I’ll upload a sample file so we can see how the machine operates in real life. I’m uploading this manually, but we support many different inputs: folder sweep, email ingestion, RPA, API, message queue. I’ve got the option of choosing from many different flows. In this instance, I’m putting it through a standard document processing flow: input, classify, identify, transcribe, and then output. First thing you’ll notice, it’s upside down, so we can see how auto rotation works within the system. Straight away, you’ll notice that we’re working with real world handwriting. We’re not looking at printed letters in boxes. It’s whatever a human might be doing filling in a form. Things like outside the boxes, skewing of a really skewed document such as this.
Theo Popescu: Above the machine learning aspect, we have things like what we call as human intent recognition. So if somebody crosses this out like that, what’s the intent behind that? Other systems will try and recognize something. Hyperscience should hopefully understand that that’s not even to be recognized. We’ll see how it handles different date formats, blank pages. When we speak about poor quality and handwriting, if a human can read it, then the machine should be able to read it as well. We’ll see how that works with photos of documents, maybe with half the page ripped off and things across it.
Theo Popescu: We’ll jump into the submission. First and foremost, we have classification. The documents have been classified correctly and the blank page has been left out immediately. What we’re seeing here is that two of these documents have one or more fields that the machine wasn’t confident to hit our accuracy target. It’s gonna ask a human in the loop to work with. Two of these documents are saying, “Hey, I don’t need anyone. I’m confident.” These, including the last document, are a hundred percent done by the machine.
Theo Popescu: If I click on performing tasks, this will take me into what the human in the loop activity looks like. First thing you notice, it’s zoomed in so that I can see what field it’s referring to. Hyperscience is field level accuracy. We’re not talking about character level accuracy. Either you get the whole field right or you don’t get it right. It’s made to be all keyboard driven. So I can just type in what I see and hit enter. That was the only field on that whole document. And you can see how already this has been de-skewed. Only one field that it wasn’t confident to hit our accuracy target. It moves on to the next field on maybe the same document, maybe a different document. To the data keyer, it makes no difference. They just get presented with a field to type what they see or hit escape if it’s illegible. We had two fields on those four documents that were recognized.
Theo Popescu: From a reporting standpoint, we not only have the ability to see what automation rate we had at our 99% accuracy target per document, but it wraps that up into the total submission automation. You can report on weekly, monthly, yearly. You really get that granular view on what you are achieving in actuality through the system. We’ll dive into one of the documents here. First one, we can see Thomas Edison. I’ve transcribed it. For audit purposes, it says that it’s transcribed by me. But everything else on this page has been machined. If you were trying to do OCRing on this, best effort would be problematic. Only with machine learning and AI can you look at it because the machine looks at it as a human does. It has a confidence on what the word looks like. We can see how well it’s doing there with handwriting check boxes, mark true or false. Again, we’re looking at the whole address. It’s confident that these are numbers at the front. These are not ones. It’s getting that absolutely correct.
Theo Popescu: The machine will actually trace down any characters that move outside the box. Hyperscience has a thing called dropout. It’s as if you hold it to the light. The visual page classifier looks at it in the light and it can actually take away and destruct the original PDF, removing all the fixed text that sits behind and only leaving the new information that’s there. We can see stamps blurred. But because we’ve got this dropout, we’re just left with the numbers. And because we look in and around it, it will find the number. Not only does it find it, but we can do things like normalization, removing the dashes in this instance.
Theo Popescu: Signatures can be marked true or false. We’re not doing any signature matching, but this image can be used as a snippet included in a custom supervision task where a human can visually inspect those signatures. Looking at the document that was de-skewed using our visual page classifier, immediately we can see that the crossed out information there isn’t even recognized. Part of this human intent recognition is what Hyperscience has at its core. It essentially says, I know what to recognize and I know what not to recognize. Other systems may try and recognize this to some degree just because there’s some information there.
Theo Popescu: Looking at our really poor quality document, this is the one that really brings home how we do really so well. Apart from just recognizing the handwriting here and getting that whole field forgetting about character by character, but the ‘Y’ dropping down below, it would normally be read as 4 6 1 here. But it understands that this belongs to the ‘Y’ above and don’t even recognize that as a ‘Y’ in that space. One of the best examples is something like 1 1 1 i l l, they’re all the same strokes. How do you identify what that actually is? But because we’re using the address data type, it understands these are numbers, these are more than likely words, and it’s confident enough to give us Illinois, not Illinois, that’s a T there, very accurately.
Theo Popescu: The last one there, which was our Australian taxation form, you’ll see that there’s four pages of information prior to reaching the actual one page of information that the customer will fill in. It’ll recognize page by page. It doesn’t matter whether they scan all the pages, one page, it’s classified and marry it up correctly against that page. One good example that I like to show, I’ve actually mapped two different data types. One’s character by character. In this instance it’s C T O. But when I look at the email address data type I’ve applied, it actually says I’m more confident that this is actually the word crocodile and not character by character. So we can see how there’s an improvement on applying the correct data type.
Theo Popescu: We spoke about our flows. That’s standard IDP flow: inputs, machine classification, identification, transcription, hitting our targeted accuracy of 99%. What Hyperscience has the ability of doing is take that concept of flows and now build out a tailored flow. So if we are looking at an insurance claim, we can classify the documents we want to have and the documents we don’t know about. We can run that through a full page transcription and run things like sentiment analysis. In any case, at this point, we now have 99% accurate data to do something with. And this is where the real power comes into play over and above any type of legacy automation. We can now take that for instance, policy holder, do a database lookup, validate that policy number. Is it a customer? Do they exist? Yes, they do. Great. Let’s pull down the name and address and let’s validate that against the application that they’ve put through.
Theo Popescu: We can look at totally unstructured. We can do something like redaction. I’ve got the Australian tax file number written here. This is one number different, it’s not a valid tax file number. And then another valid tax file number here. We’re able to do something like redaction so we can redact any field. In this instance we’re redacting only the valid tax file numbers that have been recognized on this page.
Andrew Dunkin: We’ll now go over to the live Q&A. The first question that’s being asked here is: do we need to train for every document type that we want to automate?
Theo Popescu: No. And that’s where the beautiful part with Hyperscience bring any document. With our semi-structured, once we show it what payslips look like, it will automatically then know what payslips look like. So it’ll have a confidence score on whether or not it is a payslip and it’ll have a confidence score then how to pull out the fields required from that. And then as we process those through, it’ll only get better over time. So no, you don’t need to templatize every single document.
Andrew Dunkin: The next one here is: I’m currently using Kofax. Isn’t this the same? If not, what are the major differences?
Theo Popescu: I think you’re looking at legacy Kofax, it’s really built around that OCR play. You can already very quickly see it’s not just about the transcription. We fill in all the gaps where legacy OCR takes place and can extract better than anyone around that poor quality handwriting as well as normal documents, but also then fill in that workflow area to be able to do totally unstructured text classification and unstructured text extraction. Those code blocks that we saw there, you’re able to build that as custom code blocks as well.
Andrew Dunkin: This one’s a bit of a “how long’s a piece of string,” I think, but how long does an implementation take?
Theo Popescu: It is “how long’s a piece of string,” but effectively when we go out, the longest part is typically on the client side, all the diligence around security because we’re embedding in their environment. More often than not, it’s really around getting pushed through to that integration. The actual creation of layouts and developing the models is very quick, provided all the data’s there. You can do that in a matter of days or weeks, but we typically allow a couple of months there to go through that end to end.
Andrew Dunkin: What does ChatGPT mean to Hyperscience? Is there any relation or dependencies?
Theo Popescu: I think that’s very topical at the moment. ChatGPT does a great job, but how accurate is it? We’ve seen actually in recent times that it can lie, it doesn’t always give you the correct answer. Hyperscience is SLA based. You can actually derive and that 99% accuracy SLA and target that and know that you’re getting the accuracy and automation behind that. We are investigating even using things like GPT-4 and how to better that path. But the real structure is this overarching view of what Hyperscience has.
Andrew Dunkin: Is there a limitation such as file size, text character, and language in the transcription process?
Theo Popescu: Not really. In the latest version, when we’re looking at totally unstructured, we can go into quite large file sizes. There obviously is a limit on memory and other things. But certainly throwing large files with hundreds of pages are totally fine. Languages, we work in many different languages, including Arabic and Korean. We do have CJK on printed as well, as well as Spanish and Italian, all the Latin languages. I think there’s no limitation on text character either.
Andrew Dunkin: How would we start looking at Hyperscience validating it can do what it says it says it can?
Theo Popescu: Part of our process is to go through and do a technical deep dive on what the use case is, what the business does, what are you trying to integrate with. The next part is let’s run what we call a technical validation event. Let’s run through a proof of concept on that and make sure that we hit those requirements. It’s very easy. Reaching out to a contact within Hyperscience is a very easy conversation to have and start to go down the path of a deep dive.
Andrew Dunkin: If there’s no further questions, we might just call the webinar. Just keep an eye on your inbox over the next 24 hours as we’ll send out a recording of the session. Thank you for your time. Take care.