Intro

Orchestrating Document Automation with Unprecedented Understanding, Speed, and Modularity

AI is transforming how businesses operate—but when it comes to document processing, not all AI is created equal. Large language models (LLMs) and Agentic AI offer massive potential, but many organizations are struggling to harness these tools in a way that truly delivers results.

See how Hyperscience is redefining Intelligent Document Processing (IDP) with a modular, composable approach that drives smarter automation, faster processing, and greater accuracy.

You’ll get a firsthand look at the latest capabilities of the Hyperscience Hypercell, including:

Understanding: VLMs, full-page transcription, and GenAI post-processing with minimal training
Speed: Improved accuracy, streamlined error handling, and scalable task management
Modularity: Flexible deployment and customizable workflows

Transcript

Xabi Ormazabal: Hello everyone. Thank you so much for joining us for our webinar today. We’ll be talking about these key themes around understanding speed and modularity as a key part of our Hyperscience story. My name is Xabi Ormazabal. I lead product marketing here at Hyperscience, and I’m joined by my esteemed colleagues, Priya Chakravarthi, Kaloyan Kapralov, and Rich Mautino.

Xabi Ormazabal: Today’s agenda: we’ll be covering an introduction to Hyperscience for those of you who may be familiar with us and some of you who may be new to Hyperscience. Then we’ll do a little bit of a fireside chat format where I talk through these themes of understanding speed and modularity with Priya and Kaloyan. Then we’ll have a live demo from Rich, and we’ll have a little bit of time at the end for some Q&A.

Xabi Ormazabal: Hyperscience was founded in 2014. We have about 10 years of proud machine learning innovation under our belt. We really excel in highly regulated environments. Think about government, financial services, insurance, health and life sciences, and transportation and logistics companies. We focus on the orchestration pipeline for document processing, and that’s the value we bring to these organizations. Another key part of what we do is the diversity of infrastructure that we service. One of the key things that we’ve had in a recent innovation is our FedRAMP high authorization for our US federal governments and a lot of other capabilities for on-prem, for private cloud, for SaaS, et cetera.

Xabi Ormazabal: We know that large language models and GenAI are really dominating the conversation around AI right now. But we also know that many organizations today are not fully ready and able to apply these technologies. While over half claim that they have a lot of confidence in their AI strategies, when it gets into the nuts and bolts around data quality, accessibility, and other elements around AI readiness, companies actually tend to struggle getting to that step. What we also know is that a lot of these monolithic AI and large language models can often miss on accuracy. While we would love to just throw our documents at an LLM and get an output, we often know that accuracy suffers and then that causes a lot of downstream excessive complexity and extra work. Smaller trained, built-for-purpose models can reduce your inference costs and also improve your response latency. As our CTO, Brian Weiss likes to say, “why would you take a helicopter to cross the road?” Often it’s really about predicting the right tool for the job.

Xabi Ormazabal: In our approach to AI, we have two main thoughts. One is that we have agentic experiences in that context of intelligent document processing where our system can autonomously make API calls, break down tasks into smaller subtasks, execute them, and report back. But more importantly, it can be goal-oriented. You can give it goals in terms of accuracy, automation, cost compliance, and it’s able to resolve and produce the outcomes that your organization requires. The other piece to this is generating what we call RAG-ready data sets from your document workflows. That’s a capability we’ll touch on a little bit with our full page transcription, which is really powering this. We also call this Hypercell for Gen AI.

Xabi Ormazabal: When we talk about Hyperscience and our Hypercell platform, it has a number of different elements to it and we really talk about how it’s modular and composable. At the heart of what Hypercell is all about is our core models. These are some pre-trained models for pre-processing, for de-skewing, for extracting, and also post-processing documents in order to power your workflow automations and your orchestration workflows. But then what we have additionally is a group of additional capabilities around this. Our human in the loop or HITL capabilities allow you to fine-tune models, to do quality assurance and do other tasks. Our blocks and flows architecture allows you to design those document workflows in a very modular way and invoke the different elements of the platform that you require. This is all bound and supported by reporting and admin to take the platform even further and to really support the robust outcomes that your business requires.

Xabi Ormazabal: Building upon this, when we talk about the Hyperscience Hypercell, we’re talking ultimately about the Kubernetes of IDP. For the non-technical audience, this is really emphasizing that modular and composable nature of our architecture. It really is an orchestration layer for this AI-driven document processing. It’s a flexible, intelligent automation platform that can scale very effectively. We have those modular components, blocks and flows, that can build this intelligent automation fabric. It is a fully agentic approach that can really drive outcomes to the parameters that are important to you. This results in a lot of key benefits for our customers around flexibility, efficiency, precision, and other key outcomes. We’ll talk about three pillars: understanding, speed, and modularity. In the box of understanding, we talk about how to leverage more powerful models with less risk, how to leverage your knowledge worker post-processing with Gen AI with an exciting new feature called Document Chat. Also, we’ll talk about how you can optimize your cost, you can reduce complexity with a number of different elements in our orchestration as well as managing variability in document processes. In terms of speed, we talk about applying accuracy where you need it at the field level, and also how we manage error handling and setup of operational tasks. Finally, modularity is key around the ways we can allow you to deploy more flexibly and different infrastructure options. So with that, we’re gonna kick into the fireside chat. Priya, tell us a little bit about yourself and your role at Hyperscience.

Priya Chakravarthi: I’ve been at Hyperscience for about three and a half years, and I’m the director of product for applications with IDP being our primary application. In this role, I’m also responsible for some 30 plus ML models that underpin the application, their lifecycle and all of that fun stuff.

Xabi Ormazabal: I know that in the understanding pillar, the vision language models are a huge part. Can you tell a little bit about what ORCA is and what are you most excited about this new capability?

Priya Chakravarthi: ORCA essentially stands for Optical Reasoning and Cognition Agent. It’s Hyperscience’s proprietary, no-shot VLM (Vision Language Model) that works for documents and requires no training. ORCA is Hyperscience’s response to our customers stating that they lack documents to train models and solves for the Cold Start problem. ORCA works well on semi-structured and visually complex documents, including those that contain images and text. You don’t have to, but if you choose to, you can fine-tune ORCA with as few as 15 to 40 documents. ORCA is the only VLM that includes built-in human in the loop QA and accuracy reporting and all of those known and loved guardrails that’s available to our other models and all integrated into our business user interface. So you don’t have to stitch it together yourself. ORCA being a VLM does need a GPU to run inferences.

Xabi Ormazabal: I know we have a pretty unique use case here. Can you explain what we’re looking at?

Priya Chakravarthi: I chose this example because it’s a common document type that we see in applications like claims processing. However, customers rarely have the documents required to train death certificates that accounts for their variability. Look at this image on the screen. The label “cause of death” is vertical, whereas the causes are crawled across the document in different sections. We ran this document through ORCA and found that it can correlate between the vertical text and the different sections without any problems. It can also collate all the causes of death without needing any custom code development.

Xabi Ormazabal: I know that besides these advancements that we’ve had in our vision language models, we also have our standard model. Before I do that, I’d like to introduce Kaloyan Kapralov. Kaloyan, tell us a little bit about yourself and your role at Hyperscience.

Kaloyan Kapralov: I’m Kaloyan Kapralov. I have been with Hyperscience for a little over three and a half years now. I’m a senior product manager in the platform pillar where I’m focusing on SaaS, all other deployment options besides SaaS, and orchestration.

Xabi Ormazabal: When we were talking in the understanding space about the VLM capabilities, tell me about what’s happening with OCR as well.

Kaloyan Kapralov: OCR rolls off the tongue almost like OCR, but it is different. It is our optical intelligent character recognition model. It’s a proprietary ML-based OCR technology that we use for entity extraction, but it not only does entity extraction, it also supports a pixel grab of segments from a page that are unstructured, and we call this full page transcription or FPT for short. Now you can use FPT with quality assurance and built-in accuracy reporting. Our design partner for this feature is the Veterans Affairs Administration who are using full page transcription for sending tests down from their veterans claims downstream for analytics purposes at huge volumes. However, the biggest use case for this in the wild is that it allows for Gen AI models to be seeded with the unstructured data of your enterprise at the accuracy that you expect. Imagine extracting data from documents and storing them in a vector database to seed your LLM. This is what this is about.

Xabi Ormazabal: Kaloyan, tell us a little bit about what we’re looking at here in this example with FPT QA.

Kaloyan Kapralov: This is exactly full page transcription QA in action. Usually when using entity extraction with OCR, the keyer has to type in the text during the QA process. However, typing whole paragraphs is slow and cumbersome. So instead we’re introducing a “thumbs up, thumbs down” or rather “correct, incorrect” paradigm that the keyer can select using keystrokes, which makes them much faster so that they can get through a lot more text in the same timeframe. We’re also introducing a new FPT QA report in the reporting section that shows you exactly the accuracy of the extracted data. So you can now be sure about the data that you’re using to seed your LLM. You get good data in, you’re able to get good data out.

Xabi Ormazabal: Besides the really easy thumbs up, thumbs down paradigm with the FPT QA, there’s more happening with our human in the loop capabilities. Is that right?

Kaloyan Kapralov: Yes. We are extending our human in the loop capabilities. We call human in the loop a catchall phrase. However, the platform has several human in the loop interfaces for different personas. On the left of the slide here, you’re seeing the keyer human in the loop also called supervision. That interface is used for supervising the model output and typing in correct data to guide the model when the model isn’t sure. In the center we have something that process owners can use that’s called custom supervision, which is really powerful for making decisions. On the right-hand side panel users can make different decisions and we can expose information that has been gathered with ML models or with business rules to inform those decisions. This brings us to the human in the loop functionality that we’ve recently launched, which is for knowledge workers or subject matter experts who can now interact with a workflow and can now use a custom human in the loop interface that integrates a large language model to chat with their document.

Xabi Ormazabal: This kind of leads us to our Document Chat. Can you tell me a little bit more about this?

Kaloyan Kapralov: This is a feature we are calling Document Chat in which knowledge workers can ask validating questions in plain text to the LLM and then get a response back along with citations of where the answers actually came from. It’s very business user friendly, it’s also fully integrated within the supervision experience, so it’s not a standalone chatbot. It allows for searching across all of the documents that are part of the submission. For instance, if you have a patient file with multiple hospital visits, you can ask a question for a summary of the number of hospital visits because it is powered by a large language model or a visual language model. This functionality allows for post extraction solutions like summarization, sentiment analysis, generative search. It is very powerful in cases where there is a need for some interpretation in order to make a decision on a particular document because ultimately it allows customers to consolidate their downstream tools and to do more of their decision making within Hyperscience.

Xabi Ormazabal: Priya, there are also a few more capabilities that we’re gonna talk about within the understanding pillar. So ORCA and OCR are great ML models we have in Hyperscience. How do these two models and others come together?

Priya Chakravarthi: Hyperscience provides a composable model strategy or architecture that supports both proprietary models like ORCA and third party LLMs like Claude, ChatGPT, Gemini, name your model. Within the product itself, our platform is open in that it supports multimodal orchestration, providing custom supervision capabilities for third party models and accuracy harnessing, fine tuning, QA, and reporting for our proprietary models no matter what the model is.

Xabi Ormazabal: It’s really exciting to see this flexibility and modularity leveraging LLM and proprietary models, but we also have new capabilities for enhancing extraction and classification as well. Is that right?

Priya Chakravarthi: That’s right. We continue to support more and more complex use cases that solve for the long tail of extraction and classification problems with our proprietary deep learning models. For instance, we recently launched auto splitting. Auto splitting solves for the fact that when customers scan a box of documents into Hyperscience from their back offices, they will often scan in back-to-back bank statements or identity documents in the same file. Auto splitting essentially allows our models to know when a document begins and when it ends without the customers having to do this manually. Similarly, another feature we launched was an update to our drift management capabilities, which allows for the identification and creation of variations on the fly. Customers now don’t need to have all the variations of a form before they begin their automation on Hyperscience.

Xabi Ormazabal: Now we’re gonna pivot into the second pillar, which is around speed. And I know we’re gonna talk a little about field level accuracy targets. Priya, can you tell me a little bit about this capability and how it provides faster deployment and document processing?

Priya Chakravarthi: The point here is that all fields are not created equal. You as a business owner know what fields need to be error proof extraction wise, and what fields are not that important. For instance, a form that arrives with a blank SSN number may be disruptive downstream, whereas an email address may not be that important. Bumping up the accuracy targets for SSN means that over time more and more of these fields will show up for supervision, raising its accuracy. Now we have field level accuracy targets across the spectrum of documents that we support, even for semi-structured and unstructured.

Xabi Ormazabal: There’s some new updates on automatic error handling in other areas. Can you tell us through about that?

Priya Chakravarthi: This just goes to show that we’re not above sweating the small stuff, especially if it’ll give our customers operational efficiency that directly translates to speed. Previously, customers had to manually strip out signatures and images that come attached to emails before processing them on Hyperscience. Now they can do this automatically by using a setting on Hyperscience. We also allow for a default way to send normalization errors and missing and blank fields to supervision. Operational efficiency and avoiding downstream errors are the themes that we are focusing on that contribute to the concept of speed.

Xabi Ormazabal: So far we’ve talked about understanding speed, and now we’re onto our last but not least pillar around modularity. Kaloyan, would you like to tell us a little bit about what’s exciting in this new space?

Kaloyan Kapralov: Our modular infrastructure gives you more choice for where your Hypercell is deployed. On this slide, you’re seeing a reference architecture that’s built around the Google stack, but you could also imagine it for the other hyperscalers. Starting from top to bottom, you have the infrastructure layer. We have Google-based SaaS, of course you can always deploy in your private cloud, and later in this year we have Google Distributed Cloud Connected (GDCC), which is Google’s on-premise offering where a physical appliance sits at the customer premises.

Kaloyan Kapralov: Next up you have integrations. Because we know that data lives across different environments, our platform integrates directly with the most commonly used cloud storage services. We have Google Cloud storage, but we also have an Azure blob storage listener coming out imminently. And we already have an Amazon S3 connector that’s functional, which means that you can seamlessly connect to where your documents already reside. We streamline ingestion, minimize data movement, simplify the setup of Hyperscience, and ultimately shorten the time to value. The dedicated native blocks make it easy to connect Hyperscience to the services of your hyperscaler of choice. So whether you want to use Gemini or ChatGPT, you can do that easily.

Kaloyan Kapralov: Finally, on the top, you see the marketplace layer, with which we make it easier for you to get started with Hyperscience through the cloud providers marketplaces. Whether you’re using AWS or GCP, the Hypercell can be purchased directly from your cloud provider of choice, which allows you to take advantage of pre-committed cloud spend, which is a faster, easier way to purchase without additional procurement friction.

Xabi Ormazabal: I definitely see the future is multi-cloud and we’re definitely getting a lot of these requests from our customers that are helping us continually evolve on the modularity and infrastructure perspective. Now we’re gonna pivot to our demo. Before I do that, Rich, would you like to introduce yourself?

Rich Mautino: My name’s Rich. I lead the sales engineering team for North America. I’m based in Austin, Texas. What I’m about to show you reflects several years worth of enhancements and a lot of those have come from customer input. My team’s goal is to listen to our customers, understand the problems they’re trying to solve and deliver real world impact. We’re gonna show auto splitting capabilities where something big can come in all lumped together and we identify and split it out automatically. Field level accuracy targets: there may be specific fields within that document that are more important than others, so being able to set those targets accordingly. The chat with my document capabilities is great, but what’s an answer without knowing where that answer came from? So I’ll show you the citations. Full page transcription QA is how we ensure accuracy is retained over a large scale, and then we’ll also be showing VLM capabilities as well.

Rich Mautino: The scenario here is going to be a mortgage application. In this scenario, I’m not going to go necessarily through the whole journey, but I’m gonna take stops along the way that show these capabilities in action. We’ve got a hypothetical applicant who works for Samsung and they just recently transferred from an overseas office. The last time they had this experience, it was a nightmare. There were missing documents, things were all lumped together that were determined to be missing, and none of it was automatic. This time, however, the applicant is using a bank that’s using Hyperscience and the capabilities I’ll show you in this demo will show you how the high levels of accuracy and automation weave together to ultimately reduce the risk to all parties and create a better experience.

Rich Mautino: We’re gonna start with a normal email here. Some people send in an application all organized with everything labeled and nice and neat. Other times people scan it all at once and send it in. The applicant’s got their application itself, the quality’s not so great. Some of these pages are out of order, some of them are actually flipped upside down. We’ve got multiples of certain documents, whether it’s a pay stub or a bank statement. There’s not an easy way historically of knowing where to split things out automatically. What I’ll show you with auto splitting is how, in this case here you can see this is a bank statement that is one page, and then there’s another bank statement here that’s multiple pages. Knowing that this is in fact multiple bank statements and not all one. We’ve also got our W2s here, the whole packet comes in along with a very lengthy home inspection report.

Rich Mautino: What happens when this applicant sends it in is that Hyperscience receives it as a submission. You can see here how we automatically ingest it and split it all out here. We’ve got a 2023 W2, a 2024 W2, we’ve got the application itself, two bank statements, one of which is two page, the other is one page, and the 33 page home inspection. The auto splitting right outta the gates has fixed a problem that when we talk to a lot of customers, they spend quite a bit of time on it. A ton of work is done just simply identifying what stuff is and splitting it out accordingly.

Rich Mautino: Getting into the field level accuracy targets. That application’s been received and as we know, there’s a lot of information on this application, all of it’s important, but some of it’s more important than others. So the ability to come in and set field level accuracy targets is crucial. In this case, if you’re a bank, you may say the ending balance is really, really important. If we get this wrong, it messes everything up because if the ending balance for the bank account isn’t enough to cover the down payment, we come to a screeching halt. So I can take the bank statement and say, I wanna add this particular field, only the ending balance, and I want to set a 99.9% accuracy target. That way when the submission comes through, if there is an issue around it, we can be brought straight to that balance here. If there’s any doubt or certainty around it, we can have the ability to go ahead and provide that so that the mistake doesn’t happen downstream.

Rich Mautino: Moving into the custom supervision experience, what’s new here is not only the chat with my documents capability, but also the citations here. Rather than having to historically do a control F all the way through these documents, let’s say we wanted to know is there any rotting present in this house and there’s 33 pages here. Historically, an analyst has to go and do a control F of a bunch of different keywords. In this case here, you can see I asked Hyperscience, “Hey, is there any rotting in the house?” And I got a good answer, but furthermore it shows me where it found its answers so that I don’t have to worry about hallucinations or bad data entering downstream. It found four instances of the wood rot, and you can see I’ve got the ability to show where those citations are and it brings me straight there.

Rich Mautino: The capability of being able to look for an answer, “are there any plumbing issues?,” normally we would have to go through and come to the table and go, okay, we’ve got plumbing here, it’s in page 15, and scroll through. In this case here, I can actually get very granular and accurate results; it brings me straight to the answer and shows me where it finds issues, including the ability to show it with photos. Very powerful capability.

Rich Mautino: How do we know that the accuracy is maintained over a long period of time? With the presence of full page transcription, we now have that task available here. After the submission’s been fully done, I now can go through and make sure that what we’re ending up with downstream is correct. It provides me with guidance here, but this allows me to very quickly go through and just validate the correct nature. It’s got the ability to make sure that things like those images aren’t being mistaken for something that they’re not and providing better QA for subsequent submissions.

Rich Mautino: Moving into my favorite part, which is the VLM here. In that home inspection report there were some images. There were also instances where handwriting is present in the margin. Being able to take another pass on this and run this through a VLM is incredibly powerful because you can find things that you wouldn’t be able to get to without taking this approach. You can see here on the right I’ve asked it “are there any handwritten notes?” And in this case it says yes, there’s a handwritten note that says “there’s a small crack, but it’s actually very important due to the location.” This is something that normally would be missed and just not present. Maybe the inspector did the initial report, then consulted with an expert, later found this answer and wrote it in. We can also ask it to take a look at some of these photos. Maybe the inspector put a photo of something that they thought was self-explanatory but didn’t actually write about it in the report here. Asking, “Hey, is there any rust or corrosion here?” And it not only finding the rust of the corrosion in here, but then pulling in the context as far as where that is. If you’re a bank in this case and you maybe otherwise would’ve approved this, but now you see there’s a big problem with the foundation, we need to do a supplemental review here.

Rich Mautino: Moving into another use case here, completely different scenario now would be just assessing purely a photo with no text whatsoever. Let’s say there was an accident on a ski resort and someone got injured while they were skiing, and there’s a photo from one of the cameras that’s as part of the report, but there’s no actual context. The insurance company in this scenario would be able to just take this photo here and use a VLM to determine key facts and findings without needing to have a human come through and do this. Is a subject using skis or a snowboard? Alright, they’re on skis. What are the conditions? Is it storm, is it clear sky? Is it nighttime? And are there any trees in the vicinity of the ski trail? You see, this is just a photo, no text written in the report, and we’re able to get very accurate, highly automated findings from this.

Rich Mautino: And another use case here would be maybe a mail room sorting scenario here. Rather than having humans coming in and sorting all of this, we can simply upload a letter in this case and we can ask it, “Hey, do we have any indication on the letter on what’s attached?” In this case, okay, it’s a statement. Who’s the sender? It’s American Express. And who’s the recipient? So this is just one letter. Imagine doing this at scale. You can see very quickly the so what factor is the ability to solve this routine monotonous problem accurately and highly automatic.

Xabi Ormazabal: That’s great, Rich. Thanks so much for bringing this to life. Lots of great questions came flowing in. One question that came up a couple of times, Priya, was: we mentioned two of our models, ORCA and OCR. Can you explain when you might use one versus the other and for which types of documents or use cases?

Priya Chakravarthi: You would essentially use ORCA for use cases where you have semi-structured or unstructured data, possibly with multimodal content because ORCA is multimodal with images and text, where you have variability of formats, and most importantly where procuring samples is difficult for you. For our homegrown model like field ID, table ID, et cetera, they’re great when you have documents and the ability to train and optimize these models from scratch. Often you do this because you have really high accuracy requirements and because they’re CPU based, they’re also very cost effective. All of these homegrown ML models actually process for really high volumes.

Xabi Ormazabal: Another interesting question around our models for ORCA specifically: how does ORCA handle multiple language sets, for example, Arabic, and is it possible to take Arabic as an input and output both Hebrew and the English or other language translation?

Priya Chakravarthi: ORCA does work on multiple language. It has multilingual support. Yes, it works with Arabic as well. Arabic is one of the languages supported. That said, you may wanna speak with your representative to make sure that we’ve tested all of the languages that you wanna work with, taking one language as input and outputting another language. Given how flexible ORCA is, I think this is something that we would need to test.

Xabi Ormazabal: One question for you, Kaloyan. I assume these new features are SaaS only. Can you explain if they’re SaaS or if they’re available in other deployment modalities?

Kaloyan Kapralov: This is the beauty of Hyperscience, that the deployment method doesn’t require you to make sacrifices of what you get in terms of functionality. Now, if we’re talking about scaling infrastructure up and down, obviously this is possible in the elastic public or private cloud or on SaaS. But in terms of the functionality that we have presented, this is all available on every deployment option.

Xabi Ormazabal: Which new capabilities are available now versus on the roadmap for the future out of everything we’ve seen today?

Kaloyan Kapralov: Everything is available now, but Google Distributed Cloud Connected (GDCC) is coming later this year. And we have the forthcoming Azure Blob storage and Google Cloud storage listeners that are coming on SaaS first as part of our SaaS only release, and then they’re coming to on-premise in the fall of this year.

Xabi Ormazabal: Rich, one thing, two questions for you. The first one is around based off of your demo, how do you manage data privacy issues with respect to LLMs learning from our documents and utilizing it for other customers?

Rich Mautino: We do not share or divulge your data elsewhere. You can bring your own LLMs. So that really is fully customizable. We have government entities that use LLMs that are their own, so they control it entirely. In short, we don’t use your data for sharing if you don’t want us to essentially. We’ve got a lot of customers that are very sensitive in nature, so privacy is very important to us and we are FedRAMP high certified.

Xabi Ormazabal: And then another follow up for you for home inspection report as part of the demo, is the tool also analyzing images to report on the rotting or rust?

Rich Mautino: Yes, with the VLM, you’ve got the ability to actually look at the photos and see what’s going on instead of just the text.

Xabi Ormazabal: There’s a couple of questions that came up, Priya, on what I thought was not the most flashy, but it is the workhorse among the features, which is auto splitting. What if auto splitter rules are not correct? Can the user change the results?

Priya Chakravarthi: Yeah, you could review the output of the results in manual classification and if you wanna split it a different way, you could manually do that.

Xabi Ormazabal: Is auto splitting a retrainable capability? Can we customize based on our document types?

Priya Chakravarthi: No, auto splitting is not retrainable. It is something that runs after classification and you set the rules in regular expressions as part of auto splitting, so it isn’t a trainable model.

Xabi Ormazabal: One more question on the topic of field level accuracy targets. When using field level accuracy targets under the speed pillar, how do you improve model performance if the accuracy level isn’t met for a specific field?

Priya Chakravarthi: If you’re talking about the field locator model, the process of optimizing is similar to how we optimize accuracy when we do this at the global level and not at the field level. You would look at the output of the model and if it’s incorrect, look at the annotations and the training data provided to the model. If there are any discrepancies in the annotation, you’d fix the annotation, retrain the model and redeploy the model. We provide simple business user tools to perform all of these activities through our model lifecycle management interfaces.

Xabi Ormazabal: This is an interesting one on ORCA and vision language models in general. You mentioned that there’s no training required. However, that model also offers the human in the loop. When would you use human in the loop?

Priya Chakravarthi: Human in the loop is generally used by the model. It raises its hand when the model is not confident about the accuracy of the output. And is an option that you select. You can also use ORCA without any training as a black box. You would use human in the loop when accuracy is important to you and you’re willing to trade accuracy for a little bit of automation so that your downstream systems get accurate data.

Xabi Ormazabal: That’s all the time we have for questions. If you’re ready to take your next step in your automation journey in order to see our recent innovations from Hyperscience, a really nice summary of all these key features is on the QR code on the left. And also on the right, if you wanna talk to one of our experts to Rich and his team or other folks and dive deeper into your automation strategy or follow up on any of those tough questions, scan to book a time with one of our experts using the QR code on the right. Thank you so much for your time.

topic

AI / Machine Learning Intelligent Document Processing (IDP) Platform Features / Releases

Jump To Section

Orchestrating Document Automation with Unprecedented Understanding, Speed, and Modularity

topic

Related Resources

Public Sector AI in 2026: Trust, Transparency, and Technology That Delivers

Enterprise AI in 2026: From Experimentation to Integrated Intelligence

Gartner® Select the Right IDP Implementation Approach: Solution-Based or Cloud-Service Based

The New Blueprint for Business Efficiency: IDC MarketScape Confirms the IDP Revolution is Happening Now

The Next Leap in Enterprise AI: Hyperscience Winter 2025 Release

Hypercell for SNAP Explainer Video