Build vs. Buy, or Better Together?
How to assess the true cost of designing and deploying an AI-first IDP strategy
The market is currently flooded with confusion. Hyperscalers are offering ‘easy button’ LLM APIs at seemingly unbeatable prices—often as low as $0.10 per page. But for enterprise leaders, the question remains: Are you buying a solution, or just a box of parts?
Watch Xabi Ormazabal, Chip VonBurg, and Tarakshaya Bhatia as they deconstruct the definitive Build vs. Buy Whitepaper. We move beyond the ‘price per page’ fallacy and look at the hard data. Using detailed internal calculations, technical architecture insights, customer experience, and side-by-side architectural comparisons, we expose the hidden ‘Dark Green’ costs of the DIY approach—from infrastructure engineering to the maintenance of complex extraction pipelines.
Watch this session to learn how we:
- Cut through the FUD: Understand why the market is confusing ‘Generative AI’ with ‘Enterprise Automation’
- Compare Apples to Oranges: See a line-item breakdown of why a $0.10 API scales to a $2.2M TCO when you add the necessary labor and infrastructure
- Reveal the ‘Dark Green’ Costs: A transparent look at the hard costs of implementation—hiring engineers, managing model drift, and building UI—that hyperscalers leave off the invoice
- The 272% ROI Path: How to achieve a payback period of under 6 months by leveraging an agentic platform approach
Don’t let hidden costs sink your AI strategy. Learn how to calculate the real price of performance and build the best strategy for your organization.
Xabi Ormazabal: Hello everyone. Thanks so much for joining us. We’re here for our Build versus Buy or Better Together webinar. I lead product marketing here at Hyperscience, and I’m joined by Chip VonBurg, our field CTO, who’s working with a lot of customers and prospects across all areas of our business, and Tarakshaya Bhatia, our Manager of Solutions Architecture, who is focusing on public sector as well as commercial accounts.
Xabi Ormazabal: Today we’ll talk about the current AI and opportunity for intelligent document processing. We’ll lay the groundworks of what are the key takeaways from this total cost of ownership study that we’ve done, where we look at the different deployment options you have and the trade-offs and what that can imply. Then we’ll dig a little deeper into what are some of the key assumptions and we’ll dig into what we call dark green costs and light green benefits. Then we’ll close it out. We wouldn’t wanna just be talking about the build versus buy decision, but the reality that many organizations today have sort of hybrid environments where they’re doing some things with hyperscalers already. We partner very closely with hyperscalers, and we actually have a “better together” solution. Finally, we’ll close with next steps.
Xabi Ormazabal: In terms of the market context right now for AI solutions and intelligent document processing, LLMs and Gen AI have sucked all the air out of the room. LLMs have lowered the barrier to entry in terms of trialing new capabilities with AI, getting lots of responses to different queries, being able to create lots of different solutions to things that we’re working on, create code, a lot of different things that an LLM can provide today. That’s also expanded the expectations of what AI can provide. We’re against a constantly evolving threshold of what’s capable in the new AI world.
Xabi Ormazabal: Market growth in IDP specifically is really creating a lot of urgency for organizations to buy this technology. Gartner estimates that this year the intelligent document processing market will reach 2.09 billion in revenues. If you look back the last five years, that’s a compound annual growth rate of about 13% since 2021. It’s showing interest and investment from many different types of organizations as the category continues to evolve. Originally many legacy IDP solutions were OCR-based that were on-premises or other types of technology. Now with cloud and the adoption of software as a service, IDP is really pushing towards those cloud and integrated SaaS experiences as well. That’s driving a lot of the decisions around how people think about deploying the solution.
Xabi Ormazabal: While Gen AI and large language models are providing a lot of opportunity, there are some higher costs that are cropping up as people start to leverage these large scale models such as ChatGPT or Claude or Gemini in terms of token consumption, being able to predict or forecast the usage, et cetera. There’s also costs around governance, security, and composability that you need specific to intelligent document processing, where you have to stitch together a lot of different elements along these Gen AI services to produce a robust enterprise grade solution. The needs that are surging between customer and business owners in these specific use cases for IDP are actually demanding a lot of enterprise alignment. It’s a brave new world in terms of determining how to make it all work together to leverage these new capabilities.
Xabi Ormazabal: Intelligent document processing is a multi-step journey. This left to right progression can reflect the industry as a whole, but it can also be the experience within different organizations or even different departments within organizations. Many solutions started out with a baseline of everything needs to be straight-through processing. We want minimal human involvement and field-level accuracy is the top metric that’s most important. What organizations often found is there would still be a lower accuracy level, and it would require manual rework or business process outsourcing. It was a little bit of a false promise.
Xabi Ormazabal: As you move a little bit over to the right, things like classification, identification, and transcription became more important, going to a more detailed understanding of accuracy. It’s not just right or wrong, but there’s a lot of shades of gray when you’re processing many different types of documents and fields. Then getting into things like document intelligence, we talk about a concept here: DVS, data validation score. Being able to pull fields off of a document and validate that they were extracted correctly and that they make sense amongst themselves. For example, a pay stub where you’re trying to pull gross pay and net pay, looking at a year to date and looking at a current pay period. Those kinds of calculations become more and more important. Then also things like change. How do you manage model drift? How do you do training data management? Ultimately, continuous improvement. We know that at the high scale volumes that many organizations operate at, often using human in the loop in an intelligent way will get you to that highest level of accuracy. Every point of accuracy can actually be hundreds of thousands or millions of dollars for some of these large scale organizations.
Xabi Ormazabal: When we’re talking about build versus buy, here’s a chart from Gartner in one of their impact briefs around selecting the right implementation approach. You contrast in the middle an end-to-end integrated platform approach, which is what we’re talking about with Hyperscience, versus a composable cloud service-based approach where you build the services yourself from a hyperscaler, you maintain them. It lays out many of the trade-offs you have to consider: the way your workflows are constructed, the models that you use, whether they’re proprietary, whether they’re generalists, whether they can be tuned and refined or not, how does the exception handling work? How does the data flow work, the deployment, the governance and security, the generative AI capabilities, the pricing, et cetera. These are big decisions and there’s a lot of things involved in understanding better.
Xabi Ormazabal: What approach are you using today for your intelligent document processing solution? Tarakshaya, what is your experience? What have you seen in the market in terms of how frequently you come across AWS or Google Cloud or Azure?
Tarakshaya Bhatia: I think it’s ubiquitous. There are very few customers I’ve worked with who don’t work with at least one of the hyperscaler partners. It’s pervasive throughout the enterprise today. And that’s a big part of what we wanna talk about today is realizing that this story is ultimately better together. It’s not one or the other. The build versus buy decision is definitely one, but you also think about the other side of it as well.
Xabi Ormazabal: In terms of the early responses we’re having, it looks pretty down the middle in terms of 50% with AWS, 50% with Microsoft Azure from those who are using a hyperscaler. I will say it’s interesting that we didn’t see any Google Cloud because I know I’m running into quite a few customers that are using Google as their hyperscaler. So that is an interesting mix.
Xabi Ormazabal: When we talk about this return on investment, this total cost of ownership analysis that we built in this build versus buy white paper. We analyzed existing Hyperscience deployments. We looked at the average deployment time, the cost and complexities for organizations with at least 1 million pages per year. We also detailed analysis on the reference architectures from different hyperscalers such as AWS, Google Cloud Platform, Azure. We also looked at their public API costs, and we estimated by studying with our solutions architecture team, how many API calls you would have to do for about a million pages a day. We did our “house blend” where we sort of looked at those three hyperscalers and came out with a standard costing across all three based on the reference architectures to have a blended average cost based on a medium level complexity use case. We estimated these costs over a five year timeline from initial start and deployment to then go live and maintaining that solution for five years.
Xabi Ormazabal: We’re comparing a solution built on AWS or Azure or Google Cloud versus Hypercell, which is our platform for implementing intelligent document processing. We looked at the technical labor that you might need to stand up a hyperscaler solution: infrastructure engineers, machine learning models engineers, security engineers and project managers. You’ll see from the initial setup it could be four to six months to a year depending on how you’re implementing it. Then you have the ongoing technical labor to simply maintain that. The third column is the technical infrastructure: what are the actual API service calls for those different elements, the infrastructure monitoring that you need to set up, and the way you need to define your deployment pipelines.
Xabi Ormazabal: Contrast that with working with Hyperscience and using an IDP platform out of the box. You don’t need to staff up with initial technical labor just to get started. We have our expert data services, which is around creating those specialized models to process the different types of documents. You can be leveraging our vision language model to identify, classify and extract from documents without even having to define specific models per document types. Expert flow services is really around defining those workflows. We have a technical staff that can really help define that for you. During the life cycle of the project, we have ongoing maintenance with regular project health checks. And then of course we do have recurring platform costs by page volume. If you look at that, this is where that big cost benefit difference happens. It’s about $1.6 million of net present value that’s cheaper to use Hyperscience over that life cycle of five years with 1 million pages plus.
Chip VonBurg: I think that’s a good breakdown. This is the reality of what it actually takes. Anybody that’s been part of a large project that is a business critical project understands that it does take time, effort, and individuals to make that project a success. And this is a good snapshot of what that is. I will plug the white paper one more time and just say that there’s much more detail in the white paper that gets into what this looks like, but this is a good takeaway of that high level view.
Xabi Ormazabal: How is your current project implemented and maintained? Is it in-house? Is it working with the systems integrator or a mix of in-house and SIs? Tarakshaya, you’re really close to the deployment side of things. What’s been your experience in terms of these different modalities?
Tarakshaya Bhatia: I think it’s about 50/50 overall. In-house means both from a technical standpoint as well as from a business standpoint. When these two things are working closely in concert together, that is where you see the most success in terms of some of the largest IDP programs that I have seen.
Xabi Ormazabal: Interestingly, it’s actually a fairly small percentage doing just in-house and just a small percentage of just working with a systems integrator, about 14% for each of those options. Where it seems like most people are headed actually is a mix of in-house expertise and systems integrator. About 71% have that mix.
Xabi Ormazabal: Let’s talk a little bit about the light green benefits. These are where we look at what are those a little bit harder to quantify elements that we know come with having an effective project that’s delivering at the accuracy and performance level that you require. Some of these include cost savings due to reduced model training with ORCA, with our vision language model. Chip, do you wanna explain what ORCA is and what zero shot training means?
Chip VonBurg: ORCA is our VLM model. VLM is a vision language model. It’s basically like taking a language model and putting a set of eyeballs on the front of it. It’s a largely pre-trained model. It is a large language model under the hood, and that gives you a variety of different functionality, zero shot being one of them. The big idea with Zero Shot is to be able to take a document that the system has never seen before and to be able to query the model and say, “what is this document about?” or “what is the value of Field X, Y, or Z?” and to see those values come back right away. One of the beauties of Hyperscience is we have a variety of different models. We have those specialized deterministic models. We also have the ORCA models, and they have different pros and cons associated with them. A big beauty of the platform in general is we have the ability to mix and match those models based on your needs.
Xabi Ormazabal: While the cost savings around zero shot training are important, another element is around reducing the cost of manual processing with human-in-the-loop and with accuracy harnessing. We know that model accuracy performance improves during iteration and fine tuning as you take some of these models that you build that Hyperscience can evolve very quickly for the different documents that you’re working with. We know that we’ve seen higher levels of accuracy with our blended approach of these specialized models and the VLM versus what LLMs and hyperscalers often offer. Tarakshaya, what have you seen in your experience around those accuracy levels?
Tarakshaya Bhatia: There’s a very big difference around having models in specific places and being able to do the work and having the machine perform the work, and then also being able to measure the work. When we think about what we’re doing in IDP, we often think about an input and an output, and we’re just getting information and data off a page. But in the majority of situations we need to know is that data accurate? And this is where Hyperscience and the Hypercell excels because the data accuracy element is inbuilt into the platform.
Xabi Ormazabal: The last element I talk about in terms of light green benefits is least cost routing. Our CTO, Brian Weiss likes to say, “why use a helicopter to cross the street?” Don’t use the most sophisticated or highest cost largest LLM for a simple task that you can do with a specialized model. Our platform is intelligent and allows organizations to route the documents to the right model for the job and to incur the least amount of cost and the highest level of performance possible. All this together comes out to a net present value of about 263K in our analysis.
Tarakshaya Bhatia: There’s a lot of discussion right now about what happens when I take some of these frontier models (Bedrock, Claude, Google, Gemini) and I start to apply them to my IDP use case. A major government agency actually opted to do this. They took a hyperscaler plus a Gen AI approach to a 300,000 document eligibility backlog. What they found is that this hyperscaler plus generative solution was costing them 20K a month in tokens, and they’re only getting 50% accuracy out the door. Because you’re 50% accurate, you have to go have contractors on overtime to rework that backlog. They actually came to us and we were able to work with the system integrator to deploy the Hypercell in 60 days to automate from intake all the way through to extraction with high confidence. 99.4% real automation, real accuracy, 98.9% real automation. These are numbers from production. That mail room backlog was eliminated, 300,000 pages through the system in a little bit less than a week. We are actually going along the build versus buy journey of going to that million page mark and counting, continuing to build and expand use cases.
Tarakshaya Bhatia: That was one side of the hyperscaler story. There’s also a recognition that many major enterprises have cloud spend that they have to work through as a part of their enterprise contracts. The story is truly better together. This is a reference architecture. From purchase all the way through down into the infrastructure layer, we seamlessly integrate with the existing suite of services from these hyperscalers. From ingestion all the way through to process analytics, fine tuning, and being able to actually incorporate the Hypercell blocks and flows architecture seamlessly with the other services that you’re already spending and doing work on. This becomes a very clean and easy integration point for you to be able to say, “I gotta have that spend with Google, or AWS, let me use my infrastructure spend and my cloud spend to host Hyperscience.”
Tarakshaya Bhatia: As we’re thinking about the hidden costs, there’s a very simple and visible solution: I’m gonna go build this and stitch the services together. We talked a little bit about with frontier models and LLMs and VLMs, it lowers the barrier to entry significantly to start to experiment with IDP as a use case. But I’d invite you to also consider the hidden challenges related to that and why you buy software at the enterprise levels to ultimately reduce risk in your business. Hyperscience tremendously excels in regulated industries where security is paramount. The Hypercell is containerized, things don’t have to leave the boundary layer if you don’t want them to. We excel very particularly in the government and the public sector, in insurance, in financial services, because these are heavily regulated industries where security comes number one.
Tarakshaya Bhatia: The easy way of thinking about it is “I can give my document to a Gemini and I got a visible solution.” But when you go to actually do that at scale, you’re gonna drop a bunch of documents in and then you’re gonna get a bill, and the bill is gonna be eye watering. The ability to have observability and tunability around your IDP use case is tremendously important. The ongoing maintenance of such a system is also important and the missing functionality. It’s really difficult to think about the maintenance of such systems. The Hypercell as a platform is extremely tuneable. You’re able to add functionality and stack functionality as you go, you’re able to stack models as you go, but then that’s all containerized and it’s ready for enterprise deployment.
Tarakshaya Bhatia: As you’re evaluating these technology decisions, what is the business goal and the business units that the work is impacting? It’s not all about documents and document types. What is the business process that IDP is a part of and is affecting? Then we start to get into the use case. What documents do you need to process first? Are they structured or semi-structured or unstructured? What is the type of content that you’re dealing with on a regular basis? And then also the people aspect. Who am I impacting? Who are my business users? Who are my end clients of the data? When we think about the process engineering aspect and the design decisions that go into some of this IDP work, we find that the Hypercell is capable of fulfilling more requirements out of the box, and then that reduces for those who are making these buying decisions the risk.
Xabi Ormazabal: A customer asks: “Based on a study on 1 million pages a year, what is the minimum pages per year that Hyperscience can service?”
Tarakshaya Bhatia: Sometimes we have some of these smaller use cases that are tremendously high value to the organization. It’s not about what is the volume of pages, but about the value to the business. Being able to assign a value to that and build a business case around it means that your build versus buy story is gonna look a little bit different. Maybe the ROI number is slightly different or the time to achieve that ROI is different. I have seen that some of these low volume use cases are extremely high value, high complexity, but high value. By fulfilling those, you just have an ROI story or a build versus buy decision that looks a little bit different. Maybe there’s a couple of less zeros on the page volume and a couple more zeros actually in the ROI story.
Xabi Ormazabal: One of our respondents, in addition to accuracy and automation, called out adoption as a measure of success. Tarakshaya, is there anything you want to talk about in terms of what you’ve seen with our deployments in terms of measuring adoption as a success metric?
Tarakshaya Bhatia: I think of adoption in a few ways. The first is we can look at it from a technology and a service standpoint. Did we actually adopt the new IDP technology? Are we using it as a service? In other words, consumption. Are we getting documents through the system? Are we getting accurate data out the door? And is that system and process being used? The second aspect of it is user adoption. We typically think about this in the IDP world as data keyer resources. This may be at the BPO level or maybe in-house. But I kind of almost look at that as a secondary function. As you’re exploring this use case end to end, the work of checking documents or checking for accurate data or still having some sort of validation step often occurs outside the boundary of what traditional IDP is considered. There’s a human review step of validation, a human review step of enrichment in terms of the process flow that is often unaccounted for. I saw a success criteria in the chat about average handling time. I think that’s a great metric. I think about it from the portion of the work that’s done outside of Hyperscience or outside of IDP technologies that I would love to bring into the fold of the overall orchestration. When we go back to that Hypercell reference architecture, we are able to bring those things inside the Hypercell and then you start to measure average handling time there. And that creates an even better business story to be able to say, “look, we timed this out. We cut this from five days to five hours.” Real numbers that I have seen when it comes to some of these high volume document use cases.
Xabi Ormazabal: Thank you for spending this time with us. We have the actual build versus buy white paper. This is this full 27-page study that goes through all the logic, the calculations of how we did this comparison of Hypercell versus the different hyperscalers. We also have a “Talk to an Expert” option. We wanna talk to you, we want to hear what your value case talks about and how we can help you understand it. Which of these elements dial up or dial down in your case to basically be able to calculate or make your business case for making an investment in IDP. And if you haven’t read the Gartner Magic Quadrant on this category of intelligent document processing, it’s a great read. It talks a lot about many different solutions. We’re featured as a leader in that particular report. Thank you again all for your time and for your great comments and interactivity.