Total Economic Impact
Cost Savings And Business Benefits Enabled By SageMaker HyperPod
A FORRESTER TOTAL ECONOMIC IMPACT STUDY COMMISSIONED BY Amazon, December 2025
Total Economic Impact
A FORRESTER TOTAL ECONOMIC IMPACT STUDY COMMISSIONED BY Amazon, December 2025
Amazon SageMaker HyperPod supports large-scale artificial intelligence (AI) model development with resilient, cost-effective infrastructure, as well as access to the latest hardware. With Amazon SageMaker HyperPod, customers save time across the model-training lifecycle from pre-training to training, fine-tuning, and inference. Faster AI compute infrastructure accelerates time to market for new products, driving incremental revenue.
Amazon SageMaker HyperPod offers purpose-built infrastructure to support large-scale AI model training.
Amazon commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study and examine the potential return on investment (ROI) enterprises may realize by deploying SageMaker HyperPod.1 The purpose of this study is to provide readers with a framework to evaluate the potential financial impact of SageMaker HyperPod on their organizations.
To better understand the benefits, costs, and risks associated with this investment, Forrester interviewed four decision-makers with experience using SageMaker HyperPod. For the purposes of this study, Forrester aggregated the experiences of the interviewees and combined the results into a single composite organization, which is a rapidly growing global organization that develops AI models as a key driver of its business strategy.
Interviewees said that prior to using SageMaker HyperPod, their organizations struggled to train large AI models quickly and cost-effectively. Their prior AI model-training infrastructure was expensive. Every training model setup could take several months, and AI model-training runs were disrupted by node failures. The AI model development teams had to spend a significant amount of time debugging issues and replacing failed nodes. These issues increased the time required to train AI models and created a potential opportunity cost by delaying new product launches.
After the investment in SageMaker HyperPod, the interviewees now have a cost-effective infrastructure for AI model training. AI training runs ran smoothly with minimal disruption. Training development teams quickly set up the infrastructure for new models. They spent less time replacing nodes that failed during training runs since that was automated with Amazon SageMaker HyperPod. With faster model training, new AI models and products reached the market faster, accelerating revenue growth.
Quantified benefits. Three-year, risk-adjusted present value (PV) quantified benefits for the composite organization include:
Technical team time savings of 88% for AI model-training infrastructure setup. The composite organization’s infrastructure team sets up the infrastructure for new model training easier and faster with Amazon SageMaker HyperPod. Amazon SageMaker HyperPod integrations with Slurm and Amazon EKS help orchestrate cluster creation. Setting up infrastructure and software to train a new AI model dropped from eight weeks to seven days. This benefit is worth $1.0 million PV to the composite organization over three years.
Technical team time savings of 98% for AI model training. When a node fails during a training run, Amazon SageMaker HyperPod identifies the underlying issue, replaces the node, and restarts the training job automatically. Before Amazon SageMaker HyperPod, manually debugging and replacing failed nodes took the composite organization’s model development team 24 hours per failure, with training paused until repairs were completed. After implementing Amazon SageMaker HyperPod, recovery averages just 30 minutes. This benefit is worth $433,000 PV to the composite organization over three years.
Optimized AI model-training infrastructure cost savings of 50%. The composite organization halves the cost of AI model training with Amazon SageMaker HyperPod. The underlying Amazon compute infrastructure is competitively priced compared to alternatives. This benefit is worth $18.9 million PV to the composite organization over three years.
Faster AI model training with improved infrastructure utilization for a 42% improvement in infrastructure availability. Amazon SageMaker HyperPod offers advanced resiliency so that even when a node fails, it can be isolated allowing the model-training run to continue. In addition, SageMaker HyperPod offers features such as task governance or observability to maximize compute resources utilization. The composite organization can complete more model training with the same infrastructure since downtime and delays are minimized. This benefit is worth $4.2 million PV to the composite organization over three years.
Faster time to market, generating $9.3 million incremental profit over three years. The composite organization uses Amazon SageMaker HyperPod primarily for model development, and a number of those models are put into production, generating incremental revenue worth $7.6 million PV to the composite organization over three years.
Unquantified benefits. Benefits that provide value for the composite organization but are not quantified for this study include:
AWS partnership and support. AWS’s partnership and timely customer support help the composite organization navigate any issues.
Access to the latest hardware. The composite organization benefits from access to the latest and fastest GPUs with Amazon SageMaker HyperPod, including Amazon Trainium chips, a family of AI chips purpose-built by AWS for AI training and inference.
Security. Amazon SageMaker HyperPod provides robust enterprise-grade security to protect the composite organization’s AI workloads and data. Amazon SageMaker HyperPod leverages AWS Identity and Access Management (IAM) for authentication and authorization, allowing the composite organization to define permissions to control who can access HyperPod resources.
Costs. Three-year, risk-adjusted PV costs for the composite organization include:
Amazon SageMaker HyperPod cost of $11.2 million PV over three years. The composite organization pays Amazon an annual fee for SageMaker HyperPod and for data storage.
Installation and maintenance cost of $381,000 PV over three years. The composite organization transitions to Amazon SageMaker HyperPod very quickly. The composite organization runs a pilot for three weeks and then spends one week transitioning to Amazon SageMaker HyperPod. One-half (50%) of a full-time equivalent (FTE) technical team member’s time is devoted to maintaining the platform.
The financial analysis that is based on the interviews found that a composite organization experiences benefits of $32.2 million over three years versus costs of $11.5 million, adding up to a net present value (NPV) of $20.6 million and an ROI of 178%.
Return on investment (ROI)
Benefits PV
Net present value (NPV)
Payback
| Role | Industry | Region | Revenue (USD) |
|---|---|---|---|
| Technical staff | Biotech | Europe | $2 million |
| Research engineer | Collaboration platform | North America | $50 million |
| CTO | Healthcare | Global | Not applicable |
| Chief scientist | Media | North America | $8 million |
The interviewees used a variety of AI model-training approaches before adopting Amazon SageMaker HyperPod, including running virtual machines (VMs) on bare metal or using clusters on alternative cloud infrastructure. One of the interviewees’ organizations was a startup and selected Amazon SageMaker HyperPod as its initial platform after exploring alternatives and running proof-of-concept (POC) tests with several vendors.
Interviewees noted how their organizations struggled with common challenges, including:
Challenges executing large AI model-training runs. The interviewees regularly trained large, multi-node AI models. Their prior infrastructure solutions did not support efficient model training since training would often be disrupted if a cluster node failed. Node replacement often required manual effort, and it was time-consuming and inefficient to debug and resolve issues.
Time-intensive AI model setup. It could take several months to set up clusters on the prior infrastructure. This delayed model training and resulted in delays in getting new products to market.
Expensive infrastructure. Interviewees shared that their prior AI model-training infrastructure was costly and did not scale.
The interviewees searched for a solution that could:
Support efficient, fault-tolerant training of large AI models.
Offer access to the latest hardware.
Ensure enterprise-grade security.
Provide excellent customer support.
Based on the interviews, Forrester constructed a TEI framework, a composite company, and an ROI analysis that illustrates the areas financially affected. The composite organization is representative of the interviewees’ organizations, and it is used to present the aggregate financial analysis in the next section. The composite organization has the following characteristics:
Description of composite. The composite organization is a rapidly growing global organization. AI model development is a key component of its business strategy. Before deploying Amazon SageMaker HyperPod, the composite organization managed its own AI model-training infrastructure and manually managed on-premises bare-metal GPUs and cloud-hosted VM instances.
Deployment characteristics. Twenty model developers use Amazon SageMaker HyperPod to create AI models at the composite organization. Amazon SageMaker HyperPod is used primarily for AI model training but can also be used for AI model inference. The model developers perform 50 AI model-training runs annually. On average, each training run takes one week. The model developers draw on 100TB of data to train the AI models. Data is typically stored in lower-cost cold storage (such as Amazon Glacier) and then moved into high-performance storage (such as Amazon FSx for Lustre) when it is required for AI model training. On average, each AI model has 20 nodes. Nvidia H100/H200 Tensor Core GPUs provide compute power for AI model training.
20 model developers
50 AI model-training runs annually
20 nodes per AI model
| Ref. | Benefit | Year 1 | Year 2 | Year 3 | Total | Present Value |
|---|---|---|---|---|---|---|
| Atr | Technical team time savings on AI model-training infrastructure setup | $347,776 | $417,331 | $500,797 | $1,265,904 | $1,037,318 |
| Btr | Technical team time savings: on AI model training | $145,236 | $174,283 | $209,140 | $528,659 | $433,198 |
| Ctr | Optimized AI model-training infrastructure cost savings | $6,307,200 | $7,568,640 | $9,145,440 | $23,021,280 | $18,859,997 |
| Dtr | Faster AI model training with improved infrastructure utilization | $1,418,069 | $1,701,683 | $2,056,200 | $5,175,952 | $4,240,356 |
| Etr | Faster time to market | $2,550,000 | $3,060,000 | $3,672,000 | $9,282,000 | $7,605,935 |
| Total benefits (risk-adjusted) | $10,768,281 | $12,921,937 | $15,583,577 | $39,273,795 | $32,176,804 |
Evidence and data. The interviewees found it easy to get started with Amazon SageMaker HyperPod, and it was faster to set up the infrastructure for AI model training compared to prior solutions. The technical teams spent less time on model setup and could devote that time to higher-value tasks.
Amazon SageMaker HyperPod offers Slurm and Amazon Elastic Kubernetes Service (EKS) integrations to orchestrate cluster creation.
Slurm support in Amazon SageMaker HyperPod helps users provision resilient clusters for running workloads to develop AI models. The technical staff member at a biotech organization explained: “Amazon SageMaker HyperPod is out of the box with a Slurm cluster. If we just had bare-metal nodes, it would have taken us a lot longer to go and build up the infrastructure.” He estimated that it took several months of work to set up a Slurm cluster before Amazon SageMaker HyperPod.
He added: “Setting up a Slurm cluster can be a bit painful, but HyperPod has a lot of tooling and automated scripts that help us set up Slurm in a way that just works for us. It has saved us a lot of time. When it comes to setting up the cluster, we’ve completely relied on the automated script.” Without Amazon SageMaker HyperPod, his organization would have had to hire a full-time cluster admin to manage the process.
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
The composite conducts fifty AI model-training runs per year.
Twenty percent of these model-training runs require unique setup.
Before Amazon SageMaker HyperPod, it took eight weeks to set up the infrastructure for an AI model-training run.
Time savings with Amazon SageMaker HyperPod is 88%.
The fully burdened hourly rate for a technical team member is $130 per hour.
Risks. The expected financial impact is subject to risks and variation based on several factors:
The percentage of model-training runs that require setup.
The fully burdened salary for model developers.
Results. To account for these risks, Forrester adjusted this benefit downward by 5%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $1.0 million.
Technical team time savings for AI model setup
| Ref. | Metric | Source | Year 1 | Year 2 | Year 3 | |
|---|---|---|---|---|---|---|
| A1 | AI model-training runs annually | Composite | 50 | 60 | 72 | |
| A2 | Percentage of AI model-training runs that require unique setup | Composite | 20% | 20% | 20% | |
| A3 | Time to set up an AI model before Amazon SageMaker HyperPod (weeks) | Interviews | 8 | 8 | 8 | |
| A4 | Technical team time savings with Amazon SageMaker HyperPod | Interviews | 88% | 88% | 88% | |
| A5 | Fully burdened hourly rate for a technical team staff member | Composite | $130 | $130 | $130 | |
| At | Technical team time savings on AI model-training infrastructure setup | A1*A2*A3*40*A4*A5 | $366,080 | $439,296 | $527,155 | |
| Risk adjustment | ↓5% | |||||
| Atr | Technical team time savings on AI model-training infrastructure setup (risk-adjusted) | $347,776 | $417,331 | $500,797 | ||
| Three-year total: $1,265,904 | Three-year present value: $1,037,318 | |||||
Evidence and data. The interviewees found it easier and faster to debug issues and replace instances with Amazon SageMaker HyperPod. This meant that their technical teams could use the time they saved on higher-value tasks.
The interviewees shared that before Amazon SageMaker HyperPod, when a node failed, it would take a long time to manually replace the node and identify what caused the problem. With Amazon SageMaker HyperPod, failed nodes were replaced almost instantaneously. The interviewees estimated that before Amazon SageMaker HyperPod, debugging and replacing nodes could take several days or up to two weeks. Now with Amazon SageMaker HyperPod, it takes between 30 minutes and 3 hours.
The chief scientist at a media organization explained: “Sometimes there are issues with the hardware where we need to replace the instance. That doesn’t happen constantly but pretty regularly, and Amazon gives us a reasonable interface to handle the replacements. This is where SageMaker HyperPod is interesting.” He estimated that before Amazon SageMaker HyperPod, it could take a week or two to replace an instance; with Amazon SageMaker HyperPod, it is a few hours, or at most, one day. Prior to implementing Amazon SageMaker HyperPod, they would have had to hire an additional site reliability engineer (SRE) to provide support.
The technical staff member at a biotech organization shared, “The spare nodes were very helpful for us to swap in and swap out.” He added: “To replace a node, it’s a matter of running a single command line. So once we know that we would like to replace a node, fixing it takes a few seconds to issue the command, but then HyperPod will take care of it. In total, to replace a node for us takes half an hour, so that’s great.”
The research engineer at a collaboration platform noted: “Having provisioned clusters with HyperPod really helps enable us to debug quicker. Before if we need to debug a log stream from a serverless architecture training job, it would take a day or two to just sift through the logs.”
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
The composite organization conducts 50 AI model-training runs per year.
The organization experiences one disruption per week.
Prior to Amazon SageMaker HyperPod, debugging and replacing failed instances required 24 hours of technical team time.
The reduction in time spent debugging and replacing failed instances with Amazon SageMaker HyperPod is 98%.
The fully burdened hourly rate for a model developer is $130.
Risks. The expected financial impact is subject to risks and variation based on several factors:
The number of AI model-training disruptions.
The skill of the technical team in resolving disruptions.
Technical team salary.
Results. To account for these risks, Forrester adjusted this benefit downward by 5%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $433,000.
Technical team time savings on model training
| Ref. | Metric | Source | Year 1 | Year 2 | Year 3 | |
|---|---|---|---|---|---|---|
| B1 | AI model-training runs annually | Composite | 50 | 60 | 72 | |
| B2 | Disruptions per model per week before Amazon SageMaker HyperPod | Interviews | 1 | 1 | 1 | |
| B3 | Time debugging and/or replacing nodes before Amazon SageMaker HyperPod (hours per model) | Interviews | 24 | 24 | 24 | |
| B4 | Technical team time savings with Amazon SageMaker HyperPod | Interviews | 98% | 98% | 98% | |
| B5 | Fully burdened hourly rate for a technical team staff member | Composite | $130 | $130 | $130 | |
| Bt | Technical team time savings on AI model training | B1*B2*B3*B4*B5 | $152,880 | $183,456 | $220,147 | |
| Risk adjustment | ↓5% | |||||
| Btr | Technical team time savings on AI model training (risk-adjusted) | $145,236 | $174,283 | $209,140 | ||
| Three-year total: $528,659 | Three-year present value: $433,198 | |||||
Evidence and data. The interviewees found that the overall cost of AI model training was significantly lower with Amazon SageMaker HyperPod than with their prior solutions, even though the prior states varied by company. Interviewees estimated that they experienced infrastructure cost savings ranging from 20% to 60% with Amazon SageMaker HyperPod.
The interviewees used a variety of AI model-training approaches before adopting Amazon SageMaker HyperPod, including running VMs on bare metal or using clusters on alternative cloud infrastructure. One of the interviewees’ organizations was a startup and selected Amazon SageMaker HyperPod as its initial platform after exploring alternatives and running POC tests with several vendors.
The technical staff member at a biotech organization shared: “Model training is for sure cheaper now [with Amazon SageMaker HyperPod]. Before, we would run on-demand instances in various cloud providers. Those are extremely expensive, so even though we would only spin up the nodes when we needed them, they would be really expensive. Now we have a set of dedicated nodes with AWS, and the pricing is good, so we have significant savings versus what we would have paid before.”
He added, “There’s no point in training models outside of HyperPod because it’s going to be more expensive.”
The research engineer at a collaboration platform said that the cost savings with Amazon SageMaker HyperPod were “substantial.” His company used clusters on an alternative cloud platform before moving to Amazon SageMaker HyperPod.
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
The prior AI model-training infrastructure cost was $5 per GPU hour. Before deploying Amazon SageMaker HyperPod, the composite organization managed its own AI model-training infrastructure and manually managed on-premises bare-metal GPUs and cloud-hosted VM instances.
AI model infrastructure costs were reduced by 50% with Amazon SageMaker HyperPod.
The infrastructure (GPUs) required for AI model training increases by 20% each year as the composite organization grows and expands AI model training.
Risks. The expected financial impact is subject to risks and variation based on several factors:
Infrastructure pricing.
Compute requirements and selections.
Results. To account for these risks, Forrester adjusted this benefit downward by 10%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $18.9 million.
Infrastructure cost savings
| Ref. | Metric | Source | Year 1 | Year 2 | Year 3 | |
|---|---|---|---|---|---|---|
| C1 | Prior model-training infrastructure cost (dollars per GPU hour) | Interviews | $5 | $5 | $5 | |
| C2 | Percent cost savings with Amazon SageMaker HyperPod | Interviews | 50% | 50% | 50% | |
| C3 | GPUs | Composite | 160 | 192 | 232 | |
| Ct | Optimized AI model-training infrastructure cost savings | C1*C3*24*365 | $7,008,000 | $8,409,600 | $10,161,600 | |
| Risk adjustment | ↓10% | |||||
| Ctr | Optimized AI model-training infrastructure cost savings (risk-adjusted) | $6,307,200 | $7,568,640 | $9,145,440 | ||
| Three-year total: $23,021,280 | Three-year present value: $18,859,997 | |||||
Evidence and data. Amazon SageMaker HyperPod offered resiliency so that even if a node failed, it could be isolated, allowing the AI model-training run to continue. This allowed the interviewees’ organizations to optimize their AI model infrastructure. They could complete more model training with the same infrastructure since downtime and delays were minimized.
The technical staff member at a biotech organization shared, “[With Amazon SageMaker HyperPod], our training runs are fine — they’re set and forget most of the time.”
The research engineer at a collaboration platform noted: “[With Amazon SageMaker HyperPod], the clusters work seamlessly. It saves everybody’s time.”
This resiliency allowed Amazon SageMaker HyperPod to improve mean time between failures (MTBF). The chief scientist at a media organization shared, “Amazon SageMaker HyperPod improved MTBF by about 10%.”
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
AI model-training runs were stalled or disrupted 72 hours each week before using Amazon SageMaker HyperPod.
AI model-training run disruptions are reduced to 30 minutes per week with Amazon SageMaker HyperPod.
The decrease in disruptions allows the composite organization to optimize its infrastructure. For the same infrastructure cost, the composite organization can spend more time training models with less time wasted.
Risks. The expected financial impact is subject to risks and variation based on several factors:
Model-training hours lost to disruptions before Amazon SageMaker HyperPod.
Disruptions with Amazon SageMaker HyperPod.
Cost of model-training infrastructure.
Results. To account for these risks, Forrester adjusted this benefit downward by 5%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $4.2 million.
Improvement in infrastructure availability
| Ref. | Metric | Source | Year 1 | Year 2 | Year 3 | |
|---|---|---|---|---|---|---|
| D1 | Model-training time per week (hours) | 7*24 | 168 | 168 | 168 | |
| D2 | Model-training time lost to disruptions per week before Amazon SageMaker HyperPod | Interviews | 72 | 72 | 72 | |
| D3 | Percentage of hours lost before Amazon SageMaker HyperPod | D2/D1 | 42.9% | 42.9% | 42.9% | |
| D4 | Model time lost to bad nodes per week with Amazon SageMaker HyperPod (hours) | Interviews | 0.5 | 0.5 | 0.5 | |
| D5 | Hours lost now as percentage | D4/D1 | 0.3% | 0.3% | 0.3% | |
| D6 | Reduction in infrastructure cost with Amazon SageMaker HyperPod | D3-D5 | 42.6% | 42.6% | 42.6% | |
| Dt | Faster AI model training with improved infrastructure utilization | D6*F3 | $1,492,704 | $1,791,245 | $2,164,421 | |
| Risk adjustment | ↓5% | |||||
| Dtr | Faster AI model training with improved infrastructure utilization (risk-adjusted) | $1,418,069 | $1,701,683 | $2,056,200 | ||
| Three-year total: $5,175,952 | Three-year present value: $4,240,356 | |||||
Evidence and data. The interviewees used Amazon SageMaker HyperPod for AI model development, and a number of those models evolved into production models, generating incremental revenue for the interviewees’ organizations.
By enabling faster AI model training, Amazon SageMaker HyperPod helped the interviewees’ organizations get new products to market faster, accelerating incremental revenue. Interviewees estimated that Amazon SageMaker HyperPod accelerated AI model training and new product development by several months. With Amazon SageMaker HyperPod, the interviewees could focus on growing their business rather than building up infrastructure.
The research engineer at a collaboration platform noted: “Amazon SageMaker HyperPod has helped our business grow because we are training more and better models every day. That translates into us being more efficient and pushing the best performance to our customers. HyperPod has been really helpful in providing reliable hardware.”
The CTO at a healthcare organization explained: “Without Amazon SageMaker HyperPod, it would have taken us longer to build up our infrastructure. It would have been a big distraction and a big opportunity cost. We didn’t want to focus on building complex infrastructure; we want to get ahead with our business.”
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
Incremental revenue of $25 million in Year 1, growing 20% annually to $36 million by Year 3.
The operating margin is 12% to reflect the costs associated with the incremental revenue.
Risks. The expected financial impact is subject to risks and variation based on several factors:
A company’s size, revenue, and growth.
A company’s operating margin.
A company’s ability to successfully launch new products.
Results. To account for these risks, Forrester adjusted this benefit downward by 15%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $7.6 million.
Incremental profit over three years
| Ref. | Metric | Source | Year 1 | Year 2 | Year 3 | |
|---|---|---|---|---|---|---|
| E1 | Incremental revenue from faster time to market | Composite | $25,000,000 | $30,000,000 | $36,000,000 | |
| E2 | Operating margin | NYU Stern School of Business | 12% | 12% | 12% | |
| Et | Faster time to market | E1*E2 | $3,000,000 | $3,600,000 | $4,320,000 | |
| Risk adjustment | ↓15% | |||||
| Etr | Faster time to market (risk-adjusted) | $2,550,000 | $3,060,000 | $3,672,000 | ||
| Three-year total: $9,282,000 | Three-year present value: $7,605,935 | |||||
Interviewees mentioned the following additional benefits that their organizations experienced but were not able to quantify:
AWS partnership and support. The interviewees called out and praised AWS’s partnership and support. The research engineer at a collaboration platform shared: “AWS has been great. Whenever we run into issues, they come back with good customer support.”
Access to the latest hardware. The interviewees valued access to the latest and fastest GPUs with Amazon SageMaker HyperPod, including AWS Trainium chips, a family of AI chips purpose-built by AWS for AI training and inference. The research engineer at a collaboration platform noted, “We need the latest hardware, and Amazon has always been up to date.” The CTO for a healthcare organization added: “Having the latest GPUs has been very important. The fact that we were able to upgrade to the H200 has been just really phenomenal.”
Security. Confidence in the security of Amazon’s platform was a key reason the interviewees chose Amazon SageMaker HyperPod. Amazon SageMaker HyperPod leverages IAM for authentication and authorization. Organizations can define permissions to control who can access HyperPod resources. The CTO at a healthcare organization shared, “Security on Amazon SageMaker HyperPod is great, and it’s one of the reasons we wanted to be on a tier one cloud provider.” He added, “With HyperPod, everything is integrated with IAM to manage all the user identities.”
The value of flexibility is unique to each customer. There are multiple scenarios in which a customer might implement SageMaker HyperPod and later realize additional uses and business opportunities, including:
Ability to quickly leverage new open-source AI models. New open-source AI models are released constantly. With Amazon SageMaker HyperPod, the testing infrastructure is already in place, so model developers can easily set up model-testing runs to explore ways to leverage the new models. The CTO at a healthcare organization explained: “A new model comes out, and it completely changes the whole game. We need training capacity, so when new models come out, we can immediately train on them and figure out what this means for our business.”
Scalability. The interviewees found that Amazon SageMaker HyperPod provided the scalability to meet business requirements. The research engineer at a collaboration platform noted: “We can seamlessly scale up and bring it down because no model builder needs to do a pretraining from scratch all the time. That is something we really like about HyperPod.”
Flexibility would also be quantified when evaluated as part of a specific project (described in more detail in Total Economic Impact Approach).
| Ref. | Cost | Initial | Year 1 | Year 2 | Year 3 | Total | Present Value |
|---|---|---|---|---|---|---|---|
| Ftr | Amazon SageMaker HyperPod cost | $0 | $3,742,200 | $4,490,640 | $5,425,560 | $13,658,400 | $11,189,576 |
| Gtr | Installation and maintenance cost | $63,788 | $127,575 | $127,575 | $127,575 | $446,513 | $381,048 |
| Total costs (risk-adjusted) | $63,788 | $3,869,775 | $4,618,215 | $5,553,135 | $14,104,913 | $11,570,624 |
Evidence and data. The interviewees’ organizations paid Amazon a fee for SageMaker HyperPod and for data storage.
The interviewees’ organizations typically used Nvidia H100 and H200 Tensor Core GPUs for AI model training, but in some cases also used Nvidia A100 Tensor Core GPUs.
The interviewees’ organizations used anywhere from 10TB to several PB of data to train their AI models. They typically stored data in lower-cost cold storage, (such as Amazon Glacier) and then moved the data into high-performance storage (such as Amazon FSx for Lustre) when the data was required for AI model training.
The interviewees’ organizations typically entered into a three-year contract with Amazon for SageMaker HyperPod. Pricing may vary for Amazon SageMaker HyperPod. Contact Amazon for additional details.
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
The cost to use Amazon SageMaker HyperPod increases over time as the composite organization grows its AI model training and requires more infrastructure capacity. The cost is based on usage of Nvidia H100 and H200 Tensor Core GPUs for AI model training.
Storage cost is based on 100TB of data in Year 1 to train the AI models. Storage cost increases over time as additional data is stored.
Risks. The expected financial impact is subject to risks and variation based on several factors:
The type and number of GPUs used for AI model training.
The volume of data stored for AI model training and the type of storage.
Results. To account for these risks, Forrester adjusted this cost upward by 5%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $11.2 million.
| Ref. | Metric | Source | Initial | Year 1 | Year 2 | Year 3 |
|---|---|---|---|---|---|---|
| F1 | GPUs | Composite | 160 | 192 | 232 | |
| F2 | Cost per GPU per hour | Interviews | $2.50 | $2.50 | $2.50 | |
| F3 | Amazon SageMaker HyperPod cost | F1*F2*24*365 | $3,504,000 | $4,204,800 | $5,080,800 | |
| F4 | Data storage (TB) | Composite | 100 | 120 | 144 | |
| F5 | Storage cost per TB per month | Interviews | $50 | $50 | $50 | |
| F6 | Storage cost | F4*F5*12 | $60,000 | $72,000 | $86,400 | |
| Ft | Amazon SageMaker HyperPod cost | F3+F6 | $0 | $3,564,000 | $4,276,800 | $5,167,200 |
| Risk adjustment | ↑5% | |||||
| Ftr | Amazon SageMaker HyperPod cost (risk-adjusted) | $0 | $3,742,200 | $4,490,640 | $5,425,560 | |
| Three-year total: $13,658,400 | Three-year present value: $11,189,576 | |||||
Evidence and data. The interviewees’ organizations were able to transition to Amazon SageMaker HyperPod very quickly.
Most of the interviewees’ organizations transitioned to Amazon SageMaker HyperPod in about a week. One organization ran a two- to three-week pilot and then spent a week on setup when they decided to move forward with Amazon SageMaker HyperPod.
It was easy for the model developers to begin using Amazon SageMaker HyperPod. The chief scientist at a media organization explained, “The learning curve is not very high.”
Modeling and assumptions. Based on the interviews, Forrester assumes the following about the composite organization:
Four technical team members pilot Amazon SageMaker HyperPod for three weeks.
One-half of an FTE’s time is dedicated to maintaining Amazon SageMaker HyperPod.
Technical team salary including benefits is $243,000 annually.
Risks. The expected financial impact is subject to risks and variation based on several factors:
The complexity of the required AI model-training infrastructure.
Technical team salaries.
Results. To account for these risks, Forrester adjusted this cost upward by 5%, yielding a three-year, risk-adjusted total PV (discounted at 10%) of $381,000.
Transition to Amazon SageMaker HyperPod
| Ref. | Metric | Source | Initial | Year 1 | Year 2 | Year 3 |
|---|---|---|---|---|---|---|
| G1 | Installation and setup (FTE) | Interviews | 0.25 | |||
| G2 | Ongoing maintenance (FTE) | Interviews | 0.5 | 0.5 | 0.5 | |
| G3 | Fully burdened annual salary for FTE maintaining Amazon SageMaker HyperPod | Interviews | $243,000 | $243,000 | $243,000 | $243,000 |
| Gt | Installation and maintenance cost | (G1+G2)*G3 | $60,750 | $121,500 | $121,500 | $121,500 |
| Risk adjustment | ↑5% | |||||
| Gtr | Installation and maintenance cost (risk-adjusted) | $63,788 | $127,575 | $127,575 | $127,575 | |
| Three-year total: $446,513 | Three-year present value: $381,048 | |||||
| Initial | Year 1 | Year 2 | Year 3 | Total | Present Value | |
|---|---|---|---|---|---|---|
| Total costs | ($63,788) | ($3,869,775) | ($4,618,215) | ($5,553,135) | ($14,104,913) | ($11,570,624) |
| Total benefits | $0 | $10,768,281 | $12,921,937 | $15,583,577 | $39,273,795 | $32,176,804 |
| Net benefits | ($63,788) | $6,898,506 | $8,303,722 | $10,030,442 | $25,168,882 | $20,606,180 |
| ROI | 178% | |||||
| Payback | <6 months |
The financial results calculated in the Benefits and Costs sections can be used to determine the ROI, NPV, and payback period for the composite organization’s investment. Forrester assumes a yearly discount rate of 10% for this analysis.
These risk-adjusted ROI, NPV, and payback period values are determined by applying risk-adjustment factors to the unadjusted results in each Benefit and Cost section.
The initial investment column contains costs incurred at “time 0” or at the beginning of Year 1 that are not discounted. All other cash flows are discounted using the discount rate at the end of the year. PV calculations are calculated for each total cost and benefit estimate. NPV calculations in the summary tables are the sum of the initial investment and the discounted cash flows in each year. Sums and present value calculations of the Total Benefits, Total Costs, and Cash Flow tables may not exactly add up, as some rounding may occur.
From the information provided in the interviews, Forrester constructed a Total Economic Impact™ framework for those organizations considering an investment in SageMaker HyperPod.
The objective of the framework is to identify the cost, benefit, flexibility, and risk factors that affect the investment decision. Forrester took a multistep approach to evaluate the impact that Amazon SageMaker HyperPod can have on an organization.
Interviewed Amazon stakeholders and Forrester analysts to gather data relative to SageMaker HyperPod.
Interviewed four decision-makers at organizations using SageMaker HyperPod to obtain data about costs, benefits, and risks.
Designed a composite organization based on characteristics of the interviewees’ organizations.
Constructed a financial model representative of the interviews using the TEI methodology and risk-adjusted the financial model based on issues and concerns of the interviewees.
Employed four fundamental elements of TEI in modeling the investment impact: benefits, costs, flexibility, and risks. Given the increasing sophistication of ROI analyses related to IT investments, Forrester’s TEI methodology provides a complete picture of the total economic impact of purchase decisions. Please see Appendix A for additional information on the TEI methodology.
Benefits represent the value the solution delivers to the business. The TEI methodology places equal weight on the measure of benefits and costs, allowing for a full examination of the solution’s effect on the entire organization.
Costs comprise all expenses necessary to deliver the proposed value, or benefits, of the solution. The methodology captures implementation and ongoing costs associated with the solution.
Flexibility represents the strategic value that can be obtained for some future additional investment building on top of the initial investment already made. The ability to capture that benefit has a PV that can be estimated.
Risks measure the uncertainty of benefit and cost estimates given: 1) the likelihood that estimates will meet original projections and 2) the likelihood that estimates will be tracked over time. TEI risk factors are based on “triangular distribution.”
The present or current value of (discounted) cost and benefit estimates given at the cost of capital (the discount rate). The PV of costs and benefits feeds into the total NPV of cash flows.
The present or current value of (discounted) future net cash flows given the cost of capital (the discount rate). A positive project NPV normally indicates that the investment should be made unless other projects have higher NPVs.
A project’s expected return in percentage terms. ROI is calculated by dividing net benefits (benefits less costs) by costs.
The weighted average cost of capital used in cash flow analysis to take into account the time value of money. Organizations typically use discount rates between 8% and 16%.
The breakeven point for an investment. This is the point in time at which net benefits (benefits minus costs) equal initial investment or cost.
Total Economic Impact is a methodology developed by Forrester Research that enhances a company’s technology decision-making processes and assists solution providers in communicating their value proposition to clients. The TEI methodology helps companies demonstrate, justify, and realize the tangible value of business and technology initiatives to both senior management and other key stakeholders.
1 Total Economic Impact is a methodology developed by Forrester Research that enhances a company’s technology decision-making processes and assists solution providers in communicating their value proposition to clients. The TEI methodology helps companies demonstrate, justify, and realize the tangible value of business and technology initiatives to both senior management and other key stakeholders.
Readers should be aware of the following:
This study is commissioned by Amazon and delivered by Forrester Consulting. It is not meant to be used as a competitive analysis.
Forrester makes no assumptions as to the potential ROI that other organizations will receive. Forrester strongly advises that readers use their own estimates within the framework provided in the study to determine the appropriateness of an investment in SageMaker HyperPod. For any interactive functionality, the intent is for the questions to solicit inputs specific to a prospect's business. Forrester believes that this analysis is representative of what companies may achieve with SageMaker HyperPod based on the inputs provided and any assumptions made. Forrester does not endorse Amazon or its offerings. Although great care has been taken to ensure the accuracy and completeness of this model, Amazon and Forrester Research are unable to accept any legal responsibility for any actions taken on the basis of the information contained herein. The interactive tool is provided ‘AS IS,’ and Forrester and Amazon make no warranties of any kind.
Amazon reviewed and provided feedback to Forrester, but Forrester maintains editorial control over the study and its findings and does not accept changes to the study that contradict Forrester’s findings or obscure the meaning of the study.
Amazon provided the customer names for the interviews but did not participate in the interviews.
December 2025
https://mainstayadvisor.com/go/mainstay/gdpr/policy.html