Amazon EC2 Trn2 instances and UltraServers

Most powerful EC2 compute for generative AI training and inference

Why Amazon EC2 Trn2 instances and UltraServers?

Amazon EC2 Trn2 instances, powered by 16 AWS Trainium2 chips, are purpose-built for generative AI and are the most powerful EC2 instances for training and deploying models with hundreds of billions to trillion+ parameters. Trn2 instances offer 30-40% better price performance than the current generation of GPU-based EC2 P5e and P5en instances. With Trn2 instances, you can get state-of-the-art training and inference performance while lowering costs, so you can reduce training times, iterate faster, and deliver real-time, AI-powered experiences. You can use Trn2 instances to train and deploy models including large language models (LLMs), multimodal models, and diffusion transformers to build next-generation generative AI applications.

To lower training times and deliver breakthrough response times (per-token-latency) for the most demanding, state-of-the-art models, you might need more compute and memory than a single instance can deliver. Trn2 UltraServers use NeuronLink, our proprietary chip-to-chip interconnect, to connect 64 Trainium2 chips across four Trn2 instances, quadrupling the compute, memory, and networking bandwidth available in a single node and offering breakthrough performance on AWS for deep learning and generative AI workloads. For inference, UltraServers help deliver industry-leading response time to create the best real-time experiences. For training, UltraServers boost model training speed and efficiency with faster collective communication for model parallelism as compared to standalone instances.

You can easily get started on Trn2 instances and Trn2 UltraServers with native support for popular machine learning (ML) frameworks such as PyTorch and JAX.

Benefits

Trn2 instances are the most powerful EC2 instances and help you reduce your training times and deliver real-time inference experience to your end users. Trn2 instances feature 16 Trainium2 chips interconnected with NeuronLink, our proprietary chip-to-chip interconnect, to deliver up to 20.8 FP8 petaflops of compute. Trn2 instances have a total of 1.5 TB HBM3 with 46 terabytes per second (TBps) of memory bandwidth and 3.2 terabits per second (Tbps) of Elastic Fabric Adapter networking (EFAv3) networking. Trn2 UltraServers (available in preview), have 64 Trainium2 chips connected with NeuronLink and deliver up to 83.2 petaflops of FP8 compute, 6 TB of total high bandwidth memory with 185 TBps of total memory bandwidth, and 12.8 Tbps of EFAv3 networking.

To enable efficient distributed training, Trn2 instances deliver 3.2 Tbps and Trn2 UltraServers deliver 12.8 Tbps of EFAv3 networking. EFA is built on the AWS Nitro System which means all communication through EFA has encryption in transit without incurring any performance penalty. EFA also uses a sophisticated traffic routing and congestion control protocol that allows it to reliably scale to hundreds of thousands of Trainium2 chips. Trn2 instances and UltraServers are being deployed in EC2 UltraClusters to enable scale-out distributed training across tens of thousands of Trainium chips on a single petabit scale, non-blocking network.

Trn2 instances offer 30-40% better price performance than the current generation of GPU-based EC2 P5e and P5en instances.

Trn2 instances are 3x more energy efficient than Trn1 instances. These instances and the underlying chips use advanced silicon processes as well as hardware and software optimizations to deliver high-energy efficiency when running generative AI workloads at scale.

AWS Neuron SDK helps you extract the full performance from Trn2 instances and UltraServers, allowing you to focus on building and deploying models and accelerating your time to market. Neuron integrates natively with JAX, PyTorch, and essential libraries like Hugging Face, PyTorch Lightning, and NeMo. Neuron includes out-of-the-box optimizations for distributed training and inference with the open- source PyTorch libraries NxD Training and NxD Inference, while providing deep insights for profiling and debugging. Neuron also supports OpenXLA, including stable HLO and GSPMD, enabling PyTorch/XLA and JAX developers to utilize Neuron's compiler optimizations for Inferentia and Trainium. Neuron enables you to use Trn2 instances with services such as Amazon SageMaker, Amazon EKS, Amazon ECS, AWS ParallelCluster, and AWS Batch, as well as third-party services like Ray (Anyscale), Domino Data Lab, and Datadog.

Features

Trn2 instances feature 16 Trainium2 chips interconnected with NeuronLink to deliver up to 20.8 FP8 petaflops of compute. Trn2 UltraServers extend NeuronLink connectivity to 64 Trainium2 chips across four Trn2 instances to deliver up to 83.2 FP8 petaflops of compute.

Trn2 instances deliver 1.5 TB of accelerator memory with 46 TBps of total memory bandwidth. Trn2 UltraServers offer 6 TB of shared accelerator memory with 185 TBps of total memory bandwidth to accommodate ultra-large foundation models.

To support scale-out distributed training of ultra-large foundation models, Trn2 instances deliver 3.2 Tbps and Trn2 UltraServers deliver 12.8 Tbps of EFAv3 networking bandwidth. When combined with EC2 UltraClusters, EFAv3 delivers lower network latency compared to EFAv2. Each Trn2 instance supports up to 8 TB and each Trn2 UltraServer supports up to 32 TB of local NVMe storage for faster access to large datasets.

Trn2 instances and UltraServers support FP32, TF32, BF16, FP16, and the configurable FP8 (cFP8) data types. It also supports cutting-edge AI optimizations including 4x sparsity (16:4), stochastic rounding, and dedicated collective engines. Neuron Kernel Interface (NKI) enables direct access to instruction set architecture (ISA) using a Python-based environment with a Triton-like interface allowing you to innovate new model architectures and highly-optimized compute kernels that outperform existing techniques.

Neuron supports over 100,000 models on the Hugging Face model hub for training and deployment on Trn2 including popular model architectures such as Llama and Stable Diffusion. Neuron integrates natively with JAX, PyTorch, and essential tools, frameworks, and libraries such as NeMo, Hugging Face, PyTorch Lightning, Ray, Domino Data Lab, and Data Dog. It optimizes models out of the box for distributed training and inference, while providing deep insights for profiling and debugging. Neuron also integrates with services such as Amazon SageMaker, Amazon EKS, Amazon ECS, AWS ParallelCluster, and AWS Batch.

Customer and partner testimonials

Here are some examples of how customers and partners plan to achieve their business goals with Amazon EC2 Trn2 instances.

  • Anthropic

    At Anthropic, millions of people rely on Claude daily for their work. We're announcing two major advances with AWS: First, a new "latency-optimized mode" for Claude 3.5 Haiku which runs 60% faster on Trainium2 via Amazon Bedrock. And second, Project Rainier—a new cluster with hundreds of thousands of Trainium2 chips delivering hundreds of exaflops, which is over five times the size of our previous cluster. Project Rainier will help power both our research and our next generation of scaling. For our customers, this means more intelligence, lower prices, and faster speeds. We're not just building faster AI, we're building trustworthy AI that scales.

    Tom Brown, Chief Compute Officer at Anthropic
  • Databricks

    Databricks’ Mosaic AI enables organizations to build and deploy quality Agent Systems. It is built natively on top of the data lakehouse, enabling customers to easily and securely customize their models with enterprise data and deliver more accurate and domain-specific outputs. Thanks to Trainium's high performance and cost-effectiveness, customers can scale model training on Mosaic AI at a low cost. Trainium2’s availability will be a major benefit to Databricks and its customers as demand for Mosaic AI continues to scale across all customer segments and around the world. Databricks, one of the largest data and AI companies in the world, plans to use TRN2 to deliver better results and lower TCO by up to 30% for its customers.

    Naveen Rao, VP of Generative AI at Databricks
  • poolside

    At poolside, we are set to build a world where AI will drive the majority of economically valuable work and scientific progress. We believe that software development will be the first major capability in neural networks that reaches human-level intelligence because it's the domain where we can combine Search and Learning approaches the best. To enable that, we're building foundation models, an API, and an Assistant to bring the power of generative AI to your developers' hands (or keyboard). A major key to enable this technology, is the infrastructure we are using to build and run our products. With AWS Trainium2 our customers will be able to scale their usage of poolside at a price performance ratio unlike other AI accelerators. In addition, we plan to train future models with Trainium2 UltraServers with expected savings of 40% compared to EC2 P5 instances.

    Eiso Kant, CTO & Co-founder, poolside
  • Itaú Unibanco

    Itaú Unibanco's purpose is to improve people's relationship with money, creating positive impact on their lives while expanding their opportunities for transformation. At Itaú Unibanco, we believe that each customer is unique and we focus on meeting their needs through intuitive digital journeys, that leverage the power of AI to constantly adapt to their consumer habits.

    We have tested AWS Trainium and Inferentia across various tasks, ranging from standard inference to fine-tuned applications. The performance of these AI chips have enabled us to achieve significant milestones in our research and development. For both batch and online inference tasks, we have seen a 7x improvement in throughput compared to GPUs. This enhanced performance is driving the expansion of more use cases across the organization. The latest generation of Trainium2 chips unlocks groundbreaking features for GenAI and opens the door for innovation at Itau.

    Vitor Azeka, Head of Data Science at Itaú Unibanco
  • NinjaTech AI

    Ninja is an All-In-One AI Agent for Unlimited Productivity: one simple subscription, unlimited access to world’s best AI models along with top AI skills such as: writing, coding, brainstorming, image generation, online research. Ninja is an agentic platform and offers “SuperAgent” which uses Mixture-of-agents with world class accuracy comparable to (and in some categories it’s beating) frontier foundation models. Ninja’s Agentic technology demands the highest performance accelerators, to deliver the unique real- time experiences our customers expect. 

    We are extremely excited for the launch of AWS TRN2 because we believe it’ll offer the best cost per token performance and most the fastest speed currently possible for our core model Ninja LLM which is based off of Llama 3.1 405B. It’s amazing to see Trn2’s low latency coupled with competitive pricing and on-demand availability; we couldn’t be more excited about Trn2’s arrival!

    Babak Pahlavan, Founder & CEO, NinjaTech AI
  • Ricoh

    The RICOH machine learning team develops workplace solutions and digital transformation services designed to manage and optimize the flow of information across our enterprise solutions.

    The migration to Trn1 instances was easy and straightforward. We were able to pretrain our 13B parameter LLM in just 8 days, utilizing a cluster of 4,096 Trainium chips! After the success we saw with our smaller model, we fine-tuned a new, larger LLM based on Llama-3-Swallow-70B, and leveraging Trainium we were able to reduce our training costs by 50% and improve the energy efficiency by 25% as compared to using latest GPU machines in AWS. We are excited to leverage the latest generation of AWS AI Chips, Trainium2, to continue to provide our customers with the best performance at the lowest cost.

    Yoshiaki Umetsu, Director, Digital Technology Development Center, Ricoh
  • Arcee AI

    Arcee AI offers an enterprise-grade generative AI platform, Arcee Orchestra, which is powered by our industry-leading small language models (SLMs). Arcee Orchestra makes it easy for customers to build agentic AI workflows that automatically route tasks to specialized SLMs to deliver detailed, trustworthy responses, without any data leaving their VPC. Using AWS Trainium and Inferentia instances enables us to provide customers with unmatched cost- performance. For example, when using Inferentia2-based instances, our SuperNova-Lite 8- billion-parameter model is 32% more cost-efficient for inference workloads compared to the next best GPU-based instance, without sacrificing performance. We are excited to leverage the latest generation of AWS AI Chips, Trainium2, to continue to provide our customers with the best performance at the lowest cost.

    Julien Simon, Chief Evangelist, Arcee AI
  • PyTorch

    What I liked most about AWS Neuron NxD Inference library is how seamlessly it integrates with PyTorch models. NxD's approach is straightforward and user-friendly. Our team was able to onboard HuggingFace PyTorch models with minimal code changes in a short time frame. Enabling advanced features like Continuous Batching and Speculative Decoding was straightforward. This ease of use enhances developer productivity, allowing teams to focus more on innovation and less on integration challenges.

    Hamid Shojanazeri , PyTorch Partner Engineering Lead, Meta
  • Refact.ai

    Refact.ai offers comprehensive AI tools such as code auto-completion powered by Retrieval-Augmented Generation (RAG), providing more accurate suggestions, and a context-aware chat using both proprietary and open-source models.

    Customers have seen up to 20% higher performance and 1.5x higher tokens per dollar with EC2 Inf2 instances compared to EC2 G5 instances. Refact.ai’s fine-tuning capabilities further enhance our customers’ ability to understand and adapt to their organizations’ unique codebase and environment. We are also excited to offer the capabilities of Trainium2, that will bring even faster, more efficient processing to our workflows. This advanced technology will enable our customers to accelerate their software development process, by boosting developer productivity while maintaining strict security standards for their code base.

    Oleg Klimov CEO & Founder, Refact.ai
  • Karakuri Inc.

    KARAKURI, builds AI tools to improve the efficiency of web based customer support and simplify customer experiences. These tools include AI chatbots equipped with generative AI functions an FAQ centralization tools, and an email response tool, all of which improve the efficiency and quality of customer support. Utilizing AWS Trainium, we succeeded in training KARAKURI LM 8x7B Chat v0.1. For startups, like ourselves, we need to optimize the time to build and the cost required to train LLMs. With the support of AWS Trainium and AWS Team, we were able to develop a practical level LLM in a short period of time. Also, by adopting AWS Inferentia, we were able to build a fast and cost-effective inference service. We're energized about Trainium2 because it will revolutionize our training process, reducing our training time by 2x and driving efficiency to new heights!

    Tomofumi Nakayama, Co-Founder, Karakuri Inc.
  • ELYZA

    ELYZA is a GenAI company developing large language models (LLMs), supporting the use of generative AI in companies, and providing AI SaaS. Amazon’s inferentia2 accelerators enabled us to achieve high throughput and low latency while significantly reducing costs, which was crucial for building our LLM demo service. By combining this infrastructure with the Speculative Decoding technique, we successfully doubled our original inference speed. Trainium2's impressive increase in inference capabilities compared to Inferentia2 shows immense promise, and we're thrilled to see how it will drive transformative results in our work.

    Kota Kakiuchi, CTO, ELYZA
  • Stockmark Inc.

    With the mission of “reinventing the mechanism of value creation and advancing humanity,” Stockmark helps many companies create and build innovative businesses by providing cutting-edge natural language processing technology. Stockmark’s new data analysis and gathering service called Anews and SAT, a Data structuring service that dramatically improves generative AI uses by organizing all forms of information stored in an organization, required us to rethink how we built and deployed models to support these products. With 256 Trainium accelerators, we have developed and released stockmark- 13b, a large language model with 13 billion parameters, pre-trained from scratch on a Japanese corpus dataset of 220B tokens. Trn1 instances helped us to reduce our training costs by 20%. Leveraging Trainium, we successfully developed an LLM that can answer business- critical questions for professionals with unprecedented accuracy and speed. This achievement is particularly noteworthy given the widespread challenge companies face in securing adequate computational resources for model development. With the impressive speed and cost reduction of Trn1 instances, we are excited to see the additional benefits that Trainium2 will bring to our workflows and customers.

    Kosuke Arima, CTO and Co-founder, Stockmark Inc.
  • Brave

    Brave is an independent browser and search engine dedicated to prioritizing user privacy and security. With over 70 million users, we deliver industry-leading protections that make the Web safer and more user-friendly. Unlike other platforms that have shifted away from user-centric approaches, Brave remains committed to putting privacy, security, and convenience first. Key features include blocking harmful scripts and trackers, AI- assisted page summaries powered by LLMs, built-in VPN services, and more. We continually strive to enhance the speed and cost-efficiency of our search services and AI models. To support this, we’re excited to leverage the latest capabilities of AWS AI chips, including Trainium2, to improve user experience as we scale to handle billions of search queries monthly.

    Subu Sathyanarayana , VP of Engineering, Brave Software
  • Anyscale

    Anyscale is the company behind Ray, an AI Compute Engine that fuels ML, and Generative AI initiatives for Enterprises. With Anyscale's unified AI platform driven by RayTurbo, customers see up to 4.5x faster data processing, 10X lower cost batch inference with LLMs, 5x faster scaling, 12X faster iteration, and cost savings of 50% for online model inference by optimizing utilization of resources.

    At Anyscale, we’re committed to empowering enterprises with the best tools to scale AI workloads efficiently and cost- effectively. With native support for AWS Trainium and Inferentia chips, powered by our RayTurbo runtime, our customers have access to high performing, cost effective options for model training and serving. We are now excited to join forces with AWS on Trainium2, unlocking new opportunities for our customers to innovate rapidly, and deliver high-performing transformative AI experiences at scale.

    Robert Nishihara, Cofounder, Anyscale
  • Datadog

    Datadog, the observability and security platform for cloud applications, provides AWS Trainium and Inferentia Monitoring for customers to optimize model performance, improve efficiency, and reduce costs. Datadog’s integration provides full visibility into ML operations and underlying chip performance, enabling proactive issue resolution and seamless infrastructure scaling. We're excited to extend our partnership with AWS for the AWS Trainium2 launch, which helps users cut AI infrastructure costs by up to 50% and boost model training and deployment performance.

    Yrieix Garnier, VP of Product Company, Datadog
  • Hugging Face

    Hugging Face is the leading open platform for AI builders, with over 2 million models, datasets and AI applications shared by a community of more than 5 million researchers, data scientists, machine learning engineers and software developers. We have been collaborating with AWS over the last couple of years, making it easier for developers to experience the performance and cost benefits of AWS Inferentia and Trainium through the Optimum Neuron open source library, integrated in Hugging Face Inference Endpoints, and now optimized within our new HUGS self-deployment service, available on the AWS Marketplace. With the launch of Trainium2, our users will access even higher performance to develop and deploy models faster.

    Jeff Boudier, Head of Product, Hugging Face
  • Lightning AI

    Lightning AI, the creator of PyTorch Lightning and Lightning Studios offers the most intuitive, all-in-one AI development platform for enterprise-grade AI. Lightning provides full code, low-code and no-code tools to build agents, AI applications and generative AI solutions, Lightning fast. Designed for flexibility, it runs seamlessly on your cloud or ours leveraging the expertise and support of a 3M+ strong developer community.

    Lightning now natively offers support for AWS AI Chips, Trainium and Inferentia, which are integrated across Lightning Studios and our open-source tools like PyTorch Lightning, Fabric, and LitServe. This gives users seamless capability to pretrain, fine-tune, and deploy at scale—optimizing cost, availability, and performance with zero switching overhead, and the performance and cost benefits of AWS AI Chips, including the latest generation of Trainium2 chips, delivering higher performance at lower cost.

    Luca Antiga, CTO, Lightning AI
  • Domino Data Lab

    Domino’s unified AI platform gives enterprise data science teams the ability to build and operate AI at scale. Leading enterprises balance technical complexity, costs, and governance, mastering expansive AI options to innovate. With AWS Trainium and Inferentia, we empower our customers to gain high performance and efficiency without compromise. And with the launch of AWS Trainium2 our customers are able to train and deploy models at with higher performance and at lower cost. Domino’s support for the AWS Trainium2 launch provides our customers additional options to train and deploy models cost- and energy-efficiently.

    Nick Elprin, CEO and Co-founder, Domino Data Lab

Getting started

SageMaker support for Trn2 instances is coming soon. You will be able to easily train models on Trn2 instances by using Amazon SageMaker HyperPod which provides a resilient compute cluster, optimized training performance, and efficient utilization of underlying compute, networking, and memory resources. You can also scale your model deployment on Trn2 instances using SageMaker to manage models more efficiently in production and reduce operational burden.

The AWS Deep Learning AMIs (DLAMI) provide deep learning (DL) practitioners and researchers the infrastructure and tools to accelerate DL on AWS, at any scale. AWS Neuron drivers come preconfigured in the DLAMI to optimally train your DL models on Trn2 instances.

Deep Learning Containers support for Trn2 instances is coming soon. Using these containers you will be able to now deploy Trn2 instances in Amazon Elastic Kubernetes Service (Amazon EKS), a fully managed Kubernetes service, and in Amazon Elastic Container Service (Amazon ECS), a fully managed container orchestration service. Neuron is also available pre-installed in AWS Deep Learning Containers. To learn more about running containers on Trn2 instances, see the Neuron Containers tutorials.

Product details

Instance Size Available in EC2 UltraServers Trainium2 chips Accelerator
memory

vCPUs Memory
(TB)
Instance storage (TB) Network bandwidth (Tbps) EBS bandwidth (Gbps)
trn2.48xlarge No 16 1.5 TB 192 2 TB 4 x 1.92 NVMe SSD 3.2 80
trn2u.48xlarge Yes (Preview) 16 1.5 TB 192 2 TB 4 x 1.92 NVMe SSD 3.2 80