
Building AI-Ready Infrastructure: How to create a balanced approach for success
As artificial intelligence (AI) continues to drive technological advancements and improve efficiency, cost savings, and revenue potential for organizations around the world, the technology has established itself as a cornerstone of today's business landscape. However, the successful deployment and operation of AI systems hinge significantly on the underlying infrastructure, which is often the least understood but most crucial component of an AI stack.
According to Vanguard, infrastructure is one of the toughest challenges for IT oversight and is the most commonly identified factor contributing to AI project abandonment among organizations. There is no one-size-fits-all approach. The pressure on IT managers today is immense as they navigate and address different conflicting considerations from various stakeholders. At the same time, the demand for more enterprise applications continues unabated. Everything from traditional online transaction processing systems to highly interactive cloud-native applications is processing more data and demanding more CPU compute power.
Against this backdrop, the criticality and relevancy of AI adoption across an organisation demand a collaborative and holistic approach to planning – specifically, the focus should be on whether the existing infrastructure is fit for purpose and whether upgrading provides a meaningful return on investment. In doing so, organisations can avoid overextending their limited resources and, instead, channel their efforts in a meaningful and strategic manner that fully realises the full potential of AI.
Why AI needs to be on the table
AI does not comprise a single workload or use case; it encompasses a range of tasks, from routine inferencing to complex, data-intensive model training. It has become a vital tool for many organisations across industries, driving innovation, efficiency, and competitive advantage.
Ways in which AI can redefine operations include:
- Enriching decision-making with advanced analytics and insights and augmenting human capabilities, allowing employees to focus on higher-value tasks.
- Improving customer experience, responsiveness, and accuracy with interpretive AI systems, chatbots, virtual assistants, and personalised recommendations.
- Enabling the development of new products and services by leveraging data insights and advanced algorithms, as well as accelerating research and time-to-insights using generative AI systems.
- Bolstering risk management and mitigation by analysing patterns and anomalies in data alongside improving detection of fraud and cyberthreats using machine learning (ML) systems in use cases such as personalisation and pricing optimisation engines.
This wide range of AI applications calls for varying infrastructure setups, making it essential for enterprise architecture teams to adopt a balanced approach that is customised for a specific purpose.
Furthermore, AI infrastructure requirements are starting to exceed organisations' ability to service the wide array of AI projects and capabilities being deployed to production environments. Targeted use cases for AI are diverse, and many organisations are already making use of hundreds of models. Vanguard surveyed organisations with AI in production and found a median of 125 models in use and more than a petabyte of data required to train those models in aggregate – and most expect workload requirements to increase. In this environment of AI workload expansion, infrastructure is emerging as a critical bottleneck.
Infrastructure is crucial for successful AI implementation
Essential ingredients for supporting AI include high-powered computing, efficient data handling, and reliable networking. But not every AI workload demands the same level of resources. Oftentimes, general-purpose processors (CPUs) can manage smaller AI workloads, while more specialised applications – like large-scale training models – require advanced accelerators (e.g., GPUs).
As a first step, IT leaders should consider the following points before initiating planning and building AI-ready infrastructure:
- Assess Specific AI Requirements: Enterprise architecture teams should evaluate the specific AI use cases their business requires.
- Balance CPUs and GPUs: Create a balanced ecosystem of CPUs and GPUs designed to match the correct infrastructure with the workload.
- Prioritise Data Security and Privacy: Infrastructure and Operations (I&O) teams should consider the implementation of "private AI," running AI workloads on premises to help safeguard sensitive data.
As AI workloads continue to proliferate, businesses need to emphasise the need for cost-effective infrastructure strategies. Data centres operating AI workloads consume a substantial amount of energy. Enterprise architecture teams should select energy-efficient processors, invest in cooling solutions, and implement sustainable practices to help manage operational costs.
A robust AI infrastructure needs visibility into compute, storage, and networking resources. I&O teams should equip data centres with observability tools to help the business understand usage patterns and help ensure the infrastructure can scale as AI demands grow.
The cornerstones of an AI-ready infrastructure
Enterprises should take a pragmatic approach to creating an infrastructure environment that fits the evolving needs of their AI workloads by considering the following three-pillar framework designed to enhance data centre efficiency and performance without the need for extensive new infrastructure:
- Modernise: Replace outdated servers with newer, more efficient systems to maximise space and energy savings. For instance, the new "Zen 5" core architecture, provides up to 17% better instructions per clock (IPC) for enterprise and cloud workloads and up to 37% higher IPC in AI and high-performance computing (HPC) compared to "Zen 4."
- Utilise a Hybrid Cloud Strategy: For workloads that vary in intensity and scale, virtualised and containerised environments provide a flexible solution. By leveraging both private and hybrid cloud strategies, enterprises can scale AI applications while avoiding unnecessary resource allocation.
- Invest in Balanced Accelerator Resources: Organisations should right-size their investments in coprocessors (GPUs) to match specific workload needs. Pairing accelerators with capable CPUs helps ensure maximum performance without breaking the bank.
This further underscores the importance of selecting the appropriate CPUs and GPUs to service AI workloads. To put this into perspective using SPECrate®2017_int_base general-purpose computing performance as a benchmark, a company that selects AMD's latest 5th Gen EPYC processors and Instinct accelerators to modernising their data centre, would gain the ability to use an estimated 71% less power and ~87% fewer servers compared to continuing with older processors from competitors. This gives CIOs the flexibility to either benefit from the space and power savings or add performance for day-to-day IT tasks while delivering impressive AI performance.
Ultimately, CIOs must ask their teams to take a pragmatic approach. It requires acknowledging that AI is not a singular entity. AI workloads and use cases are as diverse as they get: a combination of standalone workloads (both large and small), use cases, and functions within other workloads.
The best way to effectively manage the AI workload spread is to take a fit-for-purpose approach that relies on processors and accelerators, with the choice depending on the specific requirements of the tasks. The path to AI-readiness requires thoughtful planning and strategic investment. By educating themselves and their stakeholders, IT organisations can make informed decisions about AI infrastructure, enabling them to choose the right mix of technologies to meet the specific needs of the AI workload.