Reprogramming Discovery: How AMD GPUs Are Powering the Next Wave of AI-Driven Biology
A contribution from IPA Therapeutics
Part 2: Improved Biological Analysis using Protein Embeddings
The benchmarking series, which compares the AMD Instinct MI300X and NVIDIA H100 GPUs across important AI tasks in drug development, is in its second installment.
To look at NLP-based knowledge extraction for biomedical research in Part 1. To assess how well both GPUs manage large-scale protein language models (pLLMs), which are utilised to comprehend structure, function, and mutational effects, it now turn the attention to the protein layer.
ImmunoPrecise Antibodies (IPA) and its AI subsidiary BioStrand, creators of LENSai, an AI-native platform that combines sequencing, structure, and functional reasoning to propel biologics discovery, carried out these benchmarks. Vultr’s high-performance cloud infrastructure was used for all tests, allowing for repeatable, side-by-side comparisons in a setting fit for production.
LENSai’s HYFT technology biological fingerprinting method encodes conserved sequence, structure, and function into a single index. The inability of AI to naturally comprehend biological systems was a fundamental problem that HYFTs were designed to address. HYFTs enable AI models to reason via biology rather than merely calculate it by integrating biological reasoning into the computational fabric.
AMD’s now concentrate on protein embeddings, which serve as the basis for comprehending binding interactions, mutation effects, and molecular function. Everything from structural modelling to therapeutic target prioritisation is powered by these embeddings. It demonstrate the performance of AMD GPUs in a field where memory capacity and biological accuracy are crucial by comparing several versions of the ESM-2 model and investigating the use of “anchored embeddings” in LENSai and the HYFT technology.
By incorporating functional and evolutionary information, Protein Language Models (pLLMs) greatly improve biological data processing by decoding amino acid sequences into manageable vectors.
They enable researchers to pose enquiries like: To what extent does this sequence resemble known druggable targets? Which structure is most likely to be adopted by this unidentified protein? What possible effects might a mutation have on binding or function? By compressing this data into formats that machine learning models can understand, embeddings allow for advances in immunogenicity screening, antibody identification, and multi-omics interpretation.
ESM-2 Protein Language Model Benchmarks
AMD’s throughput and scaling advantages were investigated using ESM-2 benchmarks with different model sizes:
Model Size | NVIDIA H100 (seq/sec) | AMD InstinctMI300X (seq/sec) | Cost Reduction |
Small (35M) | 2482.41 | 3413.15 | ~44% |
Medium (650M) | 368.04 | 637.94 | ~63% |
Large (3B) | 111.19 | 178.76 | ~55% |
Larger batches were effectively handled by AMD GPUs, resulting in smoother throughput and notable cost reductions.
Combined Drug Discovery Using HYFT and LENSai
LENSai uses “HYFT anchored embeddings,” which reduce noise and improve the clarity of biological signals by selecting embedding residues within conserved HYFT patterns.
HYFTs are sub-sequence units with biological significance that represent conserved motifs associated with structure or function. LENSai reduces superfluous noise and concentrates exclusively on the sequence’s most functionally informative elements by attaching embedding to HYFTs.
Support for HYFT-based embeddings:
- Precise estimation of the effects of structural mutations.
- Finding the preserved motifs.
- Effective semantic search across libraries of potential treatments.
Technical Execution: Smooth Switch to AMD Graphics Processors
To ensure minimal disturbance, protein embeddings on AMD GPUs are implemented with a single-line Dockerfile update:
ROCM6.3.1_ubuntu22.04_py3.10_pytorch FROM rocm/pytorch
No significant changes to the code are needed.
The link between unprocessed sequences and functional interpretation is protein embeddings. The testing shows that AMD MI300X has the memory headroom and performance required for the most sophisticated protein models available today.
AMD push the limits of AI-driven design itself in the last piece of this series by comparing RFdiffusion, a generative model that can imagine and create completely new proteins from scratch.
AMD Instinct MI300X Price
Category | Details | Price |
---|---|---|
Retail Estimate | General estimated market price | $10,000 – $15,000 |
Bulk Buyer (Microsoft) | Large volume procurement | ~$10,000 per unit |
Vendor System Price | MI300X platform system | Starting at $25,999 |
8-GPU MI300X full system | $230,343.50 | |
Cloud / Rental (Hourly) | On-demand usage | $2.49/hour |
On-demand | $3.00/hour | |
Month-to-month | $2.75/hour | |
6-month commitment | $2.50/hour | |
1-year commitment | $2.00/hour | |
Cloud / Rental (Monthly) | Monthly rental (no contract) | $18,000/month |
6-month contract | $16,000/month | |
12-month contract | $14,000/month | |
Competitor (for context) | Market price range | $25,000 – $40,000+ |
Conclusion
This article concludes that AMD’s MI300X GPUs change AI-driven biological research. Researchers can handle complex biological data more efficiently using the MI300X’s large memory capacity and high bandwidth, thereby accelerating discoveries in genomics and drug development. These powerful GPUs enable the development of more advanced AI models and analytics in computational biology, a significant advancement.