Boost Renewables: Using the Intel Gaudi AI accelerator to train weather models
Weather forecasting is a difficult task, and accuracy is essential for solar and wind energy. To guarantee more precise and effective renewable power dispatch to the grid, Amplify Renewables, a participant in the Intel Liftoff for AI Startups Catalyst Track, has improved its energy forecasting skills.
What improvements did they make to their forecasts, then?
By collaborating with the Intel Liftoff team and the Intel Tiber AI Cloud, Intel’s public cloud solution for AI startups and companies, large-scale weather models will be trained on Intel Gaudi 2 HPUs, improving the speed and accuracy of grid forecasting.
Training Big Models with Big Data
The way Intel forecast the weather is evolving due to machine learning. Terabytes of weather patterns from both public and private sources had to be processed in order for Amplify Renewables to train their own global weather model, which was the next step forward.
They collaborated with the Intel Liftoff team and the Intel Tiber AI Cloud to scale up their model training between December 2023 and January 2024. The workload comprised distributed data-parallel training across eight Gaudi 2 cards on a bare-metal system, high-volume data processing, and the use of 90GB+ VRAM per Intel Gaudi 2 HPU. With its 1TB of RAM and several fast NVMe SSDs, this system effectively managed big datasets using a filesystem-based storage service. That’s a lot of processing power!
How They Managed to Make It Work
The team experimented with three execution modes: Lazy, Eager, and Eager with torch.compile
. They used the Habana PyTorch framework and Huggingface’s ecosystem of libraries on the Intel Tiber AI Cloud to construct their models. Every mode had unique benefits:
- Lazy Mode: The graph compiler can optimise device execution through operator fusion, data layout management, and other improvements by accumulating operations into a graph and executing them in a deferred fashion.
- Eager Mode: Allows for simple debugging and quick model iteration by carrying out actions instantly, one at a time.
- Eager Mode with
torch.compile
: Originally included in PyTorch 2.0, this mode combines the advantages of instant execution of Eager Mode with the optimisation power of graph execution to enable portions of the model to be wrapped into a graph for better performance.
Eager Mode expanded with torch.compile
is anticipated to replace Lazy Mode in future Intel Gaudi software releases, providing similar performance without requiring graph rebuilding at each iteration. Through the Habana SynapseAI SDK, Intel Gaudi accelerators easily interact with well-known AI frameworks, including as PyTorch and TensorFlow. With with few code modifications, startups using Hugging Face models can benefit from Optimum Habana, which optimises model performance on Gaudi hardware.
What the Data Showed
The outcomes were self-evident. After training several models, Amplify Renewables discovered several significant benefits:
- Simple setup: Running PyTorch models on Habana HPUs required only minor adjustments.
- Flexible execution: Lazy and eager modes operated without a hitch during training, but all three modes performed well during inference.
- Scalability: Near-linear scalability for distributed training and linear performance for inference.
- High memory efficiency: Intel Gaudi 2 HPUs’ massive VRAM greatly improved their ability to do complex calculations.
See It in Action
Developments in Weather Prediction Using AI
The following graph shows the performance of the most advanced AI models for weather prediction in terms of Z300 RMSE skill in comparison to ECMWF HRES, a high-resolution weather forecast benchmark.
The ECMWF HRES baseline, which is a high-resolution numerical weather prediction model that is frequently used for worldwide forecasting, is represented by the orange line. Higher values indicate better predicting accuracy. The Y-axis (Z300 RMSE skill vs. ECMWF HRES) compares the relative performance of AI-based models to ECMWF HRES.
This graph illustrates how AI models have advanced beyond conventional weather forecast methods by benchmarking against ECMWF HRES.

Key Takeaways from the Graph
Development Over Time
- Since 2019, AI-based weather prediction models have advanced gradually, with a notable breakthrough occurring around 2022.
- While more recent models (FourCastNet, Keisler, and Pangu) show significant gains in forecast accuracy, older models (Dueben & Bauer, WeatherBench, and Weyn et al.) performed moderately.
New Developments in 2023–2024
- By outperforming earlier models, Pangu (2023) established a new standard for AI weather forecast accuracy.
- Since then, GraphCast, FengWu, SFNO, and FuXi have improved AI-based forecasting skills by increasing accuracy.
- Newer models including as NeuralGCM, HEAL-ViT, WindBorne, GenCast Aurora, and ArchesWeather are still improving prediction reliability and accuracy in 2024.
Comprehending the Timing Graphs for Training and Inference
The following graphs show the performance of training and inference on Intel Gaudi 2 HPUs in three different execution modes: lazy, eager, and eager with torch.compile. Both single-HPU and multi-HPU (8 HPU) systems were used in these tests.
Key Observations
Performance of Inference
Training times on 1 and 8 HPUs are contrasted in the first set of graphs. Even after accounting for the initial variance brought on by compilation overhead, Lazy mode consistently performs better than Eager mode, but inference with 1 HPU shows a discernible compilation delay. Although this additional compilation time may appear to be a drawback, the speedier performance at the end is frequently worth the cost. Lazy mode exhibits a more noticeable performance gain than Eager mode when scaling to 8 HPUs. Scaling is more effective and linear, with a small sublinearity.
Although there is little difference between the Lazy and Eager modes, training with one HPU exhibits large initial variance, probably because of graph compilation and initialisation costs. On the other hand, training with eight HPUs shows approximately linear scaling, and Lazy mode keeps performing better.
Why Does Training Go Faster in Lazy Mode?
Lazy execution reduces execution times by optimising performance prior to batch execution and compiling actions into a computational graph. Eager mode, on the other hand, carries out actions instantly, adding overhead to each process.
Why Is Performance Improved by Scaling?
Compared to a single HPU, using eight HPUs reduces per-batch execution time by enabling parallelisation across devices. In workloads involving inference, where latency is crucial, this effect is particularly apparent.
Why This Matters
Amplify Renewables can now verify their predictions against current public and private forecasts to this development. By increasing solar and wind power output estimations, they are improving grid forecasts, which is crucial for renewable energy.
The Next Step?
Amplify Renewables wants to grow now that their training system is operational. Testing a larger variety of models and attempting novel pre-training techniques are the next steps. A more robust renewable energy infrastructure, more models, and improved forecasts all point to a promising future!