Scaling Graph Neural Networks to 1000s of GPUs


Project Description

Graph Neural Networks (GNNs) are a special type of deep neural networks that deal with graphs, instead of the more traditional images. GNNs are used in a variety of applications, from recommendation systems, to social networks, to computer security, to biological networks. The common characteristic is that graphs tend to be large and complex; therefore both training and inference require significant processing power. The goal of this project is to scale GNN training to thousands of GPUs. We will target our new supercomputer, Shaheen III, which is projected to include 2800 Nvidia Hopper super-chips than combine a CPU with a H100 GPU We will use the latest frameworks, such as Microsoft DeepSpeed, and we will target very large graphs.
Program - Computer Science
Division - Computer, Electrical and Mathematical Sciences and Engineering
Center Affiliation - Extreme Computing Research Center
Field of Study - Machine Learning

About the

Panagiotis Kalnis

Panagiotis Kalnis

Desired Project Deliverables

- Tensorflow or PyTorch - based implementation - Project report