Gradient compression for distributed training of machine learning models

Apply

Project Description

Modern supervised machine learning models are trained using enormous amounts of data, and for this distributed computing systems are used.Thetraining data is distributed across the memory of the nodes of the system, and in each step of the training process one needs to aggregate updates computed by all nodes using local data. This aggregation step requires communication of a large tensor, which is the bottleneck limiting the efficiency of the training method.To mitigate this issue, various compression (e.g., sparsification/quantization/dithering) schemes were propose in the literature recently. However, many theoretical, system-level and practical questions remain to be open.In this project the intern will aim to advance the state of the art in some aspect of this field. As this is a fast moving field, details of the project will only be finalized together with the successful applicant.Background reading based on research on this topic done in mygroup: https://arxiv.org/abs/1905.11261https://arxiv.org/abs/1905.10988 https://arxiv.org/abs/1903.06701 https://arxiv.org/abs/1901.09437 https://arxiv.org/abs/1901.09269https://www.frontiersin.org/articles/10.3389/fams.2018.00062/abstract  https://arxiv.org/abs/1610.05492https://arxiv.org/abs/1610.02527​
Program - Computer Science
Division - Computer, Electrical and Mathematical Sciences and Engineering
Field of Study - ​computer science, mathematics, machine learning

About the
Researcher

Peter Richtarik

Professor, Computer Science<br/>

Peter Richtarik
Prof. Richtarik's research interests lie at the intersection of mathematics, computer science, machine learning, optimization, numerical linear algebra, high performance computing and applied probability. He is interested in developing zero, first, and second-order algorithms for convex and nonconvex optimization problems described by big data, with a particular focus on randomized, parallel and distributed methods. He is the co-inventor of federated learning, a Google platform for machine learning on mobile devices preserving privacy of users' data.

Desired Project Deliverables

​Ideally author or coauthor a research paper, and submit it to a premier conference in the field (e.g., ICML, AISTATS, NeurIPS, ICLR).​