Making ML-based Networked Systems more Trustworthy

Making ML-based Networked Systems more Trustworthy

Internship Description

Machine learning (ML) solutions to challenging networked systems problems are a promising avenue but the lack of interpretability and behavioral uncertainty affect trust and hinder adoption. The goal of this work is to develop new solutions to facilitate the training of robust ML-based decision-making agents for systems. With their recent advancements, today’s ML solutions provide remarkable results in many fields. ML methods have been recently applied to several networked systems problems including routing, congestion control, resource allocation, flow scheduling and video rate adaptation. In particular, Reinforcement Learning (RL) has been a main approach due to its ability to directly learn from experience and simulation without relying on labeled datasets, and to optimize any desirable metric.


Yet, there is a general fear that ML systems are black boxes: closed systems that receive an input, produce an output, and offer no clue why. This creates uncertainty about why these systems work, whether they will continue to work in conditions that are different from those seen during training or whether they will fall off performance cliffs. Given their statistical nature, it is difficult to predict how ML-based agents will behave when faced with previously unseen inputs. Because of this uncertainty, it is not possible to let these agents control critical, large scale systems without safety guarantees, despite the evidence that ML solutions can yield better efficiency and resource utilization than classical approaches. We seek to address this problem.​


The goal of this internship is to verify and improve existing ML agents. The student will be expected to learn about existing solutions, as well as the challenges and requirements to applying ML techniques in their settings. With guidance of other team members, the student will then find new solutions for either verifying the safety of ML agents or improve their behavior in order to reduce risks related to the safety and stability of the agent’s decisions. Possible directions include (1) analyzing the agent behavior based on symbolic or concolic execution, (2) exploring the evolution of the agent’s behavior during training, (3) considering new ways of modifying a trained agent to meet stricter requirements.


Candidates should be motivated to work on research-oriented problems with a team and develop new solutions in a budding field. They should have a strong background in computing and software engineering, in particular with regards to machine learning, networking, and distributed systems. Ideally, they should have experience in building machine learning solutions and working with related tools, as well as knowledge of Python​.​

Faculty Name

Marco Canini

Field of Study

Computer Science​