Forensics analysis of the malicious bot scrapers ecosystem


Project Description

Web scraping bots are now using so-called RESidential IP Proxy (RESIP) services to defeat state-of-the-art commercial bot countermeasures RESIP providers promise their customers to give them access to tens of millions of residential IP addresses, which belong to legitimate users. They dramatically complicate the task of the existing anti-bot solutions and give the upper hand to the malicious actors. We have developed a new technique to detect traffic coming through such proxy and, in collaboration with industrial partners, have gathered a very large datasets of such connections, and measures thereof. In this project, we want to analyse that dataset according to various view points and, in particular, we want to investigate whether it is possible to use a new multilateration algorithm that we have developed to geolocalize the malicious actors hidden behind the proxies. If successfull, this would immensely benefit the good actors trying to protect the scraped websites. This work will require strong analytical skills, rigorous mindsets and creativity. The intern will have to try to extract intelligence information from a large dataset. A desire to acquire hands on experience with big data analytics (most likely SQL based) as well with visualization techniques is a must. Python programming will most likely be required.
Program - Computer Science
Division - Computer, Electrical and Mathematical Sciences and Engineering
Center Affiliation - Resilient Computing and Cybersecurity Center
Field of Study - web security

About the

Marc Dacier

Marc Dacier

Desired Project Deliverables

a platform to systematically analyse large amount of data provided to the intern must be built. It will offer a visualisation of the intelligence extracted from the data by the intern. If successful, this could lead to a scientific paper to be written for a conference dealing with security visualisation techniques.