Designing a generic web forms crawler to enable legal compliance analysis of authentication sections
Open access
Autor(in)
Datum
2022-01-17Typ
- Master Thesis
ETH Bibliographie
yes
Altmetrics
Abstract
While users deserve security and privacy when using web services, these properties are at odds with the financial interests of website owners both in terms of work required to keep websites secure and revenues generated by exploiting sensitive data resulting in a violation of the user’s privacy. Countries, therefore, introduced regulations to balance the inequity. Namely, European Union’s General Data Protection Regulation (GDPR) specifies that any data collection and processing can only be done with the informed and specific consent of the user, including sharing of the said data with 3rd parties. Automated and large-scale detection of violations and security flaws is difficult because of the non-standardized behavior of website authentication mechanisms.
We developed a web crawler for detecting and submitting mainly registration web forms. This crawler enables novel privacy and security research on a larger scale than was previously possible. The completely automated crawler can navigate the site to find the required form, fill the form, avoid bot detection mechanisms, submit the form, and validate the submission success. In 17 days, we crawled over 600,000 domains intending to create new user accounts. Our automated crawler detected a sign-up form on 22% of all the reachable websites with a 6.4% registration success rate. We have also received at least one email from 2.3% of all crawled pages. This significantly surpasses the prior version of this project and the best widely-used published tool. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000534764Publikationsstatus
publishedVerlag
ETH ZurichOrganisationseinheit
03634 - Basin, David / Basin, David
ETH Bibliographie
yes
Altmetrics