Abstract:
The significant growth of the internet over the past thirty years reduced the cost of access to information for anyone who has unfettered access to the internet. During this period, internet users were also empowered to create new content that could instantly reach millions of people via social media platforms like Facebook and Twitter. This transformation broke down the traditional ways mass-consumed content was distributed and ultimately ushered in the era of citizen journalism and freeform content. The unrestricted ability to create and distribute information was considered a significant triumph of freedom and liberty. However, the new modes of information exchange posed new challenges for modern societies, namely trust, integrity, and the spread of misinformation.
Before the emergence of the Internet, newsrooms and editorial procedures required minimum standards for published information; today, such requirements are not necessary when posting content on social media platforms. This change led to the proliferation of information that attracts attention but lacks integrity and reliability. There are currently two broad approaches to solving the problem of information integrity on the internet; first, the revival of trusted and reliable sources of information; second, creating new mechanisms for increasing the quality of information published and spread on major social media platforms. These approaches are still in their infancy, each having its pros and cons.
In this thesis, we explore the latter and develop modern machine learning methods that can help identify (un)reliable information and their sources, efficiently prioritize content requiring human fact-checking at scale, and ultimately minimize their harm to the end-users by improving the quality of the news-feeds that users access. This thesis leverages the collaborative dynamics of content creation on Wikipedia to extract a grounded measure of information and source reliability. We also develop a method capable of modifying ranking algorithms used widely on social media platforms such as Facebook and Twitter to minimize the long-term harm posed by the spread of misinformation.