Abstract:
Online decision-making problems, in which an agent must select the optimal action from multiple alternatives at each step, are frequently encountered in various real-world scenarios. Some of the main applications of online decision making frameworks are in healthcare, finance, dynamic pricing, recommender systems, anomaly detection, and telecommunication. Multi-armed bandits (MAB) provides rich mathematical formulations for modelling online decision making problems. In MAB settings, the only feedback provided to the learning algorithm (agent) is a possibly noisy reward signal of the chosen decision. This property of MAB severely restricts the agent and slows down the learning process as the size of the action space becomes (exponentially) large. However, in reality, data is generally naturally structured (interconnected). Hence, it is critical to be able to learn such structures on the fly, and also to learn from the properties these structures create in the data with the goal to accelerate the learning and improve the performance of online decision making algorithms. This is the key idea that forms the foundation of the research in this thesis. In order to model structures and interrelations of the data, graphs have been used extensively within MAB problems. Consequently, researchers have managed to introduce effective frameworks for exploiting the structures to accelerate the learning of MAB agents and cope with the dimensions of MAB problems in big environments. However, first of all, there are still some real-world problems that require novel structured MAB frameworks to be solved. Second, state-of-the-art structured MAB frameworks mostly ignore the underlying structure in choosing their strategies in piecewise-stationary environments. Moreover, the current literature of structured MAB frameworks ignore the natural behavior of structured environments in spreading the negative effects of adversarial corruptions within social networks and consequently fail to perform in a robust manner. In this regard, with the ultimate goal of addressing these issues, we engage in the study of structured MAB settings. In the first project, we develop a novel combinatorial semi-bandit framework with causally related rewards, where we model the causal relations by a directed graph in a Structural Equation Model (SEM). We deploy our framework to demonstrate a novel application of the MAB framework: analyzing the spread of Covid-19 within a country and identifying the optimal regions for intervention to stop the epidemic. In the second project, we study a novel structured MAB in a piecewise-stationary environment such that the distribution of arms’ instantaneous rewards as well as the relationships between the arms’ rewards are subject to changes across time. Within the same project, we study the benefits of adapting the piecewise-stationary MAB strategies according to the underlying structure of the data. For the third project, along the direction of multi-task bandit settings where there is a graph structure linking the bandit tasks, we introduce a novel framework that is more data efficient, in some large-scale real-world scenarios, in comparison to the state-of-the-art. In the final project, we study the online influence maximization problem in social networks in the presence of some corrupted nodes whose damaging effects diffuse throughout the network structure and we introduce an algorithm that is robust against the diffusion of malicious effects of corruptions within the network. In this document, we substantiate the research with in-depth literature reviews and analyses, the development of various novel algorithms, rigorous theoretical justifications, and supporting experimental results.