Adaptive Online Decision Making Based on Interconnected Data

Nourani Koliji, Behzad

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
7 Mathematisch-Naturwissenschaftliche Fakultät
→
Dokumentanzeige

« zurück

Adaptive Online Decision Making Based on Interconnected Data

Nourani Koliji, Behzad

Dateien:	Behzad_Nourani_Koliji_Thesis.pdf 9.52 MB PDF Beschreibung: Main article

Zitierfähiger Link (URI):	http://hdl.handle.net/10900/178587 http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1785870 http://dx.doi.org/10.15496/publikation-119911
Dokumentart:	Dissertation
Erscheinungsdatum:	2026-04-24
Sprache:	Englisch
Fakultät:	7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich:	Informatik
Gutachter:	Hennig, Philipp (Prof. Dr.)
Tag der mündl. Prüfung:	2025-11-24
DDC-Klassifikation:	004 - Informatik
Schlagworte:	Künstliche Intelligenz , Maschinelles Lernen , Operante Konditionierung
Freie Schlagwörter:	Künstliche Intelligenz Maschinelles Lernen Operante Konditionierung Reinforcement Learning Machine Learning Artificial Intelligence
Lizenz:	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Online decision-making problems, in which an agent must select the optimal action from multiple alternatives at each step, are frequently encountered in various real-world scenarios. Some of the main applications of online decision making frameworks are in healthcare, finance, dynamic pricing, recommender systems, anomaly detection, and telecommunication. Multi-armed bandits (MAB) provides rich mathematical formulations for modelling online decision making problems. In MAB settings, the only feedback provided to the learning algorithm (agent) is a possibly noisy reward signal of the chosen decision. This property of MAB severely restricts the agent and slows down the learning process as the size of the action space becomes (exponentially) large. However, in reality, data is generally naturally structured (interconnected). Hence, it is critical to be able to learn such structures on the fly, and also to learn from the properties these structures create in the data with the goal to accelerate the learning and improve the performance of online decision making algorithms. This is the key idea that forms the foundation of the research in this thesis. In order to model structures and interrelations of the data, graphs have been used extensively within MAB problems. Consequently, researchers have managed to introduce effective frameworks for exploiting the structures to accelerate the learning of MAB agents and cope with the dimensions of MAB problems in big environments. However, first of all, there are still some real-world problems that require novel structured MAB frameworks to be solved. Second, state-of-the-art structured MAB frameworks mostly ignore the underlying structure in choosing their strategies in piecewise-stationary environments. Moreover, the current literature of structured MAB frameworks ignore the natural behavior of structured environments in spreading the negative effects of adversarial corruptions within social networks and consequently fail to perform in a robust manner. In this regard, with the ultimate goal of addressing these issues, we engage in the study of structured MAB settings. In the first project, we develop a novel combinatorial semi-bandit framework with causally related rewards, where we model the causal relations by a directed graph in a Structural Equation Model (SEM). We deploy our framework to demonstrate a novel application of the MAB framework: analyzing the spread of Covid-19 within a country and identifying the optimal regions for intervention to stop the epidemic. In the second project, we study a novel structured MAB in a piecewise-stationary environment such that the distribution of arms’ instantaneous rewards as well as the relationships between the arms’ rewards are subject to changes across time. Within the same project, we study the benefits of adapting the piecewise-stationary MAB strategies according to the underlying structure of the data. For the third project, along the direction of multi-task bandit settings where there is a graph structure linking the bandit tasks, we introduce a novel framework that is more data efficient, in some large-scale real-world scenarios, in comparison to the state-of-the-art. In the final project, we study the online influence maximization problem in social networks in the presence of some corrupted nodes whose damaging effects diffuse throughout the network structure and we introduce an algorithm that is robust against the diffusion of malicious effects of corruptions within the network. In this document, we substantiate the research with in-depth literature reviews and analyses, the development of various novel algorithms, rigorous theoretical justifications, and supporting experimental results.

Das Dokument erscheint in:

7 Mathematisch-Naturwissenschaftliche Fakultät [5274]

Veröffentlichen

Stöbern

Gesamter Bestand
Diese Sammlung

Mein Benutzerkonto

Einloggen

Adaptive Online Decision Making Based on Interconnected Data

DSpace Repositorium (Manakin basiert)

Adaptive Online Decision Making Based on Interconnected Data

Abstract:

Das Dokument erscheint in:

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto