Causal Discovery Beyond Conditional Independences

DSpace Repository


Dateien:

URI: http://hdl.handle.net/10900/74122
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-741220
http://dx.doi.org/10.15496/publikation-15528
Dokumentart: Dissertation
Date: 2017
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Schölkopf, Bernhard (Prof. Dr.)
Day of Oral Examination: 2015-10-29
DDC Classifikation: 004 - Data processing and computer science
Keywords: Maschinelles Lernen
Other Keywords:
causality
causal inference
causal discovery
kernel methods
inverse regression
semi-supervised learning
License: Publishing license including print on demand
Order a printed copy: Print-on-Demand
Show full item record

Abstract:

Knowledge about causal relationships is important because it enables the prediction of the effects of interventions that perturb the observed system. Specifically, predicting the results of interventions amounts to the ability of answering questions like the following: if one or more variables are forced into a particular state, how will the probability distribution of the other variables be affected? Causal relationships can be identified through randomized experiments. However, such experiments may often be unethical, too expensive or even impossible to perform. The development of methods to infer causal relationships from observational rather than experimental data constitutes therefore a fundamental research topic. In this thesis, we address the prob- lem of causal discovery, that is, recovering the underlying causal structure based on the joint probability distribution of the observed random variables. The causal graph cannot be determined by the observed joint distribution alone; additional causal assumptions, that link statistics to causality, are necessary. Under the Markov condition and the faithfulness assumption, conditional-independence-based methods estimate a set of Markov equiva- lent graphs. However, these methods cannot distinguish between two graphs belonging to the same Markov equivalence class. Alternative methods in- vestigate a different set of assumptions. A formal basis underlying these assumptions are functional models which model each variable as a function of its parents and some noise, with the noise variables assumed to be jointly independent. By restricting the function class, e.g., assuming additive noise, Markov equivalent graphs can become distinguishable. Variants of all afore- mentioned methods allow for the presence of confounders, which are unob- served common causes of two or more observed variables. In this thesis, we present complementary causal discovery methods employ- ing different kind of assumptions than the ones mentioned above. The first part of this work concerns causal discovery allowing for the presence of con- founders. We first propose a method that detects the existence and identifies a finite-range confounder of a set of observed dependent variables. It is based on a kernel method to identify finite mixtures of nonparametric product dis- tributions. Next, a property of a conditional distribution, called purity, is introduced which is used for excluding the presence of a low-range confounder of two observed variables that completely explains their dependence (we call low-range a variable whose range has “small” cardinality). We further study the problem of causal discovery in the two-variable case, but now assuming no confounders. To this end, we exploit the principle of inde- pendence of causal mechanisms that has been proposed in the literature. For the case of two variables, it states that, if X → Y (X causes Y ), then P (X ) and P(Y |X) do not contain information about each other. Instead, P(Y ) and P(X|Y ) may contain information about each other. Consequently, esti- mating P(Y |X) from P(X) should not be possible, while estimating P(X|Y ) based on P(Y) may be possible. We employ this asymmetry to propose a causal discovery method which decides upon the causal direction by compar- ing the accuracy of the estimations of P (Y |X ) and P (X |Y ). Moreover, the principle of independence has implications for common ma- chine learning tasks such as semi-supervised learning, which are also dis- cussed in the current work. Finally, the goal of the last part of this dissertation is to present empirical results on the performance of estimation procedures for causal discovery using Additive Noise Models (ANMs) in the two-variable case. Experiments on synthetic and real data show that the algorithms proposed in this thesis often outperform state-of-the-art algorithms.

This item appears in the following Collection(s)