Summary - Review of Causal Discovery Methods Based on Graphical Models 🌐

> [!Information] > - **Authors/Presenters**: Clark Glymour, Kun Zhang, Peter Spirtes > - **Journal/Seminar Name**: Frontiers in Genetics > - **Date**: June 1, 2019 > - **Related Topics**: #FCI_Algorithm, #GES_Algorithm, #LiNGAM, #PC_Algorithm > - **Link**: [Review of Causal Discovery Methods Based on Graphical Models](https://www.frontiersin.org/articles/10.3389/fgene.2019.00524/full) ### One-line Summary This paper introduces two kinds of causal discovery methods: constraint-based methods (including score-based methods) and function-based methods. ### Introduction “A traditional way to discover causal relations is to use intervention or randomized experiments. But it is too expensive or even impossible in many cases. Therefore, causal discovery, revealing causal information by analyzing purely observational data, has drawn much attention.” This paper introduces two kinds of causal discovery methods: constraint-based methods (including score-based methods) and function-based methods. In addition, this paper includes some issues in the practical application of causal discovery. ### Traditional Constraint-based (+ score-based) methods Typical **constraint-based** (or **score-based**) **methods** include **PC**, **FCI**, and **GES**. All of them are under the Causal Markov and Faithfulness assumptions. PC and GES additionally assume that there are no (hidden) confounders, but FCI does not. PC and FCI discover causal information by the independence between variables. In the case of GES, a properly defined score function is used instead. These three methods provide incomplete causal information because of equivalence classes problems. #### 1. The foundation of constraint-based methods Let's consider the simple case of an acyclic model. Suppose that there are no interactive causes and unobserved confounders. Then we can represent all causal relations in the acyclic model as **an NxN matrix G**, whose (i, j)th entry indicates whether variable j is the parent of variable i. This representation is meaningful even if there are some latent variables. The issue is how to estimate this matrix G. #### 2. The PC algorithm ##### Assumptions - No hidden confounder Assumption - Markov Causal Assumption - Faithfulness Assumption ##### Tools - Statistical consistency under i.i.d. sampling - Procedures for deciding (conditional) independence - V-structure has an important role. ##### Detail It works like this: ![[pc 1.png]] (A is the original true causal graph) 1. Form a complete undirected graph (B) 2. Eliminate edges between variables that are marginally independent (C) 3. Eliminate edges between variables that are conditionally independent (D) 4. For each triple of variables (A, B, C) such that A and B are adjacent, B and C are adjacent, and A and C are not adjacent, set the direction by using V-structure 5. ... use other propagation rules ... There are several other orientation propagation rules that are not illustrated here. ##### Pros - PC algorithm can be applied to various types of random variables: continuous, discrete, etc. - In large sample limits, PC is guaranteed to converge to the true Markov Equivalence Class. ##### Cons - The conditional independence test would be a difficult task if the form of dependence is unknown. - The result of the algorithm is usually non-unique and does not help in determining the causal direction between two variables. - PC algorithm cannot deal with unknown confounders. #### 3. The FCI algorithm ##### Assumptions - Markov Causal Assumption - Faithfulness Assumption **(Note that FCI algorithms tolerate unknown confounders.)** ##### Tools - Statistical consistency under i.i.d. sampling - Procedures for deciding (conditional) independence ##### Detail ![[fpc.png]] FCI orients edges by a procedure similar to PC but without assuming that every edge is directed one way or the other. The “o” mark means that it can be an arrowhead or an arrow tail. ‘X o→ Y’ indicates that the algorithm cannot tell whether there is a directed edge from X to Y or a hidden confounder, or both. The bidirected edge indicates that there is at least one hidden confounder between two variables. ‘Y←→Z’ means that there is one or more hidden confounders between Y and Z. ##### Pros - FCI algorithm can be applied to various types of random variables: continuous, discrete, etc. - FCI algorithm tolerates and sometimes discovers unknown confounders. ##### Cons - The conditional independence test would be a difficult task if the form of dependence is unknown. - The result of the algorithm is usually non-unique and does not help in determining the causal direction between two variables. #### 4. The GES : Greedy Equivalence Search ##### Assumptions - No hidden confounder Assumption - Markov Causal Assumption → **Not sure** - Faithfulness Assumption → **Not sure** ##### Tools - Statistical consistency under i.i.d. sampling - A properly defined score function - The score depends on the relative strength of (conditional) associations of variables. ##### Detail PC and FCI start with a complete undirected graph and eliminate the edges step by step. But GES starts with an empty graph and adds currently needed edges by a defined score function. When the score can no longer be improved, the algorithm eliminates unnecessary edges. In large sample limits, PC and GES converge on the same Markov Equivalence Class. On finite samples, however, the algorithms may give different results. ##### Pros - GES algorithm can be applied to various types of random variables: continuous, discrete, etc. - In large sample limits, GES is guaranteed to converge to the true Markov Equivalence Class. ##### Cons - GES algorithm cannot deal with unknown confounders. ### Typical Function-based methods **Function-based methods** are able to distinguish different DAGs in the same equivalence classes, given an appropriate function and assumption on the data distribution. A **FCM** represents the effect Y as a function of the direct causes X and some noise E, i.e., Y = f(X, E). The causal direction between X and Y is identifiable if a function f and the distribution of X and E are proper. #### 1. The foundation of function-based methods Constraint-based methods use conditional independence tests for causal discovery. However, the conditional independence test would be a difficult task if the form of dependence is unknown. Furthermore, the result of the algorithm is usually non-unique and does not help in determining the causal direction between two variables. In this case, we need additional information about the causal model. #### 2. LiNGAM (Non-Gaussian Model) ##### Assumptions - No hidden confounder Assumption - Independence between causes and noises - Causal asymmetry between effect and causes - \( Y = f(X,\ \varepsilon;\ \theta_1) \) (X is independent of error \( \varepsilon \), but Y is not) ##### Tools - The linear causal model - \( X = BX\ +\ E \) (X is independent of E & B: strictly lower-triangular matrix) - Independent Component Analysis (ICA) ##### Detail 1. As shown in the table below, if all variables are non-Gaussian, directionality can be observed. 2. In practice, if only one variable is Gaussian, the directionality can be identified. 3. Note 1: As the number of variables increases, the process of approximating B may not work correctly. 4. Note 2: According to Cramer's decomposition, only the sum of normally distributed variables can be normally distributed, which is a desirable property. 5. Note 3: As the number of variables increases, they become closer to a normal distribution. 1. This is based on the Central Limit Theorem, which has multiple versions, so the exact reference is unclear. ##### Pros - If the assumptions are met, the directionality can be determined. - The distributional assumptions of the data are reasonably acceptable. ##### Cons - There are strong constraints on the relationships between variables. If these are not met, the method is ineffective.