Networked interactions, graphical models and econometrics perspectives in data analysis
Author(s)
Seby, Jean-Baptiste.
Download1227221368-MIT.pdf (5.362Mb)
Other Contributors
Massachusetts Institute of Technology. Institute for Data, Systems, and Society.
Technology and Policy Program.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Chintan Vaishnav and John Tsitsiklis.
Terms of use
Metadata
Show full item recordAbstract
This thesis is composed of two independent parts. In Part I, we study higher-order interactions in both graphical models and networks, i.e., interactions between more than two nodes. In the graphical model setting, we do not assume that interactions are known and our goal is to recover the structure of the graph. Our main contribution is an algebraic criterion that enables us to determine whether a set of observed variables have a single cause or multiple causes. We also prove that this criterion holds in the presence of confounders, i.e., when the causes are hidden. In the network setting, we assume that the structure of the graph is known. Our objective is then to identify what kind of information about data can be learned from the analysis of higher-order interactions. More precisely, using the generalization of the normalized Laplacian and random walks on graphs to simplicial complexes, we study a simplicial notion of PageRank centrality as defined in [Schaub et al., 2018]. Conducting numerical experiments on both synthetic and true data, we find evidence that the so-called edge PageRank is related to the concepts of local and global bridges in networks. In Part II, we analyze the determinants of yield gaps in Semi-Arid Tropics (SAT) regions in India. Analyzing a panel data of households within 30 villages over 6 years in India, we apply a fixed effects estimation method and a quantile regression with fixed effects to identify the most significant explanatory variables of yield gaps for 5 different crops. Using a correlated random effects estimator for unbalanced panel data, we can also estimate coefficients for time-invariant variables. We find that yield gaps determinants are crop specific. In addition to that, soil characteristics show the most significant effects on output rate. When statistically significant, correlations with the type of soil are negative. This result might suggest that the choice of cropping pattern is not necessarily appropriate. Finally, results suggest that unobservable heterogeneity of households is critical in explaining farm productivity. Time-invariant variables hardly explain this heterogeneity for which more research is needed.
Description
Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society, Technology and Policy Program, September, 2020 Thesis: S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 231-243).
Date issued
2020Department
Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Technology and Policy Program; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Engineering Systems DivisionPublisher
Massachusetts Institute of Technology
Keywords
Institute for Data, Systems, and Society., Technology and Policy Program., Electrical Engineering and Computer Science.