Double-Boosting GMM for High Dimensional IV Regression Models - Job Market Paper
Abstract:The standard solutions to solve the endogeneity in a regression model are the two-stage least squares (2SLS) and the generalized method of moments (GMM). However, both methods are inconsistent in a high dimensional IV regression model, especially when some of instruments are irrelevant and/or invalid. To have a consistent estimation, it is critical to select valid and relevant instruments. In particular, we consider the case when endogenous variables X are unknown nonlinear functions of observable instruments W, which can be approximated by some sieve functions Z=h(W) such as polynomials of W. The sieve approximation helps to capture the nonlinearity between endogenous variables X and instruments W. At the same time, it increases the dimension of instruments rapidly. In this paper, we introduce a new selection method, Double Boosting (DB), which consistently selects relevant and valid instruments simultaneously as the sample size n increases. We allow the instruments to be in high dimension, so the consistency of selection still holds even when dim(Z)≫n. Furthermore, we also show that DB will not select weakly relevant instruments (weak instruments) or weakly valid instruments (weakly exogenous instruments), with the extents of weakness being defined in the sense of local to zero asymptotics. Monte Carlo simulation demonstrates a comparison between Double Boosting GMM (DB-GMM) and other methods such as penalized GMM (PGMM, Cheng and Liao 2015) and the standard Boosting GMM (BGMM, Ng and Bai 2008). In the application of estimating the BLP-type (Berry, Levinson and Pakes 1995) automobile demand function, where price is endogenous and instruments are high dimensional functions of product characteristics, we find that the estimated price elasticity of demand by DB-GMM is more elastic.
Abstract:The standard solutions to solve the endogeneity in a regression model are the two-stage least squares (2SLS) and the generalized method of moments (GMM). However, both methods are inconsistent in a high dimensional IV regression model, especially when some of instruments are irrelevant and/or invalid. To have a consistent estimation, it is critical to select valid and relevant instruments. In particular, we consider the case when endogenous variables X are unknown nonlinear functions of observable instruments W, which can be approximated by some sieve functions Z=h(W) such as polynomials of W. The sieve approximation helps to capture the nonlinearity between endogenous variables X and instruments W. At the same time, it increases the dimension of instruments rapidly. In this paper, we introduce a new selection method, Double Boosting (DB), which consistently selects relevant and valid instruments simultaneously as the sample size n increases. We allow the instruments to be in high dimension, so the consistency of selection still holds even when dim(Z)≫n. Furthermore, we also show that DB will not select weakly relevant instruments (weak instruments) or weakly valid instruments (weakly exogenous instruments), with the extents of weakness being defined in the sense of local to zero asymptotics. Monte Carlo simulation demonstrates a comparison between Double Boosting GMM (DB-GMM) and other methods such as penalized GMM (PGMM, Cheng and Liao 2015) and the standard Boosting GMM (BGMM, Ng and Bai 2008). In the application of estimating the BLP-type (Berry, Levinson and Pakes 1995) automobile demand function, where price is endogenous and instruments are high dimensional functions of product characteristics, we find that the estimated price elasticity of demand by DB-GMM is more elastic.