Learning Mixed Multinomial Logit Models
Author(s)
Hu, Yiqun![Thumbnail](/bitstream/handle/1721.1/150428/Hu_huyiqun_PhD_CEE_2022.pdf.jpg?sequence=3&isAllowed=y)
DownloadThesis PDF (1.341Mb)
Advisor
Simchi-Levi, David
Terms of use
Metadata
Show full item recordAbstract
Multinomial logit (MNL) model is widely used to predict the probabilities of different outcomes. However, standard MNL model suffers from several issues, including but not limited to heterogeneous population, the restricted independence of irrelevant alternative (IIA) assumption, insufficient model capacity, etc. To alleviate these issues, mixed multinomial logit (MMNL) models were introduced. MMNL models are highly flexible. McFadden and Train [2000] showed that it can approximate any random utility based discrete choice models to arbitrary degree of accuracy under appropriate assumptions. In addition, it removes other limitations of standard MNL models, including lifting the IIA assumption, allowing correlation in unobserved utility factors overtime, and most importantly, reducing the chance of model misspecification when modeling real world applications where the data composition is often found to be heterogeneous.
Despite its importance and versatility, the study on the learning theory of MMNL is limited and learning MMNL models remains an open research topic. In this thesis, we will tackle this learning problem from two different perspectives. First, inspired by the recent work in Gaussian Mixture Models (GMM), we aim to explore the polynomial learnability of MMNL models from a theoretical point of view. Next, we present an algorithm that is designed to be more applicable and utilizes the rich source of data available in the modern digitalization era, yet still yielding ideal statistical properties of the estimators.
Chapter 2 studies the polynomial learnability of MMNL models with a general K number of mixtures. This work aims to extend the current results that only apply to 2-MNL models. We analyze the existence of ϵ-close estimates using tools from abstract algebra and will show that there exists an algorithm that can learn a general K-MNL models with probability at least 1−δ, if identifiable, using polynomial number of data samples and polynomial number of operations (in 1 ϵ and 1 δ ), under some reasonable assumptions.
In Chapter 3, motivated by the Frank-Wolfe (FW) algorithm, we propose a framework that learns both mixture weights and component-specific logit parameters with provable convergence guarantees for arbitrary number of mixtures. Our algorithm utilizes historical choice data to generate a set of candidate choice probability vectors, each being ϵ-close to the ground truth with high probability. The convex hull of this set forms a shrunken feasible region with desired properties to the linear subproblems in FW, which subsequently enables independent parameter estimation within each mixture and in turn, leads to convergence of the mixture weights. This framework also resolves the issue of unboundedness in estimated parameters present in the original FW approach. Complexity analysis shows that only a polynomial number of samples is required for each candidate in the target population.
Extensive numerical experiments are conducted in Chapter 4, including both simulation and case studies on the well-known Nielsen Consumer Panel Data, to demonstrate the effectiveness of recovering the true model parameters and/or learning realistic component-level parameters, as compared to the original FW framework.
Date issued
2022-09Department
Massachusetts Institute of Technology. Department of Civil and Environmental EngineeringPublisher
Massachusetts Institute of Technology