Abstract:
Latent variable modeling is a major field in statistical inference. Accurate estimation of latent variables requires a principled statistical approach. In this thesis, we develop a class of latent variable models using derivative Gaussian processes. Our methods include several extensions of Gaussian processes (GPs) namely latent variable GPs, multi-output GPs as well as derivative GPs under a single framework. We achieve this through a modified derivative covariance function that can handle multi-dimensional output data along with their derivatives to estimate latent variable inputs. Moreover, our models account for complexities in the underlying data such as scale differences between outputs and their derivatives, varying information across multiple outputs as well as interactions between outputs. Using Bayesian inference, our models provide uncertainty estimates for each latent variable sample. Through diverse simulation scenarios, we demonstrate that latent variable estimation accuracy can be significantly increased by including derivative information through our exact GPs. Additionally, we find that including derivatives without our proposed covariance structure yields misleading results, thus emphasizing the importance of our methods. Exact GPs, however, have limited applicability for larger datasets due to their steep computational complexity. The scalability issues are further aggravated upon combining all the aforementioned GP extensions. To overcome this, we extend the recently developed Hilbert space approximations initially for multi-output and latent input settings. Under the Hilbert space Gaussian process (HSGP) framework, the covariance function is approximated with a reduced-rank representation through its spectral decomposition computed from a finite set of basis functions. By exploiting the spectral representation of a stationary covariance function, HSGPs scale linearly with both sample size and the number of basis functions. Through various experiments, we show that HSGPs provide better posterior uncertainty calibration and estimation accuracy for latent variable samples. Compared to other GP approximations, our methods find a nice balance between trustworthy inference and speed when it comes to latent variable estimation. In case of the derivative GPs, their covariance structure is jointly defined. As a general case, we extend the Hilbert space methods for \textit{composite GPs}, where we model a pair of data source as different outputs and obtain a spectral approximation of the composite covariance functions. As a special case of composite GPs, we then develop scalable derivative GPs by modeling the outputs along with their derivatives. Specifically, we derive and analyze the spectral decomposition of our modified derivative covariance functions and further study their properties theoretically. Through our extended Hilbert space approximations, the class of latent variable derivative GPs can be widely applicable in large sample data scenarios. As a concrete application, we showcase our methods on the estimation of the unobserved cellular ordering in the field of single-cell biology.