I am an assistant professor specialized in Computational and Spatial Statistics in the Department of Mathematics at The University of Houston. I worked as a postdoc with Dr. Matthias Katzfuss at Texas A&M University. I obtained my Ph.D. degree in Statistics at King Abdullah University of Science and Technology (KAUST), advised by Dr. Marc Genton. Prior to that, I obtained a B.S. in Mathematics from University of Science and Technology of China.
My research focuses on scalable Gaussian Process (GP) regression, including truncated GP, latent GP, multivariate GP, and high-dimensional GP. During my postdoc, I worked on scalable Gaussian process regression and variable selection, transport maps, and variational Bayes, most of which are based on the Vecchia approximation of Gaussian processes. During my PhD, I studied scalable evaluations of multivariate normal probabilities, mainly exploiting low-rank matrices and efficient quasi-Monte Carlo sampling rules. My research is mostly related to applications in spatial statistics, climate science, and argricultural science.
Bachelor in Applied Mathematics, 2014
University of Science and Technology of China
Master in Finance, 2016
Shanghai Jiaotong University
PhD in Statistics, 2020
King Abdullah University of Science and Technology
Cao, Jian and Katzfuss, Matthias (2025). Linear-Cost Vecchia Approximation of Multivariate Normal Probabilities. in revision for Journal of the American Statistical Association.
Cao, Jian and Katzfuss, Matthias (2025). Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation. in revision for Journal of Computational and Graphical Statistics.
Cao, Jian and Zhang, Jingjie and Sun, Zhuoer and Katzfuss, Matthias (2024). Locally anisotropic nonstationary covariance functions on the sphere. Journal of Agricultural, Biological and Environmental Statistics, 29(2), 212--231.
Abdulah, Sameh and Li, Yuxiao and Cao, Jian and Ltaief, Hatem and Keyes, David E and Genton, Marc G and Sun, Ying (2023). Large-scale environmental data science with ExaGeoStatR. Environmetrics, 34(1), e2770.
Cao, Jian and Kang, Myeongjong and Jimenez, Felix and Sang, Huiyan and Schaefer, Florian Tobias and Katzfuss, Matthias (2023). Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization. International Conference on Machine Learning.
Cao, Jian and Durante, Daniele and Genton, Marc G (2022). Scalable computation of predictive probabilities in probit models with Gaussian process priors. Journal of Computational and Graphical Statistics, 31(3), 709--720.
Cao, Jian and Genton, Marc G and Keyes, David E and Turkiyyah, George M (2022). tlrmvnmvt: Computing high-dimensional multivariate normal and student-t probabilities with low-rank methods in r. Journal of Statistical Software, 101, 1--25.
Cao, Jian and Guinness, Joseph and Genton, Marc G and Katzfuss, Matthias (2022). Scalable Gaussian-process regression and variable selection using Vecchia approximations. Journal of machine learning research, 23(348), 1--30.
Cao, Jian and Genton, Marc G and Keyes, David E and Turkiyyah, George M (2021). Exploiting low-rank covariance structures for computing high-dimensional normal and Student-t probabilities. Statistics and Computing, 31, 1--16.
Cao, Jian and Genton, Marc G and Keyes, David E and Turkiyyah, George M (2021). Sum of Kronecker products representation and its Cholesky factorization for spatial covariance matrices from large grids. Computational Statistics \& Data Analysis, 157, 107165.
Huang, Jingfang and Cao, Jian and Fang, Fuhui and Genton, Marc G and Keyes, David E and Turkiyyah, George (2021). An O (N) algorithm for computing expectation of N-dimensional truncated multi-variate normal distribution I: fundamentals. Advances in Computational Mathematics, 47(5), 65.
Cao, Jian and Genton, Marc G and Keyes, David E and Turkiyyah, George M (2019). Hierarchical-block conditioning approximations for high-dimensional multivariate normal probabilities. Statistics and Computing, 29(3), 585--598.
tlrmvnmvt: Estimating multivariate normal (MVN) probabilities using tile-low-rank (TLR) matrix representation.
VeccTMVN: Estimating multivariate MVN probabilities and sampling from truncated MVN (TMVN) distributions using Vecchia approximation and exponential tilted importance sampling.
nntmvn: Using the sequential nearest neighbor (SNN) method to draw samples from the truncated multivariate normal (TMVN) distributions.