Zehang Richard Li

Assistant Professor
Department of Statistics
University of California Santa Cruz

Email: lizehang (at) ucsc.edu
Google Scholar: Zehang Richard Li
Github: @richardli
Twitter: @z_richard_li

about     research     publications     people     teaching     software     vita     misc    


Bayesian Modeling of High-Dimensional Verbal Autopsy Data
I have been working on methods for the analysis of verbal autopsy (VA) data since 2012. VA is a widely adopted tool to estimate disease burdens in the developing world by interviewing caregivers of the deceased. The key first step to assign cause of death using VA is to understand and characterize the high-dimensional symptoms and covariates given each cause of death. Our team has been developing Bayesian latent variable models to discover meaningful symptom-cause relationship from messy and limited data. Some recent papers in this area:
  • Dimention-grouped tensor decomposition, arXiv, 2025
  • Covariate-dependent factor model, AOAS, 2025
  • Bayesian factor model, AOAS, 2020
  • InSilicoVA, JASA, 2016

  • Learning Under Distribution Shift
    Many important global and public health problems involve learning and prediction, and the heterogeneity in data distributions need to be taken into account when using these predictions for policy making. Recently, our team have developed several algorithms for cause-of-death assignment using VA data from multiple domains (countries, subnational regions, time periods, demographic groups, etc.) to achieve more robust mortality estimation. We have also developed federated learning framework to ensemble models separately trained on different domains. We are extending these models to broader applications too. Some recent papers in this area:
  • Bayesian federated learning, arXiv, 2025,
  • Hierarchical subpopulation adaptation, arXiv, 2025,
  • Tree-structured domain adaptation, Biostatistics, 2024
  • Multi-source domain adaptation, AOAS, 2024,

  • Small Area Estimation in Space and Time

    Small area estimation (SAE) refers to the process of producing estimates of quantities of interest, such as prevalence of diseases, for specific geographic areas, even when data are sparse or unavailable. We have been developing SAE methods using survey data in a variety of different settings for binary, continuous, and composite indicators. Some recent papers in this area:

  • Review of SAE methods in the LMIC context, Statistical Science, 2025,
  • Sparse random effect models, JRSS-A, 2025,
  • Spatial SAE for child mortality, DHS report 2021
  • Space-time SAE for child mortality, PlosOne 2019; UNICEF press release, 2023

  • Full Data Lifecycle of Statistical Analysis

    Statistical inference given available data is often insufficient to produce relevant scientific insights. Much of my recent work can be characterized as designing principled workflows that span the entire data lifecycle, rather than just the analysis phase. Topics that I am interested in includes (1) active and targeted design of experiments and data collection paradigm, (2) methods to elicit knowledge from human experts and integrate incomplete domain knowledge into analysis, (3) uncertainty propagation and quantification across the data pipeline for decision making, and (4) task-driven model comparison and selection. Some recent papers in this area:

  • Workflow of prevalence mapping in the LMIC context, arXiv, 2025,
  • The SAE4Health Shiny App, arXiv, 2025,
  • Active VA questionnaire design, CHIL, 2023

  • Monitoring and Understanding COVID-19 and disease outbreaks

    I have worked on methods on monitoring and quantifying the prevalence and transmission of the COVID-19 pandemic, and evaluating the impact of the pandemic. Some recent paper in this area:

  • Seroprevalence study in Ohio, AOE, 2022; PNAS, 2021
  • Modeling transmission in Connecticut, Scientific Reports, 2021,

  • Open Source Software for Global Health Practitioners

    I am a strong believer in open science and open-source software. Our groups has developed a collection of tools for analyzing verbal autopsy data and small area estimation that are widely adopted by practitioners worldwide. More details can be found in the software page. We work closely with international organizations and have conducted training workshops in LMICs throughout the years. Some papers focusing on software:

  • SUMMER, R Journal, 2025
  • openVA, R Journal, 2023