Industry Experience

Regeneron Pharmaceuticals - PhD Research Intern

Duration: June 2024 - Aug 2024 | Location: Tarrytown, NY

Analyzed, modeled, and optimized Regeneron's drug development pipeline and resource requirements for strategic planning.

Key achievements:

  • Implemented multistate survival models to predict attrition rates and project durations for drug development in R
  • Built Bayesian predictive model with adjustable industry vs. internal data weights for pipeline optimization
  • Increased efficiency in predicting required resources for new pipeline products
  • Created and connected data sources for efficient processing of model inputs
  • Presented optimization strategies to VP-level finance and planning leadership to inform strategic planning of pipeline expansion

Methods: Multistate survival models, Bayesian methods, nonparametric causal estimation, large-scale data processing

R Survival Analysis Bayesian Methods Pipeline Optimization Pharmaceutical Industry

Research Projects

Nonparametric Assessment of Racial Disparities in Jury Selection

Duration: Aug 2023 - Dec 2024 | Status: Submission in progress

Advanced previous findings on racial disparities in prosecutorial peremptory strikes in Mississippi's Fifth District Court using flexible, nonparametric statistical methods.

Key contributions:

  • Applied doubly robust causal inference methods to analyze 2,300 jurors across 89 trials with 120 covariates
  • Identified 37 percentage point difference (odds ratio: 6.91) in strike rates between Black and white jurors
  • Implemented heterogeneity analysis using variance-based variable importance measures to identify factors associated with disparities
  • Conducted extensive sensitivity analysis for robustness to unmeasured confounders and partial missingness
  • Found that 44% of the sample would need unmeasured covariates to invalidate results

Methods: Doubly robust estimation, DR-learner, nonparametric statistics, heterogeneity analysis, sensitivity analysis, variable importance measures

R Causal Inference Doubly Robust Methods Sensitivity Analysis Criminal Justice

Gender Disparities in Social Media (Reddit Analysis)

Duration: Jan 2022 - May 2024 | Status: Paper in progress

Established a comprehensive three-stage framework to assess potential gender disparities in online engagement on r/relationships, controlling for confounders like writing style and topical distributions.

Research approach:

  • Analyzed 97,000 Reddit posts using structural topic modeling and sentiment analysis
  • Implemented propensity score matching and cardinality matching to ensure comparable treatment and control groups
  • Applied nonparametric causal estimation methods to quantify gender disparities in engagement outcomes
  • Controlled for writing style, readability, and topic distribution as potential confounders

Methods: Structural topic modeling, sentiment analysis, readability analysis, propensity score matching, cardinality matching, text preprocessing, nonparametric causal estimation

Poster presentation at American Causal Inference Conference (ACIC) 2023, Austin, TX

R Text Analysis NLP Causal Inference Matching Methods Gender Disparities

Allegheny Housing Assessment: Effectiveness & Fairness Evaluation

Duration: Feb 2025 - current | Status: Ongoing

Comprehensive evaluation of the Allegheny Housing Assessment (AHA) algorithm examining both effectiveness and fairness implications across racial groups.

Analysis components:

  • Analyzed 25,000 records from 2018-2023 covering homeless individuals in Allegheny County
  • Evaluated three-LASSO prediction model for adverse outcomes including mental health crises, ER visits, and jail bookings
  • Assessed disparate impact across racial groups and AHA score categories
  • Compared algorithmic tool (AHA) with traditional VI-SPDAT survey in real-world implementation

Keywords: Causal inference, program evaluation, algorithmic fairness, racial disparities, heterogeneous treatment effects

R Causal Inference Algorithmic Fairness Program Evaluation LASSO

Teaching & Data Analysis Examples

Statistical Computing Materials

Collection of data analysis examples and practice problems developed for graduate-level statistics courses at CMU. These materials demonstrate practical applications of statistical methods to real-world datasets.

Featured analyses:

Boston Housing Data Analysis

Comprehensive exploratory data analysis and regression modeling examining factors affecting housing prices in Boston. Demonstrates data cleaning, visualization, model building, and interpretation.

Social Capital & Political Networks

Analysis of social capital data exploring connections between community networks and political outcomes. Includes network analysis techniques and spatial statistics.

R Data Visualization Regression Analysis Statistical Inference

Publications & Presentations

Published Research

Visualizing Formative Feedback in Statistics Writing

Authors: Laudenbach, M., Brown, D. W., Guo, Z., Ishizaki, S., Reinhart, A., & Weinberg, G.

Published: Assessing Writing, Volume 60, April 2024

Interdisciplinary research examining how visualization tools can improve feedback and motivation in statistics education. Contributed statistical analysis and methodology expertise for pre- and post-survey analysis.

Education Research Statistical Analysis Survey Methodology

Conference Presentations

Joint Statistical Meetings (JSM) 2025

Talk: "Nonparametric Assessment of Racial Disparities in Prosecutorial Peremptory Strikes"

Authors: Guo, Z., Kennedy, E. H., & Ben-Michael, E.

Location: Nashville, TN

American Causal Inference Conference (ACIC) 2023

Poster: "Assessing Gender Disparities in Textual Response on Reddit.com"

Authors: Guo, Z. & Branson, Z.

Location: Austin, TX

IEEE International Professional Communication Conference (ProComm) 2023

Panel: "Structuring Genre Performance for Future Data Scientists via an Interactionist Design Model"

Authors: Hutchison, A., Laudenbach, M., Xu, D., & Guo, Z.

Location: Ithaca, NY


Skills & Technologies

Programming Languages: R, Python (Pandas, NumPy, scikit-learn, PyTorch), SQL, MATLAB, Git, Tableau, LaTeX

Statistical Methods: Causal inference, doubly robust methods, double machine learning, survival analysis, text analysis, Bayesian methods, survey methodology, regression analysis, hypothesis testing, machine learning

Specialized Techniques: Propensity score matching, cardinality matching, structural topic modeling, sentiment analysis, sensitivity analysis, variable importance measures, multistate models, LASSO, heterogeneous treatment effects

Tools & Platforms: R/RStudio, Jupyter Notebook, Git Bash, MySQL, large-scale data processing

Domain Expertise: Algorithmic fairness, criminal justice statistics, text analysis/NLP, pharmaceutical pipeline optimization, program evaluation


Interested in collaborating or learning more about my work? Get in touch!