Industry Experience
Regeneron Pharmaceuticals - PhD Research Intern
Duration: June 2024 - Aug 2024 | Location: Tarrytown, NY
Analyzed, modeled, and optimized Regeneron's drug development pipeline and resource requirements for strategic planning.
Key achievements:
- Implemented multistate survival models to predict attrition rates and project durations for drug development in R
- Built Bayesian predictive model with adjustable industry vs. internal data weights for pipeline optimization
- Increased efficiency in predicting required resources for new pipeline products
- Created and connected data sources for efficient processing of model inputs
- Presented optimization strategies to VP-level finance and planning leadership to inform strategic planning of pipeline expansion
Methods: Multistate survival models, Bayesian methods, nonparametric causal estimation, large-scale data processing
Research Projects
Nonparametric Assessment of Racial Disparities in Jury Selection
Duration: Aug 2023 - Dec 2024 | Status: Submission in progress
Advanced previous findings on racial disparities in prosecutorial peremptory strikes in Mississippi's Fifth District Court using flexible, nonparametric statistical methods.
Key contributions:
- Applied doubly robust causal inference methods to analyze 2,300 jurors across 89 trials with 120 covariates
- Identified 37 percentage point difference (odds ratio: 6.91) in strike rates between Black and white jurors
- Implemented heterogeneity analysis using variance-based variable importance measures to identify factors associated with disparities
- Conducted extensive sensitivity analysis for robustness to unmeasured confounders and partial missingness
- Found that 44% of the sample would need unmeasured covariates to invalidate results
Methods: Doubly robust estimation, DR-learner, nonparametric statistics, heterogeneity analysis, sensitivity analysis, variable importance measures
Gender Disparities in Social Media (Reddit Analysis)
Duration: Jan 2022 - May 2024 | Status: Paper in progress
Established a comprehensive three-stage framework to assess potential gender disparities in online engagement on r/relationships, controlling for confounders like writing style and topical distributions.
Research approach:
- Analyzed 97,000 Reddit posts using structural topic modeling and sentiment analysis
- Implemented propensity score matching and cardinality matching to ensure comparable treatment and control groups
- Applied nonparametric causal estimation methods to quantify gender disparities in engagement outcomes
- Controlled for writing style, readability, and topic distribution as potential confounders
Methods: Structural topic modeling, sentiment analysis, readability analysis, propensity score matching, cardinality matching, text preprocessing, nonparametric causal estimation
Poster presentation at American Causal Inference Conference (ACIC) 2023, Austin, TX
Allegheny Housing Assessment: Effectiveness & Fairness Evaluation
Duration: Feb 2025 - current | Status: Ongoing
Comprehensive evaluation of the Allegheny Housing Assessment (AHA) algorithm examining both effectiveness and fairness implications across racial groups.
Analysis components:
- Analyzed 25,000 records from 2018-2023 covering homeless individuals in Allegheny County
- Evaluated three-LASSO prediction model for adverse outcomes including mental health crises, ER visits, and jail bookings
- Assessed disparate impact across racial groups and AHA score categories
- Compared algorithmic tool (AHA) with traditional VI-SPDAT survey in real-world implementation
Keywords: Causal inference, program evaluation, algorithmic fairness, racial disparities, heterogeneous treatment effects
Teaching & Data Analysis Examples
Statistical Computing Materials
Collection of data analysis examples and practice problems developed for graduate-level statistics courses at CMU. These materials demonstrate practical applications of statistical methods to real-world datasets.
Featured analyses:
Boston Housing Data Analysis
Comprehensive exploratory data analysis and regression modeling examining factors affecting housing prices in Boston. Demonstrates data cleaning, visualization, model building, and interpretation.
Social Capital & Political Networks
Analysis of social capital data exploring connections between community networks and political outcomes. Includes network analysis techniques and spatial statistics.
Publications & Presentations
Published Research
Visualizing Formative Feedback in Statistics Writing
Authors: Laudenbach, M., Brown, D. W., Guo, Z., Ishizaki, S., Reinhart, A., & Weinberg, G.
Published: Assessing Writing, Volume 60, April 2024
Interdisciplinary research examining how visualization tools can improve feedback and motivation in statistics education. Contributed statistical analysis and methodology expertise for pre- and post-survey analysis.
Conference Presentations
Joint Statistical Meetings (JSM) 2025
Talk: "Nonparametric Assessment of Racial Disparities in Prosecutorial Peremptory Strikes"
Authors: Guo, Z., Kennedy, E. H., & Ben-Michael, E.
Location: Nashville, TN
American Causal Inference Conference (ACIC) 2023
Poster: "Assessing Gender Disparities in Textual Response on Reddit.com"
Authors: Guo, Z. & Branson, Z.
Location: Austin, TX
IEEE International Professional Communication Conference (ProComm) 2023
Panel: "Structuring Genre Performance for Future Data Scientists via an Interactionist Design Model"
Authors: Hutchison, A., Laudenbach, M., Xu, D., & Guo, Z.
Location: Ithaca, NY
Skills & Technologies
Programming Languages: R, Python (Pandas, NumPy, scikit-learn, PyTorch), SQL, MATLAB, Git, Tableau, LaTeX
Statistical Methods: Causal inference, doubly robust methods, double machine learning, survival analysis, text analysis, Bayesian methods, survey methodology, regression analysis, hypothesis testing, machine learning
Specialized Techniques: Propensity score matching, cardinality matching, structural topic modeling, sentiment analysis, sensitivity analysis, variable importance measures, multistate models, LASSO, heterogeneous treatment effects
Tools & Platforms: R/RStudio, Jupyter Notebook, Git Bash, MySQL, large-scale data processing
Domain Expertise: Algorithmic fairness, criminal justice statistics, text analysis/NLP, pharmaceutical pipeline optimization, program evaluation
Interested in collaborating or learning more about my work? Get in touch!