About:
As part of an academic project, I led the development of a Reddit engagement analysis pipeline focused on understanding what drives user interaction. Using Python’s PRAW API, I collected and cleaned a dataset of over 40,000 Reddit comments. I engineered natural language features using spaCy for linguistic parsing and VADER for sentiment and profanity scoring. To analyze the relationship between these features and engagement metrics like upvotes, I implemented and compared multiple statistical models, including log-linear regression, Gamma regression, and Generalized Additive Models (GAMs) in R. This project provided me with hands-on experience in natural language processing, statistical modeling, and data-driven storytelling.