I enjoy teaching both full-length courses and brief workshops on data science (data wrangling, visualization, scraping, text analysis) and statistics using Python and R. My research has also prepared me to teach substantive courses on algorithms and society, K-12 education and inequality, higher education and inequality, and law and society.
Instructor of record
QSS20/PBPL 40.01 Modern Statistical Computing (Dartmouth College)
Course description: Social scientists are investigating questions that have led to two changes in their computing workflow.
One change is the use of new forms of data: text data to study how police officers use different language when interacting with Black drivers than with White drivers; spatial data to study the geographic clustering of autism diagnoses in more affluent communities; cellphone mobility data to (try) to estimate COVID-19 mobility patterns.
The second change is the use of new methods to discern patterns in data. Imagine a relatively simple dataset where each individual is described by a limited number of characteristics: for instance, a student and his or her demographic attributes and high school end-of-year grades. Now imagine augmenting that dataset with the forms of data described above–we know the student’s address and can thus merge in spatial data on neighborhood characteristics; we have qualitative notes from the teacher’s end-of-year reports and can investigate how those qualitative impressions correlate with grades. These require you as the researcher to have the facility to quickly pick up new methods to find patterns in large-scale data, with the methods and tools developing at a rapid pace.
This course is meant to build upon your introductory programming course and to equip you with the computing literacy to conduct social science research in the age of “big data.” This has two core components. First is learning the background tools (e.g., Git/Github; LaTeX; working on the command line) to conduct transparent and reproducible research. Second is learning programming skills essential for social science in the big data era, with a focus on using Python for various applied tasks as well as R for tasks like data visualization and SQL for tasks like working with the relational databases that form the backbone of many real-world government and commercial datasets.
Course github repo: https://github.com/rebeccajohnson88/qss20_slides_activities
Select activities (Python notebooks):
- Probabilistic record linkage using public PPP loan data
- Regular expression pattern matching using school district names
- API using National Assessment for Educational Progress
- Twitter scraping using politicians’ profiles
- Data wrangling examining racial disparities in criminal justice sentencing
- Topic modeling using DOJ press releases
Princeton Sociology summer methods camp
Course description: The Methods Camp is designed to give you training in both math and computing. In math, you will receive training in three main areas: calculus, probability, and matrix algebra. In computing, you will receive training in three main areas: data wrangling, iteration, and visualization.
At the end of the Methods Camp, students will be able to:
- Start the semester excited and ready to learn new methods
- Explain in words and pictures what is a derivative, what is an integral, and how derivatives are useful for optimization.
- Define probabilities in sets, perform basic set operations, calculate conditional probabilities, and use Bayes rule.
- Perform matrix addition, subtraction, multiplication, and inversion.
- Combine the 5 dplyr verbs, join data sets, and convert between long and wide formats
- Use loops, purrr, and functions to avoid repeating yourself
- Make simple graphs in ggplot2, write RMarkdown documents, and write basic equation in LaTeX
Example teaching materials:
Sociology 401/504 Advanced Data Analysis for the Social Sciences
Primary Instructor: Brandon Stewart
Course description: Sociology 504 is the second class in a two-semester statistics sequence for graduate students in Sociology. We also welcome undergraduates and graduate students from other departments. The course assumes material covered in Soc500 and the Princeton Sociology Summer Methods Camp. Soc504 covers maximum likelihood estimation, generalized linear models and assorted topics.
Role: led a two hour hands-on programming/precepting section every other week (alternated with co-TA); helped create/revise problem sets.
Example slides I created: