I enjoy teaching both full-length courses and brief workshops on data science (data wrangling, visualization, scraping, text analysis) and statistics using Python and R. My research has also prepared me to teach substantive courses on algorithms and society, K-12 education and inequality, higher education and inequality, and law and society.
Instructor of record
PPOL564/Data Science I: Data Science Foundations (Georgetown McCourt School of Public Policy)
Course description: : This first course in the core data science sequence teaches Data Science for Public Policy (DSPP) students how to synthesize disparate, possibly unstructured data in order to draw meaningful insights. Topics covered include the fundamentals of object-oriented programming in Python; data wrangling and visualization; merging datasets using both defined join keys and probabilistic/”fuzzy” matching; data extraction via APIs; an introduction to SQL for manipulating data stored in databases; and what I call a “data science grab-bag”: fun topics in data science for working with or visualizing complex and unstructured data (text as data; spatial data; interactive visualizations).
In addition, students will be exposed to command line and Git and Github for version control and reproducible research. The objective of the course is to teach students how incorporate data into their decision-making and analysis. No prior programming experience is assumed or required.
Goals: After completing this course, the students will be able to:
- Use Python to write user-defined functions
- Use Python to work with various data structures: lists, numpy arrays, Pandas dataframes, and so on
- Be able to manipulate a variety of data in Python: flat file data; spatial data; text data
- Be able to produce static and interative visualizations in Python
- Write SQL queries to pull, aggregate, and summarize data stored in database tables
Course github repo with all materials: https://github.com/rebeccajohnson88/PPOL564_slides_activities
QSS20/PBPL 40.01 Modern Statistical Computing (Dartmouth College)
Course description: Social scientists are investigating questions that have led to two changes in their computing workflow.
One change is the use of new forms of data: text data to study how police officers use different language when interacting with Black drivers than with White drivers; spatial data to study the geographic clustering of autism diagnoses in more affluent communities; cellphone mobility data to (try) to estimate COVID-19 mobility patterns.
The second change is the use of new methods to discern patterns in data. Imagine a relatively simple dataset where each individual is described by a limited number of characteristics: for instance, a student and his or her demographic attributes and high school end-of-year grades. Now imagine augmenting that dataset with the forms of data described above–we know the student’s address and can thus merge in spatial data on neighborhood characteristics; we have qualitative notes from the teacher’s end-of-year reports and can investigate how those qualitative impressions correlate with grades. These require you as the researcher to have the facility to quickly pick up new methods to find patterns in large-scale data, with the methods and tools developing at a rapid pace.
This course is meant to build upon your introductory programming course and to equip you with the computing literacy to conduct social science research in the age of “big data.” This has two core components. First is learning the background tools (e.g., Git/Github; LaTeX; working on the command line) to conduct transparent and reproducible research. Second is learning programming skills essential for social science in the big data era, with a focus on using Python for various applied tasks as well as R for tasks like data visualization and SQL for tasks like working with the relational databases that form the backbone of many real-world government and commercial datasets.
Course github repo: https://github.com/rebeccajohnson88/qss20_slides_activities
Select activities (Python notebooks):
- Probabilistic record linkage using public PPP loan data
- Regular expression pattern matching using school district names
- API using National Assessment for Educational Progress
- Twitter scraping using politicians’ profiles
- Data wrangling examining racial disparities in criminal justice sentencing
- Topic modeling using DOJ press releases
Princeton Sociology summer methods camp
Course description: The Methods Camp is designed to give you training in both math and computing. In math, you will receive training in three main areas: calculus, probability, and matrix algebra. In computing, you will receive training in three main areas: data wrangling, iteration, and visualization.
At the end of the Methods Camp, students will be able to:
- Start the semester excited and ready to learn new methods
- Explain in words and pictures what is a derivative, what is an integral, and how derivatives are useful for optimization.
- Define probabilities in sets, perform basic set operations, calculate conditional probabilities, and use Bayes rule.
- Perform matrix addition, subtraction, multiplication, and inversion.
- Combine the 5 dplyr verbs, join data sets, and convert between long and wide formats
- Use loops, purrr, and functions to avoid repeating yourself
- Make simple graphs in ggplot2, write RMarkdown documents, and write basic equation in LaTeX
Example teaching materials:
Sociology 401/504 Advanced Data Analysis for the Social Sciences
Primary Instructor: Brandon Stewart
Course description: Sociology 504 is the second class in a two-semester statistics sequence for graduate students in Sociology. We also welcome undergraduates and graduate students from other departments. The course assumes material covered in Soc500 and the Princeton Sociology Summer Methods Camp. Soc504 covers maximum likelihood estimation, generalized linear models and assorted topics.
Role: led a two hour hands-on programming/precepting section every other week (alternated with co-TA); helped create/revise problem sets.
Example slides I created: