This is a learner-centered, project-based, introductory applied statistics course that exemplifies the principle of “data for good,” or putting Data Science and Digital Humanities to work for social justice efforts. Students will choose a data set that is personally meaningful and work with others who are interested in the same issue to generate evidence-based, statistically sound arguments, applying the methods learned throughout the course to their project in a guided, scaffolded, structured way. We will primarily use R Studio but will also produce visualizations in Tableau. The final product may be a website, infographic or data dashboard + white paper that could be used as a pitch to generate interest in and/or support for their chosen issue. This course is designed for students from any background, major, or minor, and its flipped structure ensures that students are not struggling through exercises isolated and alone but rather working in cooperative groups, troubleshooting questions and challenges both together and with an engaged instructor.
- Demonstrate how to use computational methods to conduct a statistical analysis of data, including how to acquire, clean and organize data, analyze data using computationally intensive statistical methods, and report findings.
- Use the programming language R to create new algorithms and functionality and to express statistical ideas and computations. Understand different data technologies and tools, when to use them, and their trade-offs.
- Use Tableau and R to display data graphically and interpret the visualizations.
- Understand and apply the basic principles of probability including the laws for unions, intersections, and Bayes theorem
- Estimate population parameters from data sets and use the sampling distributions to compute confidence intervals for these population parameters
- Learn the basic components of hypothesis testing and perform hypothesis tests on population means, variances, and proportions.
- Develop a web, data dashboard, or infographic publication of findings with an accompanying white paper.
Specific Statistics Learning Outcomes
|recognize, describe and calculate measures of central tendency and variance||conduct and interpret chi-square tests of independence, goodness of fit, homogeneity, and single variance hypothesis tests|
|construct and interpret outcome and contingency tables||create and analyze scatter plots; create and interpret a line of best fit; calculate and interpret the correlation coefficient|
|calculate and interpret confidence intervals for estimating a population mean and proportion||conduct and interpret one-way ANOVA; conduct and interpret hypothesis tests of two variances|
|conduct and interpret hypothesis tests for a single population mean, as well as known and unknown standard deviations||understand multiple regression modeling, analysis, estimation, and hypothesis testing|
- County of Los Angeles Open Data
- Historical Statistics of the United States (Make sure to check “Use for statistical analysis” so the dataset download formats correctly.)
- US Office of Justice Programs Open Data
- ICPSR Thematic Data Collections
- UCLA Library’s Guide to Data
- ProPublica Data Sets (Politics, crime, business, education, finance, and more)
- Data Sets for Digital Humanities
- New York Times
- Public Opinion Polls: ICPSR and other poll archives to which UCLA Library subscribes
- Scott Lynch, Using Statistics in Social Research (Springer, 2016). Free access through UCLA’s library to ebook.
- Jeffrey M. Stanton, Reasoning with Data: An Introduction to Traditional and Bayesian Statistics Using R (Annotated and Illustrated Edition) (The Guilford Press, 2017)
An unconventional history of data visualization and data science
Review data types/levels of measurement
Rectangular and non-rectangular data structures
Reading: Cocco, Federica, and Alan Smith. “Race and America: Why Data Matters,” Financial Times, July 23, 2020.
Lecture on the above topics
Data visualization scavenger hunt
Time zone & topic interest survey
Download and install Tableau
Communicate with your team to select a data set
|Week 2||Exploratory Data Visualization|
Read: Sanders, Visualizing History’s Fragments, chapter 3.
Read: Lynch, Using Statistics for Social Research, chapter 4. Complete exercises 1-10.
Discuss Readings and homework exercises
Workshop: Using Tableau for exploratory data visualization
Application: Work in groups to draft research questions and generate exploratory data visualizations with chosen data sets
Homework Due Week 3: Individual Assignment. Share one of the exploratory data visualizations you’ve created.. Then write a 1 paragraph presentation of the research question you sought to answer and the significance of the question, and 1 paragraph analysis of what your data visualization means. [See this week’s reading for examples.] Tutorial: How to export a visualization from Tableau Desktop
|Week 3||Descriptive Stats|
Read: Lynch, Using Statistics for Social Research, chapter 4. Complete exercises 1-10.
Read: Stanton, Reasoning with Data, chapter 1. Complete exercises 1-4.
Install R: Stanton, Appendix A describes how to do install R
Practice reading files into R: Stanton, Appendix B
Answer questions about R installation and reading data into R
Mini lecture on descriptive statistics
Workshop: Descriptive stats with R
Application: Work in groups to generate 5-number summary and information about variance for chosen data set.
Whole class discussion: What did you discover about your data set?
Homework Due Week 4: Team Assignment. In your project teams, polish your analysis of what your 5-number summaries and measures of variance mean in the context of your data set. Your analysis should be two paragraphs long of at least five meaningful sentences each.
|Week 4||Probability Theory|
Read: Lynch, Using Statistics in Social Research, chapter 5. Complete exercises 1, 2, 6-8, 10, 13, 16-21
Read: Stanton, Reasoning with Data, chapters 2-3. Complete CH2 exercises 1-9. Complete CH3 exercises 1-5
Workshop: Setting up your team’s notebook to share R code
Mini Lecture: Probability Theory & Contingency Tables
Review of R code from Stanton & discuss homework exercises
Examine the scholarship example
Team Meetings: Discuss applications of probability theory to your project
Scholarship Example: Morantz & Zschoche, “Professionalism, Feminism, and Gender Roles: A Comparative Study of Nineteenth-Century Medical Therapeutics,” The Journal of American History, 67, No. 3 (December 1980), 568-588.
|Week 5||Statistical Inference|
Read: Lynch, Using Statistics in Social Research, chapter 6. Complete exercises 1, 2, 4, 8-10, 15, 18.
Read: Stanton, Reasoning with Data, chapter 4. Complete exercises 1-10.
Mini Lecture & R Workshop: Inference & Confidence Intervals
Examine application of statistical inference in humanistic research from Martha Olney, “When Your Word is Not Enough: Race, Collateral, and Household Credit,” The Journal of Economic History 58, no. 2 (1998), 408-431.
Application: Project teams create outcome and contingency tables, discuss the meaning of confidence intervals, inference, and how these concepts apply to your own research project. Analyze the meaning of each.
Homework Due Week 6:Team Assignment. Create and analyze outcome and contingency tables based on your dataset. Be sure to describe why you chose to summarize the variables you did and what the tables tell you about this data. Individual Assignment: Descriptive Statistics Checkpoint
|Week 6||Statistical Approaches to Nominal Data: Chi-square tests|
Read: Lynch, Using Statistics in Social Research, chapter 7. Complete exercises 1-5.
Read: Stanton, Reasoning with Data, chapter 7, pp. 138-155. Complete exercises 7 & 8.
Mini lecture: Application of statistical inference in humanistic research
Scholarship Example: Mariot & Zalc, “Reconstructing Trajectories of Persecution: Reflections on a Prosopography of Holocaust Victims,” in Microhistories of the Holocaust, edited by Claire Zalc and Tal Bruttmann (New York: Berghahn Books, 2016), 85-112.
R Workshop: Chi-square tests
Application: Generate contingency tables and chi square tests with your own data in your project teams, list the null and alternative hypotheses, and describe what your results mean.
Homework Due Week 7: Team Assignment. Choose two nominal variables from your data set, and explain why you chose these variables, paste your R code to run your Chi-square tests, your results, and write at least one paragraph about what they mean.
|Week 7||Comparing Means Across Multiple Groups|
Read: Lynch, Using Statistics in Social Research, chapter 8. Complete exercises 1, 2, and 3.
Read: Stanton, Reasoning with Data, chapter 6. Complete exercises: 1, 2, 4, 5, and 7.
Mini Lecture: ANOVA Tests
R Workshop: ANOVA test
Application in project teams.
Homework Due Week 8: Team Assignment. Explain why you chose the variables you’re analyzing, paste your R code to run your ANOVA tests, your results, and write at least one paragraph about what they mean. In a note below your analysis.
Additional Resource: The Big Book of Data Dashboards (ebook)
|Week 8||Correlation & Simple Regression|
Read: Lynch, Using Statistics in Social Research, chapter 9. Complete exercises 1 – 4.
Read: Stanton, Reasoning with Data, chapter 7, pp. 119-138. Complete exercises 1 – 4.
Mini Lecture: Correlation and simple regression
Discussion of final projects
Application to projects
Homework Due Week 9: Team Assignment. Explain why you chose the variables you’re analyzing, paste your R code to run your simple regression and correlations, your results, and write at least one paragraph about what they mean.
|Week 9||Introduction to Multiple Regression|
Read: Lynch, Using Statistics in Social Research, chapter 10. Complete exercises 1-3.
Read: Stanton, Reasoning with Data, chapter 8. Complete exercises 1-9.
Mini lecture: multiple regression
Project team check-in
Homework Due Week 10: Team Assignment. Explain why you chose the variables you’re analyzing, paste your R code to run your multiple regression and your results, and write at least one paragraph about what they mean, or explain why multiple regression is not appropriate for your data or questions.
|Week 10||Presenting Results of Statistical Analysis|
Read: Lynch, Using Statistics in Social Research, chapter 11
Read: D’Ignazio & Klein, Data Feminism, chapter 7.
Mini Lecture: Presenting statistical results
Discuss Data Feminism, chapter 7.
Workshop: Introduction to R markdown & graphics for communication with ggplot2.
Project team check-ins
Homework Due Finals Week:
Team Assignment. Select the three analyses you find most meaningful for your data set and topic and prepare a polished visual presentation of your findings in one of the following formats:
3. Data Dashboard with R or Tableau
Record a 5-minute video presentation of your findings as a team (perhaps via Zoom) or splice together individual recordings from each team member. This recording should represent your best work this quarter and something you could be proud to share with a potential employer or graduate school selection committee.
|Final Exam Week||Due:|
5-minute video presentation of findings
File upload or link to visual presentation of findings