UVA Public Interest Data Lab

Project site for UVA's Spring 2018 Public Interest Data Lab

View the Project on GitHub

Resources

We’ll be using Garrett Grolemund’s and Hadley Wickham’s freely available R for Data Science as an R resource and A research synthesis on child welfare disproportionality and disparities produced by the Center for the Study of Social Policy and others as a subject resource. Additional materials are either freely available and linked here or provided on our collab page. Most of the assigned reading will occur in the first few weeks.

The UVA Library and other research support partners across grounds offer free workshops and training opportunities that might be of interest to some, especially those new to tools like R. You can find a list of the Libray offerings here, and a full list of workshops by partner organizations here (the workshops in the health sciences library and school of medicine are open to all, but tend to fill up quickly).

Schedule and Reading

January 19, 2018

Since we only have 10 weeks together, we’ll be diving in on day one with discussions about working collaboriatively and for a client, ethical considerations and data security, and some background on our research question.

January 26, 2018

Our clients will be joining us this day as we delve into the prior research around racial disproportionality and disparity in child welfare systems. We’ll start by collectively presenting the key points from the research synthesis. Everyone is responsible for the first section of the research synthesis, and I’ll begin by choosing individuals to summarize definitions and frameworks to ensure we’re starting with the same understanding. To practice presenting and communicating research, teams of two (or three, if needed) lab members will have seven minutes to brief the class and clients on a section of the report, followed by three minutes for follow up questions. Our clients will give us some additional local context and we’ll have a chance to start asking them some questions.

A research synthesis on child welfare disproportionality and disparities. Center for the Study of Social Policy, December 2011. pp. 7-70

For next week, review the data dictionary for the data provided by DSS. Each lab member should email me by 10am Friday (2/2) one to three research questions she or he believes can fruitfully be answered by the DSS data, or this data supplemented by other additional data (e.g., census-based data); in the same email, include additional questions you have about the data (no need to include questions I’ve already raised on the document). This will replace the weekly update scheduled to begin February 2.

February 2, 2018

Lab members will have access to the data after our 1/26 meeting and can start exploring the data right away. In the first part of class, we’ll go over the submitted questions, consider possibilities for supplementing the DSS data, and pin down a series of research questions we will pursue. We’ll parcel out additional tasks to individual or teams of lab members as needed – e.g., ensuring the data lends itself to a proposed question, verifying the availability of supplemental data, creating a schema for capturing and presenting the work. We’ll spend the second half of class going over some concepts of working in R and working with (potentially different aspects of) the data in groups.

Each group is cleaning, exploring one of data sets for next week, and purusing one supplemental task: looking into useful census data and acquiring, looking into the Cville Open Data Portal and acquiring data if useful, seeking to clarify remaining questions about the data with DSS.

February 9, 2018

After reviewing last week’s work, we’ll draft a formal problem statement based on last week’s discussion and initial investigations into the data and availability of supplementary data. We’ll devise a more formal research plan, splitting up into teams to tackle different pieces of the research plan. We’ll spend some time going over some data wrangling concepts.

Also, there’s a workshop on using Census data Wednesday, February 7 from 2-3 in the Brown Science & Engineering Library!

Today’s Agenda

Time (ish) Task/Goal
12:00 Code review, each group will review another group’s initial code and provide suggestions for improvement
12:20 Report progress on supplemental tasks (no more than 10 minutes each) - exploring/recommending census data, exploring/recommending cville open data portal data, and clarifying questions about the data.
12:45 Problem statement – in groups, draft initial problem statement following the DataKind template from week 1, slides 16-29
1:00 Reporting/refining a common problem statement
1:15 Wrangling questions in R – what do we need to know? E.g., aggregating, merging/joining, graphing, dates/times?
1:40 Groups review initial review of code and implement improvements.

February 16, 2018

We’ll definitely spend time time at the beginning of class sharing progress and providing feedback, and demonstrating key data wrangling concepts. And we’ll finalize the problem statement and draft some analysis plans.

Also, there’s a potentially relevant research talk today for those interested: Beyond Reading, Writing and Arithmetic: The Role of Teachers and Schools in Reporting Child Maltreatment, Maria Fitzpatrick, Associate Professor, Cornell University, 11-12:30, Holloway Hall (Rm 116), Bavaro Hall

Today’s Agenda

Time (ish) Task/Goal
12:00 Collective code review, each group will have 10 minutes to walk us through their code to read in, examine, format, and clean up a data set. We’ll ask questions and make suggestions, and use examples from each other’s code to demonstrate more general functions and processes. We’ll also finalize some choices on how we’ll incorporate race and ethnicity consistently.
12:40 Problem Statement, we’ll refine a merged problem statement based on those we began last week, and use this to start outlining a sequence of analysis for each question.
1:20 Start analyzing, we’ll break up into teams to tackle different pieces/questions and each group will outline a research and analysis strategy for their part of the analysis.

February 23, 2018 and beyond

We’ll spend the first half updating one another on our progress, and talking about modeling choices and strategies. And we’ll talk about collaborative writing in ShareLaTeX!.

Today’s Agenda

Time (ish) Task/Goal
12:00 Sharing of and feedback on group progress reports (one for each enumerated portion of the problem statement), including implications for how “race” is coded in analysis.
1:00 Review of regression models and the ShareLaTeX! document.
1:30 Analysis work. Each group will ultimately generate (1) an R script that reads in, formats and wrangles, and cleans the data, starting with the code of our previous groups, but adapting as needed, and saved as prep_yourchoice.R and (2) an R sript that implements the analysis used in the report and saved as analysis_yourchoice.R. You may also want a script for exploration of the data that won’t necessarily become part of the analysis, e.g., exploratory_yourchoice.R.

March 2, 2018

Updates on progress, sharing the article review load, and working in groups. I want to be sure to reserve at least 20 minutes to talk with each group in more detail while the other group digs in on analysis, articles, adding to the document, etc.

Today’s Agenda

Time (ish) Task/Goal
12:00 Sharing of and feedback on group progress (one for each enumerated portion of the problem statement).
1:00 Divvy up the articles
1:10 Confab with group 1 while group 2 works
1:30 Confab with group 2 while group 1 works
1:50 More on annotation of article (it’s unlikely we’ll have any time remaining, but hope springs eternal!)

March 16, 2018

Updates on progress and working in groups, including time to talk with each group separately.

March 23, 2018