Agenda

–Principles of Data Visualization

–Occupational Employment Data

–Mapping #1: Build a Map in Google Fusion Tables

–Tableau and calculations. Reference Line in Tableau

–Audio recording and editing skills.

 

Principles of Data Visualization

How to Represent Your Data

 

Cleveland McGill Scale

 Dataviz Catalog

FT Visual Vocabulary

 

Build a Map in Google Fusion Tables

Download FBI Data and Map Crime Rates

Map Crime Rates Feb 15 2018-12mwgtl

Fusion Table

https://www.google.com/fusiontables/DataSource?docid=1xaT29A1j-Mr3u9X70X9CB6mrN_2WjzAxRFiiSmTO

 

Single Mom 2 2-15-18-20rg8wh

 

https://www.google.com/fusiontables/DataSource?docid=1IHXqRKZiXStuRpelBDpPGbYw7sVTG8oXwMnHGsCN

<iframe width="500" height="300" scrolling="no" frameborder="no" src="https://fusiontables.google.com/embedviz?q=select+col0+from+1IHXqRKZiXStuRpelBDpPGbYw7sVTG8oXwMnHGsCN&viz=MAP&h=false&lat=35.80452979030749&lng=-92.91575650624998&t=1&z=8&l=col0&y=2&tmplt=2&hml=GEOCODABLE"></iframe>

Occupational Employment Data

Arkansas
https://www.bls.gov/oes/current/oes_ar.htm

NW Arkansas
https://www.bls.gov/oes/current/oes_22220.htm

  1. Identify jobs paying $9.50 an hour or less.

Gender and racial diversity in newsrooms. Note the animation

https://googletrends.github.io/asne/?view=0

Class Occupational Employment Data

Arkansas
https://www.bls.gov/oes/current/oes_ar.htm

NW Arkansas
https://www.bls.gov/oes/current/oes_22220.htm

  1. Identify jobs paying $9.50 an hour or less.
  2. Sort by the largest number employed in this category

Compare to

  • Identify jobs paying $9.50 an hour or less.
  • Sort by the largest number employed in this category
  • Compare to state.

Create a Reference Line for Poverty

Step 1 – Build the View

  1. Drag % Female Households – Children 5 Years to the Rows shelf.
  2. Drag Geography to the Columns shelf.

Step 2 – Create Parameters

  1. Right-click in the Data pane and then select Create Parameter.
  2. Name the parameter “Arkansas Average”.
  3. Under Data Type select Integer.
  4. Under Current Value, set to 55.8
  5. Under Allowable values select All.
  6. Click OK.

Step 3: Create the calculated field

  1. Select Analysis > Create Calculated Field.
  2. Name the calculated field “Reference Line”.
  3. In the formula field, enter the following formula:
  4. IF[% Female Households – Children 5 years younger]=[Arkansas Average] THEN [Arkansas Average] END
  5. Click OK.

Step 4 – Use the calculated field as a Parameter Control

  1. Drag the “Reference Line “calculated field to Details. This is the box below Color in the Marks Card
  2. Click the arrow to change the measure from SUM to Minimum.
  3. In the view, right-click on the Y axis and select Add Reference Line.
  4. In the Value drop down menu, select Minimum(Reference Line).
  5. In the Label drop-down menu, select Value.
  6. Click OK.

Homework

Reading Below on Data Diaries

Albert Cairo, The Truthful Art, Ch. 3 Chapter3TheTruthfulArt-2nw36qt

Cohen “Numbers in the Newsroom” p. 50-58. Averaging Averages. Weighted Averages. Standard Deviation; ch. 3,Working With Graphics.

Readings on Interviewing – Jacqui Banasynski on interviewing:
Blog Post: Due 11:59 pm Tuesday Feb 20
1)–Based on the Cairo and Cohen readings, what are one or two things you want to try with your data visualizations going forward? What did you learn about info graphics.
2) –Based on the interviewing readings, what new techniques did you learn to help with these interviews of low income workers?
3) –Fix issues with your assignment #1 based on my comments. You will revise that Assignment #1 post.

Data Diary Examples

The following material was posted on NICAR-L, a listserv for data journalists. There are some great examples of how the pros use data diaries / data dictionaries in their workflow.
1) Geoff
This is a great question, and I’m finding as I think through my response that it’s helpful to remind myself of good practices.
I use Jupyter notebooks for when I’m doing analysis or exploration in Python or SQL and R Markdown for when I’m doing it in R. However, I would stress that any data diary you keep and keep in a detailed way that is useful to you and others later, regardless of format, is better than the one you don’t.
https://github.com/newsapps/public-notebooks/blob/master/Shooting%20victims%20by%20block.ipynb is an example of a representative but not great notebook for a small data task.
A few things that I try (but don’t always succeed) to do:
– Link to the source data, summary reports and codebooks near the top of my notebook. This is both a convenience to me, because I refer to these often, and especially to others who may not have seen those things before.
– Put a high level summary of why I’m interested in the data and what I’m trying to find at the top of the notebook. This keeps me focused as I’m doing my exploration and also is helpful for others who might be skimming.
– Keep a parking lot of questions (or potential concerns about validity or cleanliness of data) near the top of the notebook. That way I can quickly capture things I think about as I’m exploring or analyzing the data, while still staying focused.
– Near the end of my day (or the first thing the next morning), do a quick pass over a notebook I worked on during the day. Do my notes still make sense? Are they as clear as they could be? If not, try to clean them up.  If I don’t have time at the moment, I at least leave a “TODO” note to flag the section as needing some love.
– Share the notebook with someone else as early as possible, even if you’re still in-progress. This is the most helpful way to know if I’m capturing your process with enough granularity. Or maybe I’m getting too granular. If so, is there a way to summarize  process and findings at the top of a section?
– If using code, don’t give a play-by-play of the code in text. Instead, describe what I’m trying to find out, why it’s important and why I’m taking a particular approach. Also note any assumptions my code is making.
Hopefully this is helpful.
Best,
Geoff

2) Christian McDonald

Oh, do I have feelings about this one…

I keep a data diary for myself that has everything from notes about public information requests, notes about where I got data, descriptions of what I did, sql queries and all kinds of things. I sometimes also make a data report that is really RESULTS of what I learned, as opposed to how I got there in the data diary. The data report is more for other reporters, editors and maybe sources, but the diary is for me, so less formal.

These days I’m trying to script more of my work using Jupyter Notebooks, which then tends to be a mix of the two. It has info about where the data came from and the code that made the result. Sometimes it is written for future me, sometimes for the public. Generally, I’ll still keep a personal data diary just for my future self, ‘cause I can’t remember what I did yesterday much less last week.

Data diaries I tend to write in markdown files on my machine so code doesn’t get wigged with curly-quote translations. Data reports are typically Google Docs or Jupyter Notebooks on Github.