Module #8: Data Cleaning

Census Data and Data Cleaning



Filtering
Reading Data Dictionaries


Data Cleaning in Excel:
=Trim, Paste Special, Values, Transpose, Find and Replace

Basic Population – Race Census Data Download Instructions

Census: data.census.gov


https://data.census.gov

Data Cleaning Exercises
Pivot Tables, Data Cleaning: =Trim, Paste Special, Values, Transpose, Find and Replace



1) Advanced Search
–Topics | Geography | Years | Surveys | Codes

2) Topics | Race and Ethnicity | White
–Note that the “White” filter displays below

3) Geography | County | Arkansas | All Counties in Arkansas
–Note that the “All counties in Arkansas” filter displays

4) Search!

5) Select Table Named RACE
American Community Survey
Total Population
TableID: B02001

6) Switch to 2016: ACS 5-Year Estimates Detailed Tables

7) Customize Table. Download. Make Sure to Download 2016: ACS 5-Year Estimates Detailed Tables


Clean Census Data
1) Create Data Dictionary
2) Duplicate Sheet
3) Four corners select and copy
4) New Sheet. Paste Special | Transpose
–the races are now the rows
–Filter by Estimate: Contains Estimate, Delete
5) Edit Headers: White, Black, Hispanic
6) Check totals – do they add up?
7) Two races including Some other race. Two races excluding Some other race, and three or more races (delete)
8) Save and Load to Tableau
9) Build a Arkansas Population Map by Race

Build Arkansas Population Map By Race
--Clean the County Field ", Arkansas"
--Create Calculations for Percentage Population by Race: Calculated Fields

Income by Race



https://data.census.gov

Advanced Search
Filters | Geography
Counties | Arkansas | All counties
Filters | Topics | Income and Poverty
Filters | Topics | Race and Ethnicity
Filters | Years | 2016

Filters | Text Search in Find a Filter: “Income” | Select “Income (Households, Families, Individuals)

Search


Download White Only, Black Only, Hispanic or Latino Householder

Your tables will say this:
HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE HOUSEHOLDER)
Survey/Program: American Community Survey
Product:
2016: ACS 1-Year Estimates Detailed Tables


Tables: B19001A, B19001B, B19001I 

Download – select .csv


Tableau

Clean Data as described in previous lesson
Combine the three tables in Tableau linking to the income as a common field.
Create a chart

PAST TUTORIALS IN AMERICAN FACT FINDER.

NEED TO REVISE

https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml

Household income data for counties and state and national. Gender and demographics of low-wage workers
American FactFinder
https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
Advanced Search | Show Me All
Topics | People | Poverty
Geographies | County | Arkansas | All Counties Within Arkansas
Select Table S1701, Poverty Status in the Past 12 Months
Modify Table
—Select top Filter
—Total and Percent Below Poverty Level
—Select second Filter
—Keep Estimate, do not check margin of Error
Download

30:00 shows how to use the fact finder
https://www.census.gov/data/training-workshops/recorded-webinars/measuring-america.html

Selected Economic Characteristics DP03 2012-2016 American Community Survey 5-Year Estimates

Standard Data Cleaning

Data is: ACS_16_5YR_DP03 DP03 SELECTED ECONOMIC CHARACTERISTICS   2012-2016 American Community Survey 5-Year Estimates
Copy main data sheet and call copy wages below $25k
Delete all data fields except headers and these columns
HC01_VC74 Estimate; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households
HC01_VC75 Estimate; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – Less than $10,000 HC03_VC75 Percent; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – Less than $10,000 HC01_VC76 Estimate; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – $10,000 to $14,999 HC03_VC76 Percent; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – $10,000 to $14,999 HC01_VC77 Estimate; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – $15,000 to $24,999 HC03_VC77 Percent; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – $15,000 to $24,999 HC01_VC85 Estimate; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households – Median household income (dollars)

–Rotate header rows, wrap text.
Shrink verbiage from Estimate; INCOME AND BENEFITS (IN 2016 INFLATION-ADJUSTED DOLLARS) – Total households to “Total households”
Total households %Total households Total households – >$10k %Total households – >$10k Total households – $10kto $14,999 %Total households – $10kto $14,999 Total households – $15,000 to $24,999 %Total households – $15,000 to $24,999 Median household income$
–Specify Arkansas-state
Then find/replace to eliminate “County, Arkansas” from geography labels.
Create Total Under $25 column.
Add Total households – >$10k + Total households – $10k to $14,999 + Total households – $15,000 to $24,999
Create % Under $25k column (total Under $25k / total households)
Copy formulas down
Check math
When satisfied, copy and paste values

More on Data Cleaning Census spreadsheets
 
–Download the view and the data versions of large spreadsheets. One to guide you. the other to do the work.
–Merge / unmerge cells
–Find-Replace
— =CONCATENATE(B3, B4).

Census Demographic data


Household income data for counties and state and national. Gender and demographics of low-wage workers
American FactFinder
https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
Advanced Search | Show Me All
Topics | People | Poverty | Poverty (added to Your Selections)
Geographies | County | Arkansas | All Counties Within Arkansas
Geographies | United States
Geographies | Arkansas
Select Table S1701, Poverty Status in the Past 12 Months
Modify Table
—Select top Filter
—Total and Percent Below Poverty Level
—Select second Filter
—Keep Estimate, do not check margin of Error
Download
—Use the Data
Download Again
—View the Data
—Excel spreadsheet
Questions about categories and definitions:
See “Table Notes” to far right on factfinder website after you’ve generated a table.
https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2016_ACSSubjectDefinitions.pdf
Read: “Poverty Status in the Past 12 Months”
“Poverty Status of Households”
Definitions. working Poor
–Poverty thresholds:
The actual poverty thresholds vary with the makeup of the family. In 2015, the weighted average poverty threshold for a family of four was $24,257; for a family of nine or more people, the threshold was $49,177; and for one person (see Unrelated individuals), it was $12,082. Poverty thresholds are updated each year to reflect changes in the Consumer Price Index for All Urban Consumers (CPI-U). Thresholds do not vary geographically. (For more information, see “Income and poverty in the United States: 2015.”)
https://www.bls.gov/opub/reports/working-poor/2015/home.htm#unrelatedindividual
Weighted Average Poverty
Thresholds in 2015 by Size of
Family
(Dollars)
One person 12,082
Two people 15,391
Three people 18,871
Four people 24,257
Five people 28,741
Six people 32,542
Seven people 36,998
Eight people 41,029
Nine people or more 49,177
Source: U.S. Census Bureau.
https://www.census.gov/content/dam/Census/library/publications/2016/demo/p60-256.pdf
https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-poverty-thresholds.html

–Download the view and the data versions of large spreadsheets. One to guide you. the other to do the work.

–Merge / unmerge cells

–Find-Replace

— =CONCATENATE(B3, B4).

Cleaned and download 2011-2015 estimates with detailed poverty metrics
Ark Counties full income search 5-10-17 ACS_15_5YR_DP03

Students assigned geographical location for Census data.

Questions:

–Number and Percentage of Minimum Wage Households?

–Compare to National, State Averages

–Produce basic Tableau chart

Data Cleaning and Joining Exercise


--You will join the CollegeScorecard data with 2017 Census Data with the average household income
The goal is to compare the average student debt in a college to the income in the surrounding town.
This in-class task will stretch over two class sessions, unless you are a DATA KING OR QUEEN
This exercise builds on the data analysis, cleaning and visualization skills you learned this semester.


Task 1: Retrieve the Census Data
Use Excel
--Examine the data dictionary.
--Examine the data, the range of incomes and number of cities, towns and places
--Create a copy of the Census sheet for the data cleaning
Task 2: Data Cleaning
You will need to match the town in the Census to the city in College Scorecard.
Look at the "city" column in the College Scorecard data: ARDebt17_10_23.csv
--Tip: Data cleaning tools in Excel: Text to columns and find and replace
Task 3: Joining
--Join the Census data to the ARDebt17_10_23.csv in Tableau
--Chart the Income and the Grad debt by the 10 largest public schools
Task 4: Analysis
--Construct a Ratio of Grad Debt to Per Capita Income. Map it