Coding Time

 

 

 

 

 

 

 

 

 

 

 

 

 

.

Data Journalism Workshop

Build a Map on Deadline

Tuesday, 4/11, 11a – 12:15p

Denise Malan, Training Director

Investigative Reporters and Editors Inc.

Denise Malan is a training director for Investigative Reporters & Editors. A graduate of the University of Arkansas and former Arkansas Traveler editor, she was a newspaper journalist for more than a decade, covering government, education, politics, the environment and more. Before joining IRE she was data/investigative editor at the Corpus Christi Caller-Times in Texas.

This is an Open Workshop. All Are Welcome.

Bring Your Laptop.

To Hold a Seat RSVP: rswells@uark.edu

SQL Is Fun

Really.

  1. What does SQL stand for?
  2. What is SQL and Why Do I Need It? Relational database. “SQL is a standard language for storing, manipulating and retrieving data in databases.”
  3. How Does It Work?
  4. When Can We Get Started? I Can’t Wait! https://www.w3schools.com/sql/default.asp

 

This week:

–Orientation with the queries and logic of basic SQL commands

–Import data and create a database

Part 1:Basic Commands.

We will be using this sample database called customers for these drills. Take a look at it.

This data is in our SQL tutorial. Here’s how we extract and filter the records to get what we want.

Major Commands:

SELECT

FROM

WHERE

GROUP BY

ORDER BY

SELECT: You tell the database what columns you want.

If you want everything, SELECT *

Example: SELECT * FROM Customers;

–Retrieves EVERYTHING (*) from the database called Customers

SELECT CustomerName FROM Customers;

–Retrieves CustmerName from the database called Customers

FROM: The basic command that just identifies which table you’re pulling data from.

Example: SELECT * FROM Customers;

–Retrieves EVERYTHING (*) from the database called Customers

WHERE: A filter. Allows you to grab only certain records do numerical and text string, or a horizontal slice of your data; you set criteria that has to be true in order for rows to be returned

SELECT * FROM Customers WHERE ContactName “Maria Anders”;

–retrieves all records with Maria Anders as contact name

SELECT * FROM Customers WHERE ContactName LIKE “Maria%”;

–retrieves all records with Maria in ContactName. Note the addition of LIke and the wildcard %

 

Work on the tutorial:

TIP: Print our the table names and the first 2-3 rows of data as a reference

 

 

GROUP BY – this groups rows together by a certain field or fields so that you can do math on those groups, such as counting the number of records in each group or summing up the numbers in a field.

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

–Here we are counting the number of countries in the database

Remember this command is paired with putting an aggregate function in your SELECT statement, such as COUNT(*) or SUM(noncitizen)

ORDER BY – this sorts your results by a field or fields, ascending (default) or descending (DESC)

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

–Produces a list ordered by country, alphabetized

List of countries in descending order:

SELECT * FROM Customers

ORDER BY Country DESC;

Exercises:
  1. Find the total number of cities in the database, sort by descending order
  2. List all of the business in the United States, sorting alphabetically by city
  3. Produce a list of all that counts all of the customers by countries in the database, sorted by descending order
    Switzerland has how many customers?
    United States?
    Argentina?

Quick Reference Sheet for these commands:

https://www.w3schools.com/sql/sql_quickref.asp

 

SQL – Importing Data, Creating a Database

For this exercise, we will be using an open source database manager called SQLite that runs as an extension to Firefox. It basically turns your browser into a database program. Firefox Add-On: https://addons.mozilla.org/en-us/firefox/addon/sqlite-manager/

This will suffice for our purposes this semester, but if you are going to deal with Big Data, you will need Big Boy and Big Girl data software. A discussion for another day.
Firefox: Install add-on, relaunch browser. Find Add-on in Menu Bar: Tools

Create a database

I wanted to do this test drill to show you the process of creating a database. At the end of the class, you can download the databases I created and work on the queries.
1) Create a database. Call it anything, HMDA1 or Subprime or Donkey … Put it in the same folder as your data:
2) Import Data HMDA.csv into SQL.
(SQLite only imports .csv data)
This can be tedious, and here’s why:
We have to tell SQL about each column, whether it is text, numbers or some variation in between.
There are multiple data types:
For this, we are just going to define the columns as either “text” or “numeric” — more on the other data types later. So you need to separately define 15 different columns as text or numeric. (When you buy Big Boy or Big Girl software, the import sequence is less tedious)
As you import HMDA.csv do yourself a favor and open it separately in a spreadsheet so you can see the various columns and what they contain. Check the number of columns in the spreadsheet versus the number of imported columns to ensure they match.
Headings
IMPORTANT NOTE: Eliminate Drudgery of renaming file names – CHECK the box import top row with file names
Here’s an example of what your finished import sequence should look like:
This is what you did in SQL language
CREATE TABLE “HMDA2015” (“rate_spread” NUMERIC, “respondent_id” NUMERIC, “county_name” TEXT, “tract_to_msamd_income” NUMERIC, “population” NUMERIC, “minority_population” NUMERIC, “number_of_owner_occupied_units” NUMERIC, “number_of_1_to_4_family_units” NUMERIC, “loan_amount_000s” NUMERIC, “hud_median_family_income” NUMERIC, “applicant_income_000s” NUMERIC, “state_name” TEXT, “state_abbr” TEXT, “sequence_number” NUMERIC, “purchaser_type_name” TEXT, “property_type_name” TEXT, “preapproval_name” TEXT, “owner_occupancy_name” TEXT, “msamd_name” TEXT, “loan_type_name” TEXT, “loan_purpose_name” TEXT, “lien_status_name” TEXT, “hoepa_status_name” TEXT, “edit_status_name” TEXT, “denial_reason_name_3” TEXT, “denial_reason_name_2” TEXT, “denial_reason_name_1” TEXT, “co_applicant_sex_name” TEXT, “co_applicant_race_name_3” TEXT, “co_applicant_race_name_2” TEXT, “co_applicant_race_name_1” TEXT, “co_applicant_ethnicity_name” TEXT, “census_tract_number” NUMERIC, “as_of_year” TEXT, “application_date_indicator” NUMERIC, “applicant_sex_name” TEXT, “applicant_race_name_5” TEXT, “applicant_race_name_4” TEXT, “applicant_race_name_3” TEXT, “applicant_race_name_2” TEXT, “applicant_ethnicity_name” TEXT, “agency_name” TEXT, “agency_abbr” TEXT, “action_taken_name” TEXT, “applicant_race_name_1” TEXT, “Am_Indian” NUMERIC, “Asian” NUMERIC, “Black” NUMERIC, “Nat_Hawiian” NUMERIC, “White” NUMERIC, “Hispanic” NUMERIC, “Not_Provided” NUMERIC, “Not_Applicable” NUMERIC)
If you messed up, here is how you edit the headings
1) Structure
2) See columns, click on row and rename

Get Stuff From The Database

We return to our SQL commands to see what is in the database.  Remember to have a list of your column headings handy so you spell the queries correctly.
SELECT * FROM HMDA2015
–Brings up all columns from HMDA2015
Follow this cheat sheet for additional exercises:
HAVE A MAP OF ALL OF THE TABLE NAMES HANDY FOR QUERIES
DOWN MENU ARROW FOR List of Fields.
Show data with a query, separated by commas
SELECT RATE_SPREAD, COUNTY_NAME, Black, White, Am_Indian, Asian FROM HMDA2015
Massively overcounts the data. But don’t worry. The queries are refined by filters called WHERE and GROUPBY
Filter by where the ratespread > 1
SELECT RATE_SPREAD, COUNTY_NAME, Black, White, Am_Indian, Asian FROM HMDA2015 WHERE Rate_Spread > 1
List of SQL commands
WHERE CustomerName LIKE ‘a%’ Finds any values that starts with “a”
Wildcard query
SELECT * FROM HMDA2015 WHERE county_name LIKE “Ben%”
Wildcard needs filter WHERE + THE COLUMN YOU ARE SEARCHING + LIKE AND %
SELECT RATE_SPREAD, COUNTY_NAME, Black, White, Am_Indian, Asian
FROM HMDA2015
WHERE Rate_Spread > 1 and County_Name LIKE ‘Washington%’
NOTES
Add a column
Q: Let’s Add A Column Just For Black Subprime
ALTER TABLE HMDA2015
ADD COLUMN subprime_black;
Fill it with stuff
UPDATE HMDA2015
???SET immigrant = (naturalized + BornAbroad)

Homework

You don’t need to turn in anything for Thursday. But I want you to work on the following exercises so we can advance to the next phase of SQL:

https://www.w3schools.com/sql/sql_update.asp

https://www.w3schools.com/sql/sql_delete.asp

https://www.w3schools.com/sql/sql_min_max.asp

https://www.w3schools.com/sql/sql_count_avg_sum.asp

Install Firefox and the SQL Lite Manager on your laptop. https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/

Read the first section on importing data in this tipsheet: MySQL_Tipsheet-rsw using this data FL_nativity

Try to create a new database (see instructions on this page) and perform some basic queries.