IT Teaching Resources

SQL Workshop

Connecting datasets and sharing with research partners for greater insights

Technology Tool

Presenter: Wilson Wang
Moderator: Mae Bethel, Academic Technology Specialist

SQL is a powerful tool for organizing, storing, and analyzing large sets of data. It allows researchers to extract insights and draw conclusions about educational trends and patterns, such as student performance, and communicate their research findings through reports or visualizations.

Recording of the session:

Central Questions: 

  • How does SQL help me analyze data? 
  • How can I share reports of the results of my data analysis?
  • How do I connect data between multiple datasets?

Key quotes: 

SQL stands for a structured query language. Imagine you have an Excel spreadsheet; SQL is just a supercharged version of that. (1:35)

The easiest way to join tables is numeric, for example using a student ID or zip code, because it’s a lot easier to have a one to one correlation. The fuzzy joins use strings, such as first name, and it’s not a one to one comparison. (9:31)

DISTINCT does not have a logic to it. It just collapses the data set and blindly removes duplicate rows.(18:07)

Since your DataFrames are tabular data, you can use the SQL library in RStudio to query over your data. And doing that with R might be easier than setting up a SQL database. (35:12)

Take-aways: 

Main Commands

  • Use SELECT column1, column2 FROM table to select data. Remember to select only the columns you need. SELECT * FROM table selects the entire table.
  • Use WHERE, LIKE, and BETWEEN to conditionally filter data based on comparing numbers or words. % is a wildcard.
  • Use ORDER BY, LIMIT, and OFFSET to sort the results by a column or group of columns. You can also limit the number of rows in your results.
  • Use COUNT, GROUP BY, and AVG to do basic statistical analysis on your dataset. Some platforms allow more advanced analysis such as STDDEV.
  • Use JOINs to connect your datasets. LEFT, RIGHT, and OUTER joins will behave differently depending how you want to handle multiple results and null values.

Ways to Export SQL

  • R package sqldf allows you to run SQL commands on data within DataFrames
  • You can usually export SQL databases to CSV or Excel. You can also use a simple SQL platform such as SQLite3 which saves the databases as flat files, or use cloud SQL platforms such as BigQuery.
  • You can share queries via plain text to collaborators, or you can save query results within a SQL database using a VIEW.

Further Learning

Education Datasets