Step three: SQL

Cleaning in Microsoft SQL Server Management Studio

Insights below

Back to the analysis

SQL

Moving onto SQL. Microsoft SQL Server Management Studio to be precise. Most companies use SQL to essentially to as it says – query their servers. What does that mean? Well, it’s almost like asking a librarian to retrieve a few books for you.

SQL offers us the chance to read in our data from csv, clean & merge everything into neat tables and finally investigate what our data holds (we’ll do this in our EDA).

You will see throughout the code: comments and logical names. This is considered best practice to ensure readability.

full script

Importing the data

We first import our data using the SQL Server’s import & export tool. We can read this into the Server by selecting our input source as Excel and browsing to the xlsx file we want to upload.

We then select our input source as ‘SQL Server Native Client’ and also selecting the correct Server name and Database to read the data into.

All that is left is to follow the rest of the wizard and repeat for each of our files. Once we have these files in, they are each represented as a table within SQL Server. It would be useful to make sure they have an appropriate name as we will be using these names throughout our SQL Script. This will act to make our code more legible; a very important factor when trying to review our code or even more so when someone else is trying to review our code. We can achieve this renaming using the GUI.

/* AGE DATA */

/* DWELLINGS DATA */

/* RECYCLING DATA */

/* JOBS DATA */

/* PAY DATA */ --part one

/* PAY DATA */ --part two

/* WELL BEING DATA */

/* HOUSING DATA */ --part one

/* HOUSING DATA */ --part two

/* HOUSING DATA */ --part three

Combine Clean Tables

Clean Combined Tables