Cleaning in Microsoft SQL Server Management Studio
Moving onto SQL. Microsoft SQL Server Management Studio to be precise. Most companies use SQL to essentially to as it says – query their servers. What does that mean? Well, it’s almost like asking a librarian to retrieve a few books for you.
SQL offers us the chance to read in our data from csv, clean & merge everything into neat tables and finally investigate what our data holds (we’ll do this in our EDA).
You will see throughout the code: comments and logical names. This is considered best practice to ensure readability.
We first import our data using the SQL Server’s import & export tool. We can read this into the Server by selecting our input source as Excel and browsing to the xlsx file we want to upload.
We then select our input source as ‘SQL Server Native Client’ and also selecting the correct Server name and Database to read the data into.
All that is left is to follow the rest of the wizard and repeat for each of our files. Once we have these files in, they are each represented as a table within SQL Server. It would be useful to make sure they have an appropriate name as we will be using these names throughout our SQL Script. This will act to make our code more legible; a very important factor when trying to review our code or even more so when someone else is trying to review our code. We can achieve this renaming using the GUI.