Step six: SEABORN

A brief EDA using Seaborn

Insights below

Back to the analysis

Seaborn

Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python. This will allow us to conduct a brief EDA very easily.

full notebook

EDA

Looks like our data is almost normally distributed with a few houses dotted up towards the upper end.

We can use a heatmap combined with a correlation function on our dataset to represent the relationships between our data columns.

From this analysis we can see that, as expected, the average price of houses has the highest correlation with hourly pay. In fact, the female hourly pay has the highest correlation with average house price.

We can demonstrate this relationship using lmplot which creates a regression plot. We see that as we’d expect, as the wages increase, so do the house prices.

Having grew up in Kingston Upon Thames this data is of particular interest to me. Let’s examine the gender demographic of our dataset. We can see that there are roughly similar numbers of males and females each year. The number of females is slightly greater, this is likely due to the higher life expectancy of females.

We can represent this more visually on a jointplot.

With a bit of feature engineering, we can do further trend analysis on our data. Let’s split our data into month and year columns.

Now that we have the months separated out, we can see that house prices tend to peak at the end of the year as expected due to prices increasing over time. However interestingly we also see house prices tend to drop in June each year followed by a rapid increase for the latter half of the year.

We can pivot our data so that we have months against years. This is similar to a pivot table in excel.

We can use our pivoted data in a heatmap demonstrating the trend of values. As we’d expect the average prices increase over time.