Visualizations with Tableau

Recently, I experimented with creating visualizations with Tableau. You can click on the image above to see the dashboard. Later I will stream my aquaponics data into Tableau from Azure SQL.

The data I am using is from Open.Georgia.gov (ga.gov). The data does not have coordinates, so I created a script to add the coordinates in. There is a small issue with the data, which can be easily fixed by telling Tableau that text is wrapped by single quotation marks, and you need to uncheck ‘Field names are in first row.’ You just have to rename the columns later. (Or you could just delete the first row from the data.)

Tableau is less “busy” than Power BI. Power BI has many options, which can be disorienting. This data is pretty large, around 500MB, and Tableau handles the data really well.

Tableau uses a very simple language in its calculated field – it feels like a combination of Python and SQL. On the other hand, Power BI uses DAX and M – OMG!!!! I am so glad both Tableau and Power BI support Python.

In pandas, if we wish to find all rows such that TITLE contains “PROFESSOR,” we would use

professor_df = salary[salary.TITLE.str.contains("PROFESSOR")]

The language in Tableau is very similar. Below, I created a field for profession filter:

IF CONTAINS([TITLE],[PROFESSIONS])
THEN [SALARY]
ELSE NULL
END

Here, in Tableau we are using parameters for professions and aggregations. Finally, we create an aggregation field using CASE:

CASE [AGG]
WHEN "avg" THEN AVG([TITLE FILTER])
WHEN "median" THEN MEDIAN([TITLE FILTER])
WHEN "max" THEN MAX([TITLE FILTER])
WHEN "min" THEN MIN([TITLE FILTER])
END

Overall, I prefer Tableau for creating sophisticated visualizations; however, ETL is just far superior with Pandas and PySpark.

The data that we used does not have coordinates, so I used geopy, which is open-source. The code is below. First, I extracted universities and colleges from the data, and then I created a new DataFrame from it. Finally, I used geopy to obtain latitude and longitude for each school.

import pandas as pd 
from geopy.geocoders import Nominatim
import numpy as np
#This data was obtained from the following website: https://open.ga.gov/
salary = pd.read_csv('salary.txt',sep=',',quotechar="'",names=['NAME','TITLE','SALARY','TRAVEL','ORGANIZATION','FISCAL YEAR'])
org_df = pd.DataFrame(salary['ORGANIZATION'].unique(),columns=['ORGANIZATION'])
schools = org_df[org_df['ORGANIZATION'].str.contains('UNIVERSITY|COLLEGE|SCHOOL')].reset_index(drop=True)
schools.to_csv('schools.csv',index=False)
geolocator = Nominatim(user_agent="usg")
def find_coords(row):
location = geolocator.geocode(row['ORGANIZATION'])
if location:
return pd.Series({'latitude':location.latitude,'longitude':location.longitude})
else:
return pd.Series({'latitude':np.nan,'longitude':np.nan})
schools[['latitude','longitude']] = schools.apply(find_coords,axis=1)
schools.to_csv('schools_with_coords.csv',index=False)

Tags:

Leave a comment