Corgi working on a Data Science project. 2022. Roman x Stable Diffusion
In this session we will be trying out some of the techniques already learned on real world data from AirBnb as well as experiment with Kaggle
In this notebook we will be using data from AirBnb for some basic EDA and geoplotting
In this notebook we will be learning how to work with data from Kaggle as well as exercise more simple data-viz.
This is how you can preprocess the GeoCoordinates from the JSON file:
#Load pandas
import pandas as pd
# Read the file from remote
data = pd.read_json('https://admin.opendata.dk/dataset/44ecd686-5cb5-40f2-8e3f-b5e3607a55ef/resource/eeabb0f8-1b19-4c80-b059-5ba5c4c872d2/download/guidedenmarkaalborgenjson.json')
# The GeoCoordinates are hiding in the Address column
data['Address'][0]['GeoCoordinate']
# You can use list comprehension to pull out GeoCoordinates (also empty values) - try out
# This will allow you to filter for missing data without fancy workarounds
[x['GeoCoordinate'] for x in data['Address']]
# Make a new column based on that to be used for filtering out missing data
data['GeoCoordinate'] = [x['GeoCoordinate'] for x in data['Address']]
# drop, where no GeoCoordinate
data = data.dropna(subset=['GeoCoordinate'])
# Pull out the values
data['latitude'] = [x['Latitude'] for x in data['GeoCoordinate']]
data['longitude'] = [x['Longitude'] for x in data['GeoCoordinate']]
Using GeoPandas to analyze geospatial data will be our focus in this notebook.