Exercise 4#

Due date: Please complete this exercise by the end of day on Wednesday 28.2.

Exercise 4 - Start your assignment

You can start working on your personal copy of the Exercise by:

accepting the GitHub Classroom assignment.

Notice that if you are using GitHub Classroom for the first time, it might ask from you a permission to verify your GitHub identity. In such case, choose “Authorize GitHub Classroom”.

You can also take a look at the open course copy of Exercise 4 in the course GitHub repository (does not require logging in). Note that you should not try to make changes to this copy of the exercise, but rather only to the copy available via GitHub Classroom.

Cloud computing environment#

After you have your personal exercise in GitHub, start doing the programming using CSC Notebooks:

Using Git#

Note

We will use git and GitHub when working with the exercises. You can find instructions for using git and the Jupyter Lab git plugin in here.

Hints#

How to easily create an interactive visualization from pandas or geopandas?#

It is very easy to create an interactive visualization from any data presented as pandas DataFrame or geopandas GeoDataFrame, using either the built-in geopandas function .explore() or using a hvplot library. In this hint, we show both ways. To be able to use hvplot() functionality, we need to import the pandas extension (hvplot.pandas) that provides us extended capabilities that we can use with our DataFrames and GeoDataFrames. You can then for example do following:

import hvplot.pandas
import geopandas as gpd

# Fetch sample data
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head(2)

	pop_est	continent	name	iso_a3	gdp_md_est	geometry
0	889953.0	Oceania	Fiji	FJI	5496	MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1	58005463.0	Africa	Tanzania	TZA	63177	POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...

# Plot all countries as an interactive map using the .explore()
world.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

# Plot African countries on top of CartoDB Positron baselayer style with hvplot
africa = world.loc[world["continent"]=="Africa"].copy()

# Plot using hvplot
africa.hvplot(geo=True, width=500, height=500, tiles="CartoDark", color=None, line_color="green", alpha=0.4)

How to count values in pandas and visualize them interactively?#

import pandas as pd
import seaborn as sns

# Load some sample data
data = sns.load_dataset('flights')

data.head(3)

	year	month	passengers
0	1949	Jan	112
1	1949	Feb	118
2	1949	Mar	132

# Let's take a random sample from the data for demonstration purposes
data = data.sample(n=70)
data.shape

(70, 3)

# Add day info as the first day of the month
data["day"] = 1

# Convert month names to integers
data["month"] = pd.to_datetime(data["month"], format="%b").dt.month

# Generate datetime index from year, month and day
data["time"] = pd.to_datetime(data[["year", "month", "day"]])

# Convert the time to timestamp string with specific format (Year-Month-Day Hour:Minute:Second)
data["timestamp"] = data["time"].dt.strftime("%Y-%m-%d %H:%M:%S")

# Set the time as index
data = data.set_index("time")
data.head(2)

	year	month	passengers	day	timestamp
time
1951-03-01	1951	3	178	1	1951-03-01 00:00:00
1958-01-01	1958	1	340	1	1958-01-01 00:00:00

# Count how many values there are per "month" column within a year key work: "A" 
# Check most typical ways to sample temporal data from here (e.g. how to do this on a minutely frequency): https://stackoverflow.com/a/19821311
data["month"].resample("A").count()

time
1949-12-31    4
1950-12-31    9
1951-12-31    7
1952-12-31    6
1953-12-31    8
1954-12-31    6
1955-12-31    5
1956-12-31    6
1957-12-31    5
1958-12-31    7
1959-12-31    4
1960-12-31    3
Freq: A-DEC, Name: month, dtype: int64

As we can see, there are now different number of months for each year because we picked randomly 70 months from our data.

We can plot the counts as an interactive bar graph by:

data["timestamp"].resample("A").count().hvplot()

How to plot an interactive histogram?#

PLotting an interactive histogram can be in a similar manner as above that we did with monthly counts.

# Load some sample data
data = sns.load_dataset('penguins')
data.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

# Plot a histogram showing the counts of "flipper_length_mm" attribute
data.hvplot.hist("flipper_length_mm")

Exercise 4

Contents

Exercise 4#

Cloud computing environment#

Using Git#

Hints#

How to easily create an interactive visualization from pandas or geopandas?#

How to count values in pandas and visualize them interactively?#

How to plot an interactive histogram?#