Tutorial 4: Trajectory data mining in Python#


Finnish university students are encouraged to use the CSC Notebooks platform.
CSC badge

Others can follow the lesson interactively using Binder. Check the rocket icon on the top of this page.

In this tutorial, we will learn how to conduct exploratory space-time data analysis (ESTDA) based on movement data. When analyzing mobility data, using various visualization approaches (visual analytics) is important to be able to understand the data and extract insights from it. Here, we will learn a few tricks how we can manipulate movement data and visualize it using static and interactive visualizations in Python.


This tutorial is partly based on excellent resources provided by Anita Graser and the documentation of movingpandas library.

import pandas as pd
from keplergl import KeplerGl
import geopandas as gpd
import movingpandas as mpd
from datetime import datetime, timedelta
import plotly.express as px
import warnings

Input data#

In this tutorial, we will use AIS data that describes vessel movements (obtained from movingpandas). The AIS data has been published by the Danish Maritime Authority. The AIS record sample extracted for this tutorial covers vessel traffic on the 5th July 2017 near Gothenburg. Let’s start by reading it and checking how the data looks like:

# Filepath
fp = "data/ais.gpkg"
data = gpd.read_file(fp);
data.plot(figsize=(12,12), markersize=0.7, alpha=0.7)
<Axes: >
(84702, 8)
# Check first rows
Timestamp MMSI NavStatus SOG COG Name ShipType geometry
0 05/07/2017 00:00:03 219632000 Under way using engine 0.0 270.4 None Undefined POINT (11.85958 57.68817)
1 05/07/2017 00:00:05 265650970 Under way using engine 0.0 0.5 None Undefined POINT (11.84175 57.66150)
2 05/07/2017 00:00:06 265503900 Under way using engine 0.0 0.0 None Undefined POINT (11.90650 57.69077)
3 05/07/2017 00:00:14 219632000 Under way using engine 0.0 188.4 None Undefined POINT (11.85958 57.68817)
4 05/07/2017 00:00:19 265519650 Under way using engine 0.0 357.2 None Undefined POINT (11.87192 57.68233)

Okay as we can see, we have around 80,000 recorded observations and information about the timestamp, geometry, ship type etc. SOG column contains information about speed over ground. Let’s take a look at the data distribution:

data['SOG'].hist(bins=100, figsize=(15,3))
<Axes: >

As we can see, most of the observations are actually such where the vessels have not been moving, i.e. they are staying at the harbor. It is not useful to keep such records in our data, so let’s get rid off those:

data = data[data.SOG>0].copy()
(33593, 8)

Okay, now we have much less observations. Let’s at this point calculate x and y coordinates based on our Point geometries, so that we can easily plot our data in 3D:

# Calculate x and y coordinates
data["x"] = data["geometry"].x
data["y"] = data["geometry"].y
Timestamp MMSI NavStatus SOG COG Name ShipType geometry x y
6 05/07/2017 00:00:28 265647200 Moored 0.1 298.0 None Undefined POINT (11.89120 57.68569) 11.891205 57.685688
7 05/07/2017 00:00:31 265663280 Under way using engine 0.1 131.6 None Undefined POINT (11.87037 57.68194) 11.870370 57.681937
22 05/07/2017 00:01:03 219632000 Under way using engine 0.1 233.9 None Undefined POINT (11.85958 57.68817) 11.859583 57.688167
32 05/07/2017 00:01:30 265663280 Under way using engine 0.1 348.6 None Undefined POINT (11.87037 57.68194) 11.870368 57.681942
85 05/07/2017 00:03:35 376474000 Under way using engine 8.8 229.2 LARGONA Cargo POINT (11.91198 57.69591) 11.911977 57.695910

Constructing trajectories#

Next, we want to convert our individual point observations into trajectories. This can be done easily using the TrajectoryCollection functionality / class provided by movingpandas (see docs). To be able to use this functionality, we first need to parse a DateTime index based on the Timestamp column, and that the MMSI number (Maritime Mobile Service Idenity) is used as the unique id for a ship. We also specify that the minimum length for a trajectory needs to be at least 100 meters:

# Create trajectories
data['time'] = pd.to_datetime(data['Timestamp'], format='%d/%m/%Y %H:%M:%S')
data = data.set_index('time')

# Specify minimum length for a trajectory (in meters)
minimum_length = 100 
collection = mpd.TrajectoryCollection(data, 'MMSI', min_length=minimum_length)

# What did we get?
TrajectoryCollection with 77 trajectories

As a result, we get an object that represents the collection of trajectories identified by movingpandas. We can plot all the trajectories easily by:

# Plot all trajectories
<Axes: >

We can access each of these trajectories individually by looping over our collection. We can also easily calculate the speed and add information about the direction of movement to our data by:

for trajectory in collection.trajectories:
    # Calculate speed
    # Determine direction of movement
    # Let's first check only the first trajectory (i.e. stop iteration after first run)

Let’s see what we got. We can access the DataFrame of the given trajectory by:

Timestamp MMSI NavStatus SOG COG Name ShipType geometry x y speed direction
2017-07-05 17:32:18 05/07/2017 17:32:18 210035000 Under way using engine 9.8 52.8 NORDIC HAMBURG Cargo POINT (11.80462 57.67612) 11.804620 57.676125 5.037170 51.697066
2017-07-05 17:32:28 05/07/2017 17:32:28 210035000 Under way using engine 9.8 51.0 NORDIC HAMBURG Cargo POINT (11.80528 57.67641) 11.805283 57.676405 5.037170 51.697066
2017-07-05 17:32:36 05/07/2017 17:32:36 210035000 Under way using engine 9.8 51.0 NORDIC HAMBURG Cargo POINT (11.80528 57.67641) 11.805283 57.676405 0.000000 0.000000
2017-07-05 17:32:38 05/07/2017 17:32:38 210035000 Under way using engine 9.8 51.0 NORDIC HAMBURG Cargo POINT (11.80528 57.67641) 11.805283 57.676405 0.000000 0.000000
2017-07-05 17:32:48 05/07/2017 17:32:48 210035000 Under way using engine 9.7 52.0 NORDIC HAMBURG Cargo POINT (11.80668 57.67699) 11.806675 57.676992 10.569682 51.737848

As we can see, now we have new columns speed and direction added to the data which contain information about the speed and direction. We can plot the single trajectory using geopandas:

<Axes: >

This kind of individual trajectory is not very informative as such. We can add more context to our visualization by adding a background map and making it interactive. Movingpandas provides an easy-to-use api to plot the trajectory on top of OSM basemap with Holoviews/Bokeh:

trajectory.hvplot(frame_height=600, frame_width=800, line_width=3, line_color="red")

This map provides much more contextual information and makes it possible to understand that the trajectory is a ship which is approaching or leaving the harbor of Gothenburg (tricky to say at this point).

Exploratory analysis with Space Time Cube (STC)#

It is also fairly easy to visualize our trajectory in 3D using space time cube, in which the 3rd dimension is based on time. There are different libraries that can be used for plotting data in 3D, but we will use here plotly library which provides a relatively easy and straightforward API to generate 3D visualizations. For plotting data in 3D Let’s first modify our datetime information a bit, and only keep the time because all of our values are from the same date (5th of July, 2017):

# Extract the time component
trajectory.df["time"] = pd.to_datetime(trajectory.df["Timestamp"]).dt.strftime("%H:%M")

# Plot the trajectory in space-time cube
fig = px.line_3d(trajectory.df, x="x", y="y", z="time", color='MMSI')

# Want to get background map as well? It is possible:
# https://chart-studio.plotly.com/~empet/14397/plotly-plot-of-a-map-from-data-available/#/