Basic Data Exploration

The total distance traveled by all ships:

SELECT SUM( length( Trip ) ) FROM Ships;
--500433519.121321
				

This query uses the length function to compute per trip the sailing distance in meters. We then aggregate over all trips and calculate the sum. Let's have a more detailed look, and generate a histogram of trip lengths:

WITH buckets (bucketNo, RangeKM) AS (
  SELECT 1, floatrange '[0, 0]' UNION
  SELECT 2, floatrange '(0, 50)' UNION
  SELECT 3, floatrange '[50, 100)' UNION
  SELECT 4, floatrange '[100, 200)' UNION
  SELECT 5, floatrange '[200, 500)' UNION
  SELECT 6, floatrange '[500, 1500)' UNION
  SELECT 7, floatrange '[1500, 10000)' ),
histogram AS (
  SELECT bucketNo, RangeKM, count(MMSI) as freq
  FROM buckets left outer join Ships on (length(Trip)/1000) <@ RangeKM
  GROUP BY bucketNo, RangeKM
  ORDER BY bucketNo, RangeKM
)
SELECT bucketNo, RangeKM, freq,
  repeat('▪', ( freq::float / max(freq) OVER () * 30 )::int ) AS bar
FROM histogram;
--Total query runtime: 5.6 secs

bucketNo,   bucketRange,        freq	   bar
1;          "[0,0]";            303;       ▪▪▪▪▪
2;          "(0,50)";           1693;      ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
3;          "[50,100)";         267;       ▪▪▪▪▪
4;          "[100,200)";        276;       ▪▪▪▪▪
5;          "[200,500)";        361;       ▪▪▪▪▪▪
6;          "[500,1500)";       86;        ▪▪
7;          "[1500,10000)";     6;
				

Surprisingly there are trips with zero length. These are clearly noise that can be deleted. Also there are very many short trips, that are less than 50 km long. On the other hand, there are few long trips that are more than 1,500 km long. Let's visualize these last two cases in Figure 1.3, “Visualizing trips with abnormal lengths”. They look like noise. Normally one should validate more, but to simplify this module, we consider them as noise, and delete them.

DELETE FROM Ships
WHERE length(Trip) = 0 OR length(Trip) >= 1500000;
-- Query returned successfully in 7 secs 304 msec.
				

Now the Ships table looks like Figure 1.4, “Ship trajectories after filtering”.

Figure 1.3. Visualizing trips with abnormal lengths

Visualizing trips with abnormal lengths

Figure 1.4. Ship trajectories after filtering

Ship trajectories after filtering

Let's have a look at the speed of the ships. There are two speed values in the data; the speed calculated from the spatiotemporal trajectory speed(Trip), and the SOG attribute. Optimally, the two will be the same. A small variance would still be OK, because of sensor errors. Note that both are temporal floats. In the next query, we compare the averages of the two speed values for every ship:

SELECT ABS(twavg(SOG) * 1.852 - twavg(speed(Trip))* 3.6 ) SpeedDifference
FROM Ships
ORDER BY SpeedDifference DESC;
--Total query runtime: 8.2 secs
--990 rows retrieved.

SpeedDifference
NULL
NULL
NULL
NULL
NULL
107.861100067879
57.1590253627668
42.4207839833568
39.5819188407125
33.6182789410313
30.9078594633161
26.514042447366
22.1312646226031
20.5389022294181
19.8500569368283
19.4134688682774
18.180139457754
17.4859077178001
17.3155991287105
17.1739822139821
12.9571603234404
12.6195380496344
12.2714437568609
10.9619033557275
10.4164745930929
10.3306155308426
9.46457823214455
...
				

The twavg computes a time-weighted average of a temporal float. It basically computes the area under the curve, then divides it by the time duration of the temporal float. By doing so, the speed values that remain for longer durations affect the average more than those that remain for shorter durations. Note that SOG is in knot, and Speed(Trip) is in m/s. The query converts both to km/h.

The query shows that 26 out of the 990 ship trajectories in the table have a difference of more than 10 km/h or NULL. These trajectories are shown in Figure 1.5, “Ship trajectories with big difference between speed(Trip) and SOG. Again they look like noise, so we remove them.

Figure 1.5. Ship trajectories with big difference between speed(Trip) and SOG

Ship trajectories with big difference between speed(Trip) and SOG

Now we do a similar comparison between the calculated azimuth from the spatiotemporal trajectory, and the attribute COG:

SELECT ABS(twavg(COG) - twavg(azimuth(Trip)) * 180.0/pi() ) AzimuthDifference
FROM Ships
ORDER BY AzimuthDifference DESC;
--Total query runtime: 4.0 secs
--964 rows retrieved.

264.838740787458
220.958372832234
180.867071483688
178.774337481463
154.239639388087
139.633953692907
137.347542674865
128.239459879571
121.107566199195
119.843262642657
116.685117326047
116.010477588934
109.830338231363
106.94301191915
106.890186229337
106.55297972109
103.20192549283
102.585009756697
...
				

Here we see that the COG is not as accurate as the SOG attribute. More than 100 trajectories have an azimuth difference bigger than 45 degrees. Figure 1.6, “Ship trajectories with big difference between azimuth(Trip) and COG visualizes them. Some of them look like noise, but some look fine. For simplicity, we keep them all.

Figure 1.6. Ship trajectories with big difference between azimuth(Trip) and COG

Ship trajectories with big difference between azimuth(Trip) and COG