Changing the Simulation Scenario

Changing the Simulation Scenario
Prev	Chapter 2. Generating Realistic Trajectory Datasets	Next

In this workshop, we have used until now the BerlinMOD scenario, which models the trajectories of persons going from home to work in the morning and returning back from work to home in the evening during the week days, with one possible leisure trip during the weekday nights and two possible leisure trips in the morning and in the afternoon of the weekend days. In this section, we devise another scenario for the data generator. This scenario corresponds to a home appliance shop that has several warehouses located in various places of the city. From each warehouse, the deliveries of appliances to customers are done by vehicles belonging to the warehouse. Although this scenario is different than BerlinMOD, many things can be reused and adapted. For example, home nodes can be replaced by warehouse locations, leisure destinations can be replaced by customer locations, and in this way many functions of the BerlinMOD SQL code will work directly. This is a direct benefit of having the simulation code written in SQL, so it will be easy to adapt to other scenarios. We describe next the needed changes.

Each day of the week excepted Sundays, deliveries of appliances from the warehouses to the customers are organized as follows. Each warehouse has several vehicles that make the deliveries. To each vehicle is assigned a list of customers that must be delivered during a day. A trip for a vehicle starts and ends at the warehouse and make the deliveries to the customers in the order of the list. Notice that in a real-world situation, the scheduling of the deliveries requires to take into account customers' availability in a time slot of a day and the time needed to make the delivery of the previous customers in the list. We do not take into account these aspects in this simple simulation scenario.

To be able to run the delivery generator you need to execute the first two steps specified in the section called “Quick Start” to load the street network and prepare the base data for simulation, if not done already. The delivery generator can then be run as follows.

psql -h localhost -p 5432 -U dbowner -d brussels -f deliveries_datagenerator.sql
# adds the pgplsql functions of the simulation to the database

psql -h localhost -p 5432 -U dbowner -d brussels \
-c 'select deliveries_generate(scaleFactor := 0.005)'
# calls the main pgplsql function to start the simulation

If everything is correct, you should see an output like that starts with this:

INFO:  -----------------------------------------------------------------------
INFO:  Starting deliveries generation with scale factor 0.005
INFO:  -----------------------------------------------------------------------
INFO:  Parameters:
INFO:  ------------
INFO:  No. of warehouses = 7, No. of vehicles = 141, No. of days = 4
INFO:  Start day = 2020-06-01, Path mode = Fastest Path, Disturb data = f
...

The generator will take about one minute. It will generate deliveries, according to the default parameters, for 141 cars over 2 days starting from Monday, June 1^st 2020. It is possible to generate more or less data by respectively passing a bigger or a smaller scale factor value. Please refer to the the section called “Customizing the Generator to Your City” to see all the parameters that can be used to customize the simulation, with the exception of the P_NEIGHBOURHOOD_RADIUS parameter, which is not used in this scenario.

We describe next the main steps in the generation of the deliveries scenario.

We start by generating the Warehouses table. Each warehouse is located at a random node of the network.

DROP TABLE IF EXISTS Warehouses;
CREATE TABLE Warehouses(warehouseId int, nodeId bigint, geom geometry(Point));
FOR i IN 1..noWarehouses LOOP
  INSERT INTO Warehouses(warehouseId, nodeId, geom)
  SELECT i, id, geom
  FROM Nodes N
  ORDER BY id LIMIT 1 OFFSET random_int(1, noNodes);
END LOOP;

We create a table Vehicles with all vehicles and the associated warehouse. Warehouses are associated to vehicles in a round-robin way.

DROP TABLE IF EXISTS Vehicles;
CREATE TABLE Vehicles(vehicleId int PRIMARY KEY, licence text, type text, brand text,
  warehouse int);
FOR i IN 1..noVehicles LOOP
  licence = berlinmod_createLicence(i);
  type = VEHICLETYPES[random_int(1, NOVEHICLETYPES)];
  brand = NOVEHICLEBRANDS[random_int(1, NOVEHICLEBRANDS)];
  warehouse = 1 + ((i - 1) % noWarehouses);
  INSERT INTO Vehicles VALUES (i, licence, type, brand, warehouse);
END LOOP;

We create next the Trips and Destinations tables that contain, respectively, the list of source and destination nodes composing the delivery trip of a vehicle for a day, and the list of source and destination nodes for all vehicles.

DROP TABLE IF EXISTS Trips;
CREATE TABLE Trips(vehicle int, day date, seq int, source bigint, target bigint,
  PRIMARY KEY (vehicle, day, seq));
DROP TABLE IF EXISTS Destinations;
CREATE TABLE Destinations(id serial PRIMARY KEY, source bigint, target bigint);
-- Loop for every vehicle
FOR i IN 1..noVehicles LOOP
  -- Get the warehouse node 
  SELECT W.node INTO warehouseNode
  FROM Vehicles V, Warehouses W
  WHERE V.vehicleId = i AND V.warehouse = W.warehouseId;
  day = startDay;
  -- Loop for every generation day
  FOR j IN 1..noDays LOOP
    -- Generate delivery trips excepted on Sunday
    IF date_part('dow', day) <> 0 THEN
      -- Select a number of destinations between 3 and 7
      SELECT random_int(3, 7) INTO noDest;
      sourceNode = warehouseNode;
      prevNodes = '{}';
      FOR k IN 1..noDest + 1 LOOP
        IF k <= noDest THEN
          targetNode = deliveries_selectDestNode(i, noNodes, prevNodes);
          prevNodes = prevNodes || targetNode;
        ELSE
          targetNode = warehouseNode;
        END IF;
        IF targetNode IS NULL THEN
          RAISE EXCEPTION '    Destination node cannot be NULL';
        END IF;
        IF sourceNode = targetNode THEN
          RAISE EXCEPTION '    Source and destination nodes must be different, node: %', sourceNode;
        END IF;
        -- Keep the start and end nodes of each subtrip
        INSERT INTO Segments VALUES (i, day, k, sourceNode, targetNode);
        INSERT INTO Destinations(source, target) VALUES (sourceNode, targetNode);
        sourceNode = targetNode;
      END LOOP;
    END IF;
    day = day + interval '1 day';
  END LOOP;
END LOOP;

For every vehicle and every day which is not Sunday we proceed as follows. We randomly chose a number between 3 and 7 destinations and call the function deliveries_selectDestNode for determining these destinations. This function choses a destination node which is different from the previous nodes of the delivery, which are kept in the variable prevNodes. Then, the sequence of source and destination couples starting in the warehouse, visiting sequentially the clients to deliver and returning to the warehouse are added to the tables Segments and Destinations.

Next, we compute the paths between all source and target nodes that are in the Destinations table. Such paths are generated by pgRouting and stored in the Paths table.

DROP TABLE IF EXISTS Paths;
CREATE TABLE Paths(seq int, path_seq int, start_vid bigint, end_vid bigint,
  node bigint, edge bigint, cost float, agg_cost float,
  -- These attributes are filled in the subsequent update
  geom geometry, speed float, category int);
-- Select query sent to pgRouting
IF pathMode = 'Fastest Path' THEN
  query1_pgr = 'SELECT id, source, target, cost_s AS cost, '
    'reverse_cost_s as reverse_cost FROM edges';
ELSE
  query1_pgr = 'SELECT id, source, target, length_m AS cost, '
    'length_m * sign(reverse_cost_s) as reverse_cost FROM edges';
END IF;
-- Get the total number of paths and number of calls to pgRouting
SELECT COUNT(*) INTO noPaths FROM (SELECT DISTINCT source, target FROM Destinations) AS T;
noCalls = ceiling(noPaths / P_PGROUTING_BATCH_SIZE::float);
FOR i IN 1..noCalls LOOP
  query2_pgr = format('SELECT DISTINCT source, target FROM Destinations '
    'ORDER BY source, target LIMIT %s OFFSET %s',
    P_PGROUTING_BATCH_SIZE, (i - 1) * P_PGROUTING_BATCH_SIZE);
  INSERT INTO Paths(seq, path_seq, start_vid, end_vid, node, edge, cost, agg_cost)
  SELECT * FROM pgr_dijkstra(query1_pgr, query2_pgr, true);
END LOOP;
UPDATE Paths SET
  -- adjusting directionality
  geom = CASE WHEN node = E.source THEN E.geom ELSE ST_Reverse(E.geom) END,
  speed = maxspeed_forward,
  category = berlinmod_roadCategory(tag_id)
FROM Edges E WHERE E.id = edge;

After creating the Paths table, we set the query to be sent to pgRouting depending on whether we have want to compute the fastest or the shortest paths between two nodes. The generator uses the parameter P_PGROUTING_BATCH_SIZE to determine the maximum number of paths we compute in a single call to pgRouting. This parameter is set to 10,000 by default. Indeed, there is limit in the number of paths that pgRouting can compute in a single call and this depends in the available memory of the computer. Therefore, we need to determine the number of calls to pgRouting and compute the paths by calling the function pgr_dijkstra. Finally, we need to adjust the directionality of the geometry of the edges depending on which direction a trip traverses the edges, and set the speed and the category of the edges.

We explain how to generate the trips for a number of vehicles and a number of days starting at a given day.

DROP TABLE IF EXISTS Deliveries;
CREATE TABLE Deliveries(deliveryId int PRIMARY KEY, Vehicle int, Day date, noCustomers int,
  Trip tgeompoint, Trajectory geometry);
DROP TABLE IF EXISTS Segments;
CREATE TABLE Segments(deliveryId int, seq int, source bigint, target bigint, 
  trip tgeompoint,
  -- These columns are used for visualization purposes
  trajectory geometry, sourceGeom geometry, PRIMARY KEY (deliveryId, seq));
delivId = 1;
aDay = startDay;
FOR i IN 1..noDays LOOP
  SELECT date_part('dow', aDay) into weekday;
  -- 6: saturday, 0: sunday
  IF weekday <> 0 THEN
    FOR j IN 1..noVehicles LOOP
      -- Start delivery
      t = aDay + time '07:00:00' + createPauseN(120);
      -- Get the number of segments (number of destinations + 1)
      SELECT count(*) INTO noSegments
      FROM Trips
      WHERE vehicle = j AND day = aDay;
      FOR k IN 1..noSegments LOOP
        -- Get the source and destination nodes of the segment
        SELECT source, target INTO sourceNode, targetNode
        FROM Trips
        WHERE vehicle = j AND day = aDay AND seq = k;
        -- Get the path
        SELECT array_agg((geom, speed, category) ORDER BY path_seq) INTO path
        FROM Paths P
        WHERE start_vid = sourceNode AND end_vid = targetNode AND edge > 0;
        IF path IS NULL THEN
          RAISE EXCEPTION 'The path of a trip cannot be NULL. '
            'Source node: %, target node: %, k: %, noSegments: %', sourceNode, 
              targetNode, k, noSegments;
        END IF;
        startTime = t;
        trip = create_trip(path, t, disturbData, messages);
        IF trip IS NULL THEN
          RAISE EXCEPTION 'A trip cannot be NULL';
        END IF;
        t = endTimestamp(trip);
        tripTime = t - startTime;
        IF k < noSegments THEN
          -- Add a delivery time in [10, 60] min using a bounded Gaussian distribution
          deliveryTime = random_boundedgauss(10, 60) * interval '1 min';
          t = t + deliveryTime;
          trip = appendInstant(trip, tgeompoint_inst(endValue(trip), t));
        END IF;
        alltrips = alltrips || trip;
        SELECT geom INTO sourceGeom FROM Nodes WHERE id = sourceNode;
        INSERT INTO Segments(deliveryId, seq, source, target, trip, trajectory, sourceGeom)
          VALUES (delivId, k, sourceNode, targetNode, trip, trajectory(trip), sourceGeom);
      END LOOP;
      trip = merge(alltrips);
      INSERT INTO Deliveries(deliveryId, vehicle, day, noCustomers, trip, trajectory)
        VALUES (delivId, j, aDay, noSegments - 1, trip, trajectory(trip));
      delivId = delivId + 1;
      alltrips = '{}';
    END LOOP;
  END IF;
  aDay = aDay + interval '1 day';
END LOOP;

We start by creating the tables Deliveries and Segments. Then, the procedure simply loops for each day (excepted Sundays) and for each vehicle and generates the deliveries. For this, we first set the start time of a delivery trip by adding to 7 am a random non-zero duration of 120 minutes using a uniform distribution. Then, for every couple of source and destination nodes in a segment, we call the function create_trip that we have seen previously to generate the trip. We add a delivery time between 10 and 60 minutes using a bounded Gaussian distribution before starting the trip to the next customer or the return trip to the warehouse and then insert the trip into the Segments table.

Figure 2.6, “Visualization of the data generated for the deliveries scenario. The road network is shown with blue lines, the warehouses are shown with a red star, the routes taken by the deliveries are shown with black lines, and the location of the customers with black points.” and Figure 2.7, “Visualization of the deliveries of one vehicle during one day. A delivery trip starts and ends at a warehouse and make the deliveries to several customers, four in this case.” show visualizations of the data generated for the deliveries scenario.

Figure 2.6. Visualization of the data generated for the deliveries scenario. The road network is shown with blue lines, the warehouses are shown with a red star, the routes taken by the deliveries are shown with black lines, and the location of the customers with black points.

Visualization of the data generated for the deliveries scenario. The road network is shown with blue lines, the warehouses are shown with a red star, the routes taken by the deliveries are shown with black lines, and the location of the customers with black points.

Figure 2.7. Visualization of the deliveries of one vehicle during one day. A delivery trip starts and ends at a warehouse and make the deliveries to several customers, four in this case.

Visualization of the deliveries of one vehicle during one day. A delivery trip starts and ends at a warehouse and make the deliveries to several customers, four in this case.

Prev	Up	Next
Tuning the Generator Parameters	Home	Creating a Graph from Input Data