Multidimensional Tiling

Multidimensional tiling is the mechanism used to partition the domain of temporal values in buckets or tiles of varying number of dimensions. In the case of a single dimension, the domain can be partitioned by value or by time using buckets of the same size or duration, respectively. For temporal numbers, the domain can be partitioned in two-dimensional tiles of the same size for the value dimension and the same duration for the time dimension. For temporal points, the domain can be partitioned in space in two- or three-dimensional tiles, depending on the number of dimensions of the spatial coordinates. Finally, for temporal points, the domain can be partitioned in space and time using three- or four-dimensional tiles. Furthermore, the temporal values can also be fragmented according to a multidimensional grid defined over the underlying domain.

Multidimensional tiling can be used for various purposes. For example, it can be used for computing multidimensional histograms, where the temporal values are aggregated according to the underlying partition of the domain. On the other hand, multidimensional tiling can be used for distributing a dataset across a cluster of servers, where each server contains a partition of the dataset. The advantage of this partition mechanism is that it preserves proximitity in space and time, unlike the traditional hash-based partition mechanisms used in big data environments.

Figure 5.2. Multidimensional tiling for temporal floats.


Figure 5.2, “Multidimensional tiling for temporal floats.” illustrates multidimensional tiling for temporal floats. The two-dimensional domain is split into tiles having the same size for the value dimension and the same duration for the time dimension. Suppose that this tiling scheme is used for distribute a dataset across a cluster of six servers, as suggested by the gray pattern in the figure. In this case, the values are fragmented so each server will receive the data of contiguous tiles. This implies in particular that four nodes will receive one fragment of the temporal float depicted in the figure. One advantage of this distribution of data based on multidimensional tiling is that it reduces the data that needs to be exchanged between nodes when processing queries, a process typically referred to as reshuffling.

Many of the functions in this section are set-returning functions (also known as a table functions) since they typically return more than one value. In this case, the functions are marked with the symbol.

Bucket Functions

  • Returns a set of couples (index, bucket) that cover the range or period with buckets of the same size or duration aligned with the origin.

    If the origin is not specified, it is set by default to 0 for ranges and Monday, January 3, 2000 for periods. The indices start by 1.

    bucketList(bounds range,size number,origin number=0): setof index_range

    bucketList(bounds period,duration interval,origin timestamptz='2000-01-03'):

    setof index_period

    SELECT (bl).index, (bl).range
    FROM (SELECT bucketList(tint '[15@2000-01-01, 25@2000-01-10]'::intrange, 2) AS bl) t;
    -- 1 | [14,16)
       2 | [16,18)
       3 | [18,20)
       ...
    SELECT bucketList(tfloat '[-1@2000-01-01, -10@2000-01-10]'::floatrange, 2, -7);
    -- (1,"[-11,-9)")
       (2,"[-9,-7)")
       (3,"[-7,-5)")
       ...
    SELECT (bl).index, (bl).period
    FROM (SELECT bucketList(tfloat '[1@2000-01-15, 10@2000-01-25]'::period,'2 days') AS bl) t;
    -- 1 | [2000-01-15, 2000-01-17)
       2 | [2000-01-17, 2000-01-19)
       3 | [2000-01-19, 2000-01-21)
       ...
    SELECT bucketList(tfloat '[1@2000-01-15, 10@2000-01-25]'::period, '2 days', '2000-01-02');
    -- (1,"[2000-01-14, 2000-01-16)")
       (2,"[2000-01-16, 2000-01-18)")
       (3,"[2000-01-18, 2000-01-20)")
       ...
    
  • Returns the start value of the bucket that contains the input number.

    If the origin is not specified, it is set by default to 0.

    valueBucket(value number,size number,origin number=0): number

    SELECT valueBucket(3, 2);
    -- 2
    SELECT valueBucket(3.5, 2.5, 1.5);
    -- 1.5
    
  • Returns the range in the bucket space that contains the input number.

    If the origin is not specified, it is set by default to 0.

    rangeBucket(value number,size number,origin number=0): range

    SELECT rangeBucket(2, 2);
    -- [2,4)
    SELECT rangeBucket(2, 2, 1);
    -- [1,3)
    SELECT rangeBucket(2, 2.5);
    -- [0,2.5)
    SELECT rangeBucket(2, 2.5, 1.5);
    -- [1.5,4)
    
  • Returns the start timestamp of the bucket that contains the input timestamp.

    If the origin is not specified, it is set by default to Monday, January 3, 2000.

    timeBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03'):

    timestamptz

    SELECT timeBucket(timestamptz '2020-05-01', interval '2 days');
    -- 2020-04-29 01:00:00+02
    SELECT timeBucket(timestamptz '2020-05-01', interval '2 days', timestamptz '2020-01-01');
    -- 2020-04-30 01:00:00+02
    
  • Returns the period in the bucket space that contains the input timestamp.

    If the origin is not specified, it is set by default to Monday, January 3, 2000.

    periodBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03'):

    period

    SELECT periodBucket('2000-01-04', interval '1 week');
    -- [2000-01-03, 2000-01-10)
    SELECT periodBucket('2000-01-04', interval '1 week', '2000-01-07');
    -- [1999-12-31, 2000-01-07)
    

Grid Functions

  • Returns a set of couples (index, tile) that covers the box with multidimensional tiles of the same size and duration.

    If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or 'Point(0 0 0)' for the value dimension (depending on the box type) and to Monday, January 3, 2000 for the time dimension.

    multidimGrid(bounds tbox,size float,duration interval,vorigin float=0,

    torigin timestamptz='2000-01-03'): setof index_box

    multidimGrid(bounds stbox,size float,sorigin geometry='Point(0 0 0)'):

    setof index_box

    multidimGrid(bounds stbox,size float,duration interval,sorigin geometry='Point(0 0 0)',

    torigin timestamptz='2000-01-03'): setof index_box

    In the case of a spatiotemporal grid, the SRID of the tile coordinates is determined by the input box and the size is given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.

    SELECT (gr).index, (gr).box
    FROM (SELECT multidimGrid(tfloat '[15@2000-01-15, 25@2000-01-25]'::tbox, 2.0, '2 days')
      AS gr) t;
    -- 1 | TBOX((14,2000-01-15),(16,2000-01-17))
       2 | TBOX((16,2000-01-15),(18,2000-01-17))
       3 | TBOX((18,2000-01-15),(20,2000-01-17))
       ...
    SELECT multidimGrid(tfloat '[15@2000-01-15, 25@2000-01-25]'::tbox, 2.0, '2 days', 11.5);
    -- (1,"TBOX((13.5,2000-01-15),(15.5,2000-01-17))")
       (2,"TBOX((15.5,2000-01-15),(17.5,2000-01-17))")
       (3,"TBOX((17.5,2000-01-15),(19.5,2000-01-17))")
       ...
    SELECT multidimGrid(tgeompoint '[Point(3 3)@2000-01-15,
      Point(15 15)@2000-01-25]'::stbox, 2.0);
    -- (1,"STBOX((2,2),(4,4))")
       (2,"STBOX((4,2),(6,4))")
       (3,"STBOX((6,2),(8,4))")
       ...
    SELECT multidimGrid(tgeompoint 'SRID=3812;[Point(3 3)@2000-01-15,
      Point(15 15)@2000-01-25]'::stbox, 2.0, geometry 'Point(3 3)');
    -- (1,"SRID=3812;STBOX((3,3),(5,5))")
       (2,"SRID=3812;STBOX((5,3),(7,5))")
       (3,"SRID=3812;STBOX((7,3),(9,5))")
       ...
    SELECT multidimGrid(tgeompoint '[Point(3 3 3)@2000-01-15,
      Point(15 15 15)@2000-01-25]'::stbox, 2.0, geometry 'Point(3 3 3)');
    -- (1,"STBOX Z((3,3,3),(5,5,5))")
       (2,"STBOX Z((5,3,3),(7,5,5))")
       (3,"STBOX Z((7,3,3),(9,5,5))")
       ...
    SELECT multidimGrid(tgeompoint '[Point(3 3)@2000-01-15,
      Point(15 15)@2000-01-25]'::stbox, 2.0, interval '2 days');
    -- (1,"STBOX T((2,2,2000-01-15),(4,4,2000-01-17))")
       (2,"STBOX T((4,2,2000-01-15),(6,4,2000-01-17))")
       (3,"STBOX T((6,2,2000-01-15),(8,4,2000-01-17))")
       ...
    SELECT multidimGrid(tgeompoint '[Point(3 3 3)@2000-01-15,
      Point(15 15 15)@2000-01-25]'::stbox, 2.0, '2 days', 'Point(3 3 3)', '2000-01-15');
    -- (1,"STBOX ZT((3,3,3,2000-01-15),(5,5,5,2000-01-17))")
       (2,"STBOX ZT((5,3,3,2000-01-15),(7,5,5,2000-01-17))")
       (3,"STBOX ZT((7,3,3,2000-01-15),(9,5,5,2000-01-17))")
       ...
    
  • Returns the tile of the multidimensional grid that contains the value and the timestamp.

    If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or Point(0 0 0) for the value dimension and Monday, January 3, 2000 for the time dimension, respectively.

    multidimTile(value float,time timestamptz,size float,duration interval,

    vorigin float=0.0,torigin timestamptz='2000-01-03'): tbox

    multidimTile(point geometry,size float,sorigin geometry='Point(0 0 0)'): stbox

    multidimTile(point geometry,time timestamptz,size float,duration interval,sorigin

    geometry='Point(0 0 0)',torigin timestamptz='2000-01-03'): stbox

    In the case of a spatiotemporal grid, the SRID of the tile coordinates is determined by the input point and the size is given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.

    SELECT multidimTile(15, '2000-01-15', 2, interval '2 days');
    -- TBOX((14,2000-01-15),(16,2000-01-17))
    SELECT multidimTile(15, '2000-01-15', 2, interval '2 days', 1, '2000-01-02');
    -- TBOX((15,2000-01-14),(17,2000-01-16))
    SELECT multidimTile(geometry 'Point(1 1 1)', 2.0);
    -- STBOX Z((0,0,0),(2,2,2))
    SELECT multidimTile(geometry 'Point(1 1)', '2000-01-01', 2.0, interval '2 days');
    -- STBOX T((0,0,2000-01-01),(2,2,2000-01-03))
    SELECT multidimTile(geometry 'Point(1 1)', '2000-01-01', 2.0, '2 days', 'Point(1 1)',
      '2000-01-02');
    -- STBOX T((1,1,1999-12-31),(3,3,2000-01-02))
    

Split Functions

These functions fragment a temporal value with respect to a sequence of buckets (see the section called “Bucket Functions”) or a multidimensional grid (see the section called “Grid Functions”).

  • Fragment the temporal number with respect to range buckets.

    valueSplit(value tnumber,size number,origin number=0): setof number_tnumber

    If the origin of values is not specified, it is set by default to 0.

    SELECT (sp).number, (sp).tnumber
    FROM (SELECT valueSplit(tint '[1@2012-01-01, 2@2012-01-02, 5@2012-01-05, 10@2012-01-10]',
      2) AS sp) t;
    --  0 | {[1@2012-01-01 00:00:00+01, 1@2012-01-02 00:00:00+01)}
        2 | {[2@2012-01-02 00:00:00+01, 2@2012-01-05 00:00:00+01)}
        4 | {[5@2012-01-05 00:00:00+01, 5@2012-01-10 00:00:00+01)}
       10 | {[10@2012-01-10 00:00:00+01]}
    SELECT valueSplit(tfloat '[1@2012-01-01, 10@2012-01-10)', 2.0, 1.0);
    -- (1,"{[1@2012-01-01 00:00:00+01, 3@2012-01-03 00:00:00+01)}")
       (3,"{[3@2012-01-03 00:00:00+01, 5@2012-01-05 00:00:00+01)}")
       (5,"{[5@2012-01-05 00:00:00+01, 7@2012-01-07 00:00:00+01)}")
       (7,"{[7@2012-01-07 00:00:00+01, 9@2012-01-09 00:00:00+01)}")
       (9,"{[9@2012-01-09 00:00:00+01, 10@2012-01-10 00:00:00+01)}")
    
  • Fragment the temporal value with respect to time buckets.

    timeSplit(value ttype,duration interval,origin timestamptz='2000-01-03'):

    setof time_temp

    If the origin of time is not specified, it is set by default to Monday, January 3, 2000.

    SELECT (ts).time, (ts).temp
    FROM (SELECT timeSplit(tfloat '[1@2012-01-01, 10@2012-01-10)', '2 days') AS ts) t;
    -- 2011-12-31 | [1@2012-01-01, 2@2012-01-02)
       2012-01-02 | [2@2012-01-02, 4@2012-01-04)
       2012-01-04 | [4@2012-01-04, 6@2012-01-06)
       ...
    SELECT (ts).time, astext((ts).temp) AS temp
    FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2012-01-01, Point(10 10)@2012-01-10]',
      '2 days', '2012-01-01') AS ts) AS t;
    -- 2012-01-01 | [POINT Z (1 1 1)@2012-01-01, POINT Z (3 3 3)@2012-01-03)
       2012-01-03 | [POINT Z (3 3 3)@2012-01-03, POINT Z (5 5 5)@2012-01-05)
       2012-01-05 | [POINT Z (5 5 5)@2012-01-05, POINT Z (7 7 7)@2012-01-07)
       ...
    

    Notice that we can fragment a temporal value in cyclic (instead of linear) time buckets. The following two examples show how to fragment a temporal value by hour and by day of the week.

    SELECT (ts).time::time as hour, merge((ts).temp) as temp
    FROM (SELECT timeSplit(tfloat '[1@2012-01-01, 10@2012-01-03]', '1 hour') AS ts) t
    GROUP BY hour ORDER BY hour;
    
    -- 00:00:00 | {[1@2012-01-01 00:00:00+01, 1.1875@2012-01-01 01:00:00+01),
                   [5.5@2012-01-02 00:00:00+01, 5.6875@2012-01-02 01:00:00+01)}
       01:00:00 | {[1.1875@2012-01-01 01:00:00+01, 1.375@2012-01-01 02:00:00+01),
                   [5.6875@2012-01-02 01:00:00+01, 5.875@2012-01-02 02:00:00+01)}
       02:00:00 | {[1.375@2012-01-01 02:00:00+01, 1.5625@2012-01-01 03:00:00+01),
                   [5.875@2012-01-02 02:00:00+01, 6.0625@2012-01-02 03:00:00+01)}
       03:00:00 | {[1.5625@2012-01-01 03:00:00+01, 1.75@2012-01-01 04:00:00+01),
                   [6.0625@2012-01-02 03:00:00+01, 6.25@2012-01-02 04:00:00+01)}
       ...
    SELECT EXTRACT(DOW FROM (ts).time) as dow_no, TO_CHAR((ts).time, 'Dy') as dow,
      asText(setPrecision(merge((ts).temp), 2)) as temp
    FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2012-01-01, Point(10 10)@2012-01-14)',
      '1 hour') AS ts) t
    GROUP BY dow, dow_no ORDER BY dow_no;
    -- 0 | Sun | {[POINT(1 1)@2012-01-01, POINT(1.69 1.69)@2012-01-02),
                  [POINT(5.85 5.85)@2012-01-08, POINT(6.54 6.54)@2012-01-09)}
       1 | Mon | {[POINT(1.69 1.69)@2012-01-02, POINT(2.38 2.38)@2012-01-03),
                  [POINT(6.54 6.54)@2012-01-09, POINT(7.23 7.23)@2012-01-10)}
       2 | Tue | {[POINT(2.38 2.38)@2012-01-03, POINT(3.08 3.08)@2012-01-04),
                  [POINT(7.23 7.23)@2012-01-10, POINT(7.92 7.92)@2012-01-11)}
       ...
    
  • Fragment the temporal number with respect to the tiles in a value-time grid.

    valueTimeSplit(value tumber,size number,duration interval,vorigin number=0,

    torigin timestamptz='2000-01-03'): setof number_time_tnumber

    If the origin of values and/or time are not specified, they are set by default to 0 and to Monday, January 3, 2000, respectively.

    SELECT (sp).number, (sp).time, (sp).tnumber
    FROM (SELECT valueTimeSplit(tint '[1@2012-01-01, 2@2012-01-02, 5@2012-01-05,
      10@2012-01-10]', 5, '5 days') AS sp) t;
    -- 0 | 2011-12-31 | [1@2012-01-01, 2@2012-01-02, 2@2012-01-05)
       5 | 2012-01-05 | [5@2012-01-05, 5@2012-01-10)
      10 | 2012-01-10 | [10@2012-01-10]
    SELECT (sp).number, (sp).time, (sp).tnumber
    FROM (SELECT valueTimeSplit(tfloat '[1@2012-01-01, 10@2012-01-10)', 5.0, '5 days', 1.0,
      '2012-01-01') AS sp) t;
    -- 1 | 2012-01-01 | [1@2012-01-01, 6@2012-01-06)
       6 | 2012-01-06 | [6@2012-01-06, 10@2012-01-10)
    
  • Fragment the temporal point with respect to the tiles in a spatial grid.

    spaceSplit(value tgeompoint,size float,origin geometry='Point(0 0 0)',

    bitmatrix=TRUE): setof point_tpoint

    If the origin of the space dimension is not specified, it is set by default to 'Point(0 0 0)'. If the argument bitmatrix is not specified, then the computation will use a bit matrix to speed up the process.

    SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceSplit(tgeompoint '[Point(1 1)@2020-03-01, Point(10 10)@2020-03-10]',
      2.0) AS sp) t;
    -- POINT(0 0) | {[POINT(1 1)@2020-03-01, POINT(2 2)@2020-03-02)}
       POINT(2 2) | {[POINT(2 2)@2020-03-02, POINT(4 4)@2020-03-04)}
       POINT(4 4) | {[POINT(4 4)@2020-03-04, POINT(6 6)@2020-03-06)}
       ...
    SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceSplit(tgeompoint '[Point(1 1 1)@2020-03-01,
      Point(10 10 10)@2020-03-10]', 2.0, 'Point(1 1 1)') AS sp) t;
    -- POINT Z (1 1 1) | {[POINT Z (1 1 1)@2020-03-01, POINT Z (3 3 3)@2020-03-03)}
       POINT Z (3 3 3) | {[POINT Z (3 3 3)@2020-03-03, POINT Z (5 5 5)@2020-03-05)}
       POINT Z (5 5 5) | {[POINT Z (5 5 5)@2020-03-05, POINT Z (7 7 7)@2020-03-07)}
       ...
    
  • Fragment the temporal point with respect to the tiles in a spatiotemporal grid.

    spaceTimeSplit(value tgeompoint,size float,duration interval,sorigin

    geometry='Point(0 0 0)',torigin timestamptz='2000-01-03', bitmatrix=TRUE):

    setof point_time_tpoint

    If the origin of the space and time dimensions are not specified, they are set by default to 'Point(0 0 0)' and Monday, January 3, 2000, respectively. If the argument bitmatrix is not specified, then the computation will use a bit matrix to speed up the process.

    SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1)@2020-03-01, Point(10 10)@2020-03-10]',
      2.0, '2 days') AS sp) t;
    -- POINT(0 0) | 2020-03-01 | {[POINT(1 1)@2020-03-01, POINT(2 2)@2020-03-02)}
       POINT(2 2) | 2020-03-01 | {[POINT(2 2)@2020-03-02, POINT(3 3)@2020-03-03]}
       POINT(2 2) | 2020-03-03 | {[POINT(3 3)@2020-03-03, POINT(4 4)@2020-03-04)}
       ...
    SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1 1)@2020-03-01,
      Point(10 10 10)@2020-03-10]', 2.0, '2 days', 'Point(1 1 1)', '2020-03-01') AS sp) t;
    -- POINT Z(1 1 1) | 2020-03-01 | {[POINT Z(1 1 1)@2020-03-01, POINT Z(3 3 3)@2020-03-03)}
       POINT Z(3 3 3) | 2020-03-03 | {[POINT Z(3 3 3)@2020-03-03, POINT Z(5 5 5)@2020-03-05)}
       POINT Z(5 5 5) | 2020-03-05 | {[POINT Z(5 5 5)@2020-03-05, POINT Z(7 7 7)@2020-03-07)}
       ...