Multidimensional tiling is the mechanism used to partition the domain of temporal values in buckets or tiles of varying number of dimensions. In the case of a single dimension, the domain can be partitioned by value or by time using buckets of the same size or duration, respectively. For temporal numbers, the domain can be partitioned in two-dimensional tiles of the same size for the value dimension and the same duration for the time dimension. For temporal points, the domain can be partitioned in space in two- or three-dimensional tiles, depending on the number of dimensions of the spatial coordinates. Finally, for temporal points, the domain can be partitioned in space and time using three- or four-dimensional tiles. Furthermore, the temporal values can also be fragmented according to a multidimensional grid defined over the underlying domain.
Multidimensional tiling can be used for various purposes. For example, it can be used for computing multidimensional histograms, where the temporal values are aggregated according to the underlying partition of the domain. On the other hand, multidimensional tiling can be used for distributing a dataset across a cluster of servers, where each server contains a partition of the dataset. The advantage of this partition mechanism is that it preserves proximitity in space and time, unlike the traditional hash-based partition mechanisms used in big data environments.
Figure 5.2 illustrates multidimensional tiling for temporal floats. The two-dimensional domain is split into tiles having the same size for the value dimension and the same duration for the time dimension. Suppose that this tiling scheme is used for distribute a dataset across a cluster of six servers, as suggested by the gray pattern in the figure. In this case, the values are fragmented so each server will receive the data of contiguous tiles. This implies in particular that four nodes will receive one fragment of the temporal float depicted in the figure. One advantage of this distribution of data based on multidimensional tiling is that it reduces the data that needs to be exchanged between nodes when processing queries, a process typically referred to as reshuffling.
Many of the functions in this section are set-returning functions (also known as a table functions) since they typically return more than one value. In this case, the functions are marked with the symbol.
Returns a set of couples (index, bucket) that cover the range or period with buckets of the same size or duration aligned with the origin.
If the origin is not specified, it is set by default to 0 for ranges and Monday, January 3, 2000 for periods. The indices start by 1.
bucketList(bounds range,size number,origin number=0): setof index_range
bucketList(bounds period,duration interval,origin timestamptz='2000-01-03'):
setof index_period
SELECT (bl).index, (bl).range FROM (SELECT bucketList(tint '[15@2000-01-01, 25@2000-01-10]'::intrange, 2) AS bl) t; -- 1 | [14,16) 2 | [16,18) 3 | [18,20) ... SELECT bucketList(tfloat '[-1@2000-01-01, -10@2000-01-10]'::floatrange, 2, -7); -- (1,"[-11,-9)") (2,"[-9,-7)") (3,"[-7,-5)") ... SELECT (bl).index, (bl).period FROM (SELECT bucketList(tfloat '[1@2000-01-15, 10@2000-01-25]'::period,'2 days') AS bl) t; -- 1 | [2000-01-15, 2000-01-17) 2 | [2000-01-17, 2000-01-19) 3 | [2000-01-19, 2000-01-21) ... SELECT bucketList(tfloat '[1@2000-01-15, 10@2000-01-25]'::period, '2 days', '2000-01-02'); -- (1,"[2000-01-14, 2000-01-16)") (2,"[2000-01-16, 2000-01-18)") (3,"[2000-01-18, 2000-01-20)") ...
Returns the start value of the bucket that contains the input number.
If the origin is not specified, it is set by default to 0.
valueBucket(value number,size number,origin number=0): number
SELECT valueBucket(3, 2); -- 2 SELECT valueBucket(3.5, 2.5, 1.5); -- 1.5
Returns the range in the bucket space that contains the input number.
If the origin is not specified, it is set by default to 0.
rangeBucket(value number,size number,origin number=0): range
SELECT rangeBucket(2, 2); -- [2,4) SELECT rangeBucket(2, 2, 1); -- [1,3) SELECT rangeBucket(2, 2.5); -- [0,2.5) SELECT rangeBucket(2, 2.5, 1.5); -- [1.5,4)
Returns the start timestamp of the bucket that contains the input timestamp.
If the origin is not specified, it is set by default to Monday, January 3, 2000.
timeBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03'):
timestamptz
SELECT timeBucket(timestamptz '2020-05-01', interval '2 days'); -- 2020-04-29 01:00:00+02 SELECT timeBucket(timestamptz '2020-05-01', interval '2 days', timestamptz '2020-01-01'); -- 2020-04-30 01:00:00+02
Returns the period in the bucket space that contains the input timestamp.
If the origin is not specified, it is set by default to Monday, January 3, 2000.
periodBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03'):
period
SELECT periodBucket('2000-01-04', interval '1 week'); -- [2000-01-03, 2000-01-10) SELECT periodBucket('2000-01-04', interval '1 week', '2000-01-07'); -- [1999-12-31, 2000-01-07)
Returns a set of couples (index, tile) that covers the box with multidimensional tiles of the same size and duration.
If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or 'Point(0 0 0)'
for the value dimension (depending on the box type) and to Monday, January 3, 2000 for the time dimension.
multidimGrid(bounds tbox,size float,duration interval,vorigin float=0,
torigin timestamptz='2000-01-03'): setof index_box
multidimGrid(bounds stbox,size float,sorigin geometry='Point(0 0 0)'):
setof index_box
multidimGrid(bounds stbox,size float,duration interval,sorigin geometry='Point(0 0 0)',
torigin timestamptz='2000-01-03'): setof index_box
In the case of a spatiotemporal grid, the SRID of the tile coordinates is determined by the input box and the size is given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.
SELECT (gr).index, (gr).box FROM (SELECT multidimGrid(tfloat '[15@2000-01-15, 25@2000-01-25]'::tbox, 2.0, '2 days') AS gr) t; -- 1 | TBOX((14,2000-01-15),(16,2000-01-17)) 2 | TBOX((16,2000-01-15),(18,2000-01-17)) 3 | TBOX((18,2000-01-15),(20,2000-01-17)) ... SELECT multidimGrid(tfloat '[15@2000-01-15, 25@2000-01-25]'::tbox, 2.0, '2 days', 11.5); -- (1,"TBOX((13.5,2000-01-15),(15.5,2000-01-17))") (2,"TBOX((15.5,2000-01-15),(17.5,2000-01-17))") (3,"TBOX((17.5,2000-01-15),(19.5,2000-01-17))") ... SELECT multidimGrid(tgeompoint '[Point(3 3)@2000-01-15, Point(15 15)@2000-01-25]'::stbox, 2.0); -- (1,"STBOX((2,2),(4,4))") (2,"STBOX((4,2),(6,4))") (3,"STBOX((6,2),(8,4))") ... SELECT multidimGrid(tgeompoint 'SRID=3812;[Point(3 3)@2000-01-15, Point(15 15)@2000-01-25]'::stbox, 2.0, geometry 'Point(3 3)'); -- (1,"SRID=3812;STBOX((3,3),(5,5))") (2,"SRID=3812;STBOX((5,3),(7,5))") (3,"SRID=3812;STBOX((7,3),(9,5))") ... SELECT multidimGrid(tgeompoint '[Point(3 3 3)@2000-01-15, Point(15 15 15)@2000-01-25]'::stbox, 2.0, geometry 'Point(3 3 3)'); -- (1,"STBOX Z((3,3,3),(5,5,5))") (2,"STBOX Z((5,3,3),(7,5,5))") (3,"STBOX Z((7,3,3),(9,5,5))") ... SELECT multidimGrid(tgeompoint '[Point(3 3)@2000-01-15, Point(15 15)@2000-01-25]'::stbox, 2.0, interval '2 days'); -- (1,"STBOX T((2,2,2000-01-15),(4,4,2000-01-17))") (2,"STBOX T((4,2,2000-01-15),(6,4,2000-01-17))") (3,"STBOX T((6,2,2000-01-15),(8,4,2000-01-17))") ... SELECT multidimGrid(tgeompoint '[Point(3 3 3)@2000-01-15, Point(15 15 15)@2000-01-25]'::stbox, 2.0, '2 days', 'Point(3 3 3)', '2000-01-15'); -- (1,"STBOX ZT((3,3,3,2000-01-15),(5,5,5,2000-01-17))") (2,"STBOX ZT((5,3,3,2000-01-15),(7,5,5,2000-01-17))") (3,"STBOX ZT((7,3,3,2000-01-15),(9,5,5,2000-01-17))") ...
Returns the tile of the multidimensional grid that contains the value and the timestamp.
If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or Point(0 0 0)
for the value dimension and Monday, January 3, 2000 for the time dimension, respectively.
multidimTile(value float,time timestamptz,size float,duration interval,
vorigin float=0.0,torigin timestamptz='2000-01-03'): tbox
multidimTile(point geometry,size float,sorigin geometry='Point(0 0 0)'): stbox
multidimTile(point geometry,time timestamptz,size float,duration interval,sorigin
geometry='Point(0 0 0)',torigin timestamptz='2000-01-03'): stbox
In the case of a spatiotemporal grid, the SRID of the tile coordinates is determined by the input point and the size is given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.
SELECT multidimTile(15, '2000-01-15', 2, interval '2 days'); -- TBOX((14,2000-01-15),(16,2000-01-17)) SELECT multidimTile(15, '2000-01-15', 2, interval '2 days', 1, '2000-01-02'); -- TBOX((15,2000-01-14),(17,2000-01-16)) SELECT multidimTile(geometry 'Point(1 1 1)', 2.0); -- STBOX Z((0,0,0),(2,2,2)) SELECT multidimTile(geometry 'Point(1 1)', '2000-01-01', 2.0, interval '2 days'); -- STBOX T((0,0,2000-01-01),(2,2,2000-01-03)) SELECT multidimTile(geometry 'Point(1 1)', '2000-01-01', 2.0, '2 days', 'Point(1 1)', '2000-01-02'); -- STBOX T((1,1,1999-12-31),(3,3,2000-01-02))
These functions fragment a temporal value with respect to a sequence of buckets (see Section 5.14.1) or a multidimensional grid (see Section 5.14.2).
Fragment the temporal number with respect to range buckets.
valueSplit(value tnumber,size number,origin number=0): setof number_tnumber
If the origin of values is not specified, it is set by default to 0.
SELECT (sp).number, (sp).tnumber FROM (SELECT valueSplit(tint '[1@2012-01-01, 2@2012-01-02, 5@2012-01-05, 10@2012-01-10]', 2) AS sp) t; -- 0 | {[1@2012-01-01 00:00:00+01, 1@2012-01-02 00:00:00+01)} 2 | {[2@2012-01-02 00:00:00+01, 2@2012-01-05 00:00:00+01)} 4 | {[5@2012-01-05 00:00:00+01, 5@2012-01-10 00:00:00+01)} 10 | {[10@2012-01-10 00:00:00+01]} SELECT valueSplit(tfloat '[1@2012-01-01, 10@2012-01-10)', 2.0, 1.0); -- (1,"{[1@2012-01-01 00:00:00+01, 3@2012-01-03 00:00:00+01)}") (3,"{[3@2012-01-03 00:00:00+01, 5@2012-01-05 00:00:00+01)}") (5,"{[5@2012-01-05 00:00:00+01, 7@2012-01-07 00:00:00+01)}") (7,"{[7@2012-01-07 00:00:00+01, 9@2012-01-09 00:00:00+01)}") (9,"{[9@2012-01-09 00:00:00+01, 10@2012-01-10 00:00:00+01)}")
Fragment the temporal value with respect to time buckets.
timeSplit(value ttype,duration interval,origin timestamptz='2000-01-03'):
setof time_temp
If the origin of time is not specified, it is set by default to Monday, January 3, 2000.
SELECT (ts).time, (ts).temp FROM (SELECT timeSplit(tfloat '[1@2012-01-01, 10@2012-01-10)', '2 days') AS ts) t; -- 2011-12-31 | [1@2012-01-01, 2@2012-01-02) 2012-01-02 | [2@2012-01-02, 4@2012-01-04) 2012-01-04 | [4@2012-01-04, 6@2012-01-06) ... SELECT (ts).time, astext((ts).temp) AS temp FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2012-01-01, Point(10 10)@2012-01-10]', '2 days', '2012-01-01') AS ts) AS t; -- 2012-01-01 | [POINT Z (1 1 1)@2012-01-01, POINT Z (3 3 3)@2012-01-03) 2012-01-03 | [POINT Z (3 3 3)@2012-01-03, POINT Z (5 5 5)@2012-01-05) 2012-01-05 | [POINT Z (5 5 5)@2012-01-05, POINT Z (7 7 7)@2012-01-07) ...
Notice that we can fragment a temporal value in cyclic (instead of linear) time buckets. The following two examples show how to fragment a temporal value by hour and by day of the week.
SELECT (ts).time::time as hour, merge((ts).temp) as temp FROM (SELECT timeSplit(tfloat '[1@2012-01-01, 10@2012-01-03]', '1 hour') AS ts) t GROUP BY hour ORDER BY hour; -- 00:00:00 | {[1@2012-01-01 00:00:00+01, 1.1875@2012-01-01 01:00:00+01), [5.5@2012-01-02 00:00:00+01, 5.6875@2012-01-02 01:00:00+01)} 01:00:00 | {[1.1875@2012-01-01 01:00:00+01, 1.375@2012-01-01 02:00:00+01), [5.6875@2012-01-02 01:00:00+01, 5.875@2012-01-02 02:00:00+01)} 02:00:00 | {[1.375@2012-01-01 02:00:00+01, 1.5625@2012-01-01 03:00:00+01), [5.875@2012-01-02 02:00:00+01, 6.0625@2012-01-02 03:00:00+01)} 03:00:00 | {[1.5625@2012-01-01 03:00:00+01, 1.75@2012-01-01 04:00:00+01), [6.0625@2012-01-02 03:00:00+01, 6.25@2012-01-02 04:00:00+01)} ... SELECT EXTRACT(DOW FROM (ts).time) as dow_no, TO_CHAR((ts).time, 'Dy') as dow, asText(setPrecision(merge((ts).temp), 2)) as temp FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2012-01-01, Point(10 10)@2012-01-14)', '1 hour') AS ts) t GROUP BY dow, dow_no ORDER BY dow_no; -- 0 | Sun | {[POINT(1 1)@2012-01-01, POINT(1.69 1.69)@2012-01-02), [POINT(5.85 5.85)@2012-01-08, POINT(6.54 6.54)@2012-01-09)} 1 | Mon | {[POINT(1.69 1.69)@2012-01-02, POINT(2.38 2.38)@2012-01-03), [POINT(6.54 6.54)@2012-01-09, POINT(7.23 7.23)@2012-01-10)} 2 | Tue | {[POINT(2.38 2.38)@2012-01-03, POINT(3.08 3.08)@2012-01-04), [POINT(7.23 7.23)@2012-01-10, POINT(7.92 7.92)@2012-01-11)} ...
Fragment the temporal number with respect to the tiles in a value-time grid.
valueTimeSplit(value tumber,size number,duration interval,vorigin number=0,
torigin timestamptz='2000-01-03'): setof number_time_tnumber
If the origin of values and/or time are not specified, they are set by default to 0 and to Monday, January 3, 2000, respectively.
SELECT (sp).number, (sp).time, (sp).tnumber FROM (SELECT valueTimeSplit(tint '[1@2012-01-01, 2@2012-01-02, 5@2012-01-05, 10@2012-01-10]', 5, '5 days') AS sp) t; -- 0 | 2011-12-31 | [1@2012-01-01, 2@2012-01-02, 2@2012-01-05) 5 | 2012-01-05 | [5@2012-01-05, 5@2012-01-10) 10 | 2012-01-10 | [10@2012-01-10] SELECT (sp).number, (sp).time, (sp).tnumber FROM (SELECT valueTimeSplit(tfloat '[1@2012-01-01, 10@2012-01-10)', 5.0, '5 days', 1.0, '2012-01-01') AS sp) t; -- 1 | 2012-01-01 | [1@2012-01-01, 6@2012-01-06) 6 | 2012-01-06 | [6@2012-01-06, 10@2012-01-10)
Fragment the temporal point with respect to the tiles in a spatial grid.
spaceSplit(value tgeompoint,size float,origin geometry='Point(0 0 0)',
bitmatrix=TRUE): setof point_tpoint
If the origin of the space dimension is not specified, it is set by default to 'Point(0 0 0)'
. If the argument bitmatrix
is not specified, then the computation will use a bit matrix to speed up the process.
SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint FROM (SELECT spaceSplit(tgeompoint '[Point(1 1)@2020-03-01, Point(10 10)@2020-03-10]', 2.0) AS sp) t; -- POINT(0 0) | {[POINT(1 1)@2020-03-01, POINT(2 2)@2020-03-02)} POINT(2 2) | {[POINT(2 2)@2020-03-02, POINT(4 4)@2020-03-04)} POINT(4 4) | {[POINT(4 4)@2020-03-04, POINT(6 6)@2020-03-06)} ... SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint FROM (SELECT spaceSplit(tgeompoint '[Point(1 1 1)@2020-03-01, Point(10 10 10)@2020-03-10]', 2.0, 'Point(1 1 1)') AS sp) t; -- POINT Z (1 1 1) | {[POINT Z (1 1 1)@2020-03-01, POINT Z (3 3 3)@2020-03-03)} POINT Z (3 3 3) | {[POINT Z (3 3 3)@2020-03-03, POINT Z (5 5 5)@2020-03-05)} POINT Z (5 5 5) | {[POINT Z (5 5 5)@2020-03-05, POINT Z (7 7 7)@2020-03-07)} ...
Fragment the temporal point with respect to the tiles in a spatiotemporal grid.
spaceTimeSplit(value tgeompoint,size float,duration interval,sorigin
geometry='Point(0 0 0)',torigin timestamptz='2000-01-03', bitmatrix=TRUE):
setof point_time_tpoint
If the origin of the space and time dimensions are not specified, they are set by default to 'Point(0 0 0)'
and Monday, January 3, 2000, respectively. If the argument bitmatrix
is not specified, then the computation will use a bit matrix to speed up the process.
SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1)@2020-03-01, Point(10 10)@2020-03-10]', 2.0, '2 days') AS sp) t; -- POINT(0 0) | 2020-03-01 | {[POINT(1 1)@2020-03-01, POINT(2 2)@2020-03-02)} POINT(2 2) | 2020-03-01 | {[POINT(2 2)@2020-03-02, POINT(3 3)@2020-03-03]} POINT(2 2) | 2020-03-03 | {[POINT(3 3)@2020-03-03, POINT(4 4)@2020-03-04)} ... SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1 1)@2020-03-01, Point(10 10 10)@2020-03-10]', 2.0, '2 days', 'Point(1 1 1)', '2020-03-01') AS sp) t; -- POINT Z(1 1 1) | 2020-03-01 | {[POINT Z(1 1 1)@2020-03-01, POINT Z(3 3 3)@2020-03-03)} POINT Z(3 3 3) | 2020-03-03 | {[POINT Z(3 3 3)@2020-03-03, POINT Z(5 5 5)@2020-03-05)} POINT Z(5 5 5) | 2020-03-05 | {[POINT Z(5 5 5)@2020-03-05, POINT Z(7 7 7)@2020-03-07)} ...