Window Transform

The window transform performs calculations over sorted groups of data objects. These calculations include ranking, lead/lag analysis, and aggregates such as cumulative sums and averages. Calculated values are written back to the input data stream, where they can be referenced by encodings.

For example, consider the following cumulative frequency distribution:

import altair as alt
from vega_datasets import data

alt.Chart(data.movies.url).transform_window(
    sort=[{'field': 'IMDB_Rating'}],
    frame=[None, 0],
    cumulative_count='count(*)',
).mark_area().encode(
    x='IMDB_Rating:Q',
    y='cumulative_count:Q',
)

First, we pass a sort field definition, which indicates how data objects should be sorted within the window. Here, movies should be sorted by their IMDB rating. Next, we pass the frame, which indicates how many data objects before and after the current data object should be included within the window. Here, all movies up to and including the current movie should be included. Finally, we pass a window field definition, which indicates how data objects should be aggregated within the window. Here, the number of movies should be counted.

There are many aggregation functions built into Altair. As well as those given in Binning and Aggregation, we can use the following within window field definitions:

Aggregate Parameter Description
row_number None Assigns each data object a consecutive row number, starting from 1.
rank None Assigns a rank order value to each data object in a window, starting from 1. Peer values are assigned the same rank. Subsequent rank scores incorporate the number of prior values. For example, if the first two values tie for rank 1, the third value is assigned rank 3.
dense_rank None Assigns dense rank order values to each data object in a window, starting from 1. Peer values are assigned the same rank. Subsequent rank scores do not incorporate the number of prior values. For example, if the first two values tie for rank 1, the third value is assigned rank 2.
percent_rank None Assigns a percentage rank order value to each data object in a window. The percent is calculated as (rank - 1) / (group_size - 1).
cume_dist None Assigns a cumulative distribution value between 0 and 1 to each data object in a window.
ntile Number Assigns a quantile (e.g., percentile) value to each data object in a window. Accepts an integer parameter indicating the number of buckets to use (e.g., 100 for percentiles, 5 for quintiles).
lag Number Assigns a value from the data object that precedes the current object by a specified number of positions. If no such object exists, assigns null. Accepts an offset parameter (default 1) that indicates the number of positions. This operation must have a corresponding entry in the fields parameter array.
lead Number Assigns a value from the data object that follows the current object by a specified number of positions. If no such object exists, assigns null. Accepts an offset parameter (default 1) that indicates the number of positions. This operation must have a corresponding entry in the fields parameter array.
first_value None Assigns a value from the first data object in the current sliding window frame. This operation must have a corresponding entry in the fields parameter array.
last_value None Assigns a value from the last data object in the current sliding window frame. This operation must have a corresponding entry in the fields parameter array.
nth_value Number Assigns a value from the nth data object in the current sliding window frame. If no such object exists, assigns null. Requires a non-negative integer parameter that indicates the offset from the start of the window frame. This operation must have a corresponding entry in the fields parameter array.

While an aggregate transform computes a single value that summarises all data objects, a window transform adds a new property to each data object. This new property is computed from the neighbouring data objects: that is, from the data objects delimited by the window field definition. For example, consider the following time series of stock prices:

import altair as alt
from vega_datasets import data

alt.Chart(data.stocks.url).mark_line().encode(
    x='date:T',
    y='price:Q',
    color='symbol:N',
)

It’s hard to see the overall pattern in the above example, because Google’s stock price is much higher than the other stock prices. If we plot the z-scores of the stock prices, rather than the stock prices themselves, then the overall pattern becomes clearer:

import altair as alt
from vega_datasets import data

alt.Chart(data.stocks.url).transform_window(
    mean_price='mean(price)',
    stdev_price='stdev(price)',
    frame=[None, None],
    groupby=['symbol'],
).transform_calculate(
    z_score=(alt.datum.price - alt.datum.mean_price) / alt.datum.stdev_price,
).mark_line().encode(
    x='date:T',
    y='z_score:Q',
    color='symbol:N',
)

By using two aggregation functions (mean and stdev) within the window transform, we are able to compute the z-scores within the calculate transform.

For more information about the arguments to the window transform, see WindowTransform and the Vega-Lite documentation.

Transform Options

The transform_window() method is built on the WindowTransform class, which has the following options:

Property Type Description
frame array([null, number])

A frame specification as a two-element array indicating how the sliding window should proceed. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. The default value is [null, 0], indicating that the sliding window includes the current object and all preceding objects. The value [-5, 5] indicates that the window should include five objects preceding and five objects following the current object. Finally, [null, null] indicates that the window frame should always include all data objects. If you this frame and want to assign the same value to add objects, you can use the simpler join aggregate transform. The only operators affected are the aggregation operations and the first_value, last_value, and nth_value window operations. The other window operations are not affected by this.

Default value:: [null, 0] (includes the current object and all preceding objects)

groupby array(FieldName) The data fields for partitioning the data objects into separate windows. If unspecified, all data points will be in a single window.
ignorePeers boolean

Indicates if the sliding window frame should ignore peer values (data that are considered identical by the sort criteria). The default is false, causing the window frame to expand to include all peer values. If set to true, the window frame will be defined by offset values only. This setting only affects those operations that depend on the window frame, namely aggregation operations and the first_value, last_value, and nth_value window operations.

Default value: false

sort array(SortField) A sort field definition for sorting data objects within a window. If two data objects are considered equal by the comparator, they are considered “peer” values of equal rank. If sort is not specified, the order is undefined: data objects are processed in the order they are observed and none are considered peers (the ignorePeers parameter is ignored and treated as if set to true).
window array(WindowFieldDef) The definition of the fields in the window, and what calculations to use.