Density#

The density transform performs one-dimensional kernel density estimation over input data and generates a new column of samples of the estimated densities.

Here is a simple example, showing the distribution of IMDB ratings from the movies dataset:

import altair as alt
from altair.datasets import data

alt.Chart(data.movies.url).transform_density(
    'IMDB Rating',
    as_=['IMDB Rating', 'density'],
).mark_area().encode(
    x="IMDB Rating:Q",
    y='density:Q',
)

The density can also be computed on a per-group basis, by specifying the groupby argument. Here we split the above density computation across movie genres:

import altair as alt
from altair.datasets import data

alt.Chart(
    data.movies.url,
    width=120,
    height=80
).transform_filter(
    'isValid(datum.Major_Genre)'
).transform_density(
    'IMDB Rating',
    groupby=['Major_Genre'],
    as_=['IMDB Rating', 'density'],
    extent=[1, 10],
).mark_area().encode(
    x="IMDB Rating:Q",
    y='density:Q',
).facet(
    'Major_Genre:N',
    columns=4
)

Transform Options#

The transform_density() method is built on the DensityTransform class, which has the following options:

Click to show table

Property	Type	Description
as	array(`FieldName`)	The output fields for the sample value and corresponding density estimate. Default value: `["value", "density"]`
bandwidth	`number`	The bandwidth (standard deviation) of the Gaussian kernel. If unspecified or set to zero, the bandwidth value is automatically estimated from the input data using Scott’s rule.
counts	`boolean`	A boolean flag indicating if the output values should be probability estimates (false) or smoothed counts (true). Default value: `false`
cumulative	`boolean`	A boolean flag indicating whether to produce density estimates (false) or cumulative density estimates (true). Default value: `false`
density	`FieldName`	The data field for which to perform density estimation.
extent	array(`number`)	A [min, max] domain from which to sample the distribution. If unspecified, the extent will be determined by the observed minimum and maximum values of the density value field.
groupby	array(`FieldName`)	The data fields to group by. If not specified, a single group containing all data objects will be used.
maxsteps	`number`	The maximum number of samples to take along the extent domain for plotting the density. Default value: `200`
minsteps	`number`	The minimum number of samples to take along the extent domain for plotting the density. Default value: `25`
resolve	[‘independent’, ‘shared’]	Indicates how parameters for multiple densities should be resolved. If `"independent"`, each density may have its own domain extent and dynamic number of curve sample steps. If `"shared"`, the KDE transform will ensure that all densities are defined over a shared domain and curve steps, enabling stacking. Default value: `"shared"`
steps	`number`	The exact number of samples to take along the extent domain for plotting the density. If specified, overrides both minsteps and maxsteps to set an exact number of uniform samples. Potentially useful in conjunction with a fixed extent to ensure consistent sample points for stacked densities.