Impute Transform
The impute transform allows you to fill-in missing entries in a dataset. As an example, consider the following data, which includes missing values that we filter-out of the long-form representation (see Long-form vs. Wide-form Data for more on this):
import numpy as np
import pandas as pd
data = pd.DataFrame({
't': range(5),
'x': [2, np.nan, 3, 1, 3],
'y': [5, 7, 5, np.nan, 4]
}).melt('t').dropna()
data
t variable value
0 0 x 2.0
2 2 x 3.0
3 3 x 1.0
4 4 x 3.0
5 0 y 5.0
6 1 y 7.0
7 2 y 5.0
9 4 y 4.0
Notice the result: the x
series has no entry at t=1
,
and the y
series has a missing entry at t=3
. If we use Altair to
visualize this
data directly, the line skips the missing entries:
import altair as alt
raw = alt.Chart(data).mark_line(point=True).encode(
x=alt.X('t:Q'),
y='value:Q',
color='variable:N'
)
raw
This is not always desirable, because (particularly for a line plot with no points) it can imply the existence of data that is not there.
Impute via Encodings
To address this, you can use an impute argument to the encoding channel. For example, we can impute using a constant value (we’ll show the raw chart lightly in the background for reference):
background = raw.encode(opacity=alt.value(0.2))
chart = alt.Chart(data).mark_line(point=True).encode(
x='t:Q',
y=alt.Y('value:Q', impute=alt.ImputeParams(value=0)),
color='variable:N'
)
background + chart
Or we can impute using any supported aggregate:
chart = alt.Chart(data).mark_line(point=True).encode(
x='t:Q',
y=alt.Y('value:Q', impute=alt.ImputeParams(method='mean')),
color='variable:N'
)
background + chart
Impute via Transform
Similar to the Bin transforms and Aggregate Transforms, it is also possible to specify the impute transform outside the encoding as a transform. For example, here is the equivalent of the above two charts:
chart = alt.Chart(data).transform_impute(
impute='value',
key='t',
value=0,
groupby=['variable']
).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
background + chart
chart = alt.Chart(data).transform_impute(
impute='value',
key='t',
method='mean',
groupby=['variable']
).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
background + chart
If you would like to use more localized imputed values, you can specify a
frame
parameter similar
to the Window Transform that
will control which values are used for the imputation. For example, here
we impute missing values using the mean of the neighboring points on either
side:
chart = alt.Chart(data).transform_impute(
impute='value',
key='t',
method='mean',
frame=[-1, 1],
groupby=['variable']
).mark_line(point=True).encode(
x='t:Q',
y='value:Q',
color='variable:N'
)
background + chart
Transform Options
The transform_impute()
method is built on the ImputeTransform
class, which has the following options:
Property |
Type |
Description |
---|---|---|
frame |
array(any) |
A frame specification as a two-element array used to control the window over which the
specified method is applied. The array entries should either be a number indicating the
offset from the current data object, or null to indicate unbounded rows preceding or
following the current data object. For example, the value Default value:: |
groupby |
array( |
An optional array of fields by which to group the values. Imputation will then be performed on a per-group basis. |
impute |
The data field for which the missing values should be imputed. |
|
key |
A key field that uniquely identifies data objects within a group. Missing key values (those occurring in the data but not in the current group) will be imputed. |
|
keyvals |
anyOf(array(any), |
Defines the key values that should be considered for imputation. An array of key values or an object defining a number sequence. If provided, this will be used in addition to the key values observed within the input
data. If not provided, the values will be derived from all unique values of the If there is no impute grouping, this property must be specified. |
method |
The imputation method to use for the field value of imputed data objects. One of Default value: |
|
value |
any |
The field value to use when the imputation |