Data Transformations#
It is often necessary to transform or filter data in the process of visualizing it. In Altair you can do this one of two ways:
Before the chart definition, using standard pandas data transformations.
Within the chart definition, using Vega-Lite’s data transformation tools.
In most cases, we suggest that you use the first approach, because it is more straightforward to those who are familiar with data manipulation in Python, and because the pandas package offers much more flexibility than Vega-Lite in available data manipulations.
The second approach becomes useful when the data source is not a dataframe, but, for example, a URL pointer to a JSON or CSV file. It can also be useful in a compound chart where different views of the dataset require different transformations.
This second approach – specifying data transformations within the chart
specification itself – can be accomplished using the transform_*
methods of top-level objects:
Transform |
Method |
Description |
---|---|---|
|
Create a new data column by aggregating an existing column. |
|
|
Create a new data column by binning an existing column. |
|
|
Create a new data column using an arithmetic calculation on an existing column. |
|
|
Create a new data column with the kernel density estimate of the input. |
|
|
Find the extent of a field and store the result in a parameter. |
|
|
Select a subset of data based on a condition. |
|
|
Flatten array data into columns. |
|
|
Convert wide-form data into long-form data (opposite of pivot). |
|
|
Impute missing data. |
|
|
Aggregate transform joined to original data. |
|
|
Create a new column with LOESS smoothing of data. |
|
|
One-sided join of two datasets based on a lookup key. |
|
|
Convert long-form data into wide-form data (opposite of fold). |
|
|
Compute empirical quantiles of a dataset. |
|
|
Fit a regression model to a dataset. |
|
|
Random sub-sample of the rows in the dataset. |
|
|
Compute stacked version of values. |
|
|
Discretize/group a date by a time unit (day, month, year, etc.) |
|
|
Compute a windowed aggregation |
Accessing Transformed Data#
When charts are displayed, data transformations are performed in the browser by the Vega JavaScript library. It’s often helpful to inspect transformed data results in the process of building a chart. One approach is to display the transformed data results in a table composed of Text marks as in the Brushing Scatter Plot to Show Data on a Table gallery example.
While this approach works, it’s somewhat cumbersome, and still does not make it
possible to access the transformed data from Python. To make transformed data
results available in Python, Altair provides the transformed_data()
Chart method which integrates with VegaFusion
to evaluate data transformations in the Python kernel.
First, install VegaFusion with the embed extras enabled.
pip install "vegafusion[embed]"
Then create an Altair chart and call the transformed_data()
method
to extract a pandas DataFrame containing the transformed data.
import altair as alt
from vega_datasets import data
cars = data.cars.url
chart = alt.Chart(cars).mark_bar().encode(
y='Cylinders:O',
x='mean_acc:Q'
).transform_aggregate(
mean_acc='mean(Acceleration)',
groupby=["Cylinders"]
)
chart.transformed_data()
Cylinders mean_acc mean_acc_start mean_acc_end
0 8 12.837037 0.0 12.837037
1 4 16.616425 0.0 16.616425
2 6 16.263095 0.0 16.263095
3 3 13.250000 0.0 13.250000
4 5 18.633333 0.0 18.633333
The transformed_data()
method currently supports most, but not all,
of Altair’s transforms. See the table below.
Transform |
Supported |
---|---|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |
|
✔ |