Frequently Asked Questions
Does Altair work with IPython Terminal/PyCharm/Spyder/<my favorite IDE>
Altair can be used to create chart specifications with any frontend that executes Python code, but in order to render those charts requires connecting altair to an environment capable of executing the javascript code that turns the JSON specification into a visual chart.
There are extensions included in JupyterLab, Jupyter Notebook, Colab, Kaggle kernels, VSCode, Hydrogen, and nteract that know how to automatically perform this rendering (see Installation for details).
For other frontends that don’t have vega-lite rendering built-in, it is
possible to work with Altair charts using the build-in Chart.show()
or Chart.save()
methods.
For more information on these, see Working in non-Notebook
Environments.
I tried to make a plot but it doesn’t show up
There are two basic reasons that a chart might not show up:
-
You have not installed and/or enabled the appropriate renderer for your frontend, which means charts cannot be displayed.
-
You have inadvertently created an invalid chart, and there is a javascript error preventing it from being displayed.
For details on how to trouble-shoot these kinds of display issues on various front-ends, see Display Troubleshooting.
Why does Altair lead to such extremely large notebooks?
By design, Altair does not produce plots consisting of pixels, but plots consisting of data plus a visualization specification. As discussed in Specifying Data in Altair, this data can be specified in one of several ways, either via a pandas DataFrame, a file or URL, or a JSON data object. When you specify the data as a pandas DataFrame, this data is converted to JSON and included in its entirety in the plot spec.
For example, here is a simple chart made from a dataframe with three rows of data:
import altair as alt
import pandas as pd
data = pd.DataFrame({'x': [1, 2, 3], 'y': [2, 1, 2]})
chart = alt.Chart(data).mark_line().encode(
x='x',
y='y'
)
from pprint import pprint
pprint(chart.to_dict())
{'$schema': 'https://vega.github.io/schema/vega-lite/v2.4.1.json',
'config': {'view': {'height': 300, 'width': 400}},
'data': {'values': [{'x': 1, 'y': 2}, {'x': 2, 'y': 1}, {'x': 3, 'y': 2}]},
'encoding': {'x': {'field': 'x', 'type': 'quantitative'},
'y': {'field': 'y', 'type': 'quantitative'}},
'mark': 'line'}
The resulting specification includes a representation of the data converted to JSON format, and this specification is embedded in the notebook or web page where it can be used by Vega-Lite to render the plot. As the size of the data grows, this explicit data storage can lead to some very large specifications, and by extension, some very large notebooks or web pages.
The best way around this is to store the data in an external file, and pass it to the chart by URL. You can do this manually if you wish:
url = 'data.json'
data.to_json(url, orient='records')
chart = alt.Chart(url).mark_line().encode(
x='x:Q',
y='y:Q'
)
pprint(chart.to_dict())
{'$schema': 'https://vega.github.io/schema/vega-lite/v2.4.1.json',
'config': {'view': {'height': 300, 'width': 400}},
'data': {'url': 'data.json'},
'encoding': {'x': {'field': 'x', 'type': 'quantitative'},
'y': {'field': 'y', 'type': 'quantitative'}},
'mark': 'line'}
For other strategies for effectively working with large datasets in Altair, see MaxRowsError: How can I plot Large Datasets?
With this type of approach, the data is now stored as an external file rather than being embedded in the notebook, leading to much more compact plot specifications. The disadvantage, of course, is a loss of portability: if the notebook is ever moved, the data file must accompany it or the plot may not display.
MaxRowsError: How can I plot Large Datasets?
If you try to create a plot that will directly embed a dataset with more than
5000 rows, you will see a MaxRowsError
:
data = pd.DataFrame({'x': range(10000)})
alt.Chart(data).mark_line()
MaxRowsError: The number of rows in your dataset is greater than the maximum
allowed (5000). For information on how to plot larger datasets
in Altair, see the documentation.
This is not because Altair cannot handle larger datasets, but it is because it is important for the user to think carefully about how large datasets are handled. As noted above in Why does Altair lead to such extremely large notebooks?, it is quite easy to end up with very large notebooks if you make many visualizations of a large dataset, and this error is a way of preventing that.
You can get around it in a few ways:
Disabling MaxRowsError
If you are certain you would like to embed your dataset within the visualization
specification, you can disable the MaxRows
check with the
following:
alt.data_transformers.disable_max_rows()
If you choose this route, please be careful: if you are making multiple plots with the dataset in a particular notebook, the notebook will grow very large and performance may suffer.
Passing Data by URL
A better solution when working with large datasets is to not embed the data in the notebook, but rather store it separately and pass it to the chart by URL. This not only addresses the issue of large notebooks, but also leads to better interactivity performance with large datasets.
Vega Datasets
If you are working with one of the vega datasets, you can pass the data by URL
using the url
attribute:
from vega_datasets import data
source = data.cars.url
alt.Chart(source).mark_point() # etc.
Local Filesystem
You may also save data to a local filesystem and reference the data by
file path. Altair has a JSON
data transformer
that will do this
transparently when enabled:
alt.data_transformers.enable('json')
With this data transformer enabled, each time you make a plot the data will be serialized to disk and referenced by URL, rather than being embedded in the notebook output. You may also manually save the data to file and reference it that way (see Why does Altair lead to such extremely large notebooks?).
There is a similar CSV data transformer, but it must be used more carefully because CSV does not preserve data types as JSON does.
Note that the filesystem approach may not work on some cloud-based Jupyter notebook services.
Local Data Server
It is also possible to serve your data from a local threaded server to avoid writing datasets to disk. The altair_data_server package makes this easy. First install the package:
pip install altair_data_server
And then enable the data transformer:
import altair as alt
alt.data_transformers.enable('data_server')
Note that this approach may not work on some cloud-based Jupyter notebook services.