Frequently Asked Questions

Does Altair work with PyCharm/Spyder/<my favorite IDE>

Altair can be used to create chart specifications with any frontend, but in order to render those charts requires executing the javascript code that turns the JSON specification into a visual chart.

There are extensions included in JupyterLab, Jupyter Notebook, Colab, Hydrogen, and nteract that know how to automatically perform this rendering (see Installation for details). If you’re using another frontend, you’ll have to figure out how to run that javascript rendering in the most convenient way for you.

The easiest approach is to use the Chart.serve() method, which will convert the chart to HTML, start a webserver, and open your default browser to view the chart. See Working in non-Notebook Environments for more information.

If you would like to manually save the chart to an html file in order to view it with a browser, you can use the Chart.save() method; for example chart.save('filename.html').

I tried to make a plot but it doesn’t show up

There are two basic reasons that a chart might not show up:

  1. You have not installed and/or enabled the appropriate renderer for your frontend, which means charts cannot be displayed.
  2. You have inadvertently created an invalid chart, and there is a javascript error preventing it from being displayed.

For details on how to trouble-shoot these kinds of display issues on various front-ends, see Display Troubleshooting.

Why does Altair lead to such extremely large notebooks?

By design, Altair does not produce plots consisting of pixels, but plots consisting of data plus a visualization specification. As discussed in Specifying Data in Altair, this data can be specified in one of several ways, either via a pandas DataFrame, a file or URL, or a JSON data object. When you specify the data as a pandas DataFrame, this data is converted to JSON and included in its entirety in the plot spec.

For example, here is a simple chart made from a dataframe with three rows of data:

import altair as alt
import pandas as pd
data = pd.DataFrame({'x': [1, 2, 3], 'y': [2, 1, 2]})

chart = alt.Chart(data).mark_line().encode(
     x='x',
     y='y'
)

from pprint import pprint
pprint(chart.to_dict())
{'$schema': 'https://vega.github.io/schema/vega-lite/v2.4.1.json',
 'config': {'view': {'height': 300, 'width': 400}},
 'data': {'values': [{'x': 1, 'y': 2}, {'x': 2, 'y': 1}, {'x': 3, 'y': 2}]},
 'encoding': {'x': {'field': 'x', 'type': 'quantitative'},
              'y': {'field': 'y', 'type': 'quantitative'}},
 'mark': 'line'}

The resulting specification includes a representation of the data converted to JSON format, and this specification is embedded in the notebook or web page where it can be used by Vega-Lite to render the plot. As the size of the data grows, this explicit data storage can lead to some very large specifications, and by extension, some very large notebooks or web pages.

The best way around this is to store the data in an external file, and pass it to the chart by URL. You can do this manually if you wish:

url = 'data.json'
data.to_json(url, orient='records')

chart = alt.Chart(url).mark_line().encode(
    x='x:Q',
    y='y:Q'
)
pprint(chart.to_dict())
{'$schema': 'https://vega.github.io/schema/vega-lite/v2.4.1.json',
 'config': {'view': {'height': 300, 'width': 400}},
 'data': {'url': 'data.json'},
 'encoding': {'x': {'field': 'x', 'type': 'quantitative'},
              'y': {'field': 'y', 'type': 'quantitative'}},
 'mark': 'line'}

For other strategies for effectively working with large datasets in Altair, see MaxRowsError: How can I plot Large Datasets?

With this type of approach, the data is now stored as an external file rather than being embedded in the notebook, leading to much more compact plot specifications. The disadvantage, of course, is a loss of portability: if the notebook is ever moved, the data file must accompany it or the plot may not display.

MaxRowsError: How can I plot Large Datasets?

If you try to create a plot that will directly embed a dataset with more than 5000 rows, you will see a MaxRowsError:

data = pd.DataFrame({'x': range(10000)})
alt.Chart(data).mark_line()
MaxRowsError: The number of rows in your dataset is greater than the maximum
              allowed (5000). For information on how to plot larger datasets
              in Altair, see the documentation.

This is not because Altair cannot handle larger datasets, but it is because it is important for the user to think carefully about how large datasets are handled. As noted above in Why does Altair lead to such extremely large notebooks?, it is quite easy to end up with very large notebooks if you make many visualizations of a large dataset, and this error is a way of preventing that.

You can get around it in a few ways:

Passing Data by URL

The preferred solution to working with large datasets is to not embed the data in the notebook, but rather pass it to the chart by URL. This not only addresses the issue of large notebooks, but also leads to better interactivity performance with large datasets.

Vega Datasets

If you are working with one of the vega datasets, you can pass the data by URL using the url attribute:

from vega_datasets import data
source = data.cars.url

alt.Chart(source).mark_point() # etc.

Local Filesystem

You may also save data to a local filesystem and reference the data by file path. As noted above, Altair has JSON and CSV data data_transformers that will do this transparently when enabled:

alt.data_transformers.enable('json')
# or
alt.data_transformers.enable('csv')

With this data transformer enabled, each time you make a plot the data will be serialized to disk and referenced by URL, rather than being embedded in the notebook output. You may also manually save the data to file and reference it that way (see Why does Altair lead to such extremely large notebooks?).

Note that the filesystem approach may not work on some cloud-based Jupyter notebook services.

Local Data Server

It is also possible to serve your data from a local threaded server to avoid writing datasets to disk. The altair_data_server package makes this easy. First install the package:

pip install altair_data_server

And then enable the data transformer:

import altair as alt
alt.data_transformers.enable('data_server')

Note that this may not approach on some cloud-based Jupyter notebook services.

Disabling MaxRows

If you are certain you would like to embed your dataset within the visualization specification, you can disable the MaxRows check by modifying the arguments to the default data transformer:

alt.data_transformers.enable('default', max_rows=None)

If you choose this route, please be careful: if you are making multiple plots with the dataset in a particular notebook, the notebook will grow very large and performance may suffer.