Encodings

The key to creating meaningful visualizations is to map properties of the data to visual properties in order to effectively communicate information. Altair abstracts this mapping through the idea of channel encodings. For example, here we will plot the cars dataset using four of the available channel encodings: x (the x-axis value), y (the y-axis value), color (the color of the marker), and shape (the shape of the point marker):

from altair import Chart, load_dataset

cars = load_dataset('cars')

Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    shape='Origin'
)

Altair automatically determines the correct datatype from the Data Frame columns, and creates appropriate scales and legends to represent the data.

Channels

Altair provides a number of encoding channels that can be useful in different circumstances; the following table summarizes them:

TODO: link to examples of each

Position Channels:

Channel Altair Class Description Example
column Column The column of a faceted plot Trellis Stacked Bar Chart
row Row The row of a faceted plot Becker’s Barley Trellis Plot
x X The x-axis value Scatterplot
y Y The y-axis value Scatterplot with Filled Circles

Channels with Legend:

Channel Altair Class Description Example
color Color The color of the mark Stacked Area Chart
opacity Opacity The opacity of the mark  
shape Shape The shape of the mark Colored Scatterplot
size Size The size of the mark Binned Scatterplot

Order Channels:

Channel Altair Class Description Example
order Order  
path Path  

Field Channels:

Channel Altair Class Description Example
text Text The text to display at each mark Colored Text Scatter Plot
detail Detail Additional level of detail for a grouping, without mapping to any particular channel  
label Label  

Data Types

The details of any mapping depend on the type of the data. Altair recognizes four main data types:

Data Type Shorthand Code Description
quantitative Q a continuous real-valued quantity
ordinal O a discrete ordered quantity
nominal N a discrete unordered category
temporal T a time or date value

These types can either be expressed in a long-form using the channel encoding classes such as X and Y, or in short-form using the Shorthand Syntax discussed below. For example, the following two means of specifying the type result in identical plot specifications:

>>> from altair import Chart, X
>>> chart = Chart().encode(
...             x=X('name', type='quantitative')
...         )
>>> print(chart.to_json())
{"encoding": {"x": {"field": "name", "type": "quantitative"}}}
>>> chart = Chart().encode(
...             x='name:Q'
...         )
>>> print(chart.to_json())
{"encoding": {"x": {"field": "name", "type": "quantitative"}}}

The shorthand form, "name:Q", is useful for its lack of boilerplate when doing quick data explorations. The long-form, X('name', type='quantitative'), is useful when adjusting binning, axis properties, and other details of the mapping.

Specifying the correct type for your data is important, as it affects the way Altair represents your encoding in the resulting plot. As an example of this, here we will represent the same data three different ways, with the color encoded as a quantitative, ordinal, and nominal type:

TODO: use subplots here

Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Cylinders:Q'           # Encode as quantitative (Q)
)
Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Cylinders:O'           # Encode as ordinal (O)
)
Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Cylinders:N'           # Encode as nominal (N)
)

The type specification influences the way Altair, via Vega-Lite, chooses a color scale to represent the value, and influences whether a discrete or continuous legend is used.

Binning and Aggregation

Beyond simple channel encodings, Altair’s visualizations are built on the concept of the database-style grouping and aggregation; that is, the split-apply-combine abstraction that underpins many data analyses.

For example, building a histogram from a one-dimensional dataset involves splitting data based on the bin it falls in, aggregating the results within each bin using a count of the data, and then combining the results into a final figure.

In altair, such an operation looks like this:

from altair import load_dataset, Chart, X

cars = load_dataset('cars')

Chart(cars).mark_bar().encode(
    X('Horsepower', bin=True),
    y='count(*):Q'
    # could also use Y('*', aggregate='count', type='quantitative')
)

Notice here we use the shorthand version of expressing an encoding channel (see Encoding Shorthands) with the count aggregation, the special * wild-card identifier often used with counts, and Q for quantitative type.

Similarly, we can create a two-dimensional histogram using, for example, the size of points to indicate counts within the grid (sometimes called a “Bubble Plot”):

from altair import load_dataset, Chart, X, Y

cars = load_dataset('cars')

Chart(cars).mark_point().encode(
    X('Horsepower', bin=True),
    Y('Miles_per_Gallon', bin=True),
    size='count(*):Q',
)

There is no need, however, to limit aggregations to counts alone. For example, we could similarly create a plot where the color of each point represents the mean of a third quantity, such as acceleration:

from altair import load_dataset, Chart, X, Y

cars = load_dataset('cars')

Chart(cars).mark_circle().encode(
    X('Horsepower', bin=True),
    Y('Miles_per_Gallon', bin=True),
    size='count(*):Q',
    color='average(Acceleration):Q'
)

In addition to count and average, there are a large number of available aggregation functions built into Altair; they are listed in the following table:

TODO: fill-in examples

Aggregate Description Example
sum Sum of values Aggregate Bar Chart
mean Arithmetic mean of values Text Table Heatmap
average Arithmetic mean of values  
count Total number of values Binned Scatterplot
distinct Number of distinct values  
variance Variance of values  
variancep ??  
stdev Standard Deviation of values  
stdevp ??  
median Median of values  
q1 First quartile of values  
q3 Third quartile of values  
modeskew ??  
min Minimum value  
max Maximum value  
argmin Index of minimum value  
argmax Index of maximum value  
values ??  
valid ??  
missing ??  

Encoding Shorthands

For convenience, Altair allows the specification of the variable name along with the aggregate and type within a simple shorthand string syntax. This makes use of the type shorthand codes listed in Data Types as well as the aggregate names listed in Binning and Aggregation. The following table shows examples of the shorthand specification alongside the long-form equivalent:

Shorthand Equivalent long-form
x='name' X('name')
x='name:Q' X('name', type='quantitative')
x='sum(name)' X('name', aggregate='sum')
x='sum(name):Q' X('name', aggregate='sum', type='quantitative')