Encodings
The key to creating meaningful visualizations is to map properties of the data
to visual properties in order to effectively communicate information.
In Altair, this mapping of visual properties to data columns is referred to
as an encoding, and is most often expressed through the Chart.encode()
method.
For example, here we will visualize the cars dataset using four of the available
encodings: x
(the x-axis
value), y
(the y-axis
value),
color
(the color of the
marker), and shape
(the
shape of the point marker):
import altair as alt
from vega_datasets import data
cars = data.cars()
alt.Chart(cars).mark_point().encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
shape='Origin'
)
For data specified as a DataFrame, Altair can automatically determine the correct data type for each encoding, and creates appropriate scales and legends to represent the data.
Encoding Channels
Altair provides a number of encoding channels that can be useful in different circumstances; the following table summarizes them:
Position Channels:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
x |
The x-axis value |
||
y |
The y-axis value |
||
x2 |
Second x value for ranges |
||
y2 |
Second y value for ranges |
||
longitude |
Longitude for geo charts |
||
latitude |
Latitude for geo charts |
||
longitude2 |
Second longitude value for ranges |
||
latitude2 |
Second latitude value for ranges |
||
xError |
The x-axis error value |
N/A |
|
yError |
The y-axis error value |
N/A |
|
xError2 |
The second x-axis error value |
N/A |
|
yError2 |
The second y-axis error value |
N/A |
|
theta |
The start arc angle |
||
theta2 |
The end arc angle (radian) |
Mark Property Channels:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
angle |
The angle of the mark |
||
color |
The color of the mark |
||
fill |
The fill for the mark |
||
fillopacity |
The opacity of the mark’s fill |
N/A |
|
opacity |
The opacity of the mark |
||
radius |
The radius or the mark |
||
shape |
The shape of the mark |
||
size |
The size of the mark |
||
stroke |
The stroke of the mark |
N/A |
|
strokeDash |
The stroke dash style |
||
strokeOpacity |
The opacity of the line |
N/A |
|
strokeWidth |
The width of the line |
N/A |
Text and Tooltip Channels:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
text |
Text to use for the mark |
||
key |
– |
N/A |
|
tooltip |
The tooltip value |
Hyperlink Channel:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
href |
Hyperlink for points |
Level of Detail Channel:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
detail |
Additional property to group by |
Order Channel:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
order |
Sets the order of the marks |
Facet Channels:
Channel |
Altair Class |
Description |
Example |
---|---|---|---|
column |
The column of a faceted plot |
||
row |
The row of a faceted plot |
||
facet |
The row and/or column of a general faceted plot |
Encoding Data Types
The details of any mapping depend on the type of the data. Altair recognizes five main data types:
Data Type |
Shorthand Code |
Description |
---|---|---|
quantitative |
|
a continuous real-valued quantity |
ordinal |
|
a discrete ordered quantity |
nominal |
|
a discrete unordered category |
temporal |
|
a time or date value |
geojson |
|
a geographic shape |
If types are not specified for data input as a DataFrame, Altair defaults to
quantitative
for any
numeric data, temporal
for date/time data, and
nominal
for string data,
but be aware that these defaults are by no means
always the correct choice!
The types can either be expressed in a long-form using the channel encoding
classes such as X
and
Y
, or
in short-form using the
Shorthand
Syntax discussed below.
For example, the following two methods of specifying the type will lead to
identical plots:
alt.Chart(cars).mark_point().encode(
x='Acceleration:Q',
y='Miles_per_Gallon:Q',
color='Origin:N'
)
alt.Chart(cars).mark_point().encode(
alt.X('Acceleration', type='quantitative'),
alt.Y('Miles_per_Gallon', type='quantitative'),
alt.Color('Origin', type='nominal')
)
The shorthand form, x="name:Q"
, is
useful for its lack of boilerplate
when doing quick data explorations. The long-form,
alt.X('name', type='quantitative')
,
is useful when doing more fine-tuned
adjustments to the encoding, such as binning, axis and scale properties,
or more.
Specifying the correct type for your data is important, as it affects the way Altair represents your encoding in the resulting plot.
Effect of Data Type on Color Scales
As an example of this, here we will represent the same data three different ways, with the color encoded as a quantitative, ordinal, and nominal type, using three vertically-concatenated charts (see Vertical Concatenation):
base = alt.Chart(cars).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
).properties(
width=150,
height=150
)
alt.vconcat(
base.encode(color='Cylinders:Q').properties(title='quantitative'),
base.encode(color='Cylinders:O').properties(title='ordinal'),
base.encode(color='Cylinders:N').properties(title='nominal'),
)
The type specification influences the way Altair, via Vega-Lite, decides on the color scale to represent the value, and influences whether a discrete or continuous legend is used.
Effect of Data Type on Axis Scales
Similarly, for x and y axis encodings, the type used for the data will affect
the scales used and the characteristics of the mark. For example, here is the
difference between a quantitative
and ordinal
scale for an column
that contains integers specifying a year:
pop = data.population.url
base = alt.Chart(pop).mark_bar().encode(
alt.Y('mean(people):Q', title='total population')
).properties(
width=200,
height=200
)
alt.hconcat(
base.encode(x='year:Q').properties(title='year=quantitative'),
base.encode(x='year:O').properties(title='year=ordinal')
)
Because quantitative values do not have an inherent width, the bars do not fill the entire space between the values. This view also makes clear the missing year of data that was not immediately apparent when we treated the years as categories.
This kind of behavior is sometimes surprising to new users, but it emphasizes the importance of thinking carefully about your data types when visualizing data: a visual encoding that is suitable for categorical data may not be suitable for quantitative data, and vice versa.
Encoding Channel Options
Each encoding channel allows for a number of additional options to be expressed; these can control things like axis properties, scale properties, headers and titles, binning parameters, aggregation, sorting, and many more.
The particular options that are available vary by encoding type; the various options are listed below.
The X
and
Y
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
axis |
anyOf( |
An object defining properties of axis’s gridlines, ticks and labels. If Default value: If undefined, default axis properties are applied. See also: |
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
impute |
anyOf( |
An object defining the properties of the Impute Operation to be applied. The field value of
the other positional channel is taken as See also: |
scale |
anyOf( |
An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. See also: |
sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: See also: |
|
stack |
anyOf( |
Type of stacking offset if the field should be stacked.
Default value: See also: |
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Color
,
Fill
,
and Stroke
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
condition |
anyOf( |
One or more value definition(s) with a selection or a test predicate. Note: A field definition’s |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
legend |
anyOf( |
An object defining properties of the legend. If Default value: If undefined, default legend properties are applied. See also: |
scale |
anyOf( |
An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. See also: |
sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: See also: |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Shape
encoding accepts the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
condition |
anyOf( |
One or more value definition(s) with a selection or a test predicate. Note: A field definition’s |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
legend |
anyOf( |
An object defining properties of the legend. If Default value: If undefined, default legend properties are applied. See also: |
scale |
anyOf( |
An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. See also: |
sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: See also: |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Angle
,
FillOpacity
,
Opacity
,
Size
,
StrokeOpacity
,
and StrokeWidth
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
condition |
anyOf( |
One or more value definition(s) with a selection or a test predicate. Note: A field definition’s |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
legend |
anyOf( |
An object defining properties of the legend. If Default value: If undefined, default legend properties are applied. See also: |
scale |
anyOf( |
An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. See also: |
sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: See also: |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Row
and Column
,
and Facet
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
align |
The alignment to apply to row/column facet’s subplot. The supported string values are
Default value: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
center |
|
Boolean flag indicating if facet’s subviews should be centered relative to their respective rows or columns. Default value: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
header |
An object defining properties of a facet’s header. |
|
sort |
anyOf( |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: |
spacing |
|
The spacing in pixels between facet’s sub-views. Default value: Depends on |
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Facet
encoding accepts the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
align |
anyOf( |
The alignment to apply to grid rows and columns. The supported string values are
Alternatively, an object value of the form Default value: |
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
bounds |
[‘full’, ‘flush’] |
The bounds calculation method to use for determining the extent of a sub-plot. One of
Default value: |
center |
anyOf( |
Boolean flag indicating if subviews should be centered relative to their respective rows or columns. An object value of the form Default value: |
columns |
|
The number of columns to include in the view composition layout. Default value: Note:
|
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
header |
An object defining properties of a facet’s header. |
|
sort |
anyOf( |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: |
spacing |
anyOf( |
The spacing in pixels between sub-views of the composition operator. An object of the form
Default value: Depends on |
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Text
encoding accepts the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
condition |
anyOf( |
One or more value definition(s) with a selection or a test predicate. Note: A field definition’s |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
format |
anyOf( |
When used with the default
See the format documentation for more examples. When used with a custom Default value: Derived from numberFormat config for number format and from timeFormat config for time format. |
formatType |
|
The format type for labels. One of Default value: - |
labelExpr |
|
Vega expression for customizing labels text. Note: The label text and value can be assessed via the |
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Description
,
Href
,
Tooltip
,
and Url
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
condition |
anyOf( |
One or more value definition(s) with a selection or a test predicate. Note: A field definition’s |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
format |
anyOf( |
When used with the default
See the format documentation for more examples. When used with a custom Default value: Derived from numberFormat config for number format and from timeFormat config for time format. |
formatType |
|
The format type for labels. One of Default value: - |
labelExpr |
|
Vega expression for customizing labels text. Note: The label text and value can be assessed via the |
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Detail
and Key
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Latitude
and Longitude
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
|
A flag for binning a
Default value: See also: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
|
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Latitude2
,
Longitude2
,
Radius2
,
Theta2
,
X2
, Y2
, XError
,
YError
,
XError2
,
and YError2
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
|
A flag for binning a
Default value: See also: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
The Order
encoding accepts the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
sort |
The sort order. One of |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The Radius
and Theta
encodings accept the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
scale |
anyOf( |
An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. See also: |
sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: See also: |
|
stack |
anyOf( |
Type of stacking offset if the field should be stacked.
Default value: See also: |
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
The StrokeDash
encoding accepts the following options:
Property |
Type |
Description |
---|---|---|
aggregate |
Aggregation function for the field (e.g., Default value: See also: |
|
band |
|
For rect-based marks ( For other marks, relative position on a band of a stacked, binned, time unit or band scale.
If set to |
bin |
anyOf( |
A flag for binning a
Default value: See also: |
condition |
anyOf( |
One or more value definition(s) with a selection or a test predicate. Note: A field definition’s |
field |
Required. A string defining the name of the field from which to pull a
data value or an object defining iterated values from the See also: Notes: 1) Dots ( |
|
legend |
anyOf( |
An object defining properties of the legend. If Default value: If undefined, default legend properties are applied. See also: |
scale |
anyOf( |
An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. See also: |
sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields, Default value: Note: See also: |
|
timeUnit |
anyOf( |
Time unit (e.g., Default value: See also: |
title |
anyOf( |
A title for the field. If Default value: derived from the field’s name and transformation function
( Notes:
|
type |
The type of measurement ( Vega-Lite automatically infers data types in many cases as discussed below. However, type
is required for a field if: (1) the field is not nominal and the field encoding has no
specified Default value:
Note: - Data See also: |
Binning and Aggregation
Beyond simple channel encodings, Altair’s visualizations are built on the concept of the database-style grouping and aggregation; that is, the split-apply-combine abstraction that underpins many data analysis approaches.
For example, building a histogram from a one-dimensional dataset involves splitting data based on the bin it falls in, aggregating the results within each bin using a count of the data, and then combining the results into a final figure.
In Altair, such an operation looks like this:
alt.Chart(cars).mark_bar().encode(
alt.X('Horsepower', bin=True),
y='count()'
# could also use alt.Y(aggregate='count', type='quantitative')
)