Encodings

The key to creating meaningful visualizations is to map properties of the data to visual properties in order to effectively communicate information. In Alair, this mapping of visual properties to data columns is referred to as an encoding, and is most often expressed through the Chart.encode() method.

For example, here we will visualize the cars dataset using four of the available encodings: x (the x-axis value), y (the y-axis value), color (the color of the marker), and shape (the shape of the point marker):

import altair as alt
from vega_datasets import data
cars = data.cars()

alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    shape='Origin'
)

For data specified as a DataFrame, Altair can automatically determine the correct data type for each encoding, and creates appropriate scales and legends to represent the data.

Encoding Channels

Altair provides a number of encoding channels that can be useful in different circumstances; the following table summarizes them:

Position Channels:

Channel Altair Class Description Example
x X The x-axis value Simple Scatter Plot
y Y The y-axis value Simple Scatter Plot
x2 X2 Second x value for ranges Error Bars showing Confidence Interval
y2 Y2 Second y value for ranges Line chart with Confidence Interval Band
longitude Longitude Longitude for geo charts Locations of US Airports
latitude Latitude Latitude for geo charts Locations of US Airports
longitude2 Longitude2 Second longitude value for ranges N/A
latitude2 Latitude2 Second latitude value for ranges N/A

Mark Property Channels:

Channel Altair Class Description Example
color Color The color of the mark Simple Heatmap
fill Fill The fill for the mark N/A
opacity Opacity The opacity of the mark Horizon Graph
shape Shape The shape of the mark N/A
size Size The size of the mark Table Bubble Plot (Github Punch Card)
stroke Stroke The stroke of the mark N/A

Text and Tooltip Channels:

Channel Altair Class Description Example
text Text Text to use for the mark Simple Scatter Plot with Labels
key Key N/A
tooltip Tooltip The tooltip value Scatter Plot with Tooltips

Hyperlink Channel:

Channel Altair Class Description Example
href Href Hyperlink for points N/A

Level of Detail Channel:

Channel Altair Class Description Example
detail Detail Additional property to group by Selection Detail Example

Order Channel:

Channel Altair Class Description Example
order Order Sets the order of the marks Connected Scatterplot (Lines with Custom Paths)

Facet Channels:

Channel Altair Class Description Example
column Column The column of a faceted plot Trellis Scatter Plot
row Row The row of a faceted plot Becker’s Barley Trellis Plot

Data Types

The details of any mapping depend on the type of the data. Altair recognizes four main data types:

Data Type Shorthand Code Description
quantitative Q a continuous real-valued quantity
ordinal O a discrete ordered quantity
nominal N a discrete unordered category
temporal T a time or date value

If types are not specified for data input as a DataFrame, Altair defaults to quantitative for any numeric data, temporal for date/time data, and nominal for string data, but be aware that these defaults are by no means always the correct choice!

The types can either be expressed in a long-form using the channel encoding classes such as X and Y, or in short-form using the Shorthand Syntax discussed below. For example, the following two methods of specifying the type will lead to identical plots:

alt.Chart(cars).mark_point().encode(
    x='Acceleration:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
)
alt.Chart(cars).mark_point().encode(
    alt.X('Acceleration', type='quantitative'),
    alt.Y('Miles_per_Gallon', type='quantitative'),
    alt.Color('Origin', type='nominal')
)

The shorthand form, x="name:Q", is useful for its lack of boilerplate when doing quick data explorations. The long-form, alt.X('name', type='quantitative'), is useful when doing more fine-tuned adjustments to the encoding, such as binning, axis and scale properties, or more.

Specifying the correct type for your data is important, as it affects the way Altair represents your encoding in the resulting plot.

Effect of Data Type on Color Scales

As an example of this, here we will represent the same data three different ways, with the color encoded as a quantitative, ordinal, and nominal type, using three vertically-concatenated charts (see Vertical Concatenation):

base = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
).properties(
    width=150,
    height=150
)

alt.vconcat(
   base.encode(color='Cylinders:Q').properties(title='quantitative'),
   base.encode(color='Cylinders:O').properties(title='ordinal'),
   base.encode(color='Cylinders:N').properties(title='nominal'),
)

The type specification influences the way Altair, via Vega-Lite, decides on the color scale to represent the value, and influences whether a discrete or continuous legend is used.

Effect of Data Type on Axis Scales

Similarly, for x and y axis encodings, the type used for the data will affect the scales used and the characteristics of the mark. For example, here is the difference between a quantitative and ordinal scale for an column that contains integers specifying a year:

pop = data.population.url

base = alt.Chart(pop).mark_bar().encode(
    alt.Y('mean(people):Q', axis=alt.Axis(title='total population'))
).properties(
    width=200,
    height=200
)

alt.hconcat(
    base.encode(x='year:Q').properties(title='year=quantitative'),
    base.encode(x='year:O').properties(title='year=ordinal')
)

In altair, quantitative scales always start at zero unless otherwise specified, while ordinal scales are limited to the values within the data.

Overriding the behavior of including zero in the axis, we see that even then the precise appearance of the marks representing the data are affected by the data type:

base.encode(
    alt.X('year:Q',
        scale=alt.Scale(zero=False)
    )
)

Because quantitative values do not have an inherent width, the bars do not fill the entire space between the values. This view also makes clear the missing year of data that was not immediately apparent when we treated the years as categories.

This kind of behavior is sometimes surprising to new users, but it emphasizes the importance of thinking carefully about your data types when visualizing data: a visual encoding that is suitable for categorical data may not be suitable for quantitative data, and vice versa.

Encoding Channel Options

Each encoding channel allows for a number of additional options to be expressed; these can control things like axis properties, scale properties, headers and titles, binning parameters, aggregation, sorting, and many more.

The particular options that are available vary by encoding type; the various options are listed below.

The X and Y encodings accept the following options:

Property Type Description
aggregate Aggregate Aggregation function for the field (e.g., mean, sum, median, min, max, count). Default value: undefined (None)
axis anyOf(Axis, null) An object defining properties of axis’s gridlines, ticks and labels. If null, the axis for the encoding channel will be removed. Default value: If undefined, default axis properties are applied.
bin anyOf(boolean, BinParams) A flag for binning a quantitative field, or an object defining binning parameters. If true, default binning parameters will be applied. Default value: false
field anyOf(string, RepeatRef) Required. A string defining the name of the field from which to pull a data value or an object defining iterated values from the ``repeat` <https://vega.github.io/vega-lite/docs/repeat.html>`_ operator. Note: Dots (.) and brackets ([ and ]) can be used to access nested objects (e.g., "field": "foo.bar" and "field": "foo['bar']"). If field names contain dots or brackets but are not nested, you can use \\ to escape dots and brackets (e.g., "a\\.b" and "a\\[0\\]"). See more details about escaping in the field documentation. Note: field is not required if aggregate is count.
scale anyOf(Scale, null) An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If null, the scale will be disabled and the data value will be directly encoded. Default value: If undefined, default scale properties are applied.
sort anyOf(array(string), SortOrder, SortField, null) Sort order for the encoded field. Supported sort values include "ascending", "descending", null (no sorting), or an array specifying the preferred order of values. For fields with discrete domains, sort can also be a sort field definition object. For sort as an array specifying the preferred order of values, the sort order will obey the values in the array, followed by any unspecified values in their original order. Default value: "ascending"
stack anyOf(StackOffset, null)  
timeUnit TimeUnit Time unit (e.g., year, yearmonth, month, hours) for a temporal field. or a temporal field that gets casted as ordinal. Default value: undefined (None)
title [string, null] A title for the field. If null, the title will be removed. Default value: derived from the field’s name and transformation function (aggregate, bin and timeUnit). If the field has an aggregate function, the function is displayed as part of the title (e.g., "Sum of Profit"). If the field is binned or has a time unit applied, the applied function is shown in parentheses (e.g., "Profit (binned)", "Transaction Date (year-month)"). Otherwise, the title is simply the field name. Notes: 1) You can customize the default field title format by providing the [fieldTitle property in the config or ``fieldTitle` function via the compile function’s options <compile.html#field-title>`_. 2) If both field definition’s title and axis, header, or legend title are defined, axis/header/legend title will be used.
type Type The encoded field’s type of measurement ("quantitative", "temporal", "ordinal", or "nominal"). It can also be a "geojson" type for encoding ‘geoshape’.

The Color, Fill, Opacity, Shape, Size, and Stroke encodings accept the following options:

Property Type Description
aggregate Aggregate Aggregation function for the field (e.g., mean, sum, median, min, max, count). Default value: undefined (None)
bin anyOf(boolean, BinParams) A flag for binning a quantitative field, or an object defining binning parameters. If true, default binning parameters will be applied. Default value: false
condition anyOf(ConditionalValueDef, array(ConditionalValueDef)) One or more value definition(s) with a selection predicate. Note: A field definition’s condition property can only contain value definitions since Vega-Lite only allows at most one encoded field per encoding channel.
field anyOf(string, RepeatRef) Required. A string defining the name of the field from which to pull a data value or an object defining iterated values from the ``repeat` <https://vega.github.io/vega-lite/docs/repeat.html>`_ operator. Note: Dots (.) and brackets ([ and ]) can be used to access nested objects (e.g., "field": "foo.bar" and "field": "foo['bar']"). If field names contain dots or brackets but are not nested, you can use \\ to escape dots and brackets (e.g., "a\\.b" and "a\\[0\\]"). See more details about escaping in the field documentation. Note: field is not required if aggregate is count.
legend anyOf(Legend, null) An object defining properties of the legend. If null, the legend for the encoding channel will be removed. Default value: If undefined, default legend properties are applied.
scale anyOf(Scale, null) An object defining properties of the channel’s scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If null, the scale will be disabled and the data value will be directly encoded. Default value: If undefined, default scale properties are applied.
sort anyOf(array(string), SortOrder, SortField, null) Sort order for the encoded field. Supported sort values include "ascending", "descending", null (no sorting), or an array specifying the preferred order of values. For fields with discrete domains, sort can also be a sort field definition object. For sort as an array specifying the preferred order of values, the sort order will obey the values in the array, followed by any unspecified values in their original order. Default value: "ascending"
timeUnit TimeUnit Time unit (e.g., year, yearmonth, month, hours) for a temporal field. or a temporal field that gets casted as ordinal. Default value: undefined (None)
title [string, null] A title for the field. If null, the title will be removed. Default value: derived from the field’s name and transformation function (aggregate, bin and timeUnit). If the field has an aggregate function, the function is displayed as part of the title (e.g., "Sum of Profit"). If the field is binned or has a time unit applied, the applied function is shown in parentheses (e.g., "Profit (binned)", "Transaction Date (year-month)"). Otherwise, the title is simply the field name. Notes: 1) You can customize the default field title format by providing the [fieldTitle property in the config or ``fieldTitle` function via the compile function’s options <compile.html#field-title>`_. 2) If both field definition’s title and axis, header, or legend title are defined, axis/header/legend title will be used.
type Type The encoded field’s type of measurement ("quantitative", "temporal", "ordinal", or "nominal"). It can also be a "geojson" type for encoding ‘geoshape’.

The Row and Column encodings accept the following options:

Property Type Description
aggregate Aggregate Aggregation function for the field (e.g., mean, sum, median, min, max, count). Default value: undefined (None)
bin anyOf(boolean, BinParams) A flag for binning a quantitative field, or an object defining binning parameters. If true, default binning parameters will be applied. Default value: false
field anyOf(string, RepeatRef) Required. A string defining the name of the field from which to pull a data value or an object defining iterated values from the ``repeat` <https://vega.github.io/vega-lite/docs/repeat.html>`_ operator. Note: Dots (.) and brackets ([ and ]) can be used to access nested objects (e.g., "field": "foo.bar" and "field": "foo['bar']"). If field names contain dots or brackets but are not nested, you can use \\ to escape dots and brackets (e.g., "a\\.b" and "a\\[0\\]"). See more details about escaping in the field documentation. Note: field is not required if aggregate is count.
header Header An object defining properties of a facet’s header.
sort SortOrder Sort order for a facet field. This can be "