{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Transformations\n", "\n", "One important piece of the visualization pipeline is **data transformation**.\n", "\n", "With Altair, you have two possible routes for data transformation; namely:\n", "\n", "1. pre-transformation in Python\n", "2. transformation in Altair/Vega-Lite" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import altair as alt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate Transform\n", "\n", "As an example, let's take a look at transforming some input data that is not encoded in the most intuitive manner.\n", "The ``population`` dataset lists aggregated US census data by year, sex, and age, but lists the sex as \"1\" and \"2\", which makes charts labels not particularly intuitive:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from vega_datasets import data\n", "population = data.population()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearagesexpeople
01850011483789
11850021450376
21850511411067
31850521359668
418501011260099
\n", "
" ], "text/plain": [ " year age sex people\n", "0 1850 0 1 1483789\n", "1 1850 0 2 1450376\n", "2 1850 5 1 1411067\n", "3 1850 5 2 1359668\n", "4 1850 10 1 1260099" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "population.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(population).mark_bar().encode(\n", " x='year:O',\n", " y='sum(people):Q',\n", " color='sex:N'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way we could address this from Python is to use tools in Pandas to re-map these column names; for example:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "population['men_women'] = population['sex'].map({1: 'Men', 2: 'Women'})\n", "\n", "alt.Chart(population).mark_bar().encode(\n", " x='year:O',\n", " y='sum(people):Q',\n", " color='men_women:N'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But Altair is designed to be used with URL-based data as well, in which such pre-processing is not available.\n", "In these situations, it is better to make the transformation *part of the plot specification*.\n", "Here this can be done via the ``transform_calculate`` method, which accepts a [Vega Expression](https://vega.github.io/vega/docs/expressions/), which is essentially a string that can contain a small subset of javascript operations:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# undo our addition of a column above...\n", "population = population.drop('men_women', axis=1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(population).mark_bar().encode(\n", " x='year:O',\n", " y='sum(people):Q',\n", " color='men_women:N'\n", ").transform_calculate(\n", " men_women='datum.sex == 1 ? \"Men\" : \"Women\"'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The one potentially confusing piece is the presence of the word \"datum\": this is simply the convention by which Vega expressions refer to a row of the data.\n", "\n", "If you would prefer to build these expressions in Python, Altair provides a lightweight API to do so:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from altair.expr import datum, if_\n", "\n", "alt.Chart(population).mark_bar().encode(\n", " x='year:O',\n", " y='sum(people):Q',\n", " color='men_women:N'\n", ").transform_calculate(\n", " men_women=if_(datum.sex == 1, \"Men\", \"Women\")\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter Transform\n", "\n", "The filter transform is similar. For example, suppose you would like to create a chart consisting only of the male population from these census records.\n", "As above, this could be done from Pandas, but it is useful to have this operation available within the chart specification as well.\n", "It can be done via the ``transform_filter()`` method:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(population).mark_bar().encode(\n", " x='year:O',\n", " y='sum(people):Q',\n", ").transform_filter(\n", " datum.sex == 1\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have seen this ``transform_filter`` method before, when we filtered based on the result of a selection." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Transforms\n", "\n", "Other transform methods are available, and though we won't demonstrate them here, there are examples available in the Altair [Transform documentation](https://altair-viz.github.io/user_guide/transform/index.html).\n", "\n", "Altair provides a number of useful transforms. Some will be quite familiar:\n", "\n", "- ``transform_aggregate()``\n", "- ``transform_bin()``\n", "- ``transform_timeunit()``\n", "\n", "These three transforms accomplish exactly the types of operations we discussed in [03-Binning-and-aggregation](03-Binning-and-aggregation.ipynb), with the distinction that they result in the creation of a new named value that can be referenced in multiple places throughout the chart.\n", "\n", "Many other transforms exist as well, such as:\n", "\n", "- ``transform_lookup()``: this lets you perform one-sided joins of multiple datasets. It is used often, for example, in geographic visualizations where you join data (such as unemployment within states) to data about geographic regions used to represent that data\n", "- ``transform_window()``: this lets you perform aggregates across sliding windows, for example computing local means of data. It was recently added to Vega-Lite, and so the Altair API for this transform is not yet very convenient.\n", "\n", "Again visit the [Transform documentation](https://altair-viz.github.io/user_guide/transform/index.html) for a more complete list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise\n", "\n", "Take the following data:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "x = pd.DataFrame({'x': np.linspace(-5, 5)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Create a chart based on this data, and plot sine and cosine curves using Altair's ``transform_calculate`` API. \n", "2. Use ``transform_filter`` on this chart, and remove the regions of the plot where the value of the cosine curve is less than the value of the sine curve." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }