Fork me on GitHub

PdVega: Interactive Vega-Lite Plots for Pandas

pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas dataframes, using an API that is nearly identical to Pandas’ built-in plotting API, and designed for easy use within the Jupyter notebook.

import pandas as pd
import numpy as np
data = pd.DataFrame({'x': np.random.randn(200),
                     'y': np.random.randn(200)})

import pdvega  # adds vgplot attribute to pandas
data.vgplot.scatter('x', 'y')

The result is an interactive plot rendered using Vega-Lite, a visualization specification that allows users to declaratively describe which data features should map to which visualization features using a well-defined JSON schema. The result is beautiful and dynamic data visualizations with a minimum of boiler-plate.

pdvega aims to make the construction of these specifications more accessible to Python users, via a familiar plotting API.

Quick Start

pdvega is designed to be used primarily with the Jupyter notebook. To get started, first install pdvega with the following commands:

$ pip install pdvega
$ jupyter nbextension install --sys-prefix --py vega3

(for details on installation and dependencies, see Installing and Using pdvega).

With the package installed and imported, you can use the vgplot attribute of Pandas Series and DataFrame objects to quickly create a Vega-Lite plot. For convenience here, we will load example datasets using the vega_datasets package:

# load a dataframe containing stock price time-series
from vega_datasets import data
stocks = data.stocks(pivoted=True)

# importing pdvega adds the `vgplot` attribute to pandas objects
import pdvega

stocks.vgplot.line()

Notice that by default plots created with pdvega are interactive: you can use your mouse or track pad to pan and zoom the plot.

By design, pdvega has a plotting API that is nearly identical to Pandas’ existing matplotlib API; just replace data.plot with data.vgplot, where data refers to any Pandas Series or DataFrame object:

# create a matplotlib line plot
stocks.plot.line(y='AAPL', alpha=0.5)
_images/index-2.png
# create a vega line plot
stocks.vgplot.line(y='AAPL', alpha=0.5)

pdvega does not (yet?) support every available argument supported by DataFrame.plot methods, but it covers the most commonly-used arguments.

To see more examples of visualizations created using the vgplot attribute of pandas Series and DataFrame objects, see Simple Visualizations with data.vgplot.

More Complex Plots

The pdvega package additionally supports many of the more sophisticated plotting routines available in the pandas.plotting submodule; for example, here is a multi-panel scatter-plot matrix of Fisher’s Iris dataset:

iris = data.iris()
pdvega.scatter_matrix(iris, 'species', figsize=(7, 7))

In this plot, you can click and drag for linked panning and zooming, or you can click and drag while holding the SHIFT key to do linked brushing of the points.

For more examples of statistical visualizations available in pdvega.plotting, see Statistical Visualization with pdvega.plotting.

Indices and tables