PdVega: Interactive Vega-Lite Plots for Pandas¶
pdvega
is a library that allows you to quickly create interactive
Vega-Lite plots from Pandas dataframes, using an API that is nearly
identical to Pandas’ built-in plotting API,
and designed for easy use within the Jupyter notebook.
import pandas as pd
import numpy as np
data = pd.DataFrame({'x': np.random.randn(200),
'y': np.random.randn(200)})
import pdvega # adds vgplot attribute to pandas
data.vgplot.scatter('x', 'y')
The result is an interactive plot rendered using Vega-Lite, a visualization specification that allows users to declaratively describe which data features should map to which visualization features using a well-defined JSON schema. The result is beautiful and dynamic data visualizations with a minimum of boiler-plate.
pdvega
aims to make the construction of these specifications
more accessible to Python users, via a familiar plotting API.
Quick Start¶
pdvega
is designed to be used primarily with the Jupyter notebook.
To get started, first install pdvega
with the following commands:
$ pip install pdvega
$ jupyter nbextension install --sys-prefix --py vega3
(for details on installation and dependencies, see Installing and Using pdvega).
With the package installed and imported, you can use the vgplot
attribute
of Pandas Series
and DataFrame
objects to quickly create a Vega-Lite
plot. For convenience here, we will load example datasets using the
vega_datasets package:
# load a dataframe containing stock price time-series
from vega_datasets import data
stocks = data.stocks(pivoted=True)
# importing pdvega adds the `vgplot` attribute to pandas objects
import pdvega
stocks.vgplot.line()
Notice that by default plots created with pdvega
are interactive: you can
use your mouse or track pad to pan and zoom the plot.
By design, pdvega
has a plotting API that is nearly identical to Pandas’
existing matplotlib API;
just replace data.plot
with data.vgplot
, where
data
refers to any Pandas Series
or DataFrame
object:
# create a matplotlib line plot
stocks.plot.line(y='AAPL', alpha=0.5)
# create a vega line plot
stocks.vgplot.line(y='AAPL', alpha=0.5)
pdvega
does not (yet?) support every available argument supported by
DataFrame.plot
methods, but it covers the most commonly-used arguments.
To see more examples of visualizations created using the vgplot
attribute
of pandas Series
and DataFrame
objects, see Simple Visualizations with data.vgplot.
More Complex Plots¶
The pdvega
package additionally supports many of the more sophisticated
plotting routines available in the
pandas.plotting
submodule; for example, here is a multi-panel scatter-plot matrix of Fisher’s
Iris dataset:
iris = data.iris()
pdvega.scatter_matrix(iris, 'species', figsize=(7, 7))
In this plot, you can click and drag for linked panning and zooming, or you can click and drag while holding the SHIFT key to do linked brushing of the points.
For more examples of statistical visualizations available in
pdvega.plotting
, see Statistical Visualization with pdvega.plotting.
Documentation¶
pdvega is MIT-licensed and the source is available on GitHub. If any questions or issues come up as you use it, please get in touch via Git Issues.