
PdVega: Interactive Vega-Lite Plots for Pandas¶
pdvega is a library that allows you to quickly create interactive
Vega-Lite plots from Pandas dataframes, using an API that is nearly
identical to Pandas’ built-in plotting API,
and designed for easy use within the Jupyter notebook.
import pandas as pd
import numpy as np
data = pd.DataFrame({'x': np.random.randn(200),
'y': np.random.randn(200)})
import pdvega # adds vgplot attribute to pandas
data.vgplot.scatter('x', 'y')
The result is an interactive plot rendered using Vega-Lite, a visualization specification that allows users to declaratively describe which data features should map to which visualization features using a well-defined JSON schema. The result is beautiful and dynamic data visualizations with a minimum of boiler-plate.
pdvega aims to make the construction of these specifications
more accessible to Python users, via a familiar plotting API.
Quick Start¶
pdvega is designed to be used primarily with the Jupyter notebook.
To get started, first install pdvega with the following commands:
$ pip install pdvega
$ jupyter nbextension install --sys-prefix --py vega3
(for details on installation and dependencies, see Installing and Using pdvega).
With the package installed and imported, you can use the vgplot attribute
of Pandas Series and DataFrame objects to quickly create a Vega-Lite
plot. For convenience here, we will load example datasets using the
vega_datasets package:
# load a dataframe containing stock price time-series
from vega_datasets import data
stocks = data.stocks(pivoted=True)
# importing pdvega adds the `vgplot` attribute to pandas objects
import pdvega
stocks.vgplot.line()
Notice that by default plots created with pdvega are interactive: you can
use your mouse or track pad to pan and zoom the plot.
By design, pdvega has a plotting API that is nearly identical to Pandas’
existing matplotlib API;
just replace data.plot with data.vgplot, where
data refers to any Pandas Series or DataFrame object:
# create a matplotlib line plot
stocks.plot.line(y='AAPL', alpha=0.5)
# create a vega line plot
stocks.vgplot.line(y='AAPL', alpha=0.5)
pdvega does not (yet?) support every available argument supported by
DataFrame.plot methods, but it covers the most commonly-used arguments.
To see more examples of visualizations created using the vgplot attribute
of pandas Series and DataFrame objects, see Simple Visualizations with data.vgplot.
More Complex Plots¶
The pdvega package additionally supports many of the more sophisticated
plotting routines available in the
pandas.plotting
submodule; for example, here is a multi-panel scatter-plot matrix of Fisher’s
Iris dataset:
iris = data.iris()
pdvega.scatter_matrix(iris, 'species', figsize=(7, 7))
In this plot, you can click and drag for linked panning and zooming, or you can click and drag while holding the SHIFT key to do linked brushing of the points.
For more examples of statistical visualizations available in
pdvega.plotting, see Statistical Visualization with pdvega.plotting.
Documentation¶
pdvega is MIT-licensed and the source is available on GitHub. If any questions or issues come up as you use it, please get in touch via Git Issues.