Visualizing Data

Data visualization can help you explore, understand, and gain insight from your data. Visualization can complement other methods of data analysis by taking advantage of the human ability to recognize patterns in visual information.

GraphLab Canvas provides an interactive browser-based visualization platform to explore your data. The data structures that support visualization in Canvas include SFrame, SArray, and SGraph. Each of these data structures can be shown in Canvas by calling the show method on an instance of one of those types. Canvas supports two render targets: 'browser' (the interactive browser-based experience) and 'ipynb' (visualizations are embedded in an IPython Notebook output cell).

SFrame Visualization

Data in an SFrame can be visualized with SFrame.show(). Table and Summary are the two types of visualizations1 for SFrame, each represented by a tab in the Canvas user interface. The Table view provides a scrollable, interactive tabular view of the data inside the SFrame. Like the SFrame itself, the Table view can scale to as much data as will fit on a disk -- only the rows being viewed are loaded. The paging control on the left side of the view allows you to move quickly through the SFrame or jump to a particular row.

Example of Canvas SFrame Table View

The Summary view shows which columns are in the SFrame, with a summary of the data inside each column. Numeric column types (int and float) show a box plot, str columns show a table of the most frequently occurring items in the column, and recursive column types (dict, list, and array) show some combination of those plots depending on the type of the underlying data.

Example of Canvas SFrame Summary View

The summary view supports drill-down visualization that any column within it can be visualized by clicking on the corresponding title. For example, clicking on "categories" brings up an SArray view. Below we have more information about how to visualize an SArray programatically.

Example of Canvas Drill-down View

SFrame can also be visualized using integrated bi-variate plot types, currently we support Scatter plot, Heatmap, Bar chart, Box plot, and etc. The plot types can either be specified over API or explored under the "Plots" tab.

Example of Canvas Scatter Plot Example of Canvas Bar Chart

Example of Canvas Heatmap View Example of Canvas Box Plot

SArray Visualization

Data in an SArray can be visualized with SArray.show(). Canvas has a specialized visualization for each supported dtype in SArray. The visualization types currently supported (based on column dtype) are:

dtype Visualization
float Histogram of quantiles (approximated histogram based on sketch_summary quantiles)
int Histogram of quantiles and (if there are any items with >0.01% occurrence) table of most frequent items
str Table of most frequent items
array Histogram of quantiles (aggregate or per-subcolumn)
list Table of most frequent items (aggregate)
dict Filterable table of most frequent keys, with aggregate or filtered values visualized according to value dtype

Example of Canvas SArray Numeric View Example of Canvas SArray Categorical View Example of Canvas SArray Dictionary View

SGraph Visualization

Graph structure can be visualized with SGraph.show(). The vertices and edges are laid out in a two-dimensional plot, with vertices drawn as circles and edges as lines between them. The default layout algorithm is force-directed, but it is possible to apply a custom layout algorithm with the vertex_positions parameter. In general, SGraph.show offers parameterization of the size, position, and color of vertices and edges, which can be used to great effect on graphs representing different types of data.

Example of Canvas SGraph View

Model Visualization

Models such as Recommender and Classifier can be visually inspected by calling model.show(). The model visualization provides model summary view, model evaluation view and model comparison view. More details about how to use the model comparison visualization can be found in gl.compare() and gl.show_comparison().

The summary view provides basic statistics about the training data as well as the model training time. Example of Canvas Model Summary View

The model evaluation provide model specific evaluation metrics such as precision-recall. The hover tooltip shows more details about the model performance at a specific cutoff value. The model comparison view shows multiple models in the same view space and offers interactive highlighting to support focused analysis.

Example of Canvas Model Evaluation View Example of Canvas Model Comparison View

Integration with other visualization tools

There are many other software tools that can help with different types of data visualization. Integrating with GraphLab Create is easy, since we provide methods on SFrame, SArray, and SGraph to transform and retrieve the underlying data as native Python types. One commonly used Python module for visualization is matplotlib.

Here is an example of a histogram displayed with matplotlib. For this particular use, it might be easier to simply call show on a numeric SArray, which will also give a histogram -- this example is intended to demonstrate interoperability with other Python visualization packages.

import numpy as np
from matplotlib import pyplot as plt

sa = gl.SArray([np.random.gamma(1) for _ in range(100000)])
n, bins, patches = plt.hist(list(sa))
plt.plot(bins)
plt.show()

Example of SArray in matplotlib

1. Note: in the 'ipynb' target, only the Summary visualization of SFrame is supported.