The Same Scatter Plot with 5 Different Python Libraries

Yogesh Dhande
3 min readMar 31, 2024

Data visualization is an essential skill in data analysis and science. It provides insights into complex data sets through graphical representations.

Python offers a plethora of libraries tailored for data visualization, each with unique features and capabilities. In this article, we’ll explore how to create the same scatter plot using five different Python data visualization libraries: Matplotlib, Seaborn, Plotly, Altair, and Bokeh, using the well-known Iris dataset.

The Iris Dataset

The Iris dataset is a classic in the data science community. It contains 150 measurements of iris flowers from three different species, along with four features: sepal length, sepal width, petal length, and petal width. We’ll use this dataset to create a scatter plot of each species's sepal and petal lengths.

Loading the Dataset

from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris(as_frame=True)
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
iris_df.head()
      sepal length (cm)    sepal width (cm)    petal length (cm)    petal width (cm)  species
-- ------------------- ------------------ ------------------- ------------------ ---------
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5 3.6 1.4 0.2 setosa

Visualization Libraries

1. Matplotlib

Matplotlib is a foundational library in Python’s data visualization ecosystem, offering extensive control over plots.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
for species, group in df.groupby('species'):
ax.scatter(group['sepal length (cm)'], group['petal length (cm)'], label=species)
ax.legend()
ax.set_xlabel('Sepal Length')
ax.set_ylabel('Petal Length')
plt.show()

Here’s a full working example. You can modify the code and re-run it using the green button.

2. Seaborn

Seaborn is built on Matplotlib and provides a high-level interface for drawing attractive statistical graphics.

import seaborn as sns

sns.scatterplot(x='sepal length (cm)', y='petal length (cm)', hue='species', data=df)
plt.show()

3. Plotly

Plotly is known for its interactive plots, allowing users to zoom, pan, and hover over the data points for more information.

import plotly.express as px

fig = px.scatter(df, x='sepal length (cm)', y='petal length (cm)', color='species')
fig.show()

4. Altair

Altair offers a simple and intuitive syntax for declarative statistical visualization, focusing on building a visual grammar.

import altair as alt

chart = alt.Chart(df).mark_point().encode(
x='sepal length (cm)',
y='petal length (cm)',
color='species:N'
)
chart.show()

5. Bokeh

Bokeh excels in creating interactive and web-ready plots perfect for dashboards and data applications.

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.transform import factor_cmap

source = ColumnDataSource(df)
species_list = df['species'].unique().tolist()

p = figure()
p.scatter('sepal length (cm)', 'petal length (cm)',
source=source,
legend_field="species",
color=factor_cmap('species', 'Category10_3', species_list)
)
# Legend customization
p.legend.title = 'Species'
p.legend.location = 'top_left'
show(p)

Each of these five libraries offers something unique. Matplotlib and Seaborn are excellent for static and straightforward plots, while Plotly and Bokeh shine with their interactive capabilities. Altair simplifies the process of creating complex visualizations with a concise syntax.

By exploring these libraries, you can select the most appropriate tool based on your project requirements, whether for exploratory data analysis, interactive web applications, or scientific publications.

--

--