How to generate scatter plots with matplotlib.pyplot.scatter in Python

How to generate scatter plots with matplotlib.pyplot.scatter in Python

Before you dive into plotting scatter plots with matplotlib, your data needs to be in a shape that matplotlib can easily digest. At its core, matplotlib expects sequences of x and y values – typically lists, arrays, or pandas Series. If your data is coming from a CSV, a database, or some other source, the first step is cleaning and organizing it into these sequences.

One common pitfall is mixing data types. If your x-values are timestamps and y-values are floats, make sure your timestamps are converted to a numerical format matplotlib can plot, like Python datetime objects or ordinal numbers.

Here’s a quick example of prepping data using pandas:

import pandas as pd
data = pd.read_csv('data.csv')

# Suppose 'time' column is in string format, convert it to datetime
data['time'] = pd.to_datetime(data['time'])

# Extract x and y for plotting
x = data['time']
y = data['value']

Now, if you’re working with large datasets, it’s wise to pre-aggregate or filter the data before plotting. Plotting millions of points will slow down your program and clutter your graph. Use pandas groupby or filtering to slim down the dataset:

# Aggregate by day or hour, for example
daily_data = data.groupby(data['time'].dt.date).mean()

x = daily_data.index
y = daily_data['value']

Another thing to keep in mind is missing or NaN values. Matplotlib will choke or produce gaps if you don’t handle them beforehand. Clean your data with:

# Drop rows with missing values
data = data.dropna(subset=['time', 'value'])

Finally, if you want to differentiate points by categories, prepare a separate color or marker array. Say you have a ‘category’ column with labels, you might want to map these to colors before plotting:

import numpy as np

categories = data['category'].unique()
color_map = {cat: i for i, cat in enumerate(categories)}
colors = data['category'].map(color_map)

With your data prepped like this, you’re in a solid place to start plotting. It’s not glamorous, but trust me, getting this right saves hours of debugging later.

Customizing scatter plots for maximum clarity and style

Customizing scatter plots is where matplotlib really shines. The default plot is functional but bland, and a few tweaks can transform your chart from “meh” to “wow.”

Start with the marker parameter to change the shape of each point. Common markers include circles ('o'), squares ('s'), triangles ('^'), and more. This especially important when you want to represent different groups distinctly without relying solely on color.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [5, 7, 4, 6, 8]

plt.scatter(x, y, marker='^')  # triangle markers
plt.show()

Color is another powerful tool. Use the c parameter to specify colors for each point. This can be a single color string or an array of values mapped to a colormap. When dealing with continuous data, colormaps like viridis or plasma are great choices.

import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)
values = np.random.rand(50)  # continuous variable to map color

plt.scatter(x, y, c=values, cmap='viridis')
plt.colorbar()  # adds a legend for the colors
plt.show()

Adjusting the size of points with the s parameter can add clarity, especially when encoding a third dimension of data. The sizes correspond to area, so be mindful—doubling s quadruples the marker area.

sizes = np.random.rand(50) * 100  # scale sizes between 0 and 100

plt.scatter(x, y, s=sizes, alpha=0.6)
plt.show()

Transparency, controlled by the alpha parameter, helps when points overlap heavily. Setting alpha between 0.3 and 0.7 usually strikes a balance between visibility and clutter.

If you want to combine multiple aesthetics—color, size, and shape—here’s a quick example that uses all three:

categories = np.random.choice(['A', 'B', 'C'], size=50)
color_map = {'A': 'red', 'B': 'green', 'C': 'blue'}
marker_map = {'A': 'o', 'B': 's', 'C': '^'}

for cat in np.unique(categories):
    idx = categories == cat
    plt.scatter(
        x[idx], y[idx],
        c=color_map[cat],
        s=sizes[idx],
        marker=marker_map[cat],
        alpha=0.7,
        label=f'Category {cat}'
    )

plt.legend()
plt.show()

Don’t say goodbye to axes labels, titles, and gridlines—they’re not just decoration. Use plt.xlabel(), plt.ylabel(), and plt.title() to add context. Gridlines can be toggled with plt.grid(True) to improve readability.

For example:

plt.scatter(x, y, c=values, cmap='plasma', s=50, alpha=0.8)
plt.xlabel('X Axis Label')
plt.ylabel('Y Axis Label')
plt.title('Customized Scatter Plot')
plt.grid(True)
plt.colorbar()
plt.show()

Finally, if you’re combining scatter plots with other plot types or subplots, use the object-oriented interface (fig, ax = plt.subplots()) for more control:

fig, ax = plt.subplots()
scatter = ax.scatter(x, y, c=values, cmap='coolwarm', s=60, alpha=0.6)
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_title('Scatter with Object-Oriented API')
fig.colorbar(scatter, ax=ax)
plt.show()

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *