scatterplot
📂

scatterplot

Codes

sns.scatterplot()

Notes

Scatter plot to show relationships between two numerical variables.

Status
Done

The sns.scatterplot() function in Seaborn is a fundamental tool for visualizing relationships between two numerical variables. It creates a scatter plot where each point represents an observation.

sns.scatterplot(
    data=None,
    *,
    x=None,
    y=None,
    hue=None,
    style=None,
    size=None,
    palette=None,
    hue_order=None,
    size_order=None,
    sizes=None,
    legend="auto",
    ax=None,
    **kwargs,
)

Key Parameters

Primary Parameters

  • data: The dataset (Pandas DataFrame or array-like) to visualize.
  • x: Variable plotted on the x-axis.
  • y: Variable plotted on the y-axis.

Aesthetic Mappings

  • hue: Differentiates points by color based on a categorical or numerical variable.
  • style: Differentiates points by marker style based on a categorical variable.
  • size: Differentiates points by size based on a numerical or categorical variable.
  • palette: Specifies the color palette for the hue variable.

Order and Sizes

  • hue_order: Specifies the order of categories for the hue variable.
  • size_order: Specifies the order of categories for the size variable.
  • sizes: Specifies a range for marker sizes (e.g., (min_size, max_size)).

Legend

  • legend: Controls the display of the legend ("auto""brief""full", or False).

Axes

  • ax: Matplotlib Axes object to draw the plot on.

Additional Styling (*kwargs)

  • Supports additional Matplotlib arguments like alphalinewidth, etc.

Dataset Preparation

First, let's load and inspect the Titanic dataset.

import seaborn as sns
import matplotlib.pyplot as plt

# Load Titanic dataset
titanic = sns.load_dataset('titanic')

# Display first few rows of the dataset
titanic.head()
survived
pclass
sex
age
sibsp
parch
fare
embarked
class
who
adult_male
deck
embark_town
alive
alone
0
0
3
male
22.0
1
0
7.2500
S
Third
man
True
NaN
Southampton
no
False
1
1
1
female
38.0
1
0
71.2833
C
First
woman
False
C
Cherbourg
yes
False
2
1
3
female
26.0
0
0
7.9250
S
Third
woman
False
NaN
Southampton
yes
True
3
1
1
female
35.0
1
0
53.1000
S
First
woman
False
C
Southampton
yes
False
4
0
3
male
35.0
0
0
8.0500
S
Third
man
True
NaN
Southampton
no
True

1. Basic Scatter Plot

sns.scatterplot(data=df, x='Variable1', y='Variable2')
plt.title("Basic Scatter Plot")
plt.show()

Use Case: Visualizing the relationship between two continuous variables.

sns.scatterplot(data=titanic, x='age', y='fare')
plt.title("Basic Scatter Plot")
plt.show()
image

2. Scatter Plot with hue

sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category')
plt.title("Scatter Plot with Hue")
plt.show()

Key Parameterhue differentiates points by color.

Use the hue parameter to color points based on survival status (survived).

sns.scatterplot(data=titanic, x='age', y='fare', hue='survived')
plt.title("Scatter Plot with Hue")
plt.legend(title="Survived")
plt.show()
image

3. Scatter Plot with style


sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', style='SubCategory')
plt.title("Scatter Plot with Hue and Style")
plt.show()

Key Parameterstyle adds marker shapes based on a categorical variable.

Differentiate points by the class of the ticket (class).

sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', style='sex')
plt.title("Scatter Plot with Hue and Style")
plt.legend(title="Survived/Class")
plt.show()
image

4. Scatter Plot with size

sns.scatterplot(data=df, x='Variable1', y='Variable2', size='NumericCategory', sizes=(20, 200))
plt.title("Scatter Plot with Variable Sizes")
plt.show()

Key Parametersize maps a variable to the size of points.

Map the size of points to the number of siblings/spouses aboard (sibsp).

sns.scatterplot(data=titanic, x='age', y='fare', size='sibsp', sizes=(20, 200))
plt.title("Scatter Plot with Variable Sizes")
plt.legend(title="Number of Siblings/Spouses")
plt.show()
image

5. Using palette for Custom Colors

sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', palette='coolwarm')
plt.title("Scatter Plot with Custom Palette")
plt.show()
  • Key Parameterpalette customizes the colors for the hue variable.
  • We must have hue before we can use palette.
  • Apply a custom palette to the survival status (hue).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', palette='dark')
plt.title("Scatter Plot with Custom Palette")
plt.legend(title="Survived")
plt.show()
image

6. Ordering Categories with hue_order

sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', hue_order=['C', 'B', 'A'])
plt.title("Scatter Plot with Ordered Hue")
plt.show()

Key Parameterhue_order sets the order of categories for color mapping.

Specify the order of the survival categories.

sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', hue_order=[1, 0], palette='dark')
plt.title("Scatter Plot with Ordered Hue")
plt.legend(title="Survived")
plt.show()
image

7. Combining huestyle, and size

sns.scatterplot(
    data=df,
    x='Variable1',
    y='Variable2',
    hue='Category',
    style='SubCategory',
    size='NumericCategory',
    sizes=(50, 300),
    palette='viridis'
)
plt.title("Comprehensive Scatter Plot")
plt.show()

Use Case: Highly customized visualization combining multiple aesthetics.

Use multiple parameters to encode data:

  • hue: Survival status.
  • style: Ticket class.
  • size: Number of siblings/spouses aboard.
sns.scatterplot(
    data=titanic, 
    x='age', 
    y='fare', 
    hue='survived', 
    style='class', 
    size='sibsp', 
    sizes=(30, 300), 
    palette='dark'
)
plt.title("Comprehensive Scatter Plot")
plt.legend(title="Survived/Class/SibSp")
plt.show()
image

8. Scatter Plot with Transparency

sns.scatterplot(data=df, x='Variable1', y='Variable2', alpha=0.6)
plt.title("Scatter Plot with Transparency")
plt.show()

Key Stylingalpha adjusts point transparency for overlapping points.

Add transparency to reduce overlap when points cluster.

sns.scatterplot(data=titanic, x='age', y='fare', alpha=0.3, palette='dark', hue='sex', size='parch', sizes=(20, 200))
plt.title("Scatter Plot with Transparency")
plt.show()
image

9. Scatter Plot with Custom Marker Sizes (sizes)

Control marker size range manually for better readability.

sns.scatterplot(data=titanic, x='age', y='fare', size='sibsp', sizes=(50, 800))
plt.title("Scatter Plot with Custom Marker Sizes")
plt.legend(title="SibSp")
plt.show()
image

10. Scatter Plot with a Specific Axis

Use ax to draw the scatter plot on a predefined Matplotlib axes object.

fig, ax = plt.subplots(figsize=(20, 6))

sns.scatterplot(
    data=titanic, 
    x='age', 
    y='fare', 
    hue='survived', 
    style='class', 
    size='sibsp', 
    sizes=(30, 300), 
    palette='dark'
)
plt.title("Comprehensive Scatter Plot")
plt.legend(title="Survived/Class/SibSp")
plt.show()
image

Use Cases for Data Scientists

  1. Exploratory Data Analysis (EDA):
    • Understand relationships between variables.
    • Spot trends, clusters, or outliers.
  2. Multivariate Analysis:
    • Combine huesize, and style to analyze more dimensions in the data.
  3. Feature Engineering:
    • Identify variable relationships for feature interactions.
  4. Model Validation:
    • Plot actual vs. predicted values for regression models.

Practical Notes

  1. Customizing Legends:
    • Use legend="brief" for simplified legends or legend=False to remove them.
  2. Dealing with Overlapping Points:
    • Adjust alpha for transparency.
    • Use style to add marker variety.
  3. Scaling Sizes:
    • Use the sizes parameter to control the range of marker sizes for better readability.
  4. Avoid Clutter:
    • For large datasets, subset data or combine with sns.relplot() for faceting.

References in Your Machine Learning Guide

Use this function during:

  • EDA: To explore relationships and correlations between features.
  • Visualization: To present findings with clear, concise scatter plots.
  • Model Validation: To analyze model predictions versus actual outcomes.