sns.scatterplot()
Scatter plot to show relationships between two numerical variables.
The sns.scatterplot()
function in Seaborn is a fundamental tool for visualizing relationships between two numerical variables. It creates a scatter plot where each point represents an observation.
sns.scatterplot(
data=None,
*,
x=None,
y=None,
hue=None,
style=None,
size=None,
palette=None,
hue_order=None,
size_order=None,
sizes=None,
legend="auto",
ax=None,
**kwargs,
)
Key Parameters
Primary Parameters
data
: The dataset (Pandas DataFrame or array-like) to visualize.x
: Variable plotted on the x-axis.y
: Variable plotted on the y-axis.
Aesthetic Mappings
hue
: Differentiates points by color based on a categorical or numerical variable.style
: Differentiates points by marker style based on a categorical variable.size
: Differentiates points by size based on a numerical or categorical variable.palette
: Specifies the color palette for thehue
variable.
Order and Sizes
hue_order
: Specifies the order of categories for thehue
variable.size_order
: Specifies the order of categories for thesize
variable.sizes
: Specifies a range for marker sizes (e.g.,(min_size, max_size)
).
Legend
legend
: Controls the display of the legend ("auto"
,"brief"
,"full"
, orFalse
).
Axes
ax
: Matplotlib Axes object to draw the plot on.
Additional Styling (*kwargs
)
- Supports additional Matplotlib arguments like
alpha
,linewidth
, etc.
Dataset Preparation
First, let's load and inspect the Titanic dataset.
import seaborn as sns
import matplotlib.pyplot as plt
# Load Titanic dataset
titanic = sns.load_dataset('titanic')
# Display first few rows of the dataset
titanic.head()
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
1. Basic Scatter Plot
sns.scatterplot(data=df, x='Variable1', y='Variable2')
plt.title("Basic Scatter Plot")
plt.show()
Use Case: Visualizing the relationship between two continuous variables.
sns.scatterplot(data=titanic, x='age', y='fare')
plt.title("Basic Scatter Plot")
plt.show()

2. Scatter Plot with hue
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category')
plt.title("Scatter Plot with Hue")
plt.show()
Key Parameter: hue
differentiates points by color.
Use the hue
parameter to color points based on survival status (survived
).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived')
plt.title("Scatter Plot with Hue")
plt.legend(title="Survived")
plt.show()

3. Scatter Plot with style
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', style='SubCategory')
plt.title("Scatter Plot with Hue and Style")
plt.show()
Key Parameter: style
adds marker shapes based on a categorical variable.
Differentiate points by the class of the ticket (class
).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', style='sex')
plt.title("Scatter Plot with Hue and Style")
plt.legend(title="Survived/Class")
plt.show()

4. Scatter Plot with size
sns.scatterplot(data=df, x='Variable1', y='Variable2', size='NumericCategory', sizes=(20, 200))
plt.title("Scatter Plot with Variable Sizes")
plt.show()
Key Parameter: size
maps a variable to the size of points.
Map the size of points to the number of siblings/spouses aboard (sibsp
).
sns.scatterplot(data=titanic, x='age', y='fare', size='sibsp', sizes=(20, 200))
plt.title("Scatter Plot with Variable Sizes")
plt.legend(title="Number of Siblings/Spouses")
plt.show()

5. Using palette
for Custom Colors
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', palette='coolwarm')
plt.title("Scatter Plot with Custom Palette")
plt.show()
- Key Parameter:
palette
customizes the colors for thehue
variable. - We must have hue before we can use palette.
- Apply a custom palette to the survival status (
hue
).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', palette='dark')
plt.title("Scatter Plot with Custom Palette")
plt.legend(title="Survived")
plt.show()

6. Ordering Categories with hue_order
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', hue_order=['C', 'B', 'A'])
plt.title("Scatter Plot with Ordered Hue")
plt.show()
Key Parameter: hue_order
sets the order of categories for color mapping.
Specify the order of the survival categories.
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', hue_order=[1, 0], palette='dark')
plt.title("Scatter Plot with Ordered Hue")
plt.legend(title="Survived")
plt.show()

7. Combining hue
, style
, and size
sns.scatterplot(
data=df,
x='Variable1',
y='Variable2',
hue='Category',
style='SubCategory',
size='NumericCategory',
sizes=(50, 300),
palette='viridis'
)
plt.title("Comprehensive Scatter Plot")
plt.show()
Use Case: Highly customized visualization combining multiple aesthetics.
Use multiple parameters to encode data:
hue
: Survival status.style
: Ticket class.size
: Number of siblings/spouses aboard.
sns.scatterplot(
data=titanic,
x='age',
y='fare',
hue='survived',
style='class',
size='sibsp',
sizes=(30, 300),
palette='dark'
)
plt.title("Comprehensive Scatter Plot")
plt.legend(title="Survived/Class/SibSp")
plt.show()

8. Scatter Plot with Transparency
sns.scatterplot(data=df, x='Variable1', y='Variable2', alpha=0.6)
plt.title("Scatter Plot with Transparency")
plt.show()
Key Styling: alpha
adjusts point transparency for overlapping points.
Add transparency to reduce overlap when points cluster.
sns.scatterplot(data=titanic, x='age', y='fare', alpha=0.3, palette='dark', hue='sex', size='parch', sizes=(20, 200))
plt.title("Scatter Plot with Transparency")
plt.show()

9. Scatter Plot with Custom Marker Sizes (sizes
)
Control marker size range manually for better readability.
sns.scatterplot(data=titanic, x='age', y='fare', size='sibsp', sizes=(50, 800))
plt.title("Scatter Plot with Custom Marker Sizes")
plt.legend(title="SibSp")
plt.show()

10. Scatter Plot with a Specific Axis
Use ax
to draw the scatter plot on a predefined Matplotlib axes object.
fig, ax = plt.subplots(figsize=(20, 6))
sns.scatterplot(
data=titanic,
x='age',
y='fare',
hue='survived',
style='class',
size='sibsp',
sizes=(30, 300),
palette='dark'
)
plt.title("Comprehensive Scatter Plot")
plt.legend(title="Survived/Class/SibSp")
plt.show()

Use Cases for Data Scientists
- Exploratory Data Analysis (EDA):
- Understand relationships between variables.
- Spot trends, clusters, or outliers.
- Multivariate Analysis:
- Combine
hue
,size
, andstyle
to analyze more dimensions in the data. - Feature Engineering:
- Identify variable relationships for feature interactions.
- Model Validation:
- Plot actual vs. predicted values for regression models.
Practical Notes
- Customizing Legends:
- Use
legend="brief"
for simplified legends orlegend=False
to remove them. - Dealing with Overlapping Points:
- Adjust
alpha
for transparency. - Use
style
to add marker variety. - Scaling Sizes:
- Use the
sizes
parameter to control the range of marker sizes for better readability. - Avoid Clutter:
- For large datasets, subset data or combine with
sns.relplot()
for faceting.
References in Your Machine Learning Guide
Use this function during:
- EDA: To explore relationships and correlations between features.
- Visualization: To present findings with clear, concise scatter plots.
- Model Validation: To analyze model predictions versus actual outcomes.