sns.scatterplot()
Scatter plot to show relationships between two numerical variables.
The sns.scatterplot() function in Seaborn is a fundamental tool for visualizing relationships between two numerical variables. It creates a scatter plot where each point represents an observation.
sns.scatterplot(
data=None,
*,
x=None,
y=None,
hue=None,
style=None,
size=None,
palette=None,
hue_order=None,
size_order=None,
sizes=None,
legend="auto",
ax=None,
**kwargs,
)
Key Parameters
Primary Parameters
data: The dataset (Pandas DataFrame or array-like) to visualize.x: Variable plotted on the x-axis.y: Variable plotted on the y-axis.
Aesthetic Mappings
hue: Differentiates points by color based on a categorical or numerical variable.style: Differentiates points by marker style based on a categorical variable.size: Differentiates points by size based on a numerical or categorical variable.palette: Specifies the color palette for thehuevariable.
Order and Sizes
hue_order: Specifies the order of categories for thehuevariable.size_order: Specifies the order of categories for thesizevariable.sizes: Specifies a range for marker sizes (e.g.,(min_size, max_size)).
Legend
legend: Controls the display of the legend ("auto","brief","full", orFalse).
Axes
ax: Matplotlib Axes object to draw the plot on.
Additional Styling (*kwargs)
- Supports additional Matplotlib arguments like
alpha,linewidth, etc.
Dataset Preparation
First, let's load and inspect the Titanic dataset.
import seaborn as sns
import matplotlib.pyplot as plt
# Load Titanic dataset
titanic = sns.load_dataset('titanic')
# Display first few rows of the dataset
titanic.head()survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
1. Basic Scatter Plot
sns.scatterplot(data=df, x='Variable1', y='Variable2')
plt.title("Basic Scatter Plot")
plt.show()Use Case: Visualizing the relationship between two continuous variables.
sns.scatterplot(data=titanic, x='age', y='fare')
plt.title("Basic Scatter Plot")
plt.show()2. Scatter Plot with hue
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category')
plt.title("Scatter Plot with Hue")
plt.show()Key Parameter: hue differentiates points by color.
Use the hue parameter to color points based on survival status (survived).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived')
plt.title("Scatter Plot with Hue")
plt.legend(title="Survived")
plt.show()
3. Scatter Plot with style
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', style='SubCategory')
plt.title("Scatter Plot with Hue and Style")
plt.show()Key Parameter: style adds marker shapes based on a categorical variable.
Differentiate points by the class of the ticket (class).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', style='sex')
plt.title("Scatter Plot with Hue and Style")
plt.legend(title="Survived/Class")
plt.show()
4. Scatter Plot with size
sns.scatterplot(data=df, x='Variable1', y='Variable2', size='NumericCategory', sizes=(20, 200))
plt.title("Scatter Plot with Variable Sizes")
plt.show()
Key Parameter: size maps a variable to the size of points.
Map the size of points to the number of siblings/spouses aboard (sibsp).
sns.scatterplot(data=titanic, x='age', y='fare', size='sibsp', sizes=(20, 200))
plt.title("Scatter Plot with Variable Sizes")
plt.legend(title="Number of Siblings/Spouses")
plt.show()5. Using palette for Custom Colors
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', palette='coolwarm')
plt.title("Scatter Plot with Custom Palette")
plt.show()- Key Parameter:
palettecustomizes the colors for thehuevariable. - We must have hue before we can use palette.
- Apply a custom palette to the survival status (
hue).
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', palette='dark')
plt.title("Scatter Plot with Custom Palette")
plt.legend(title="Survived")
plt.show()6. Ordering Categories with hue_order
sns.scatterplot(data=df, x='Variable1', y='Variable2', hue='Category', hue_order=['C', 'B', 'A'])
plt.title("Scatter Plot with Ordered Hue")
plt.show()
Key Parameter: hue_order sets the order of categories for color mapping.
Specify the order of the survival categories.
sns.scatterplot(data=titanic, x='age', y='fare', hue='survived', hue_order=[1, 0], palette='dark')
plt.title("Scatter Plot with Ordered Hue")
plt.legend(title="Survived")
plt.show()
7. Combining hue, style, and size
sns.scatterplot(
data=df,
x='Variable1',
y='Variable2',
hue='Category',
style='SubCategory',
size='NumericCategory',
sizes=(50, 300),
palette='viridis'
)
plt.title("Comprehensive Scatter Plot")
plt.show()
Use Case: Highly customized visualization combining multiple aesthetics.
Use multiple parameters to encode data:
hue: Survival status.style: Ticket class.size: Number of siblings/spouses aboard.
sns.scatterplot(
data=titanic,
x='age',
y='fare',
hue='survived',
style='class',
size='sibsp',
sizes=(30, 300),
palette='dark'
)
plt.title("Comprehensive Scatter Plot")
plt.legend(title="Survived/Class/SibSp")
plt.show()8. Scatter Plot with Transparency
sns.scatterplot(data=df, x='Variable1', y='Variable2', alpha=0.6)
plt.title("Scatter Plot with Transparency")
plt.show()
Key Styling: alpha adjusts point transparency for overlapping points.
Add transparency to reduce overlap when points cluster.
sns.scatterplot(data=titanic, x='age', y='fare', alpha=0.3, palette='dark', hue='sex', size='parch', sizes=(20, 200))
plt.title("Scatter Plot with Transparency")
plt.show()9. Scatter Plot with Custom Marker Sizes (sizes)
Control marker size range manually for better readability.
sns.scatterplot(data=titanic, x='age', y='fare', size='sibsp', sizes=(50, 800))
plt.title("Scatter Plot with Custom Marker Sizes")
plt.legend(title="SibSp")
plt.show()
10. Scatter Plot with a Specific Axis
Use ax to draw the scatter plot on a predefined Matplotlib axes object.
fig, ax = plt.subplots(figsize=(20, 6))
sns.scatterplot(
data=titanic,
x='age',
y='fare',
hue='survived',
style='class',
size='sibsp',
sizes=(30, 300),
palette='dark'
)
plt.title("Comprehensive Scatter Plot")
plt.legend(title="Survived/Class/SibSp")
plt.show()Use Cases for Data Scientists
- Exploratory Data Analysis (EDA):
- Understand relationships between variables.
- Spot trends, clusters, or outliers.
- Multivariate Analysis:
- Combine
hue,size, andstyleto analyze more dimensions in the data. - Feature Engineering:
- Identify variable relationships for feature interactions.
- Model Validation:
- Plot actual vs. predicted values for regression models.
Practical Notes
- Customizing Legends:
- Use
legend="brief"for simplified legends orlegend=Falseto remove them. - Dealing with Overlapping Points:
- Adjust
alphafor transparency. - Use
styleto add marker variety. - Scaling Sizes:
- Use the
sizesparameter to control the range of marker sizes for better readability. - Avoid Clutter:
- For large datasets, subset data or combine with
sns.relplot()for faceting.
References in Your Machine Learning Guide
Use this function during:
- EDA: To explore relationships and correlations between features.
- Visualization: To present findings with clear, concise scatter plots.
- Model Validation: To analyze model predictions versus actual outcomes.