sns.boxplot()
Box plot to display data distribution and outliers across categories.
Seaborn sns.boxplot()
– Box Plot for Data Distribution & Outliers
sns.boxplot()
is used to visualize the distribution of numerical data and detect outliers across different categories. It shows key statistical summaries like medians, quartiles, and extreme values.
1. General Syntax
python
CopyEdit
sns.boxplot(
data=None,
x=None,
y=None,
hue=None,
order=None,
hue_order=None,
orient=None,
width=0.8,
dodge=True,
fliersize=5,
linewidth=None,
whis=1.5,
notch=False,
showcaps=True,
showbox=True,
showfliers=True,
showmeans=False,
meanline=False,
meanprops=None,
medianprops=None,
whiskerprops=None,
capprops=None,
boxprops=None
)
2. Understanding Box Plot Components
- Median (Q2, 50th percentile) – The middle value of the dataset.
- Interquartile Range (IQR) – The range between the 25th percentile (Q1) and 75th percentile (Q3).
- Whiskers – Extend up to 1.5 × IQR from Q1 and Q3 by default.
- Outliers – Points beyond the whiskers.
- Notch – Represents a confidence interval around the median.
3. Dataset Setup
We will use the diamonds
dataset from Seaborn.
python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset
data = sns.load_dataset("diamonds")
data.head()
Dataset Columns
- cut: Quality of the cut (Fair, Good, Very Good, Premium, Ideal).
- color: Diamond color.
- clarity: Clarity levels (I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF).
- carat: Carat weight of the diamond.
- price: Price in US dollars.
carat | cut | color | clarity | depth | table | price |
0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 |
0.21 | Premium | E | SI1 | 59.8 | 61 | 326 |
0.23 | Good | E | VS1 | 56.9 | 65 | 327 |
0.29 | Premium | I | VS2 | 62.4 | 58 | 334 |
0.31 | Good | J | SI2 | 63.3 | 58 | 335 |
4. Basic Box Plot
Plot price distribution by diamond cut
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price")
plt.title("Box Plot of Price by Cut")
plt.show()
Explanation
- The box represents Q1, Q2 (median), and Q3.
- The whiskers show minimum and maximum values (excluding outliers).
- Outliers are shown as individual points beyond the whiskers.
5. Customizing Whisker Length (whis
)
Extend whiskers to cover 95% of data
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", whis=2)
plt.title("Box Plot with Adjusted Whiskers (whis=2)")
plt.show()
Explanation
- The default
whis=1.5
captures most values without extreme outliers. - Increasing
whis=2
includes more data within the whiskers.
6. Removing Outliers (fliersize=0
)
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", fliersize=0)
plt.title("Box Plot Without Outliers")
plt.show()
Explanation
fliersize=0
hides the outliers for a cleaner plot.
7. Grouping by Hue (hue
)
Compare price
distribution for different diamond colors
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", hue="color")
plt.title("Box Plot of Price by Cut and Color")
plt.show()
Explanation
- Uses different colors to compare multiple categories.
8. Using Notches for Confidence Intervals (notch=True
)
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", notch=True)
plt.title("Box Plot with Notches")
plt.show()
Explanation
- Notches represent a confidence interval around the median.
9. Controlling Box Width (width
)
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", width=0.5)
plt.title("Narrow Box Plot")
plt.show()
Explanation
width
adjusts the box width (default is0.8
).
10. Styling Box Plot Elements
Customizing median, box, and whiskers
python
CopyEdit
sns.boxplot(
data=data,
x="cut",
y="price",
medianprops={"color": "red", "linewidth": 2},
boxprops={"facecolor": "lightblue", "edgecolor": "black"},
whiskerprops={"linewidth": 2, "linestyle": "--"},
capprops={"color": "blue", "linewidth": 2}
)
plt.title("Styled Box Plot")
plt.show()
Explanation
medianprops
: Changes median line color and thickness.boxprops
: Colors the box.whiskerprops
: Customizes whiskers.capprops
: Changes cap lines.
11. Multiple Box Plots Using hue
Comparing price
by cut
and clarity
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", hue="clarity")
plt.title("Box Plot of Price by Cut and Clarity")
plt.show()
Explanation
- Groups
clarity
within eachcut
category for deeper insights.
12. Removing Box Borders
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", showbox=False)
plt.title("Box Plot Without Box Borders")
plt.show()
Explanation
showbox=False
hides the box while keeping whiskers.
13. Adjusting Figure Size
python
CopyEdit
plt.figure(figsize=(10,6))
sns.boxplot(data=data, x="cut", y="price")
plt.title("Larger Box Plot")
plt.show()
Explanation
- Increases readability for larger datasets.
14. Rotating X-axis Labels
python
CopyEdit
sns.boxplot(data=data, x="cut", y="price")
plt.xticks(rotation=45)
plt.title("Box Plot with Rotated Labels")
plt.show()
Explanation
plt.xticks(rotation=45)
prevents label overlap.
15. Final Example: Fully Customized Box Plot
python
CopyEdit
plt.figure(figsize=(12, 6))
sns.boxplot(
data=data,
x="cut",
y="price",
hue="color",
notch=True,
width=0.7,
medianprops={"color": "black", "linewidth": 2},
boxprops={"facecolor": "lightgray", "edgecolor": "black"},
whiskerprops={"linewidth": 2, "linestyle": "--"},
capprops={"color": "red", "linewidth": 2},
fliersize=4
)
plt.xticks(rotation=45)
plt.title("Customized Box Plot of Diamond Price by Cut and Color")
plt.show()
Features in This Plot
✔ Hue by color
✔ Notched box plot
✔ Custom box, whiskers, and median line styling
✔ Wider width and rotated x-axis labels
✔ Larger figure size for better visibility
Conclusion
✅ sns.boxplot()
is one of the best tools to visualize data distribution, identify outliers, and compare multiple categories.
✅ It provides statistical insights like medians, quartiles, and spread.
✅ It’s highly customizable, allowing adjustments to whiskers, notches, outliers, colors, and transparency.
Mastering box plots will enhance your ability to analyze and interpret numerical data efficiently! 🚀