boxplot
📂

boxplot

Codes

sns.boxplot()

Notes

Box plot to display data distribution and outliers across categories.

Status
Done

Seaborn sns.boxplot() – Box Plot for Data Distribution & Outliers

sns.boxplot() is used to visualize the distribution of numerical data and detect outliers across different categories. It shows key statistical summaries like medians, quartiles, and extreme values.

1. General Syntax

python
CopyEdit
sns.boxplot(
    data=None,
    x=None,
    y=None,
    hue=None,
    order=None,
    hue_order=None,
    orient=None,
    width=0.8,
    dodge=True,
    fliersize=5,
    linewidth=None,
    whis=1.5,
    notch=False,
    showcaps=True,
    showbox=True,
    showfliers=True,
    showmeans=False,
    meanline=False,
    meanprops=None,
    medianprops=None,
    whiskerprops=None,
    capprops=None,
    boxprops=None
)

2. Understanding Box Plot Components

  • Median (Q2, 50th percentile) – The middle value of the dataset.
  • Interquartile Range (IQR) – The range between the 25th percentile (Q1) and 75th percentile (Q3).
  • Whiskers – Extend up to 1.5 × IQR from Q1 and Q3 by default.
  • Outliers – Points beyond the whiskers.
  • Notch – Represents a confidence interval around the median.

3. Dataset Setup

We will use the diamonds dataset from Seaborn.

python
CopyEdit
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset
data = sns.load_dataset("diamonds")
data.head()

Dataset Columns

  • cut: Quality of the cut (Fair, Good, Very Good, Premium, Ideal).
  • color: Diamond color.
  • clarity: Clarity levels (I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF).
  • carat: Carat weight of the diamond.
  • price: Price in US dollars.
carat
cut
color
clarity
depth
table
price
0.23
Ideal
E
SI2
61.5
55
326
0.21
Premium
E
SI1
59.8
61
326
0.23
Good
E
VS1
56.9
65
327
0.29
Premium
I
VS2
62.4
58
334
0.31
Good
J
SI2
63.3
58
335

4. Basic Box Plot

Plot price distribution by diamond cut

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price")
plt.title("Box Plot of Price by Cut")
plt.show()

Explanation

  • The box represents Q1, Q2 (median), and Q3.
  • The whiskers show minimum and maximum values (excluding outliers).
  • Outliers are shown as individual points beyond the whiskers.

5. Customizing Whisker Length (whis)

Extend whiskers to cover 95% of data

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", whis=2)
plt.title("Box Plot with Adjusted Whiskers (whis=2)")
plt.show()

Explanation

  • The default whis=1.5 captures most values without extreme outliers.
  • Increasing whis=2 includes more data within the whiskers.

6. Removing Outliers (fliersize=0)

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", fliersize=0)
plt.title("Box Plot Without Outliers")
plt.show()

Explanation

  • fliersize=0 hides the outliers for a cleaner plot.

7. Grouping by Hue (hue)

Compare price distribution for different diamond colors

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", hue="color")
plt.title("Box Plot of Price by Cut and Color")
plt.show()

Explanation

  • Uses different colors to compare multiple categories.

8. Using Notches for Confidence Intervals (notch=True)

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", notch=True)
plt.title("Box Plot with Notches")
plt.show()

Explanation

  • Notches represent a confidence interval around the median.

9. Controlling Box Width (width)

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", width=0.5)
plt.title("Narrow Box Plot")
plt.show()

Explanation

  • width adjusts the box width (default is 0.8).

10. Styling Box Plot Elements

Customizing median, box, and whiskers

python
CopyEdit
sns.boxplot(
    data=data,
    x="cut",
    y="price",
    medianprops={"color": "red", "linewidth": 2},
    boxprops={"facecolor": "lightblue", "edgecolor": "black"},
    whiskerprops={"linewidth": 2, "linestyle": "--"},
    capprops={"color": "blue", "linewidth": 2}
)
plt.title("Styled Box Plot")
plt.show()

Explanation

  • medianprops: Changes median line color and thickness.
  • boxprops: Colors the box.
  • whiskerprops: Customizes whiskers.
  • capprops: Changes cap lines.

11. Multiple Box Plots Using hue

Comparing price by cut and clarity

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", hue="clarity")
plt.title("Box Plot of Price by Cut and Clarity")
plt.show()

Explanation

  • Groups clarity within each cut category for deeper insights.

12. Removing Box Borders

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price", showbox=False)
plt.title("Box Plot Without Box Borders")
plt.show()

Explanation

  • showbox=False hides the box while keeping whiskers.

13. Adjusting Figure Size

python
CopyEdit
plt.figure(figsize=(10,6))
sns.boxplot(data=data, x="cut", y="price")
plt.title("Larger Box Plot")
plt.show()

Explanation

  • Increases readability for larger datasets.

14. Rotating X-axis Labels

python
CopyEdit
sns.boxplot(data=data, x="cut", y="price")
plt.xticks(rotation=45)
plt.title("Box Plot with Rotated Labels")
plt.show()

Explanation

  • plt.xticks(rotation=45) prevents label overlap.

15. Final Example: Fully Customized Box Plot

python
CopyEdit
plt.figure(figsize=(12, 6))
sns.boxplot(
    data=data,
    x="cut",
    y="price",
    hue="color",
    notch=True,
    width=0.7,
    medianprops={"color": "black", "linewidth": 2},
    boxprops={"facecolor": "lightgray", "edgecolor": "black"},
    whiskerprops={"linewidth": 2, "linestyle": "--"},
    capprops={"color": "red", "linewidth": 2},
    fliersize=4
)
plt.xticks(rotation=45)
plt.title("Customized Box Plot of Diamond Price by Cut and Color")
plt.show()

Features in This Plot

✔ Hue by color

✔ Notched box plot

✔ Custom box, whiskers, and median line styling

✔ Wider width and rotated x-axis labels

✔ Larger figure size for better visibility

Conclusion

✅ sns.boxplot() is one of the best tools to visualize data distribution, identify outliers, and compare multiple categories.

✅ It provides statistical insights like medians, quartiles, and spread.

✅ It’s highly customizable, allowing adjustments to whiskers, notches, outliers, colors, and transparency.

Mastering box plots will enhance your ability to analyze and interpret numerical data efficiently! 🚀