FUNCTION
sklearn.tree: 
Multi-select
Status
Not started
URL
- Decision tree based models for classification and regression.
 - Exporting
 - Plotting
 
🔍 Study Note: sklearn.tree
sklearn.tree
Decision tree based models for classification and regression. User guide. See the Decision Trees section for further details. Exporting: Plotting:
scikit-learn.org

The sklearn.tree module in Scikit-learn is primarily used for decision tree algorithms and associated utilities. Below is a detailed list of the available classes, functions, and attributes under sklearn.tree:
The sklearn.tree contains three major model 
1. Decision Tree Models  | DecisionTreeClassifier | A decision tree classifier.  | 
DecisionTreeRegressor | A decision tree regressor.  | |
2. Extra Tree Models  | ExtraTreeClassifier | An extremely randomized tree classifier.  | 
ExtraTreeRegressor | An extremely randomized tree regressor.  | 
Classes
‣
1. Decision Tree Models
‣
2. Extra Tree Models
‣
Functions
‣
Attributes
‣
Modules for Integration
‣
Quick Summary
Explanation of Key Parameters
export_graphvizParameters:clf: The trained decision tree model.out_file=None: Returns the DOT data as a string instead of saving it to a file.feature_names: Names of the features used in training the model.class_names: Names of the classes for classification.filled=True: Adds colors to the nodes for better readability.rounded=True: Makes the edges and corners of the nodes rounded.special_characters=True: Supports special characters (e.g., for feature names).- Graphviz Object:
 graphviz.Source: Converts DOT data into a graph object that can be displayed or saved.
Visualization Output
- Color Nodes: Nodes are color-coded to represent purity (e.g., Gini or entropy values).
 - Feature Names: Each split is annotated with the corresponding feature.
 - Class Names: Leaf nodes show the predicted class.
 
Notes
- Graphviz Installation:
 - You must have Graphviz installed on your system for 
graphviz.Sourceto work. - Install via package manager:
 - Python Library Installation:
 - Install the Python wrapper for Graphviz:
 - Alternative:
 - Use 
tree.plot_tree()for a Matplotlib-based visualization: 
Copy code
# On Ubuntu/Debian:
sudo apt-get install graphviz
# On macOS:
brew install graphviz
bash
Copy code
pip install graphviz
from sklearn.tree import plot_tree
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()
# Step 1: Import Necessary Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn import tree
import graphviz
# Step 2: Load and Prepare Dataset
# Assume `df` is your dataframe and 'status' is the target variable.
# Modify the dataset loading as per your specific project.
# Example: df = pd.read_csv('your_dataset.csv')
# Split the dataset into train and test datasets
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=11)
# Further split the train dataset into train and validation datasets
df_train, df_val = train_test_split(df_full_train, test_size=0.25, random_state=11)
# Reset indices
df_train = df_train.reset_index(drop=True)
df_val = df_val.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)
# Extract target variables
y_train = (df_train.status == 'default').astype('int').values
y_val = (df_val.status == 'default').astype('int').values
y_test = (df_test.status == 'default').astype('int').values
# Drop the target variable from features
df_train = df_train.drop(columns='status')
df_val = df_val.drop(columns='status')
df_test = df_test.drop(columns='status')
# Convert the datasets to dictionaries
dict_train = df_train.to_dict(orient='records')
dict_val = df_val.to_dict(orient='records')
# Step 3: Vectorize Features
dv = DictVectorizer(sparse=False)
# Fit on training data and transform both train and validation datasets
X_train = dv.fit_transform(dict_train)
X_val = dv.transform(dict_val)
# Extract feature names
feature_names = dv.get_feature_names_out()
# Step 4: Train the Decision Tree Model
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Step 5: Evaluate Model Performance
# Compute ROC AUC Score
y_val_pred = clf.predict_proba(X_val)[:, 1]  # Predict probabilities for positive class
roc_score = roc_auc_score(y_val, y_val_pred)
print(f"Validation ROC AUC Score: {roc_score:.2f}")
# Step 6: Export the Decision Tree for Visualization
dot_data = tree.export_graphviz(
    clf,
    out_file=None,                      # No file output, return as string
    feature_names=feature_names,       # Use DictVectorizer feature names
    class_names=["not default", "default"],  # Modify according to your project
    filled=True,                       # Color nodes
    rounded=True,                      # Rounded edges
    special_characters=True            # Allow special characters
)
# Visualize the Decision Tree
graph = graphviz.Source(dot_data)
graph