A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. It models decisions based on a tree-like structure where each internal node represents a feature (attribute), each branch represents a decision rule, and each leaf node represents an outcome (class label or regression value).
The algorithm involves the following steps:
# Import libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load dataset
dataset = load_iris()
X, y = dataset.data, dataset.target
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Create and train the model
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
# Make predictions
y_predict = classifier.predict(X_test)
# Evaluate accuracy
print("Accuracy:", accuracy_score(y_test, y_predict))
# Visualize the tree
from sklearn import tree
tree.plot_tree(classifier)
Accuracy: 0.93
# Scatter plot for actual vs predicted values
import matplotlib.pyplot as plt
plt.scatter(y_test, y_predict, alpha=0.7, color='blue')
plt.title('Actual vs Predicted Values')
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.grid(True)
plt.axis('equal')
plt.show()
Advantages | Disadvantages |
---|---|
Interpretability: Easy to understand and visualize. | Overfitting: Can become complex and overfit if not pruned. |
Non-linear Relationships: Models non-linear relationships effectively. | Instability: Small changes in data can result in different splits. |
Feature Importance: Provides insights into the importance of features. | Bias: May favor features with more levels, leading to suboptimal splits. |