Use case: Regression tasks. Measures how far the predicted values ( \(\hat{y_i}\) ) are from the actual values ( \(y_i\) ) by averaging the square of the differences. The larger the difference, the higher the penalty, as squaring the difference emphasizes bigger errors.
Cross-Entropy (Classification)
\[H(p,q) = -\sum_{x}p(x)\log q(x)\]
Use case: Classification tasks. Compares the true probability distribution ( \(p(x)\) ) (often 0 or 1 for classification) with the predicted probability ( \(q(x)\) ). If the predicted probability is far from the true label, it gives a higher penalty. Logarithms are used to give more emphasis to confident but wrong predictions.
import numpy as np# True labels (one-hot encoded) for 3 classesp = np.array([ [1, 0, 0], # Class 1 [0, 1, 0], # Class 2 [0, 0, 1], # Class 3 [1, 0, 0], # Class 1 [0, 1, 0], # Class 2])# Predicted probabilities for each class (sum of each row should be 1)q = np.array([ [0.8, 0.1, 0.1], # Predicted probabilities for Class 1, 2, 3 [0.2, 0.7, 0.1], # Predicted probabilities for Class 1, 2, 3 [0.1, 0.2, 0.7], # Predicted probabilities for Class 1, 2, 3 [0.6, 0.3, 0.1], # Predicted probabilities for Class 1, 2, 3 [0.2, 0.6, 0.2], # Predicted probabilities for Class 1, 2, 3])# Clip values to avoid log(0)q = np.clip(q, 1e-12, 1-1e-12)# Calculate cross-entropy for multiclass classificationcross_entropy =-np.sum(p * np.log(q))
\[H(p,q) = -\sum_{x}p(x)\log q(x)\]
Model Evaluation and Generalization
Train-Test Split
Split data into training and test sets
Train on training data
Evaluate on test data
Helps assess generalization
flowchart LR
A[("ποΈ Dataset")]
B["π Training Set\n80%"]
C["π§ͺ Test Set\n20%"]
D["π€ Train the Model<br>Random Forest, XGBoost"]
E["π Evaluate on Test Set<br>Accuracy, MSE, Precision"]
F["π― Assess Generalization<br>Performance on unseen data"]
A --> B
A --> C
B --> D
D --> E
C --> E
E --> F
classDef cool fill:#8EC5FC,stroke:#4A6FA5,stroke-width:2px,rx:10,ry:10;
classDef warm fill:#FBC2EB,stroke:#A66E98,stroke-width:2px,rx:10,ry:10;
classDef neutral fill:#E0EAFC,stroke:#8B9FBF,stroke-width:2px,rx:10,ry:10;
class A,D cool;
class B,C warm;
class E,F neutral;
Cross-Validation
K-fold cross-validation
Helps in robust model evaluation
Reduces overfitting
# Example code (not executed)from sklearn.model_selection import cross_val_scorescores = cross_val_score(model, X, y, cv=5)print(f"Cross-validation scores: {scores}")print(f"Mean score: {scores.mean():.2f}")
Cross-Validation
Bias-Variance Tradeoff
Bias: Error from overly simplistic models (underfitting)
Variance: Error from models too sensitive to training data (overfitting)
Total Error = BiasΒ² + Variance + Irreducible Error
Goal: Find the optimal balance between bias and variance for the lowest possible total error.
This version adds more explanation to bias and variance, and highlights the purpose of the tradeoff.
Bias-Variance Tradeoff
Mathematical Representation of Bias-Variance
For a given point \(x\), the expected prediction error is:
Where: - \(\text{Var}(\hat{f}(x))\) is the variance - \([\text{Bias}(\hat{f}(x))]^2\) is the squared bias - \(\text{Var}(\epsilon)\) is the irreducible error
Hereβs how you could structure the slide:
Regularization: Preventing Overfitting
Overfitting: When a model is too complex and memorizes training data, leading to poor performance on new data.
Regularization: Technique to balance model complexity and performance, making the model generalize better.
Common Regularization Methods:
Lasso (L1): Sets some coefficients to zero, ignoring less important features.
Ridge (L2): Shrinks all coefficients, but keeps all features.
Elastic Net: Combines both Lasso and Ridge regularization.
graph TD
A[Study Hours?] -->|Yes| B[Attendance?]
A -->|No| C[Fail]
B -->|High| D[Pass]
B -->|Low| E[Fail]
style A fill:#FFDDC1,stroke:#FFB3B3,stroke-width:2px;
style B fill:#FFD9E8,stroke:#FFB3B3,stroke-width:2px;
style C fill:#C7CEEA,stroke:#A3B3FF,stroke-width:2px;
style D fill:#C7F9CC,stroke:#B2F2BB,stroke-width:2px;
style E fill:#FFC9DE,stroke:#FFA1B5,stroke-width:2px;
Neural Networks
Multi-layer Perceptron (MLP)
Layers of interconnected neurons
Activation functions: ReLU, Sigmoid, Tanh
Recurrent Neural Networks (RNN)
Designed for sequential data
Long Short-Term Memory (LSTM) units
Transformers
Attention mechanisms for long-range dependencies
Used in NLP tasks (e.g., BERT, GPT) but also others
Graph Neural Networks (GNN)
For graph-structured data
Applications in social networks, biology
Practical Considerations
Feature Engineering
Creating relevant features: Deriving new features to improve model performance.
Handling missing data: Filling or removing incomplete data to ensure a robust model.
Scaling and normalization: Ensuring features are on similar scales for algorithms that rely on distance (e.g., k-NN, SVM).
Example: One-Hot Encoding for Categorical Variables
Transforms categorical features into a numerical format that models can understand.