Nasim — Default Prediction


R · ISLR · Train/test split · Threshold trade-offs · 2025

Overview

Goal: predict whether a customer defaults (default = Yes) using balance, income, and student, and compare Logistic Regression against LDA.

Key idea: default is rare (~3–4%), so the probability cutoff you choose matters. Lower cutoffs flag more accounts as "high risk" and catch more true defaulters.

Method

Reporting focuses on: predicted default rate (how many accounts are flagged) and caught defaulter rate (share of true defaulters flagged).

Results

Baseline: predicting "No default" for everyone gives ~96.37% accuracy (test default rate ~3.63%).

Model Cutoff Test accuracy Test error Predicted default rate Caught defaulter rate
Logistic0.5097.20%2.80%1.50%32.11%
Logistic0.1093.53%6.47%8.10%72.48%
Logistic0.0589.97%10.03%12.53%84.40%
LDA0.5097.10%2.90%1.13%25.69%
LDA0.1093.40%6.60%8.23%72.48%
LDA0.0588.43%11.57%14.07%84.40%

Takeaways

Plots

Logistic regression predicted probabilities (test set). Vertical lines mark cutoffs 0.50 / 0.10 / 0.05.

Logistic regression predicted probability histogram

LDA predicted probabilities (test set). Most predictions are near 0 because default is rare.

LDA predicted probability histogram

GitHub

Note: results depend on the random train/test split. A next step would be cross-validation or a validation set for selecting cutoffs.

Tags: ISLR · R · Train/test split · Threshold trade-offs