Deep Learning And ML Systems
Created with Inkfluence AI
ML fundamentals, deep neural networks, recommendation systems, Spark, and REST/CICD
Table of Contents
- 1. Classification vs Regression: Decision Limits
- 2. Preprocessing Pipeline: Missing, Scaling, Encoding
- 3. Overfitting Control: Early Stopping & Regularization
- 4. Deep Neural Network Blueprint: Layer & Activation Design
- 5. Spark ALS Recommendation: Spark ML Pipeline
Preview: Classification vs Regression: Decision Limits
A short excerpt from “Classification vs Regression: Decision Limits”. The full book contains 5 chapters and 4,092 words.
Overview
When should a model output a label (class) versus a numeric value? This section uses the CRISP-Goal Fit Matrix to decide between classification and regression based on target semantics, error cost, and evaluation metrics, and it specifies practical loss/metric choices for MTech-level model engineering.
Quick Reference
CRISP-Goal Fit Matrix (classification vs regression)
- Choose Classification if your target is:
- Discrete categories (e.g., churned / retained, plan_A / plan_B)
- Ordinal classes with clear boundaries (map to ordered classes and use ordinal-aware losses if needed)
- Missing/uncertain outcomes where you care about decision correctness (precision/recall)
- Choose Regression if your target is:
- Continuous quantity (e.g., revenue, time-to-churn in days, probability-calibrated churn score)
- You need magnitude accuracy (MAE/MSE align with error cost)
- Common poor fit indicators
- Using regression for discrete labels → unstable thresholds, misleading RMSE
- Using classification for continuous targets → quantization error, capped resolution
Metric mapping
- Classification: Accuracy, F1, ROC-AUC, PR-AUC, Log loss
- Regression: MAE, RMSE, R², MAPE (avoid when target can be 0)
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| `task_type` | `{"classification","regression"}` | Yes | Select target modeling type using CRISP-Goal Fit Matrix |
| `loss` | `str` | Yes | Classification: `binary_crossentropy`, `categorical_crossentropy`, `log_loss`; Regression: `mse`, `mae`, `huber_loss` |
| `metric` | `str` | Yes | Classification: `f1`, `roc_auc`, `pr_auc`; Regression: `mae`, `rmse`, `r2` |
| `threshold` | `float` | No | Decision boundary for classification (e.g., 0.5 default); tune on validation for desired recall/precision |
| `class_weight` | `dict[int,float]` | No | Balances imbalanced classes (e.g., churn vs non-churn) for classification loss weighting |
| `label_encoding` | `{"binary","one_hot","ordinal"}` | Yes (classification) | Defines label format expected by the loss |
| `output_activation` | `str` | Yes | Classification: `sigmoid` (binary) / `softmax` (multi-class); Regression: `linear` |
| `target_transform` | `{"none","log1p","standardize"}` | No | Regression-only transforms; use for heavy-tailed targets to stabilize gradients |
| `calibration` | `{"none","platt","isotonic"}` | No | Post-hoc probability calibration when thresholds must be reliable |
Code Example
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, roc_auc_score, mean_absolute_error
from sklearn.linear_model import LogisticRegression, Ridge
# Assumption: churn labels are discrete {0,1}; churn score is not a continuous quantity.
# If your label is discrete -> classification; if continuous -> regression.
# ---------- Classification (binary) ----------
X = np.random.randn(5000, 20)
y = np.random.binomial(1, 0.18, size=5000) # churned vs retained
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=1000, class_weight="balanced")
clf.fit(X_train, y_train)
p_val = clf.predict_proba(X_val)[:, 1] # churn probability
threshold = 0.35 # tune on validation to meet recall/precision constraints
y_pred = (p_val >= threshold).astype(int)
print("ROC-AUC:", roc_auc_score(y_val, p_val))
print("F1 @ threshold:", f1_score(y_val, y_pred))
# ---------- Regression (poor fit example guard) ----------
# If you mistakenly model discrete y with regression, you must threshold anyway.
# Ridge regression outputs numeric values; you still decide churn via threshold.
reg = Ridge(alpha=1.0)
reg.fit(X_train, y_train)
y_score = reg.predict(X_val)
print("MAE (regression loss on discrete labels):", mean_absolute_error(y_val, y_score))Response Format
{
"task_decision": {
"task_type": "classification|regression",
"justification": {
"target_semantics": "discrete|continuous",
"decision_cost_alignment": "thresholded_decision|magnitude_error",
"poor_fit_indicator": "quantization_or_threshold_instability|misleading_magnitude_metrics"
}
},
"model_config": {
"loss": "binary_crossentropy|mse|mae|huber_loss|log_loss",
"metric": "f1|roc_auc|mae|rmse|r2",
"threshold": 0.5
},
"evaluation": {
"primary_metric": 0.0,
"secondary_metrics": {
"roc_auc": 0.0,
"pr_auc": 0.0,
"mae": 0.0
}
}
}Notes & Best Practices
- Thresholding is not optional for classification: report the metric at the tuned `threshold` used for decisions (especially for churn).
- Avoid metric mismatch: do not use RMSE as the primary metric when your business objective is correct classification under class imbalance; use F1/PR-AUC and threshold selection....
About this book
"Deep Learning And ML Systems" is a technical book by Anonymous with 5 chapters and approximately 4,092 words. ML fundamentals, deep neural networks, recommendation systems, Spark, and REST/CICD.
This book was created using Inkfluence AI, an AI-powered book generation platform that helps authors write, design, and publish complete books. It was made with the AI Documentation Generator.
Frequently Asked Questions
What is "Deep Learning And ML Systems" about?
ML fundamentals, deep neural networks, recommendation systems, Spark, and REST/CICD
How many chapters are in "Deep Learning And ML Systems"?
The book contains 5 chapters and approximately 4,092 words. Topics covered include Classification vs Regression: Decision Limits, Preprocessing Pipeline: Missing, Scaling, Encoding, Overfitting Control: Early Stopping & Regularization, Deep Neural Network Blueprint: Layer & Activation Design, and more.
Who wrote "Deep Learning And ML Systems"?
This book was written by Anonymous and created using Inkfluence AI, an AI book generation platform that helps authors write, design, and publish books.
How can I create a similar technical book?
You can create your own technical book using Inkfluence AI. Describe your idea, choose your style, and the AI writes the full book for you. It's free to start.
Write your own technical book with AI
Describe your idea and Inkfluence writes the whole thing. Free to start.
Start writingCreated with Inkfluence AI