Technical

Deep Learning And ML Systems

by Anonymous · Published 2026-05-24

5 chapters 4,092 words ~16 min read English

ML fundamentals, deep neural networks, recommendation systems, Spark, and REST/CICD

1. Classification vs Regression: Decision Limits
2. Preprocessing Pipeline: Missing, Scaling, Encoding
3. Overfitting Control: Early Stopping & Regularization
4. Deep Neural Network Blueprint: Layer & Activation Design
5. Spark ALS Recommendation: Spark ML Pipeline

Preview: Classification vs Regression: Decision Limits

A short excerpt from “Classification vs Regression: Decision Limits”. The full book contains 5 chapters and 4,092 words.

Overview

When should a model output a label (class) versus a numeric value? This section uses the CRISP-Goal Fit Matrix to decide between classification and regression based on target semantics, error cost, and evaluation metrics, and it specifies practical loss/metric choices for MTech-level model engineering.

Quick Reference

CRISP-Goal Fit Matrix (classification vs regression)

Choose Classification if your target is:
Discrete categories (e.g., churned / retained, plan_A / plan_B)
Ordinal classes with clear boundaries (map to ordered classes and use ordinal-aware losses if needed)
Missing/uncertain outcomes where you care about decision correctness (precision/recall)
Choose Regression if your target is:
Continuous quantity (e.g., revenue, time-to-churn in days, probability-calibrated churn score)
You need magnitude accuracy (MAE/MSE align with error cost)
Common poor fit indicators
Using regression for discrete labels → unstable thresholds, misleading RMSE
Using classification for continuous targets → quantization error, capped resolution

Metric mapping

Classification: Accuracy, F1, ROC-AUC, PR-AUC, Log loss
Regression: MAE, RMSE, R², MAPE (avoid when target can be 0)

Parameters

Parameter	Type	Required	Description
`task_type`	`{"classification","regression"}`	Yes	Select target modeling type using CRISP-Goal Fit Matrix
`loss`	`str`	Yes	Classification: `binary_crossentropy`, `categorical_crossentropy`, `log_loss`; Regression: `mse`, `mae`, `huber_loss`
`metric`	`str`	Yes	Classification: `f1`, `roc_auc`, `pr_auc`; Regression: `mae`, `rmse`, `r2`
`threshold`	`float`	No	Decision boundary for classification (e.g., 0.5 default); tune on validation for desired recall/precision
`class_weight`	`dict[int,float]`	No	Balances imbalanced classes (e.g., churn vs non-churn) for classification loss weighting
`label_encoding`	`{"binary","one_hot","ordinal"}`	Yes (classification)	Defines label format expected by the loss
`output_activation`	`str`	Yes	Classification: `sigmoid` (binary) / `softmax` (multi-class); Regression: `linear`
`target_transform`	`{"none","log1p","standardize"}`	No	Regression-only transforms; use for heavy-tailed targets to stabilize gradients
`calibration`	`{"none","platt","isotonic"}`	No	Post-hoc probability calibration when thresholds must be reliable

Code Example

python

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, roc_auc_score, mean_absolute_error
from sklearn.linear_model import LogisticRegression, Ridge

# Assumption: churn labels are discrete {0,1}; churn score is not a continuous quantity.
# If your label is discrete -> classification; if continuous -> regression.

# ---------- Classification (binary) ----------
X = np.random.randn(5000, 20)
y = np.random.binomial(1, 0.18, size=5000)  # churned vs retained

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

clf = LogisticRegression(max_iter=1000, class_weight="balanced")
clf.fit(X_train, y_train)

p_val = clf.predict_proba(X_val)[:, 1]  # churn probability
threshold = 0.35  # tune on validation to meet recall/precision constraints
y_pred = (p_val >= threshold).astype(int)

print("ROC-AUC:", roc_auc_score(y_val, p_val))
print("F1 @ threshold:", f1_score(y_val, y_pred))

# ---------- Regression (poor fit example guard) ----------
# If you mistakenly model discrete y with regression, you must threshold anyway.
# Ridge regression outputs numeric values; you still decide churn via threshold.
reg = Ridge(alpha=1.0)
reg.fit(X_train, y_train)
y_score = reg.predict(X_val)

print("MAE (regression loss on discrete labels):", mean_absolute_error(y_val, y_score))

Response Format

json

{
  "task_decision": {
    "task_type": "classification|regression",
    "justification": {
      "target_semantics": "discrete|continuous",
      "decision_cost_alignment": "thresholded_decision|magnitude_error",
      "poor_fit_indicator": "quantization_or_threshold_instability|misleading_magnitude_metrics"
    }
  },
  "model_config": {
    "loss": "binary_crossentropy|mse|mae|huber_loss|log_loss",
    "metric": "f1|roc_auc|mae|rmse|r2",
    "threshold": 0.5
  },
  "evaluation": {
    "primary_metric": 0.0,
    "secondary_metrics": {
      "roc_auc": 0.0,
      "pr_auc": 0.0,
      "mae": 0.0
    }
  }
}

Notes & Best Practices

Thresholding is not optional for classification: report the metric at the tuned `threshold` used for decisions (especially for churn).
Avoid metric mismatch: do not use RMSE as the primary metric when your business objective is correct classification under class imbalance; use F1/PR-AUC and threshold selection....

About this book

"Deep Learning And ML Systems" is a technical book by Anonymous with 5 chapters and approximately 4,092 words. ML fundamentals, deep neural networks, recommendation systems, Spark, and REST/CICD.

This book was created using Inkfluence AI, an AI-powered book generation platform that helps authors write, design, and publish complete books. It was made with the AI Documentation Generator.

Frequently Asked Questions

What is "Deep Learning And ML Systems" about?

ML fundamentals, deep neural networks, recommendation systems, Spark, and REST/CICD

How many chapters are in "Deep Learning And ML Systems"?

The book contains 5 chapters and approximately 4,092 words. Topics covered include Classification vs Regression: Decision Limits, Preprocessing Pipeline: Missing, Scaling, Encoding, Overfitting Control: Early Stopping & Regularization, Deep Neural Network Blueprint: Layer & Activation Design, and more.

Who wrote "Deep Learning And ML Systems"?

This book was written by Anonymous and created using Inkfluence AI, an AI book generation platform that helps authors write, design, and publish books.

How can I create a similar technical book?

You can create your own technical book using Inkfluence AI. Describe your idea, choose your style, and the AI writes the full book for you. It's free to start.