Lesson 7.7: Practical Session - Building Your First Neural Network

This is the capstone lesson for Module 7. We will synthesize everything we've learned—from network architecture to optimizers and regularization—into a complete, end-to-end Python project. We will build and train a Multi-Layer Perceptron (MLP) for a binary classification task using the popular Keras/TensorFlow library.

Part 1: The Problem and The Goal

The Business Problem: We have a dataset of bank customers with features like `age`, `credit_score`, `balance`, and `tenure`. We want to predict if a customer is likely to "churn" (close their account) in the next year.

The Machine Learning Goal: We will build a **binary classification model**. The model will take the customer's features as input and output a probability of churn. This is the same problem we solved with Logistic Regression, but now we'll use a more powerful, non-linear tool.

Part 2: The Deep Learning Workflow

The Professional's Deep Learning Checklist

Data Preparation: Load the data, separate features (X) and target (y), and perform a train-test split.
Feature Scaling: Crucial for neural networks. Apply `StandardScaler` to all numerical features.
Model Architecture: Define the layers of our MLP using the Keras `Sequential` API. We will include `Dense` layers (our fully-connected layers) and `Dropout` layers for regularization.
Model Compilation: Specify the `optimizer` (e.g., 'adam'), the `loss` function (e.g., 'binary_crossentropy' for a yes/no problem), and any `metrics` to monitor (e.g., 'accuracy').
Model Training: Use the `.fit()` method to train the model, providing both training data and a **validation set** to monitor for overfitting and use with Early Stopping.
Evaluation: Use the `.evaluate()` method on the unseen test set to get the final, unbiased performance metrics.

Part 3: The Complete Python Implementation

Let's build this model step-by-step using TensorFlow and Keras.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score

# For reproducibility
tf.random.set_seed(42)

# --- 1. Data Preparation ---
# We'll simulate a dataset for this example.
np.random.seed(42)
n_samples = 10000
X = pd.DataFrame({
    'credit_score': np.random.randint(500, 850, n_samples),
    'age': np.random.randint(25, 65, n_samples),
    'tenure': np.random.randint(0, 10, n_samples),
    'balance': np.random.uniform(0, 200000, n_samples),
})
# Simulate a non-linear relationship for the churn probability
churn_prob = 1 / (1 + np.exp(-(
    -5 + (X['age']/10) - (X['credit_score']/100) + np.sin(X['balance']/50000) - X['tenure']*0.2
)))
y = (np.random.rand(n_samples) < churn_prob).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# --- 2. Feature Scaling ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# --- 3. Model Architecture ---
model = Sequential([
    # Input layer and first hidden layer
    Dense(units=32, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dropout(0.3), # Regularization

    # Second hidden layer
    Dense(units=16, activation='relu'),
    Dropout(0.3),

    # Output layer
    # Single neuron with sigmoid activation for binary classification (outputs a probability)
    Dense(units=1, activation='sigmoid')
])

# --- 4. Model Compilation ---
model.compile(
    optimizer='adam', 
    loss='binary_crossentropy', 
    metrics=['accuracy']
)

print(model.summary())

# --- 5. Model Training ---
# Define EarlyStopping callback to prevent overfitting
early_stopping = EarlyStopping(
    monitor='val_loss', # Monitor the loss on the validation set
    patience=10,        # Number of epochs with no improvement after which training will be stopped
    restore_best_weights=True # Restore model weights from the epoch with the best value of the monitored quantity.
)

history = model.fit(
    X_train_scaled, y_train,
    epochs=100, # Train for up to 100 epochs
    batch_size=32,
    validation_split=0.2, # Use 20% of training data for validation
    callbacks=[early_stopping],
    verbose=1
)

# --- 6. Evaluation ---
print("\n--- Final Evaluation on Test Set ---")
loss, accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test Accuracy: {accuracy*100:.2f}%")
print(f"Test Loss: {loss:.4f}")

# Generate predictions
y_pred_proba = model.predict(X_test_scaled).flatten()
y_pred_class = (y_pred_proba > 0.5).astype(int)

print("\n--- Classification Report ---")
print(classification_report(y_test, y_pred_class, target_names=['No Churn', 'Churn']))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_pred_proba):.4f}")

Congratulations! You Have Completed Module 7

You have successfully journeyed from the single artificial neuron to building, training, and regularizing a complete Multi-Layer Perceptron. You understand the core mechanics of how deep learning works, from the chain rule of backpropagation to the modern optimizers that make it all possible.

What's Next in Your Journey?

The MLP is a powerful universal function approximator, but it has one glaring weakness: it has no memory. It treats every single row of data as an independent event. This makes it unsuitable for data where the order is crucial, like text or financial time series.

In **Module 8: Deep Learning for Sequences**, we will solve this problem by introducing loops and memory into our networks. We will explore **Recurrent Neural Networks (RNNs)** and their powerful successors, **LSTMs**, the architectures that power language translation, speech recognition, and advanced time series forecasting.

The Universal Approximation Theorem: Why Neural Networks are so Powerful

Introduction to Convolutional Neural Networks (CNNs): For Image and Grid Data