1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
# -*- coding: utf-8 -*- """ Created on Mon Sep 1 18:28:16 2025 @author: dev66 Dataset: MNIST with 60,000 training images and 10,000 test images, each 28×28 grayscale. Preprocessing: Normalize pixel values to [0,1], flatten to 784 features, one-hot encode labels into 10 classes. Network Type: Artificial Neural Network (ANN), feed-forward MLP. Architecture: Input (784) → Dense(256, ReLU) → Dense(128, ReLU) → Dense(10, Softmax). Training: Optimizer = Adam, Loss = categorical cross-entropy, 100 epochs, batch size 128. erformance: Achieved about ~98% test accuracy. Compare training vs validation accuracy/loss If training accuracy keeps going up but validation accuracy stops improving (or drops), that’s overfitting. Same with loss: training loss ↓ but validation loss ↑. Plot curves (already in your code) Look at the loss/accuracy plots. Divergence between train and val curves shows overfitting. Use fewer epochs Train shorter (watch validation accuracy). If test accuracy goes down after too many epochs, you overfit. Add regularization (to reduce overfitting if you find it): Dropout layers (e.g., layers.Dropout(0.5)) L2 weight decay (kernel_regularizer=keras.regularizers.l2(0.001)) Early stopping callback (stop training when val loss stops improving). """ # file: show_mnist_images.py import numpy as np import matplotlib.pyplot as plt from tensorflow import keras from tensorflow.keras import layers # 1) Load MNIST dataset (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() print("Training set:", x_train.shape, y_train.shape) print("Test set:", x_test.shape, y_test.shape) print("x train cell 3 ", x_train[3].shape) print("values", x_train[3]) # 2) Show first 9 original images (values 0–255) plt.figure(figsize=(6,6)) for i in range(9): plt.subplot(3,3,i+1) plt.imshow(x_train[i], cmap="gray") plt.title(f"Label: {y_train[i]}") plt.axis("off") plt.suptitle("Original MNIST images (0–255)", fontsize=14) plt.tight_layout() plt.show() print("----- Normalize images to [0,1] ------") # 3) Normalize images to [0,1] x_train = x_train.astype("float32") / 255.0 x_test = x_test.astype("float32") / 255.0 print("x train cell 3 ", x_train.shape) print("values", x_train[3]) # 4) Show first 9 normalized images plt.figure(figsize=(6,6)) for i in range(9): plt.subplot(3,3,i+1) plt.imshow(x_train[i], cmap="gray") plt.title(f"Label: {y_train[i]}") plt.axis("off") plt.suptitle("Normalized MNIST images (0–1)", fontsize=14) plt.tight_layout() plt.show() # 5) Flatten 28x28 -> 784 x_train = x_train.reshape((-1, 28*28)) # -1 dynamice settiign tonumber of cells x_test = x_test.reshape((-1, 28*28)) # -1 dynamice settiign tonumber of cells # 6) One-hot encode labels (0–9) print("------------------- One-hot encode labels (0–9) --------------------------") # Example for digit 3: # [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] # Example for digit 7: # [0, 0, 0, 0, 0, 0, 0, 1, 0, 0] y_train_cat = keras.utils.to_categorical(y_train, 10) y_test_cat = keras.utils.to_categorical(y_test, 10) print (y_train_cat ) # 7) Build model (simple MLP) model = keras.Sequential([ layers.Dense(256, activation="relu", input_shape=(784,)), layers.Dense(128, activation="relu"), layers.Dense(10, activation="softmax") # 10 classes ]) model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]) model.summary() # 8) Train history = model.fit(x_train, y_train_cat, validation_data=(x_test, y_test_cat), epochs=100, batch_size=128) # 9) Evaluate loss, acc = model.evaluate(x_test, y_test_cat) print(f"Test Accuracy: {acc:.4f}") # 10) Predict one example sample_idx = 0 sample = x_test[sample_idx].reshape(1, -1) pred = model.predict(sample) print("Predicted:", np.argmax(pred), "True:", y_test[sample_idx]) # 11) Plot loss and accuracy plt.figure(figsize=(12,4)) plt.subplot(1,2,1) plt.plot(history.history["loss"], label="train loss") plt.plot(history.history["val_loss"], label="val loss") plt.title("Loss") plt.legend() plt.subplot(1,2,2) plt.plot(history.history["accuracy"], label="train acc") plt.plot(history.history["val_accuracy"], label="val acc") plt.title("Accuracy") plt.legend() plt.show() |
Spatial data means data where the position of each value matters and has relationships with its neighbors.
Examples:
-
Images → each pixel has meaning only in relation to nearby pixels (edges, corners, shapes).
-
Maps / GIS data → a location’s value (temperature, population, elevation) depends on surrounding areas.
-
Medical scans (X-ray, MRI, CT) → pixel/voxel arrangement encodes anatomy.
Non-spatial data (opposite):
-
Tabular data (Excel sheets: age, income, blood pressure). The order of columns or rows does not define relationships.
-
Feature vectors (already extracted numbers like embeddings).
In short:
-
Spatial data = has a geometry (2D, 3D, grid, sequence) where location matters.
-
Non-spatial data = just independent features, order doesn’t matter.
A plain ANN (like MLP) can sometimes be better than a CNN, but only in specific conditions:
-
Non-spatial data
-
If your inputs are tabular data (numbers, features, categories) with no spatial or temporal structure, ANN is usually better.
-
Example: predicting house prices, credit scoring, sensor values.
-
-
Very small datasets
-
CNNs need many samples to learn filters. With very little data, a small ANN may generalize better (or at least overfit less).
-
-
Low-dimensional inputs
-
If your input has only a few features (e.g., 20–50 values), CNN has no advantage.
-
ANN is simpler and faster.
-
-
When spatial structure is irrelevant
-
If the order of pixels/features doesn’t matter (e.g., shuffled or abstract features), CNN loses its main advantage.
-
-
As a classifier after feature extraction
-
Sometimes features are already extracted (e.g., embeddings from another model). In that case, a simple ANN on top of those features is better than CNN.
-
Rule of thumb:
-
Use CNN when data has clear spatial/local structure (images, spectrograms).
-
Use ANN when data is flat/tabular or when relationships are global, not local.