Skip to main content

Command Palette

Search for a command to run...

Zero to Neuron Series 1: Your First Neural Network with TensorFlow 2.0 and Keras

Updated
β€’7 min read
Zero to Neuron Series 1: Your First Neural Network with TensorFlow 2.0 and Keras

Ever wonder how Netflix somehow knows what movie you're going to want to watch next, or how your phone camera can select your face out of a crowd? The wizardry behind much of today's technology is an idea known as a neural network.

It may sound like something out of science fiction, but the concepts themselves are quite intuitive. Today we're going to lift the veil. You're not only going to read about theory; you're going to create a functioning neural network from the ground up that is capable of reading handwritten digits.

Ready? Let's go!

So, What's a Neural Network Anyway?

At its core, a neural network is a powerful pattern-finding machine πŸ•΅οΈ. Instead of being programmed with a long list of if/else rules, it learns by looking at tons of examples. We are creating an environment to neurons to learn patterns.

Think about it: you could try to describe a cat to a computer with rules like, "if it has pointy ears AND whiskers AND a feline shape, it's a cat." But that's brittle. A neural network takes a different approach. It looks at thousands of pictures of cats and learns the vibe of what a "cat" is. It finds the subtle, combined patterns on its own.

This makes them amazing for two main jobs:

  1. Classification: Is this email spam or not? Is this a picture of a cat or a dog?

  2. Regression: What will the price of this house be? How many sales will we make next month?

The Three Building Blocks 🧱

Before we jump into the code, let's quickly meet the three core components you'll be working with.

1. Tensors: The Data Containers πŸ“¦

The most fundamental piece in TensorFlow is the Tensor. Don't let the name intimidate you. A Tensor is just a fancy container for numbers. All of our dataβ€”whether it's an image, text, or stock pricesβ€”gets turned into these multi-dimensional arrays of numbers.

A single grayscale image of a handwritten number in our project is 28 pixels tall by 28 pixels wide. This fits perfectly into a 2D Tensor, or a matrix, with a "shape" of (28, 28).

2. Layers: The Brains of the Operation 🧠

If Tensors are the data, Layers are the processors. The most common type is the Dense Layer. It's called "dense" because every neuron in it is connected to every single neuron from the layer before it.

I love the analogy of a layer as a "committee of experts" (the neurons). Each expert on the committee gets a report from every expert on the previous committee. They review all this information, decide how important each piece is (these are the "weights"), and pass their final decision on to the next committee.

The whole process of "training" is just the network figuring out the best "weights" for all these connections to get the right answer most of the time.

Image by iwab, licensed under CC BY-SA 4.0

3. Models: The Blueprint πŸ“‹

Finally, we stack these layers together to form a Model. The easiest way to do this is with the Sequential model in Keras. It’s exactly what it sounds like: a plain, sequential stack of layers that creates a pipeline for your data to flow through.

We will use Tensorflow 2.0 and keras

Let's Code! Building a Digit Recognizer

Alright, enough theory. Time to get our hands dirty. We are going to build and train a model to recognize handwritten digits from the famous MNIST dataset.

Step 1: Prepare the Data

First, let's import our tools and load the MNIST dataset.

import tensorflow as tf
from tensorflow.keras.datasets import mnist
import numpy as np

# The data comes neatly split into training and testing sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

The image pixels are valued from 0 to 255. We'll normalize them to a range between 0 and 1. This simple step makes it much easier for the network to learn efficiently.

# Normalize the data
X_train = X_train / 255.0
X_test = X_test / 255.0

Step 2: Design the Model Architecture

Now we get to the fun part: designing our model's blueprint using the Sequential API.

Before we write the code, let's quickly meet the key players we're about to use from tensorflow.keras.layers.

  • Sequential: This is our main container. Think of it as a pipeline or a stack of LEGO bricksβ€”you simply add layers one by one, in the order you want the data to flow through them. It's the most straightforward way to build a model.

  • Flatten: Our image data starts as a 2D grid (28x28 pixels). This layer’s only job is to unroll that grid into a single, long line of 784 pixels. It’s the prep-chef of our model, getting the data ready for the main layers.

  • Dense: This is your classic, fully-connected neural network layerβ€”the real workhorse. Every neuron in this layer receives input from all the neurons in the previous layer. This is where the model does its "thinking" and learns the complex patterns in the data. The 128 represents the no of neurons in that current (Refer the Code Below)

  • Dropout: This is a clever technique to prevent our model from simply "memorizing" the training data (a problem called overfitting). During training, it randomly deactivates a fraction of neurons. This forces the network to learn more robust and general patterns, rather than relying on any single neuron to do the heavy lifting. The 0.3 hyperparameter basically saying give rest to 30% of the neurons.

Now, let's put them together in our code:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout

model = Sequential([
    # 1. The Flattener: This unrolls our 28x28 image grid into a single line of 784 pixels.
    Flatten(input_shape=(28, 28)),

    # 2. The Thinker: Our main "hidden" layer with 128 neurons.
    Dense(units=128, activation="relu"),
    Dropout(rate=0.3), # A technique to prevent our model from just "memorizing" the data

    # 3. The Decider: The final output layer. It has 10 neurons (for digits 0-9).
    # 'softmax' makes sure the outputs are probabilities that sum to 1.
    Dense(units=10, activation="softmax")
])

model.summary()

Step 3: Compile and Train! πŸš€

Before training, we compile the model. This is where we give it its learning instructions: the optimizer, the loss function (how it measures error), and what metrics to track.

Python

model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=['accuracy']
              )

Now for the magic moment. We .fit() the model to our training data and let it learn.

# Let the training begin!
model.fit(x=X_train, y=y_train, batch_size=32, epochs=5)

Step 4: Evaluate our Creation

Epoch 1/5 1875/1875 ━━━━━━━━━━━━━━━━━━━━ 7s 2ms/step - accuracy: 0.8459 - loss: 0.5215

Epoch 2/5 1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.9501 - loss: 0.1671

Epoch 3/5 1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.9638 - loss: 0.1203

Epoch 4/5 1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.9683 - loss: 0.1027

Epoch 5/5 1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.9716 - loss: 0.0902

So, how did our model do? Let's evaluate it on the test dataβ€”the images it has never seen before. This is its final exam!

# Evaluate on the test dataset
test_loss, test_acc = model.evaluate(X_test,  y_test)
print(f"\nTest accuracy: {test_acc:.4f}")

313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9722 - loss: 0.0856 Test accuracy: 0.9768

Wow! Over 97% accuracy is fantastic for a model this simple.

Let's check its prediction for the very first image in the test set.

predictions = model.predict(X_test)
predicted_digit = np.argmax(predictions[0])
actual_digit = y_test[0]

print(f"Model's prediction: {predicted_digit}")
print(f"Actual label: {actual_digit}")

What's Next?

And that's it! You've got a fully constructed neural network. You started with data, recognized the fundamental building blocks that work on it, and trained a model to make good predictions.

Next time, we'll discover how we can modify this very same setup to work on a radically different problem: predicting a continuous value (such as a house price) rather than a category.

Link for Next Blog:- Link

Until then, happy coding!

Hey You Do not forget to Subscribe before leaving!!

More from this blog

Import Arun

9 posts