# Artificial Neural Networks

In this Notebook we will take a look at the core of modern Machine Learning  - **Artificial Neural Networks**. 
This area of Machine Learning, where Artificial Neural Networks have complex architecture with big number of hidden layers is often defined as Deep Learning.

**Deep learning:** *A subfield of machine learning that structures algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own.*

Artificial Neural Networks (ANN) can be designed to deal with many different tasks. 
In this Notebook we will work with **classification**. 

**TASK:** Train the simple  Neural Network to classify handwritten digit as a proper digit.  

As A training data we will be using popular dataset - MNIST, containing handwritten digits (0-9) images with correct labels.
[Read more about MNIST dataset](http://yann.lecun.com/exdb/mnist/).

**WHAT YOU WILL LEARN:**

- how to prepare data for training with Pytorch (DataLoader, batches)
- what are main components of basic Artificial Neural Network
- how to implement basic Artificial Neural Network with Pytorch
- how to perform main training loop in Pytorch
- what are Hyperparameters for ANN and how to tune them


**TO DO:** Read and understand following code. Run the cells, analyse the results and if everything is clear, follow the instructions concerning exercises parts. 


# Classification with Artificial Neural Network

## Data Preperation and Visualization

- download data
- transform Images to Tensors
- create DataLoader with defined batch size

The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image. 

MNIST dataset is available within `torchvision.datasets` package! 
You can download them with single line of code. 
As you can see you there is no need to split data into "train" and "test" manually - you just need to specify a parameter "train" as True or False.


In [None]:
import torch
import torchvision
from torchvision import transforms, datasets
import matplotlib.pyplot as plt


train = datasets.MNIST('', train=True, download=True,)

test = datasets.MNIST('', train=False, download=True,)


print(type(train[0][0]))
plt.imshow(train[0][0])

The important feature here is `transform`, wich enables to perform some changes to the data, like normalization, scaling etc. We need to convert the data into Tensors, which is required data type for working with Pytorch. 

### **Exercise** 

Lets download the MNIST dataset once again. Uncomment the part concerning `transforms` and see how the data type has changed.

In [None]:
train = datasets.MNIST('', train=True, download=True,
                    #    transform=transforms.Compose([
                    #        transforms.ToTensor()
                    #  ])
                       )

test = datasets.MNIST('', train=False, download=True,
                    #    transform=transforms.Compose([
                    #        transforms.ToTensor()
                    # ])
                    )

print(type(train[0][0]))
print(train[0][0])
plt.imshow(train[0][0].view(28,28))


The next step is to to make the data iterable using [DataLoader](https://pytorch.org/docs/stable/data.html) module in order to be able grab the batches of data (instead of whole dataset) for training or exploring purposes.

In [None]:
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)


**Batches - why do we need them?**
With batches, instead of loading all the 60 000 images into memory which is way too expensive for the computer, you can load 64 images(1 batch) for 938 times in 1 epoch of training which requires way less memory as compared to loading the complete data set.

### **Exercise**

Since we have created DataLoaders we now can iterate over them.
In the the next cell, there is the loop for iterating over `trainset` and inspect it. Do the same with `testset`!

In [None]:
import matplotlib.pyplot as plt

for data in trainset:
    batch_of_images = data[0]
    print(batch_of_images.size())
    batch_of_labels = data[1]
    print(batch_of_labels.size())

    fig = plt.figure()
    fig.set_size_inches(18.5, 10.5, forward=True)
    for i, (image, label) in enumerate(zip(batch_of_images,batch_of_labels)):
        plt.subplot(2,5,i+1)
        plt.imshow(image.view(28,28), cmap='gray', interpolation='none')
        plt.title(f"Ground Truth: {label}")
    fig

    break

In [None]:
### for data in testset:
    ### YOUR CODE HERE

## Buld Neural Network Model

In [None]:
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch.autograd import Variable


Typical Neral Network in Pytorch inhereits from the `nn.Module class` and has the following structure:


In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()

net = Net()
print(net)

Time to put some layers inside the Network!


Pay attention to parameters of layers - they have to correspond to our input dimensions and expected output dimensions!


In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64) # 28x28 is our image size; however basic neural network is going to expect to have a flattened array, so not a 28*28 = 784.
        self.fc2 = nn.Linear(64, 64) # hidden layer - next layer is always going to accept however many connections the previous layer outputs
        self.fc3 = nn.Linear(64, 64) # hidden layer
        # may be more hidden layers - the more layer, the more "deep" our Network is
        self.fc4 = nn.Linear(64, 10) # output layer needs 10 neurons because we have 10 classes!

    # passing our data through the layers + activations
    def forward(self, x):
        x = F.relu(self.fc1(x)) # simply passing our data (x) through the layer with ReLU activation (activation functions are keeping our data scaled between 0 and 1)
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.log_softmax(x, dim=1) # softmax is good for multi-class classification - outputs themselves are a confidence score, adding up to 1

net = Net()
print(net)

Another way to construct the Network is with using `nn.Sequential`.
Useful especially when the architecture is more complex.

Also different way for adding ReLU activation is presented. Now we have it in the form of ReLU() layers. 

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(28*28, 64),
            nn.ReLU(),
            nn.Linear(64, 64),
            nn.ReLU(),
            nn.Linear(64, 64),
            nn.ReLU(),
            nn.Linear(64, 10) )



    def forward(self, x):
        x = self.layers(x)
        return F.log_softmax(x, dim=1) 

net = Net()
print(net)

Lets verify if out Network is correctly designed. We will create a random image with 28x28 dimensions and pass it to out Network:

In [None]:
X = torch.randn((28,28)) # create random 28x28 image
print('Input:')
plt.imshow(X.view(28,28))
plt.show()


X = X.view(-1,28*28) # neural network wants this to be flattened
output = net(X)
print(f'Output: {output}')

### **Exercise**

Now verify the Network with one of data from our dataset. 
Hint: Take a batch from out `trainset` using `next(iter(trainset))`.

In [None]:
#### YOUR CODE HERE

## Training



Lets start the main part - trainig!

In order to perform the training, apart from the Model architecture itself we need 2 more elements:
- loss function - calculates "how far off" our classifications are from reality
- optimizer - adjusts our model's adjustable parameters like the weights, to slowly, over time, fit our data



In [None]:
import torch.optim as optim

loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

Main training loop in Pytorch requires following actions:
- loop by epochs
- iterating through the DataLoader (batch by batch)
- setting gradients to 0 before calculating the loss function
- pass the batch of the data through the Model in order to obtain the output
- compare the output with true labels - calculate the loss
- apply this loss backwards to update the parameters
- optimize the weights of neurons with optimizer algorithm
- optional - collect the loss and accuracy values at each step in order to plot them later

In [None]:
epochs = 3
loss_list_epoch = []

for epoch in range(epochs): # 3 full passes over the data
    for data in trainset:  # 'data' is a batch of data
        X, y = data  # X is the batch of features, y is the batch of targets.
        net.zero_grad()  # sets gradients to 0 before loss calc. You will do this likely every step.
        output = net(X.view(-1,784))  # pass in the reshaped batch (recall they are 28x28)
        #loss = F.nll_loss(output, y)  # calc and grab the loss value - The negative log likelihood loss
        loss = loss_function(output, y)
        loss.backward()  # apply this loss backwards through the network's parameters
        optimizer.step()  # attempt to optimize weights to account for loss/gradients
    loss_list_epoch.append(loss.item())
    print(f'Epoch: {epoch}; Loss: {loss}')  # print loss for each epoch - we hope loss (a measure of wrongness) declines!

# Plot the loss value per epoch
plt.plot(range(epochs), loss_list_epoch)
plt.xlabel("epoch")
plt.ylabel("loss")
plt.title("Loss by epoch")
plt.show()

## Accuracy value and evaluation on examples from dataset

Lets calculate our final accuracy value on `testset` and test our Model on examplary image from `testset`!

In [None]:
correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net(X.view(-1,784))
        #print(output)
        for idx, i in enumerate(output):
            #print(torch.argmax(i), y[idx])
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

In [None]:
import matplotlib.pyplot as plt

print('Input:')
plt.imshow(X[0].view(28,28))
plt.show()

output = torch.argmax(net(X[0].view(-1,784))[0])
print(f'Output: {output}')

### **Exercise**

Try to check more examples! Can you find the one which is 'difficult' for our Model to classify?

In [None]:
### YOUR CODE

## Hyperparameters

Tuning Neural Network is basically testing different values of hyperparameters which are:

- BATCH SIZE
- LEARNING RATE
- NUMBER OF HIDDEN LAYERS
- NUMBER OF EPOCHS


The good practice is to all Hyperparameters as global variables at the beginning of the Notebook as follows:


In [None]:
# Hyperparameters
BATCH_SIZE = 64
LEARNING_RATE = 0.001
OPTIMIZER = optim.Adam(net.parameters(), lr=LEARNING_RATE)
NUMBER_OF_HIDDEN_LAYERS = 5
EPOCHS = 6

### **Exercise**

Try to refactor the code for Network Model architecture assuming that `NUMBER_OF_HIDDEN_LAYERS` is a parameter. 

**Hint**: Use `nn.Sequential`.

Verify your Network  with 28x28 input image as we did it previously.

In [None]:
NUMBER_OF_HIDDEN_LAYERS = 5

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64) 
        #### YOUR CODE HERE 
        self.fc_n = nn.Linear(64, 10) 

    def forward(self, x):
        x = F.relu(self.fc1(x)) 
        #### YOUR CODE HERE
        x = self.fc_n(x)
        return F.log_softmax(x, dim=1) 

net = Net()
print(net)

In [None]:
# Check your Model

### YOUR CODE HERE

### **Exercise** 

1. (Optional) Refactor the code in the Notebook to keep Hyperparameters as global variables.
2. Change the learning rate value to something much smaller (e.g. 0.00000001) and run the training again. What are the results? 
3. (Optional - if you have time) Leave this small value of learning rate but try to "fix" the results by changing other hyperparameters values.