---
opengraph:
  title: DESY Notes
  image: https://s3.desy.de/public/DESY_logo_3C_web.png
  description: Note taking App provided by Helmholtz Cloud

---

# Cern School of Computing 2024 - Introduction to ML

These are the collaborative learner notes of the CSC'24 introduction to machine learning. The event can be reached here:
https://indico.cern.ch/event/1376644/

## Day 1

Here, the introductory lecture of Judith Katzy was delivered: https://indico.cern.ch/event/1376644/contributions/5945620/


## Day 2

### Lecture 2, 8:45 am

https://indico.cern.ch/event/1376644/contributions/5945662/

Feel free to add notes, links to pages or articles below that relate to the lecture content.

- Judith is showing some plots from "Understanding Deep Learning" by J.D. Prince
    - this book is available for free: https://udlbook.github.io/udlbook/

### Exercise 1, 9:45 am

As we start the exercises, please follow the instructions in the exercise repo:
https://github.com/psteinb/2024-intro2ml-cern-school-of-computing

### Feedback Day 2

Besides direct feedback, we invite all learners to share their anonymous feedback here! Just add your thoughts in the respective section below.

#### Share something you learned or that you liked about today

- really nice pen-and-paper exercises. I'm a beginner on ML (0 experience).. so really useful. But I could not do them in 10 min each. 
- Very nice to do by hand in order to figure out the inner workings. Would be nice to have the solution to part 1.4, for at least one of the groups. Thanks!
- I feel the theoretical lectures could have gone into more of the concepts with some detail rather than just gliding over.
- Very nice notebooks!
- Super nice introduction to backpropagation and really helpful to go through the maths again - thank you!
- nice vibe :)
- very enjoyable :) good to go back to basics as someone who learnt a bit of ML before, but never really practiced much
- ~~ATTENTION THIS MEME WAS MADE BY DNN~~
- Was good to get back to by-hand calculations, we give so much stuff for granted +1
-  (^._.^)
-         _nnnn_
-        dGGGGMMb
-       @p~qp~~qMb
-       M|@||@) M|
-       @,----.JM|
-      JS^\__/  qKL
-     dZP        qKRb
-    dZP          qKKb
-   fZP            SMMb
-   HZM            MMMM
-   FqM            MMMM
- __| ".        |\dS"qML
- |    `.       | `' \Zq
- _)      \.___.,|     .'
- \____   )MMMMMP|   .'
-     `-'       `--' hjm


#### Share something you didn't understand or want us to improve today

-  When backpropagating, how do you deal with the activation function when taking gradients?
    -  PS: Good question! This is usually done by convention. For example, ReLU is a non-monotonic function and hence mathematically speaking has no gradient defined. To circumvent this, the community agreed on a recipe. For our use case, pick a strategy and go for it.
    
-  I also would like to implement numerical calculation of gradient function, not just by analyticaly solving this
    -  PS: the foundations of autodiff can be intricate, I like this page from a former collaborator https://e-dorigatti.github.io/math/deep%20learning/2020/04/07/autodiff.html
- `your comment goes here`


## Day 2

### Lecture 3, 8:45 am

https://indico.cern.ch/event/1376644/contributions/5945674/

Feel free to add notes, links to pages or articles below that relate to the lecture content.


### Exercise 2+3, 4:30 pm

https://github.com/psteinb/2024-intro2ml-cern-school-of-computing

#### A MLP for your phone

Please share the number of parameters as well as the size in memory of your MLP below:



- `num params, nn GB`
- (3072,4080) - 1604354561 nodes - 12.834836488 GB 
- pictures of 4032 * 3024 -> 1 560 707 585 parameters -> ~6,24 GB size
- My phone takes 3000x4000 pixel images cause it's a gigachad. So the total number of parameters is 1536033281, meaning ~6.1GB (ouch окау) whopsie I didn't multiply by 3 :harold:
- 2055012737 (2.05e9) parameters, picture size = (3472 x 4624), 8.22 GB
- 302023169 parameters, 1.2 GB
- 1153024769 p, 

picture size: (3024, 4032)
n parameters: **1 572 900 225** (3024*4032*(128+1) + 2*128*129 + 1*(128+1))
memory size (assuming float32): **6.29 GB**

- Beluga picture (+1 🐋)
Picture size (3024x4032)
no of parameters = 1.5605*10^9, memory : 6,24 GB 
+1 (same resolution)
- picture size: (4096x3072) therefore 1,601,645,761 parameters, at float32 = ... I'm getting 48gb that aint right lol

- Picture 4032 x 3024 (3 color channels): 4718634369 would take 17.58 GiB assuming float32 weights

- Evaluation:
  - x, y = 4624, 3468
  - hidLayer = 128
  - print(   hidLayer * (x* y +1) + 2*128*(128+1) + 1*(128+1)  )
  - Final: 2052645377
  - float in python is 64-bit -> 8 bytes. 
  - 2052645377*8 / 1024**3 -> 15 GB
- picture size: 4032*2268, 1170539009 parameters, 4.68 GB

- Picture size 4032 x 3024, just like everybody else...
Number of parameters: 1 560 707 584.
Size in memory assuming 4 bit integers: 6.2 GB.
Ok I see clever people are taking RGB into account. Slap some alpha in there as well for an x4. Anyway.

- Picture size 1800x4000:
    - number of parameters: 921633281
    - size of the network around 3.6GB (4byte floats)
    - well if we are talking RGB, we can start with thripple the inputs.

Picture size = 4608*3456 = 15,925,248, number of parameters = 2,038,465,025
Input to h1 -> 15.9M*128 weights + 128 bias
h1 to h2 -> 128*128 weights + 128 bias
h2 to h3 -> 128*128 weights + 128 bias
h3 to out -> 128 weights + 1 bias



==== Picture size: 4000 x 3000 pixels
- Parameters: 1536033281
- Size in memory (4 byte single precision): approx 6.14GB


Picture size: 3468x4624
- Parameters: 2052645377
- size in memory, assuming 4-byte floats: y_gb = x / 1024**3
- >>> y_gb
7.646699909120798
### Feedback Day 2

 - Good amount of explanations, easy to follow, understand :)

Besides direct feedback, we invite all learners to share their anonymous feedback here! Just add your thoughts in the respective section below.


#### Share something you learned or that you liked about today

- `your comment goes here`

#### Share something you didn't understand or want us to improve today

- Calling pytorch a very low level library is a bit silly
    - PS: granted, for someone doing C++, rust or similar programming all day, that might be so. But the statement was meant in comparison to libraries like `keras` and `sklearn`


 - Is there some  theorem that says, that if data-set size and computing power go to infinity, then all reasonable models, despite size and architecture, converge to same preformance? 

## Day 3


### Lecture 4, 8:45 am

https://indico.cern.ch/event/1376644/contributions/5945749/

Feel free to add notes, links to pages or articles below that relate to the lecture content.

- `notes go here`

### Exercise 4+5, 9:45 am and 11:30 am

https://github.com/psteinb/2024-intro2ml-cern-school-of-computing

### Feedback Day 3

- great micro lesson! +1
- this lecture was straight bussin, no cap, no cap, the content hit different 🚀 and ML is now my bro. U R the GOAT of ML lectures. Vibe checked and lit!
- The exercises are great at showing the full pipeline from data loading to training and results. It is a great reference for future ML projects.
- The excercises were really helpful and very very well prepared. It was a great balance between concrete examples and abstract ideas. I hope you will be able to come to the future CSCs!
- The exercises were really good. It was helpful to see the process from beginning to end. As someone that did never really work with either pytorch or scikit learn, I felt, that the exercises where we had to fill in stuff was not so helpful. I did not know what to do. So I just looked at the solutions. But at least I could learn from reading the solutions.
- Really good exercises, for me a great intro to ML. If I ever need it in the future, these will be my first point of reference
- There were nice examples to go through by yourself, with a lot of explanations in between. I missed pair/group work. It felt more like reading exlanations than exercises. And the lecture could be a bit more connected with the exercises.
esides direct feedback, we invite all learners to share their anonymous feedback here! Just add your thoughts in the respective section below.

#### Share something you learned or that you liked about today

- `your comment goes here`
- I liked that you went through the code and explained what the steps were, it was easier to follow the new concepts than if I read it myself (+1)

- I now finally understand how the CNN works by having multiple channels with different kernels to extract different feautures of the input. As well as how the convolution works to reduce/increase the dimentionality.(+1)

- I really appreciate the work that went into designing the notebooks. They are self-contained and let me learn new concepts in my own pace. They are also a great reference to come back to.

- Kernel size =3, stride=1, padding=1 produce same results:
  - p:padding=0; [a,b,c] - kernel; 
  - [p,0,1,2,3,4,5,6,7,8,p]
  -           x
  - [a,b,c].........[a,b,c]
  -           =
  - [                     ]
    - I didn't know autoencoders can be reasonable on simple tasks!

#### Share something you didn't understand or want us to improve today

- `your comment goes here`

## Contact details

For those still curious about certain aspects of ML, you can reach me at `p.steinbach[at]hzdr.de` 
Have a safe trip home and thanks for sharing this positive feedback above!