### FWXMACHINA

Nanosecond machine learning with boosted decision trees for high energy physics





Tae Min Hong

- Paper [2104.03408]
- Info fwx.pitt.edu
- Code gitlab.com/PittHongGroup/fwX

Pheno 2021

May 24, 2021

https://indico.cern.ch/event/982783/sessions/396894/

# Thank you to my collaborators



[2104.03408]

fwx.pitt.edu

gitlab.com/PittHongGroup/fwX

PITT-PACC-2103-v2

# Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics

T.M. Hong<sup>\*</sup>, B.T. Carlson, B.R. Eubanks, S.T. Racz<sup>†</sup>, S.T. Roche, J. Stelzer, and D.C. Stumpp<sup>‡</sup>

> Department of Physics and Astronomy University of Pittsburgh

> > May 17, 2021

Undergraduate researchers





- Intro
- Algorithm structure
- Firmware design
- Physics results (simulated)
- Backup slides

# Outline (2)



#### Intro

Machine learning at Level-1 trigger

### Algorithm structure

- Bit integer representation
- Tree flattening & merging

### • Firmware design

• Bin Engines

### Physics results (simulated)

VBF Higgs vs. multijet

### Backup slides

- Comparison to hls4ml
- Test bench

# Machine learning at L1 trigger





# Machine learning at L1 trigger (2) TM Hong







- Optimization
- Use bit integer precision



- Will discuss next:
  - Tree Flattener Forest Merger



Optimization

#### Use bit integer precision

E.g., ap\_int $\langle 8 \rangle$  means the variable is represented by a range from 0 to 255.

Advantages & subtleties

**Pre-evaluate** f

Bit integers represents a wide range without sacrificing float precision

Firmware only adds

**Transformation** 
$$c_{\text{int}} = f(c_{\text{float}}) = \left\lfloor \frac{c_{\text{float}} - c_{\min}}{c_{\max} - c_{\min}} \cdot \left(2^N - 1\right) \right\rfloor$$

Floor operation

Equal up to one bit because of floor

 $f(x_1 + x_2) = f(x_1) + f(x_2)$ 

# **Decision tree, 2 var example**





- Advantages & subtleties
  - Cut thresholds & weights determined during training
  - Danger of "memorizing" boundaries (overtraining), so must consider a forest

Xa

Ci

# Decision tree, 2 var example (2)



Xa



- Advantages & subtleties
  - Deterministic, conventional style •
  - Cuts in each axis is not independent of each other, so recursive



Ci

х<sub>а</sub>

# Decision tree, 2 var example (3)

TM Hong





- Advantages & subtleties
  - Each axis is independent of each other  $\rightarrow$  Bin search problem on a grid
  - Does not scale well for very deep trees (but do you really need it at L1?)



Ci

х<sub>а</sub>

Xa

# Forest of **boosted** decision trees





# Merging of the forest

TM Hong





#### Advantages & subtleties

- Merging is pre-processed before implementation in firmware
- This is using adaptive boosting. Gradient boosting cannot pre-merge, but we have approximations for that method to improve performance.
- Physics impact of flattening & merging
  - None, bec. encodes the entirety of conventional approach
  - Firmware is a giant look-up table problem

# Physics: VBF Higgs vs. multijet



/lachina Samples **VBF** Higgs Unit norm 10-5 Multijet  $10^{-4}$ 10 50 100 150 200 250 300 350 400 100 150 200 50 250 Leading jet p<sub>T</sub> Sub-leading jet p E 10<sup>-1</sup> u it D 10<sup>-3</sup>  $10^{-5}$  $10^{-6}$  0 0.5 1 1.5 2 2.5 3 3.5 4 4.5  $10^{-7}$ 100 200 300 400 500 **Di-jet invariant mass** Di-jet p<sub>T</sub> E 10<sup>−1</sup> Unit  $10^{-2}$ 

0.5

1.5

Azimuthal angle gap

2.5

-01 norm 10<sup>-1</sup>

 $10^{-3}$ 

10-4

10-{

Unit norm. 10\_\_\_\_

10

0.06 Unit norm. 0.04

0.02

56

Pseudorapidity gap

- VBF Higgs vs. Multijet background
  - $\sigma_{Higgs}$  = 4 pb, two widely separated high-p<sub>T</sub> jets
  - $\sigma_{pp}$  = 80 mb, dominant process at LHC
  - Distributions given on the right
- We consider two decays of the Higgs
  - $H \rightarrow v\bar{v}v\bar{v}$ , "invisible"
  - $H \rightarrow b\bar{b}b\bar{b}$ , thru pseudoscalar decays
- Strategy
  - Train BDT to identify VBF jet pair, i.e., train BDT on Multijet vs. VBF  $H \rightarrow v\bar{v}v\bar{v}$
  - Apply that BDT to Multijet vs. VBF  $H \rightarrow b\bar{b}b\bar{b}$
- Why
  - If it works for VBF  $H \rightarrow b\bar{b}b\bar{b}$ , then it can be a trigger for VBF independent of the Higgs decay
  - **Does it work?** Next slide

# Physics: VBF Higgs vs. multijet (2) TM Hong





- Reminder. Did *not* train on VBF  $H \rightarrow b\bar{b}b\bar{b}$
- Subtlety re: jet selection (see paper)
- Distributions given on the right

- Performance comparison
  - Try to mimic ATLAS HL-LHC cuts as best we can using Madgraph + Delphes
  - Two-fold signal efficiency improvement from ATLAS-inspired → fwX results



- We validated our setup to reproduce the signal efficiency in the ATLAS Run-2 paper
- Comparison using bit integers, not floats



# Firmware: VBF Higgs vs. multijet TM Hong



| <ul> <li>Ran two configurations</li> <li>Optimized version</li> </ul>                                                                                                       |                                                | VBF H<br>Optimized | VBF H<br>Non-opt |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|--------------------|------------------|
| <ul> <li>Non-optimized version (for comparison)</li> <li>Both using 100 trees, max depth of 4</li> </ul>                                                                    | N <sub>var</sub>                               | 5                  | 7                |
| <ul> <li>Results given on the right</li> </ul>                                                                                                                              | N <sub>bit-var</sub><br>N <sub>bit-score</sub> | 8<br>16            | 12<br>16         |
| <ul> <li>Performance</li> </ul>                                                                                                                                             | N <sub>bin</sub>                               | 40k                | 1M               |
| <ul> <li>5 clock ticks = 16 ns</li> <li>Negligible resource usage</li> </ul>                                                                                                | Latency                                        | 5 ticks            | 6 ticks          |
| - Negligible resource usage                                                                                                                                                 | LUT                                            | 1%                 | 1.5%             |
|                                                                                                                                                                             | Flip Flops                                     | ~0                 | ~0               |
| <ul> <li>Benchmark using e<sup>+</sup> vs. γ</li> </ul>                                                                                                                     | BRAM                                           | 2%                 | 30%              |
| <ul> <li>In the paper, we also define <u>one set</u> of parameters to scale up <u>one param. at a time</u></li> <li>Uses 4 variables, 8 bits &amp; same as above</li> </ul> | \ DSP                                          | 0                  | ~0               |

- 3 clock ticks = 10 ns
- Negligible resource usage

Back up

# **Comparison to hls4ml**



#### Details

- Ideally we would run hls4ml's example & compare, but we can't as-is because they run a 5-class jet identification (b, W, top, g, q)
- We ran hls4ml on the <u>same dataset</u> with the <u>same configuration</u> as in our paper

| Parameter               | FWXMACHINA                | hls4ml-Conifer                    | Comments      |                                |
|-------------------------|---------------------------|-----------------------------------|---------------|--------------------------------|
| ML training setup       |                           |                                   |               | _                              |
| Training software       | TMVA                      | TMVA                              | same          | Same setup                     |
| Physics problem         | electron vs. photon       | electron vs. photon               | same          | Carrie Cottap                  |
| Training samples        | from ref. [56]            | from ref. [56]                    | same          |                                |
| No. of event classes    | 2                         | 2                                 | same          |                                |
| No. of training trees   | 100                       | 100                               | same          |                                |
| Max. depth              | 4                         | 4                                 | same          |                                |
| No. of input variables  | 4                         | 4                                 | See figure 18 |                                |
| Other TMVA parameters   | TMVA defaults             | TMVA defaults                     | same          |                                |
| Nanosec. Optimization   | Flattened & merged to 10  | N/A                               | Unique to FwX |                                |
|                         | final trees, without TREE |                                   |               |                                |
|                         | Remover of Cut Eraser     |                                   |               |                                |
| FPGA and firmware setup |                           |                                   |               | _                              |
| Chip family             | Xilinx Virtex Ultrascale+ | Xilinx Virtex Ultrascale+         | same          |                                |
| Chip model              | xcvu9p-flga2104-2L-e      | xcvu9p-flga2104-2L-e              | same          |                                |
| Vivado HLS version      | 2019.2                    | 2019.2                            | same          |                                |
| Clock speed, period     | 320 MHz, 3.125 ns         | 320 MHz, 3.125 ns                 | same          |                                |
| Precision               | $ap_i(8)$                 | ap_ufixed $\langle 10, 5 \rangle$ | See text      |                                |
| BIN ENGINE              | BSBE                      | N/A                               | Unique to FwX |                                |
| FPGA cost               |                           |                                   |               |                                |
| Latency                 | 3 clock ticks, 9.375 ns   | 15 clock ticks, $46.875$ ns       | -             | <ul> <li>Comparison</li> </ul> |
| Interval                | 1 clock tick, $3.125$ ns  | 1 clock tick, 3.125 ns            | same          |                                |
| LUT                     | 1903, < 0.2% of total     | $2.3 \mathrm{M}, 192\%$ of total  | See caption   |                                |
| FF                      | 138, < 0.01% of total     | $1.1 \mathrm{M}, 44\%$ of total   | -             |                                |
| BRAM 18k                | 8, < 0.2% of total        | 0                                 | -             |                                |
| URAM                    | 0                         | 0                                 | same          |                                |
| DSP                     | 0                         | 0                                 | same          | _                              |

# **Benchmark firmware perform'ce**



| Parameter                             | Value                  | Comments                     | -                                                                                                                          |
|---------------------------------------|------------------------|------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| FPGA setup                            |                        |                              |                                                                                                                            |
| Chip family                           | Xilinx Virtex Ultrasca | le+                          | - 9                                                                                                                        |
| Chip model                            | xcvu9p-flga2104-2L-e   |                              |                                                                                                                            |
| Vivado version                        | 2019.2.1               |                              | 2 8 Machina – 11                                                                                                           |
| Synthesis type                        | C-Synthesis            |                              | $\overrightarrow{0}$ 7 $ \overrightarrow{10}$ $\overrightarrow{ns}$ $  10$ $\overrightarrow{s}$ $ 10$ $\overrightarrow{s}$ |
| HLS or RTL                            | HLS                    |                              | $\vec{2}_{0} = 6$                                                                                                          |
| HLS interface pragma                  | None                   |                              |                                                                                                                            |
| Clock speed                           | 320 MHz                | Clock period is 3.125 ns     | o clock ticks - 7                                                                                                          |
| ML training configuration             |                        |                              |                                                                                                                            |
| ML training method                    | Boosted decision tree  | Binary classification        | 100000                                                                                                                     |
| Boost method                          | Adaptive               | AdaBoost with yes/no leaf    | $100 \text{ M}^{-1}$                                                                                                       |
| No. of event types to classify        | 2                      | Signal vs. background        |                                                                                                                            |
| No. of input variables                | 4                      |                              |                                                                                                                            |
| No. of trees used for training        | 100                    |                              | 100 150 200 250 300 350 400 450 500 550 600                                                                                |
| Maximum tree depth                    | 4                      |                              | Clock speed (MHz)                                                                                                          |
| Nanosecond Optimization configuration | 1                      |                              | -                                                                                                                          |
| BIN ENGINE type                       | BIT SHIFT BIN ENGINE   | (BSBE)                       |                                                                                                                            |
| No. of bits for input variables       | 8 bits for each        |                              |                                                                                                                            |
| No. of bits for cut thresholds        | 8 bits for each        |                              |                                                                                                                            |
| No. of bits for BDT output score      | 8 bits                 |                              |                                                                                                                            |
| No. of trees after merging            | 10                     | TREE MERGER via ordered list | <ul> <li>10 ns is independent of</li> </ul>                                                                                |
| No. of final trees                    | 10, none removed       | TREE REMOVER by truncation   |                                                                                                                            |
| No. of bins                           | 26132                  | CUT ERASER not used          | clock from 100-320 MHz                                                                                                     |
| FPGA cost                             |                        |                              |                                                                                                                            |
| Latency                               | 3 clock ticks          | 9.375 ns                     |                                                                                                                            |
| Interval                              | 1 clock tick           | 3.125 ns                     |                                                                                                                            |
| Look up tables                        | 1903  out of  1182240  | < 0.2% of available          |                                                                                                                            |
| Flip flops                            | 138 out of 2364480     | < 0.01% of available         |                                                                                                                            |
| Block RAM                             | 8 out of 4320          | < 0.2% of available          |                                                                                                                            |
| Ultra RAM                             | 0 out of 960           | -                            |                                                                                                                            |
| Digital signal processors             | 0 out of 6840          | -                            |                                                                                                                            |



Table 9: List of input variables for the classification of the VBF Higgs boson vs. multijet process. Also given are the ATLAS-inspired cut-based offline thresholds for Run 2 [64] and HL-LHC [65]. For Run-2, differences arise with respect to the document when the  $m_{jj}$  threshold is quoted as 1100 GeV for L1 MJJ-500-NFF; we use the > 99% offline efficiency point, which is achieved around  $m_{jj} > 1300$  GeV. for others the offline thresholds are used. For HL-LHC, the single-level scheme values are quoted. The performance of the cut-based approach using these values is compared the performance to the BDT result in figure 16. The non-optimized (non-opt) configuration includes the five variables from the optimized configuration.

| Input              | Description                                | ATLAS Run-2 offline   | ATLAS HL-LHC offline  | Used in BDT |
|--------------------|--------------------------------------------|-----------------------|-----------------------|-------------|
| variable           |                                            | cut [64], see caption | cut [65], see caption |             |
| $p_{\mathrm{T1}}$  | Leading jet $p_{\rm T}$                    | > 90 GeV              | > 75 GeV              | -           |
| $p_{\mathrm{T2}}$  | Subleading jet $p_{\mathrm{T}}$            | > 80 GeV              | > 75 GeV              | Optimized   |
| $p_{\mathrm{T12}}$ | Sum $p_{T1} + p_{T2}$                      | -                     | -                     | Optimized   |
| $ \eta_1 $         | Leading jet $\eta$                         | < 3.2                 | -                     | -           |
| $ \eta_2 $         | Subleading jet $\eta$                      | < 4.9                 | -                     | -           |
| $\prod_{\eta}$     | Product $\eta_1 \cdot \eta_2$              | -                     | -                     | Optimized   |
| $ \Delta \eta $    | Separation in $ \eta_2 - \eta_1 $          | > 4.0                 | > 2.5                 | -           |
| $ \Delta \phi $    | Separation in $ \phi_2 - \phi_1 $          | < 2.0                 | < 2.5                 | non-opt     |
| $ \Delta R $       | $\sqrt{(\Delta \eta)^2 + (\Delta \phi)^2}$ | -                     | -                     | non-opt     |
| $m_{jj}$           | Dijet invariant mass                       | > 1300 GeV            | -                     | Optimized   |
| $p_T^{jj}$         | Dijet $p_{\rm T}$                          | -                     | -                     | Optimized   |







 Each variable is processed independently of each other

# Firmware design: Bin Engines





 Look up thresholds in memory, compare



- Bit shift to localize data
  - This is fast
- Use combinatoric logic as much as possible without multiplication. No explicit clocked operations.



#### Setup to validate against software simulation



 No difference seen wrt software implementation

### More info





#### Screenshots of the code repository on git





|            | PittHo                                | ngGroup / fwX · GitLab × +                                             |                                       |                                 |  |
|------------|---------------------------------------|------------------------------------------------------------------------|---------------------------------------|---------------------------------|--|
|            | $(\leftarrow)$ > C' $\textcircled{a}$ | 🗊 🔒 https://gitlab.com/PittHong                                        | Group/fwX 🗐 80% ···· 🕅                | छ 🏠 🚽 💷 🗉 🛎 ┪                   |  |
|            | 🌣 Most Visited 💮 Reload v             | ia ULS 💮 Kick Ass                                                      |                                       |                                 |  |
|            | 🦊 GitLab 🏻 Projects 🗸                 | Groups 🗸 More 🗸                                                        | Search or jump to                     | ۹ D• M + E @•+ 🌒 -              |  |
|            | X fwX                                 | PittHongGroup > 🗴 fwX                                                  |                                       |                                 |  |
|            | ✿ Project overview                    | fwX ⊕                                                                  |                                       | û → 🖈 Star 0 ¥ Fork 0           |  |
|            | Details                               | Project ID: 26555331                                                   |                                       |                                 |  |
|            | Activity                              | - 5 Commits 🖓 1 Branch 🖉 1 Tag 🗈 4 MB Files 🗔 4 MB Storage 🚀 1 Release |                                       |                                 |  |
|            | Releases                              | master v fwX / +                                                       | ✓ History                             | Find file Web IDE 🗸 🖞 🗸 Clone 🗸 |  |
|            | Repository                            |                                                                        |                                       |                                 |  |
|            | D Issues 0                            | first commit<br>Tae Min Hong authored 20 n                             | ninutes ago                           | 83446586                        |  |
|            | Merge requests 0                      |                                                                        |                                       |                                 |  |
|            | <b>Ξ</b> - Requirements               | L Upload File     L README     L                                       | CHANGELOG                             | BUTING Enable Auto DevOps       |  |
|            | CI/CD                                 | Add Kubernetes cluster     Set                                         | t up CI/CD                            |                                 |  |
|            | Security & Compliance                 | Name                                                                   | Last commit                           | Last update                     |  |
|            | Operations                            | 🖨 doc                                                                  | first commit                          | 20 minutes ago                  |  |
| examples   | Packages & Registries                 | a examples                                                             | update stuff                          | 55 minutes ago                  |  |
| ·          | Jul Analytics                         | 🖨 fwXmachina                                                           | update stuff                          | 55 minutes ago                  |  |
|            | U wiki                                | 🖨 images                                                               | update stuff                          | 55 minutes ago                  |  |
| ode —      | V Coincrete                           | <ul> <li>.gitignore</li> </ul>                                         | first commit                          | 21 minutes ago                  |  |
|            | 5 Snippets                            | CHANGELOG                                                              | update stuff                          | 55 minutes ago                  |  |
|            | A Members                             | ₩ŧ EULA.md                                                             | first commit                          | 25 minutes ago                  |  |
|            | 🗘 Settings                            | M* README.md                                                           | update stuff                          | 55 minutes ago                  |  |
| river file | -                                     | 🗧 fwX.py                                                               | update stuff                          | 55 minutes ago                  |  |
|            |                                       | 🕏 setup.py                                                             | update stuff                          | 55 minutes ago                  |  |
|            |                                       | README.md                                                              |                                       |                                 |  |
|            |                                       | FW                                                                     | Machina                               |                                 |  |
|            |                                       | <ul> <li>Doxygen is available at https:</li> </ul>                     | //PittHongGroup.gitlab.io/fwXmachina/ |                                 |  |
|            | ≪ Collapse sidebar                    | FW V                                                                   |                                       |                                 |  |



| $(-) \rightarrow C$ $(-)$                                                                                                                             | 🛈 🔒 https://gitlab.com/PittHongGroup/fwX 80% 🚥 🖂 📩 🖳 🕄 🌚 🍲                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 🗘 Most Visited 🌐 Reload via U                                                                                                                         | ULS ( Kick Ass                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| 🤞 GitLab Projects 🗸 Gro                                                                                                                               | aups 🗸 More 🗸 🖸 🖓 Search or jump to Q DP 🐧 🗸 🕑 🗘 🖉                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| fwX                                                                                                                                                   | README.md                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Project overview   Details   Activity   Releases   Repository   Issues   Merge requests   Merge requests   CI/CD   Security & Compliance   Operations | <image/> <text></text>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Packages & Registries                                                                                                                                 | & Vivado HI S Download and Installation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| <ul> <li>Analytics</li> <li>Wiki</li> <li>Snippets</li> <li>Members</li> <li>Settings</li> </ul>                                                      | <ol> <li>Navigate to https://www.xilinx.com/support/download.html</li> <li>Click the icon of the person in the top right and create an account</li> <li>Navigate back to the URL above</li> <li>Select the desired version on the left. Make sure to select a version that supports your FPGA part number (most versions support all devices)</li> <li>Scroll down a little and click on the name of the installation method. For example, Windows users will click the *.exe one</li> <li>Once that is downloaded, open up the install wizard and progress through the installation. Make sure to select "Vivado" and "Vivado Design Edition"</li> <li>Once it is done installing, open Vivado HLS to verify it is working</li> </ol> |
|                                                                                                                                                       | Other   • ROOT compiled with Python 3, installation depends on method used below  • Other Python package dependencies automatically installed  Installation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                                                                                                                       | Local Installation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|                                                                                                                                                       | Dependencies<br>CERN's ROOT framework compiled with Python 3. This page gives instructions on how to download that.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                                       | Steps                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |

#### install

#### **README (continued)**





