Machine learning methods are becoming ubiquitous across particle physics. However, the exploration of such techniques in low-latency environments like L1 trigger systems has only just begun. We present here a new software, based on High Level Synthesis (HLS), to generically port several kinds of network models (BDTs, DNNs, CNNs) into FPGA firmware. As a benchmark physics use case, we consider the task of tagging high-pT jets as H->bb candidates using jet substructure. We map out resource usage and latency versus types of machine learning algorithms and their hyper-parameters. We present a set of general practices to efficiently design low-latency machine-learning algorithms on FPGAs.