23–27 Sept 2024
CERN
Europe/Zurich timezone

Optimizing Multi-Model Compression for Resource-Constrained Devices (Poster Upload)

23 Sept 2024, 17:13
1m
500/1-001 - Main Auditorium (CERN)

500/1-001 - Main Auditorium

CERN

400
Show room on map

Speaker

João Luís Prado

Description

We address the challenge of compressing a sequence of models for deployment on computing- and memory-constrained devices. This task differs from single model compression, as the decision to apply compression schemes either independently or jointly across all sub-networks introduces a new degree of freedom. We evaluate the performance of pruning and quantization techniques for model compression in the context of a prototypical image restoration and object detection multi-model system. We propose an adaptation of Quantization Aware Training (QAT) and pruning techniques, where the multi-model system is fine-tuned as a single unit with an adapted loss function, rather than applying these techniques to each model individually.

What of the following keywords match your abstract best? Other

Author

João Luís Prado

Co-authors

Dr Laleh Makarem (Logitech) Dr Mathieu Salzmann (EPFL)

Presentation materials