Speaker
João Luís Prado
Description
We address the challenge of compressing a sequence of models for deployment on computing- and memory-constrained devices. This task differs from single model compression, as the decision to apply compression schemes either independently or jointly across all sub-networks introduces a new degree of freedom. We evaluate the performance of pruning and quantization techniques for model compression in the context of a prototypical image restoration and object detection multi-model system. We propose an adaptation of Quantization Aware Training (QAT) and pruning techniques, where the multi-model system is fine-tuned as a single unit with an adapted loss function, rather than applying these techniques to each model individually.
What of the following keywords match your abstract best? | Other |
---|
Author
João Luís Prado
Co-authors
Dr
Laleh Makarem
(Logitech)
Dr
Mathieu Salzmann
(EPFL)