Speaker
Dr
Song Han
(MIT)
Description
This talk presents efficient multi-modal LLM innovations with algorithm and system co-design. I’ll first present VILA, a visual language model deployable on the edge. It is capable of visual in-context learning, multi-image reasoning, video captioning and video QA. Followed by SmoothQuant and AWQ for LLM quantization, which enables VILA deployable on edge devices, bringing new capabilities for mobile vision applications. Second, I’ll present StreamingLLM, a KV cache optimization technique for long conversation and QUEST, leveraging sparsity for KV cache compression.
Author
Dr
Song Han
(MIT)