Stanford's Hazy Research group recently explored how to maximize the speed of open-source models on modern GPUs, particularly in the challenging scenario of generating a single sequence with Llama-3.2-1B.
One MegaKernel to rule Llama-1B
Stanford's Hazy Research group recently explored how to maximize the speed of open-source models on modern GPUs, particularly in the challenging scenario of generating a single sequence with Llama-3.2-1B.