Stanford's Hazy Research group recently explored how to maximize the speed of open-source models on modern GPUs, particularly in the challenging scenario of generating a single sequence with Llama-3.2-1B.
Share this post
One MegaKernel to rule Llama-1B
Share this post
Stanford's Hazy Research group recently explored how to maximize the speed of open-source models on modern GPUs, particularly in the challenging scenario of generating a single sequence with Llama-3.2-1B.