Announcement_1
New preprint: TriAttention achieves 2.5x throughput for long-context LLM reasoning via trigonometric KV cache compression, enabling deployment on a single consumer GPU.
New preprint: TriAttention achieves 2.5x throughput for long-context LLM reasoning via trigonometric KV cache compression, enabling deployment on a single consumer GPU.