Announcement_1

New preprint: TriAttention achieves 2.5x throughput for long-context LLM reasoning via trigonometric KV cache compression, enabling deployment on a single consumer GPU.