NVIDIA AI Introduces TiDAR: A Hybrid Diffusion Autoregressive Architecture For High Throughput LLM Inference
How far can we push large language model speed by reusing “free” GPU compute, without giving up autoregressive level output quality? NVIDIA researchers propose TiDAR, a sequence level hybrid language model that drafts tokens with diffusion and samples them autoregressively in a single forward pass. The main goal of this research is to reach autoregressive…
