Home

QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration

prakhar@affmantra.com8 months ago05 mins

What would you build if you could run Reinforcement Learning (RL) post-training on a 32B LLM in 4-bit NVFP4—on a single H100—with BF16-level accuracy and 1.2–1.5× step speedups? NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced QeRL (Quantization-enhanced Reinforcement Learning), a training framework that pushes Reinforcement Learning (RL) post-training into 4-bit FP4…

Project Rough: DIY Corner Balance Take 2 – With Physics!

prakhar@affmantra.com8 months ago011 mins

DIY – It’s In The Game Corner balancing a car has been one of those things that seemed off-limits to the average automotive DIY enthusiast. You can find a wide range of electronic scales from a speed shop like Summit Racing; however, a ‘decent’ set will set you back at least $1,000, and a higher-quality…

Ivy Framework Agnostic Machine Learning Build, Transpile, and Benchmark Across All Major Backends

prakhar@affmantra.com8 months ago010 mins

In this tutorial, we explore Ivy’s remarkable ability to unify machine learning development across frameworks. We begin by writing a fully framework-agnostic neural network that runs seamlessly on NumPy, PyTorch, TensorFlow, and JAX. We then dive into code transpilation, unified APIs, and advanced features like Ivy Containers and graph tracing, all designed to make deep…

10-Year Throwback: For The Love Of Rotary

prakhar@affmantra.com8 months ago010 mins

Speedhunters Throwback: This story was originally published in 2015. This FD3S Mazda RX-7 doesn’t need much of an introduction – it’s one of many cars over here in Japan that I’ve had on my ‘to shoot’ list for far too long. But finally, after the stars recently aligned, I managed to get the cool guys at Car…

Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed

prakhar@affmantra.com8 months ago04 mins

Anthropic released Claude Haiku 4.5, a latency-optimized “small” model that delivers similar levels of coding performance to Claude Sonnet 4 while running more than twice as fast at one-third the cost. The model is immediately available via Anthropic’s API and in partner catalogs on Amazon Bedrock and Google Cloud Vertex AI. Pricing is $1/MTok input…

Kei & Mighty: Exploring Japan’s WAZUKA Microcar Museum

prakhar@affmantra.com8 months ago09 mins

Japan’s vast road network boasts 1.2 million kilometres of tarmac across its sprawling landscape. That might sound like a lot, but it manages some 82 million vehicles in some of the world’s most densely populated cities daily. As a country, it should be at a perpetual standstill. Yet, ever since the 1950s, the Japanese have…

ServiceNow AI Research Releases DRBench, a Realistic Enterprise Deep-Research Benchmark

prakhar@affmantra.com8 months ago05 mins

ServiceNow Research has released DRBench, a benchmark and runnable environment to evaluate “deep research” agents on open-ended enterprise tasks that require synthesizing facts from both public web and private organizational data into properly cited reports. Unlike web-only testbeds, DRBench stages heterogeneous, enterprise-style workflows—files, emails, chat logs, and cloud storage—so agents must retrieve, filter, and attribute…

8 Cars In 1 Day: A Day With Drivers Lounge

prakhar@affmantra.com8 months ago010 mins

In all my years in Japan I’ve never seen an influx of tourists like I have in the last two years since the country opened up post-pandemic. And rightly so, there’s so much to see, visit, do and of course eat, oh and if you happen to be a car guy or gal, then you’re…

7 LLM Generation Parameters—What They Do and How to Tune Them?

prakhar@affmantra.com8 months ago05 mins

Tuning LLM outputs is largely a decoding problem: you shape the model’s next-token distribution with a handful of sampling controls—max tokens (caps response length under the model’s context limit), temperature (logit scaling for more/less randomness), top-p/nucleus and top-k (truncate the candidate set by probability mass or rank), frequency and presence penalties (discourage repetition or encourage…

NVIDIA Researchers Propose Reinforcement Learning Pretraining (RLP): Reinforcement as a Pretraining Objective for Building Reasoning During Pretraining

prakhar@affmantra.com8 months ago05 mins

NVIDIA AI has introduced Reinforcement Learning Pretraining (RLP), a training objective that injects reinforcement learning into the pretraining stage rather than deferring it to post-training. The core idea is simple and testable: treat a short chain-of-thought (CoT) as an action sampled before next-token prediction and reward it by the information gain it provides on the…

Trending News

AI

AI

Home

Category Collection

Home

QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration

Project Rough: DIY Corner Balance Take 2 – With Physics!

Ivy Framework Agnostic Machine Learning Build, Transpile, and Benchmark Across All Major Backends

10-Year Throwback: For The Love Of Rotary

Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed

Kei & Mighty: Exploring Japan’s WAZUKA Microcar Museum

ServiceNow AI Research Releases DRBench, a Realistic Enterprise Deep-Research Benchmark

8 Cars In 1 Day: A Day With Drivers Lounge

7 LLM Generation Parameters—What They Do and How to Tune Them?

NVIDIA Researchers Propose Reinforcement Learning Pretraining (RLP): Reinforcement as a Pretraining Objective for Building Reasoning During Pretraining