Building a Context-Folding LLM Agent for Long-Horizon Reasoning with Memory Compression and Tool Use

In this tutorial, we explore how to build a Context-Folding LLM Agent that efficiently solves long, complex tasks by intelligently managing limited context. We design the agent to break down a large task into smaller subtasks, perform reasoning or calculations when needed, and then fold each completed sub-trajectory into concise summaries. By doing this, we…

Read More

QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration

What would you build if you could run Reinforcement Learning (RL) post-training on a 32B LLM in 4-bit NVFP4—on a single H100—with BF16-level accuracy and 1.2–1.5× step speedups? NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced QeRL (Quantization-enhanced Reinforcement Learning), a training framework that pushes Reinforcement Learning (RL) post-training into 4-bit FP4…

Read More

Ivy Framework Agnostic Machine Learning Build, Transpile, and Benchmark Across All Major Backends

In this tutorial, we explore Ivy’s remarkable ability to unify machine learning development across frameworks. We begin by writing a fully framework-agnostic neural network that runs seamlessly on NumPy, PyTorch, TensorFlow, and JAX. We then dive into code transpilation, unified APIs, and advanced features like Ivy Containers and graph tracing, all designed to make deep…

Read More

Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed

Anthropic released Claude Haiku 4.5, a latency-optimized “small” model that delivers similar levels of coding performance to Claude Sonnet 4 while running more than twice as fast at one-third the cost. The model is immediately available via Anthropic’s API and in partner catalogs on Amazon Bedrock and Google Cloud Vertex AI. Pricing is $1/MTok input…

Read More

ServiceNow AI Research Releases DRBench, a Realistic Enterprise Deep-Research Benchmark

ServiceNow Research has released DRBench, a benchmark and runnable environment to evaluate “deep research” agents on open-ended enterprise tasks that require synthesizing facts from both public web and private organizational data into properly cited reports. Unlike web-only testbeds, DRBench stages heterogeneous, enterprise-style workflows—files, emails, chat logs, and cloud storage—so agents must retrieve, filter, and attribute…

Read More

7 LLM Generation Parameters—What They Do and How to Tune Them?

Tuning LLM outputs is largely a decoding problem: you shape the model’s next-token distribution with a handful of sampling controls—max tokens (caps response length under the model’s context limit), temperature (logit scaling for more/less randomness), top-p/nucleus and top-k (truncate the candidate set by probability mass or rank), frequency and presence penalties (discourage repetition or encourage…

Read More

NVIDIA Researchers Propose Reinforcement Learning Pretraining (RLP): Reinforcement as a Pretraining Objective for Building Reasoning During Pretraining

NVIDIA AI has introduced Reinforcement Learning Pretraining (RLP), a training objective that injects reinforcement learning into the pretraining stage rather than deferring it to post-training. The core idea is simple and testable: treat a short chain-of-thought (CoT) as an action sampled before next-token prediction and reward it by the information gain it provides on the…

Read More

A Coding Implementation of Advanced PyTest to Build Customized and Automated Testing with Plugins, Fixtures, and JSON Reporting

In this tutorial, we explore the advanced capabilities of PyTest, one of the most powerful testing frameworks in Python. We build a complete mini-project from scratch that demonstrates fixtures, markers, plugins, parameterization, and custom configuration. We focus on showing how PyTest can evolve from a simple test runner into a robust, extensible system for real-world…

Read More

Andrej Karpathy Releases ‘nanochat’: A Minimal, End-to-End ChatGPT-Style Pipeline You Can Train in ~4 Hours for ~$100

Andrej Karpathy has open-sourced nanochat, a compact, dependency-light codebase that implements a full ChatGPT-style stack—from tokenizer training to web UI inference—aimed at reproducible, hackable LLM training on a single multi-GPU node. The repo provides a single-script “speedrun” that executes the full loop: tokenization, base pretraining, mid-training on chat/multiple-choice/tool-use data, Supervised Finetuning (SFT), optional RL on…

Read More

Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints

Do you actually need a giant VLM when dense Qwen3-VL 4B/8B (Instruct/Thinking) with FP8 runs in low VRAM yet retains 256K→1M context and the full capability surface? Alibaba’s Qwen team has expanded its multimodal lineup with dense Qwen3-VL models at 4B and 8B scales, each shipping in two task profiles—Instruct and Thinking—plus FP8-quantized checkpoints for…

Read More