Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family

How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget….

Read More

How to Build an End-to-End Interactive Analytics Dashboard Using PyGWalker Features for Insightful Data Exploration

def generate_advanced_dataset(): np.random.seed(42) start_date = datetime(2022, 1, 1) dates = [start_date + timedelta(days=x) for x in range(730)] categories = [‘Electronics’, ‘Clothing’, ‘Home & Garden’, ‘Sports’, ‘Books’] products = { ‘Electronics’: [‘Laptop’, ‘Smartphone’, ‘Headphones’, ‘Tablet’, ‘Smartwatch’], ‘Clothing’: [‘T-Shirt’, ‘Jeans’, ‘Dress’, ‘Jacket’, ‘Sneakers’], ‘Home & Garden’: [‘Furniture’, ‘Lamp’, ‘Rug’, ‘Plant’, ‘Cookware’], ‘Sports’: [‘Yoga Mat’, ‘Dumbbell’, ‘Running Shoes’,…

Read More

Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual Speech Recognition Models for 1600+ Languages

How do you build a single speech recognition system that can understand 1,000’s of languages including many that never had working ASR (automatic speech recognition) models before? Meta AI has released Omnilingual ASR, an open source speech recognition suite that scales to more than 1,600 languages and can be extended to unseen languages with only…

Read More

A Coding Implementation to Build and Train Advanced Architectures with Residual Connections, Self-Attention, and Adaptive Optimization Using JAX, Flax, and Optax

In this tutorial, we explore how to build and train an advanced neural network using JAX, Flax, and Optax in an efficient and modular way. We begin by designing a deep architecture that integrates residual connections and self-attention mechanisms for expressive feature learning. As we progress, we implement sophisticated optimization strategies with learning rate scheduling,…

Read More

Gelato-30B-A3B: A State-of-the-Art Grounding Model for GUI Computer-Use Tasks, Surpassing Computer Grounding Models like GTA1-32B 

How do we teach AI agents to reliably find and click the exact on screen element we mean when we give them a simple instruction? A team of researchers from ML Foundations has introduced Gelato-30B-A3B, a state of the art grounding model for graphical user interfaces that is designed to plug into computer use agents…

Read More

A Coding Implementation to Build Neural Memory Agents with Differentiable Memory, Meta-Learning, and Experience Replay for Continual Adaptation in Dynamic Environments

In this tutorial, we explore how neural memory agents can learn continuously without forgetting past experiences. We design a memory-augmented neural network that integrates a Differentiable Neural Computer (DNC) with experience replay and meta-learning to adapt quickly to new tasks while retaining prior knowledge. By implementing this approach in PyTorch, we demonstrate how content-based memory…

Read More