prakhar@affmantra.com

How to Reduce Cost and Latency of Your RAG Application Using Semantic LLM Caching

Semantic caching in LLM (Large Language Model) applications optimizes performance by storing and reusing responses based on semantic similarity rather than exact text matches. When a new query arrives, it’s converted into an embedding and compared with cached ones using similarity search. If a close match is found (above a similarity threshold), the cached response…

Read More

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family

How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget….

Read More

How to Build an End-to-End Interactive Analytics Dashboard Using PyGWalker Features for Insightful Data Exploration

def generate_advanced_dataset(): np.random.seed(42) start_date = datetime(2022, 1, 1) dates = [start_date + timedelta(days=x) for x in range(730)] categories = [‘Electronics’, ‘Clothing’, ‘Home & Garden’, ‘Sports’, ‘Books’] products = { ‘Electronics’: [‘Laptop’, ‘Smartphone’, ‘Headphones’, ‘Tablet’, ‘Smartwatch’], ‘Clothing’: [‘T-Shirt’, ‘Jeans’, ‘Dress’, ‘Jacket’, ‘Sneakers’], ‘Home & Garden’: [‘Furniture’, ‘Lamp’, ‘Rug’, ‘Plant’, ‘Cookware’], ‘Sports’: [‘Yoga Mat’, ‘Dumbbell’, ‘Running Shoes’,…

Read More