-
•
11 min read
TL;DR: You’re building a semantic caching system using Spring AI and Redis to improve LLM application performance. Unlike traditional caching that requires exact query matches, semantic caching understands the meaning behind queries and can return cached responses for semantically similar questions. It works by storing query-response pairs as vector embeddings in Redis, allowing your application to…
-
•
20 min read
TL;DR:You’re building an AI agent with memory using Spring AI and Redis. Unlike traditional chatbots that forget previous interactions, memory-enabled agents can recall past conversations and facts. It works by storing two types of memory in Redis: short-term (conversation history) and long-term (facts and experiences as vectors), allowing agents to provide personalized, context-aware responses. LLMs…
-
Did you know the Deep Java Library (DJL) powers Spring AI and Redis OM Spring? DJL helps you run machine learning models right inside your Java applications. Check them out: Spring AI with DJL: https://docs.spring.io/spring-ai/reference/api/embeddings/onnx.htmlSemantic Search with SpringBoot & Redis: https://medium.com/redis-with-raphael-de-lio/semantic-search-with-spring-boot-redis-ef376bbdb106 TL;DR: 📚 Index Zero-shot classification is a machine learning technique that allows models to classify text…
-
TL;DR: You’re building an AI-powered app that needs to send lots of prompts to OpenAI. Instead of sending them one by one, you want to do it in bulk — efficiently and safely. This is how you can use Spring AI with Java Virtual Threads to process hundreds of prompts in parallel. When calling LLM APIs like…
-
•
14 min read
TL;DR:You’re building a semantic search app using Spring Boot and Redis. Instead of matching exact words, semantic search finds meaning using Vector Similarity Search (VSS). It works by turning movie synopses into vectors with embedding models, storing them in Redis (as a vector database), and finding the closest matches to user queries. Video: What is semantic search? A traditional searching system works by matching the words a user types…
-
•
16 min read
This content is also available on YouTube. Check it out! The Sliding Window Counter offers a more efficient way to handle rate limiting compared to the Sliding Window Log. While the Sliding Window Log keeps an exact log of timestamps for each request, allowing precise tracking over a rolling time period, this precision comes at the cost of higher…
-
•
14 min read
This article is also available on YouTube. Check it out! The Sliding Window Log is a more precise way to handle rate limiting. Instead of splitting time into fixed intervals like the Fixed Window Counter , it keeps a log of timestamps for each request. This allows it to track requests over a rolling time…
-
•
14 min read
This article is also available on YouTube! The Token Bucket algorithm is a flexible and efficient rate-limiting mechanism. It works by filling a bucket with tokens at a fixed rate (e.g., one token per second). Each request consumes a token, and if no tokens are available, the request is rejected. The bucket has a maximum…
-
•
14 min read
This article is also available on YouTube! The Fixed Window Counter is the simplest and most straightforward rate-limiting algorithm. It divides time into fixed intervals (e.g., seconds, minutes, or hours) and counts the number of requests within each interval. If the count exceeds a predefined threshold, the requests are rejected until the next interval begins. Looking for…
-
•
7 min read
This article is also available on YouTube! Rate limiting — it’s something you’ve likely encountered, even if you haven’t directly implemented one. For example, have you ever been greeted by a “429 Too Many Requests” error? That’s a rate limiter in action, protecting a resource from overload. Or maybe you’ve used a service with explicit request quotas…
