Build Long-Term and Short-Term Memory for Agents Using RedisVL

Pros and cons analysis based on real-world practice

Build Long-Term and Short-Term Memory for Agents Using RedisVL.
Build Long-Term and Short-Term Memory for Agents Using RedisVL. Image by DALL-E-3

Introduction

For this weekend note, I want to share some tries I made using RedisVL to add short-term and long-term memory to my agent system.

TLDR: RedisVL works pretty well for short-term memory. It feels a bit simpler than using the traditional Redis API. For long-term memory with semantic search, the experience is not good. I do not recommend it.

Why RedisVL?

Big companies like to use mature infrastructure to build new features.

We know mem0 and Graphiti are good open source software for long-term agent memory. But companies want to stay safe. Building new infrastructure costs money. It is unstable. It needs people who know how to run it.

So when Redis launched RedisVL with vector search, we naturally wanted to try it first. You can connect it to existing Redis clusters and start using it. That sounds nice. But is it really nice? We need to try it for real.

Today I will cover how to use MessageHistory and SemanticMessageHistory from RedisVL to add short-term and long-term memory to agents built on the Microsoft Agent Framework.

You can find the source code at the end of this article.

Don’t forget to follow my blog to stay updated on my latest progress in AI application practices.

Subscribe Now

Preparation

Install Redis

If you want to try it locally, you can install a Redis instance with Docker.

docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

Cannot use Docker Desktop? See my other article.

A Quick Guide to Containerizing Agent Applications with Podman
Alternative solutions compatible with Docker SDK

The Redis instance will listen on ports 6379 and 8001. Your RedisVL client should connect to redis://localhost:6379. You can visit http://localhost:8001 in the browser to open the Redis console.

Install RedisVL

Install RedisVL with pip.

pip install redisvl

After installation, you can use the RedisVL CLI to manage your indexes and keep your testing neat.

rvl index listall

Implement Short-Term Memory Using MessageHistory

There are lots of “How to” RedisVL articles online, so let’s start straight from Microsoft Agent Framework and see how to use MessageHistory for short-term memory.

As in the official tutorial, you should implement a RedisVLMessageStore based on ChatMessageStoreProtocol.

class RedisVLMessageStore(ChatMessageStoreProtocol):
    def __init__(
        self,
        thread_id: str = "common_thread",
        top_k: int = 6,
        session_tag: str | None = None,
        redis_url: str | None = "redis://localhost:6379",
    ):
        self._thread_id = thread_id
        self._top_k = top_k
        self._session_tag = session_tag or f"session_{uuid4()}"
        self._redis_url = redis_url
        self._init_message_history()

In __init__ you should note two parameters.

  • thread_id is used for the name parameter when creating MessageHistory. I like to bind it to the agent. Each agent gets a unique thread_id.
  • session_tag lets you set a tag for each user so different sessions do not mix.

The protocol asks us to implement two methods list_messages and add_messages.

  • list_messages runs before the agent calls the LLM. It gets all available chat messages from the message store. It takes no parameters, so it cannot support long-term memory. More on that later.
  • add_messages runs after the agent gets the LLM’s reply. It stores new messages into the message store.

Here is how the message store works.

The calling order of message store in the agent.
The calling order of message store in the agent. Image by Author

So in list_messages and add_messages we just use RedisVL’s MessageHistory to do the job.

list_messages below uses get_recent to get top_k recent messages and turns them into ChatMessage.

class RedisVLMessageStore(ChatMessageStoreProtocol):
    ...
    
    async def list_messages(self) -> list[ChatMessage]:
        messages: list[dict[str, str]] = self._message_history.get_recent(
            top_k=self._top_k,
            session_tag=self._session_tag,
        )
        return [self._back_to_chat_message(message)
                for message in messages]

add_messages turns the ChatMessage into Redis messages and calls add_messages to store them.

class RedisVLMessageStore(ChatMessageStoreProtocol):
    ...
    
    async def add_messages(self, messages: Sequence[ChatMessage]):
        messages = [self._to_redis_message(message)
                    for message in messages]
        self._message_history.add_messages(
            messages,
            session_tag=self._session_tag
        )

That is short-term memory done with RedisVL. You may also implement deserialize, serialize and update_from_state for saving and loading the memory, but it is not important now. See the full code at the end.

Test RedisVLMessageStore

Let’s build an agent and test the message store.

agent = OpenAILikeChatClient(
    model_id=Qwen3.NEXT
).create_agent(
    name="assistant",
    instructions="You're a little helper who answers my questions in one sentence.",
    chat_message_store_factory=lambda: RedisVLMessageStore(
        session_tag="user_abc"
    )
)

Now a console loop for multi-turn dialog. Remember, Microsoft Agent Framework does not support short-term memory unless you use an AgentThread and pass it to run.

async def main():
    thread = agent.get_new_thread()
    while True:
        user_input = input("User: ")
        if user_input.startswith("exit"):
            break
        response = await agent.run(user_input, thread=thread)
        print(f"\nAssistant: {response.text}")
    thread.message_store.clear()

AgentThread when created calls the factory method to build the RedisVLMessageStore.

To check if the store works, we can use mlflow.openai.autolog() to see if messages sent to the LLM contain historical messages.

import mlflow
mlflow.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
mlflow.set_experiment("Default")
mlflow.openai.autolog()
You can see that the conversation comes with a complete history of messages.
You can see that the conversation comes with a complete history of messages. Image by Author

See my other article for using MLFlow to track LLM calls.

Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial
Enhance your multi-agent application’s observability, explainability and Traceability

Let’s open the Redis console to see the cache.

How the cache is stored in Redis.
How the cache is stored in Redis. Image by Author

As you can see, after using MessageHistory as MAF's message store, we can implement multi-turn conversations with historical messages.

With thread_id and session_tag parameters, we can also implement the feature that lets users switch between multiple conversation sessions, like in popular LLM chat applications.

Feels simpler than the official RedisMessageStore solution right?


Implement Long-Term Memory Using SemanticMessageHistory

SemanticMessageHistory is a subclass of MessageHistory. It adds a get_relevant method for vector search.

Example:

prompt = "what have I learned about the size of England?"
semantic_history.set_distance_threshold(0.35)
context = semantic_history.get_relevant(prompt)
for message in context:
    print(message)
Batches: 100%|██████████| 1/1 [00:00<00:00, 56.30it/s]
{'role': 'user', 'content': 'what is the size of England compared to Portugal?'}

Compared to MessageHistory the big thing here is that we can get the most relevant historical messages based on the user request.

You might think that if MessageStore short-term memory is nice, then SemanticMessageHistory with semantic search must be even better.

From my experience, this is not the case.

From my test results, it is not like that. Let’s now make a long-term memory adapter for Microsoft Agent Framework using SemanticMessageHistory and see the result.

Use SemanticMessageHistory in Microsoft Agent Framework

Earlier I said list_messages in ChatMessageStoreProtocol has no parameters, so we cannot search history. Thus, we cannot use MessageStore for long-term memory.

Microsoft Agent Framework has a ContextProvider class. From its name, it is for context engineering.

So we should build long-term memory on this class.

class RedisVLSemanticMemory(ContextProvider):
    def __init__(
        self,
        thread_id: str | None = None,
        session_tag: str | None = None,
        distance_threshold: float = 0.3,
        redis_url: str = "redis://localhost:6379",
        embedding_model: str = "BAAI/bge-m3",
        embedding_api_key: str | None = None,
        embedding_endpoint: str | None = None,
    ):
        self._thread_id = thread_id or "semantic_thread"
        self._session_tag = session_tag or f"session_{uuid4()}"
        self._distance_threshold = distance_threshold
        self._redis_url = redis_url
        self._embedding_model = embedding_model
        self._embedding_api_key = embedding_api_key or os.getenv("EMBEDDING_API_KEY")
        self._embedding_endpoint = embedding_endpoint or os.getenv("EMBEDDING_ENDPOINT")
        self._init_semantic_store()

ContextProvider has two methods invoked and invoking.

  • invoked runs after LLM call. It stores the latest messages in RedisVL. It has both request_message and response_messages parameters but stores them separately.
  • invoking runs before LLM call. It uses the user’s current input to search for relevant history in RedisVL and returns a Context object.

The Context object has three variables.

  • instructions string. The agent adds this to the system prompt.
  • messages list. Put history messages found in long-term memory here.
  • tools list for functions. The agent adds these tools to its ChatOptions.
The purpose of the three types of messages retrieved.
The purpose of the three types of messages retrieved. Image by Author

Since we want to use vector search to get relevant history, we put those messages in messages. The order between MessageStore messages and ContextProvider messages matters. Here is the order of their calls.

The calling order of long-term and short-term memory in the agent.
The calling order of long-term and short-term memory in the agent. Image by Author

Setting up a TextVectorizer

Semantic vector search needs embeddings. We must set up a vectorizer.

💡 Unlock Full Access for Free!
Subscribe now to read this article and get instant access to all exclusive member content + join our data science community discussions.