Memory
LangGraph distinguishes two memory tiers: short-term (thread-scoped) and long-term (cross-thread). Together they enable agents that learn, remember preferences, and maintain context across conversations.
Short-Term Memory
Thread-Scoped
Short-term memory is scoped to a single thread and managed automatically by the checkpointer. Each thread maintains its own isolated state.
config_1 = {"configurable": {"thread_id": "user-a"}}
config_2 = {"configurable": {"thread_id": "user-b"}}
graph.invoke({"msg": "hi"}, config_1) # user A: ["hi"]
graph.invoke({"msg": "hello"}, config_2) # user B: ["hello"]
graph.invoke({"msg": "how are you"}, config_1) # user A: ["hi", "how are you"]
MessagesState with add_messages Reducer
MessagesState uses the add_messages reducer, which is idempotent -- appending message objects uses their ID to deduplicate:
from langgraph.graph import MessagesState
class State(MessagesState):
preferences: dict
session_count: int
from langgraph.graph.message import add_messages
from typing import Annotated
from langchain_core.messages import BaseMessage
class State(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
The add_messages reducer:
- Appends new messages
- Replaces existing messages with matching IDs (for tool calls, edits)
- Deduplicates automatically
Managing Conversation History
Long-running conversations need history management to stay within context windows:
Summarization
def summarize_conversation(state):
if len(state["messages"]) > 20:
old_messages = state["messages"][:-10]
summary = llm.invoke(f"Summarize this conversation:\n{old_messages}")
recent = state["messages"][-10:]
return {"messages": [SystemMessage(f"Prior summary: {summary}")] + recent}
return {}
Trimming
from langchain_core.messages import trim_messages
def trim_if_needed(state):
state["messages"] = trim_messages(
state["messages"],
max_tokens=4000,
strategy="last",
token_counter=model,
include_system=True
)
return {"messages": state["messages"]}
Long-Term Memory
Cross-Thread via Store
Long-term memory persists across threads using the Store. One user's memory can influence all their future workflows:
from langgraph.store.memory import InMemoryStore
store = InMemoryStore()
graph = builder.compile(checkpointer=checkpointer, store=store)
Profile vs Collection Approaches
| Approach | Description | Best For |
|---|---|---|
| Profile | Single aggregated user profile (preferences, facts) | Simple user model |
| Collection | Many discrete memory items searched at query time | Rich, queryable memory |
Profile:
store.put(("users", user_id, "profile"), "main", {"prefs": {...}, "facts": [...]})
Collection:
store.put(("users", user_id, "memories"), "mem-1", {"type": "preference", "content": "likes dark mode"})
store.put(("users", user_id, "memories"), "mem-2", {"type": "fact", "content": "lives in Paris"})
relevant = store.search(
("users", user_id, "memories"),
filter={"type": "preference"},
limit=5
)
Memory Types
| Type | Description | Example |
|---|---|---|
| Semantic | Facts and knowledge | "User lives in NYC", "Company policy: refunds within 30 days" |
| Episodic | Past interactions and experiences | "Last session we debugged the billing pipeline" |
| Procedural | How to do things, learned patterns | "For this user, always format dates as DD/MM/YYYY" |
Writing Memories
Hot Path (Inline)
Write memory during graph execution -- simple but adds latency:
def extract_memories(state, *, store, config):
user_id = config["configurable"]["thread_id"]
memory = llm.invoke(f"What facts about the user are in: {state['messages']}")
store.put(("users", user_id, "memories"), str(uuid4()), memory)
return {}
Background (Deferred)
Enqueue memory extraction to run asynchronously, keeping the main flow fast:
def schedule_memory_extraction(state):
return {"memory_queue": [{"conversation": state["messages"][-5:]}]}
A separate process or cron job reads the queue and writes to the store, avoiding hot-path latency.
Trustcall for Memory Management
Trustcall provides structured memory extraction with validation:
from trustcall import extract
schema = {
"type": "object",
"properties": {
"memories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {"enum": ["fact", "preference", "episode"]},
"content": {"type": "string"},
"confidence": {"type": "number"}
}
}
}
}
}
memories = extract(
messages=state["messages"],
schema=schema,
prompt="Extract user memories from this conversation"
)
Trustcall ensures memories are well-formed, typed, and include confidence scores before storage.
Memory Retrieval Pattern
def inject_memory(state, *, store, config):
user_id = config["configurable"]["thread_id"]
namespace = ("users", user_id, "memories")
query = state["messages"][-1].content
memories = store.search(namespace, filter=None, limit=10)
memory_context = "\n".join([m.value["content"] for m in memories])
state["messages"].insert(0, SystemMessage(f"User memories:\n{memory_context}"))
return {"messages": state["messages"]}
This pattern retrieves cross-thread memories at the start of each workflow, making the agent continuously context-aware.
Related: Persistence, Interrupts, Store API