Home > Wiki > Toolchain > Langgraph > Concepts > Memory

Memory

LangGraph distinguishes two memory tiers: short-term (thread-scoped) and long-term (cross-thread). Together they enable agents that learn, remember preferences, and maintain context across conversations.

Short-Term Memory

Thread-Scoped

Short-term memory is scoped to a single thread and managed automatically by the checkpointer. Each thread maintains its own isolated state.

config_1 = {"configurable": {"thread_id": "user-a"}}
config_2 = {"configurable": {"thread_id": "user-b"}}

graph.invoke({"msg": "hi"}, config_1)  # user A: ["hi"]
graph.invoke({"msg": "hello"}, config_2)  # user B: ["hello"]
graph.invoke({"msg": "how are you"}, config_1)  # user A: ["hi", "how are you"]

MessagesState with add_messages Reducer

MessagesState uses the add_messages reducer, which is idempotent -- appending message objects uses their ID to deduplicate:

from langgraph.graph import MessagesState

class State(MessagesState):
    preferences: dict
    session_count: int

from langgraph.graph.message import add_messages
from typing import Annotated
from langchain_core.messages import BaseMessage

class State(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

The add_messages reducer: - Appends new messages - Replaces existing messages with matching IDs (for tool calls, edits) - Deduplicates automatically

Managing Conversation History

Long-running conversations need history management to stay within context windows:

Summarization

def summarize_conversation(state):
    if len(state["messages"]) > 20:
        old_messages = state["messages"][:-10]
        summary = llm.invoke(f"Summarize this conversation:\n{old_messages}")
        recent = state["messages"][-10:]
        return {"messages": [SystemMessage(f"Prior summary: {summary}")] + recent}
    return {}

Trimming

from langchain_core.messages import trim_messages

def trim_if_needed(state):
    state["messages"] = trim_messages(
        state["messages"],
        max_tokens=4000,
        strategy="last",
        token_counter=model,
        include_system=True
    )
    return {"messages": state["messages"]}

Long-Term Memory

Cross-Thread via Store

Long-term memory persists across threads using the Store. One user's memory can influence all their future workflows:

from langgraph.store.memory import InMemoryStore

store = InMemoryStore()
graph = builder.compile(checkpointer=checkpointer, store=store)

Profile vs Collection Approaches

Approach	Description	Best For
Profile	Single aggregated user profile (preferences, facts)	Simple user model
Collection	Many discrete memory items searched at query time	Rich, queryable memory

Profile:

store.put(("users", user_id, "profile"), "main", {"prefs": {...}, "facts": [...]})

Collection:

store.put(("users", user_id, "memories"), "mem-1", {"type": "preference", "content": "likes dark mode"})
store.put(("users", user_id, "memories"), "mem-2", {"type": "fact", "content": "lives in Paris"})

relevant = store.search(
    ("users", user_id, "memories"),
    filter={"type": "preference"},
    limit=5
)

Memory Types

Type	Description	Example
Semantic	Facts and knowledge	"User lives in NYC", "Company policy: refunds within 30 days"
Episodic	Past interactions and experiences	"Last session we debugged the billing pipeline"
Procedural	How to do things, learned patterns	"For this user, always format dates as DD/MM/YYYY"

Writing Memories

Hot Path (Inline)

Write memory during graph execution -- simple but adds latency:

def extract_memories(state, *, store, config):
    user_id = config["configurable"]["thread_id"]

    memory = llm.invoke(f"What facts about the user are in: {state['messages']}")
    store.put(("users", user_id, "memories"), str(uuid4()), memory)

    return {}

Background (Deferred)

Enqueue memory extraction to run asynchronously, keeping the main flow fast:

def schedule_memory_extraction(state):
    return {"memory_queue": [{"conversation": state["messages"][-5:]}]}

A separate process or cron job reads the queue and writes to the store, avoiding hot-path latency.

Trustcall for Memory Management

Trustcall provides structured memory extraction with validation:

from trustcall import extract

schema = {
    "type": "object",
    "properties": {
        "memories": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "type": {"enum": ["fact", "preference", "episode"]},
                    "content": {"type": "string"},
                    "confidence": {"type": "number"}
                }
            }
        }
    }
}

memories = extract(
    messages=state["messages"],
    schema=schema,
    prompt="Extract user memories from this conversation"
)

Trustcall ensures memories are well-formed, typed, and include confidence scores before storage.

Memory Retrieval Pattern

def inject_memory(state, *, store, config):
    user_id = config["configurable"]["thread_id"]
    namespace = ("users", user_id, "memories")

    query = state["messages"][-1].content
    memories = store.search(namespace, filter=None, limit=10)

    memory_context = "\n".join([m.value["content"] for m in memories])
    state["messages"].insert(0, SystemMessage(f"User memories:\n{memory_context}"))

    return {"messages": state["messages"]}

This pattern retrieves cross-thread memories at the start of each workflow, making the agent continuously context-aware.

Related: Persistence, Interrupts, Store API