Memory Steps

For common fields, string substitutions, metadata filters, caching, and execution order, see the Agent Steps Overview.

Memory

Memory gives agents the ability to remember information across runs. Unlike retrieval (which searches external knowledge bases), memory is a system-managed memory store that agents write to and read from automatically. Each memory bank controls its own embedding, compaction, and retention settings — create and manage them from the Memory Banks page.

How It Works

Each memory bank has its own content source that stores embedded entries — chunked, vectorized, and ready for similarity search — just like any other knowledge base content. The source is provisioned automatically when the bank is created.

Each memory entry is partitioned by a key (e.g. {{metadata.user_id}} for per-user memory, or a custom key for per-project or per-session scoping). This lets a single memory store serve many independent groupings without interference.

There are two memory bank types — but the same three unified step types work with both:

Conversation banks — Designed for chat-style use cases. Entries are tagged with a speaker ("user" or "assistant"). The speaker field is required when writing to a conversation bank.
General banks — Designed for facts, preferences, entity data, and standalone knowledge. No speaker concept — the speaker field is not used when writing to a general bank.

The bank type is determined automatically from the memory_bank_id — you don't need to choose different step types for different bank types.

The Three Memory Steps

Step	Purpose	Best For
Add Memory	Write a new entry to a memory bank	Storing facts, preferences, decisions, summaries
Search Memory	Vector similarity search across entries	Recalling relevant past context before a prompt call
Load Memory	Load all entries chronologically	Summarization, timeline review, full-context prompting

Use Search when you want the most semantically relevant entries for a query. Use Load when you need the complete history in order (e.g. to summarize or compact it).

Key Concepts

Key — A string that partitions memory entries. Use {{metadata.user_id}} for per-user memory, project-{{metadata.project_id}} for per-project memory, or any custom key. All search and load operations are automatically scoped to the specified key.
Speaker (conversation banks only) — Each entry records who said what ("user" or "assistant"). Required when writing to a conversation bank; omitted for general banks.
Metadata — Optional key-value pairs attached to memory entries. Metadata values are stored alongside the entry and can be used for filtering in search steps.
Memory banks — Create and manage memory banks from the Memory Banks page or via the Public API and MCP tools. Each bank controls its embedding model, compaction prompt, and retention settings.

Common Patterns

Remember and Recall — The most common pattern. Search memory at the start of a workflow to inject past context, generate a response, then extract and store new facts for the next run.

Step 1: Search Memory (key: "{{metadata.user_id}}", query: "{{input}}")
  Step 2: Prompt Call — "Past context:\n{{input}}\n\nUser question: {{agent.input}}"
    Step 3: Prompt Call — "Extract new facts from this exchange"
      Step 4: Add Memory (key: "{{metadata.user_id}}", speaker: "assistant")
    Step 5: Streaming Result

Figure 1.Remember and Recall — the agent searches memory for context, generates a response, extracts new facts in parallel, and stores them back.

Progressive Summarization — Periodically load the full history, summarize it, and store the summary as a compacted entry. This keeps memory manageable as conversations grow.

Step 1: Load Memory (key: "{{metadata.user_id}}", order: "oldest_first")
  Step 2: Prompt Call — "Summarize this conversation history into key points"
    Step 3: Add Memory (key: "{{metadata.user_id}}", speaker: "assistant")

Figure 2.Progressive Summarization — the agent loads full history, summarizes it, and stores the summary as a new entry.

Preference Learning — Extract user preferences from conversations and persist them for future personalization.

Step 1: Insight — "Extract any user preferences from: {{input}}"
  Step 2: Add Memory (key: "{{metadata.user_id}}")

Figure 3.Preference Learning — the agent extracts user preferences and stores them in memory for future runs.

Combined pattern (personalised chatbot — uses BOTH bank types):

The Add Memory step on a conversation bank already returns the conversation history as a messages array (see Conversation Bank Output Format), so there's no need for a Load Memory step after it — the history flows directly into the Prompt Call.

Step 1: Search Memory (general bank, key: "{{metadata.user_id}}") — recall preferences
  Step 2: Add Memory (conversation bank, key: "{{metadata.user_id}}", speaker: "user") — save user msg → returns history
    Step 3: Prompt Call — generate a personalised response using preferences + history
      Step 4a: Streaming Result
      Step 4b: Add Memory (conversation bank, key: "{{metadata.user_id}}", speaker: "assistant") — save reply
      Step 4c: Add Memory (general bank, key: "{{metadata.user_id}}") — store new preferences

Figure 4.Combined pattern — the agent recalls preferences from a general bank, saves the user message to a conversation bank (which returns the history), generates a personalised response, then streams and persists in parallel.

Memory vs. Retrieval

	Memory	Retrieval
Data source	Agent-written entries (automatic)	External content from sources (RSS, websites, uploads)
Setup	None — content source provisioned at bank creation	Requires source + knowledge base configuration
Scoping	Partitioned by `key`	Filtered by knowledge base and metadata
Write method	Add Memory step	Source connections ingest content
Read method	Search Memory or Load Memory steps	Retrieval step
Best for	Per-user/session context, preferences, facts	Shared organizational knowledge, documentation

Add Memory Step

Writes content to a memory bank. Each memory entry is embedded for vector search and partitioned by key, allowing agents to remember information across runs (e.g. user preferences, past decisions, accumulated facts). For conversation-type banks, the speaker field is also required.

Fields

Field	Type	Required	Default	Description
`memory_bank_id`	UUID	Yes	—	The memory bank to write to. Select one from the Memory Banks page.
`key`	string	Yes	—	A key that partitions memory entries (e.g. per user, per session). Supports substitutions — use `{{metadata.user_id}}` for per-user memory.
`speaker`	string	No	`null`	Identifies who is "speaking" — `"user"` or `"assistant"`. Required for conversation banks, omitted for general banks. Supports substitutions.
`content`	string	No	`null`	The content to store in memory. If omitted, the step's input is used (equivalent to `{{input}}`). Supports substitutions.
`entry_metadata`	object	No	`null`	Optional metadata object to attach to the entry. Stored as key-value pairs and available for filtering.

Behavior

The memory bank's content source is provisioned when the bank is created — no manual setup required.
Memory entries are stored as content versions, chunked, and embedded for vector similarity search.
The key is stored as metadata on each entry, allowing the search step to filter by key automatically.
For conversation banks, the speaker field is stored as metadata, enabling you to distinguish who said what when loading or searching conversation history.
The step's input is passed through as-is to child steps (the memory write is a side effect).
Agent ID and agent run ID are automatically included in the entry's metadata for traceability.

Conversation Bank Output Format

When writing to a conversation bank, the Add Memory step returns the updated conversation history formatted as a JSON messages array — up to 100 entries in chronological order. The actual entries depend on the bank's compaction settings: compaction consolidates older entries into summaries, so the history you get back is the compacted version rather than raw individual messages:

[
  { "role": "user", "content": "What's the weather like?" },
  { "role": "assistant", "content": "It's sunny and 22°C today." },
  { "role": "user", "content": "Should I bring a jacket?" }
]

This format is directly compatible with a downstream Prompt Call that has use_step_input_as_messages: true (see Input as Messages). Each message is sent as a separate element in the LLM API call, which enables prompt caching on supported providers — the model caches the stable prefix of older messages and only processes the newest ones, reducing latency and cost on every turn.

Tip: For general banks, the step is a pass-through — the written content is returned as-is to child steps.

Use Case Examples

Save a conversation turn (conversation bank):

Step 1: Add Memory (key: "{{metadata.user_id}}", speaker: "user", content: "{{agent.input}}")

Figure 5.Save conversation turn — the agent immediately stores the user's message in a conversation memory bank.

Chatbot pattern — add user message, respond, save assistant reply:

The most common conversational pattern. The Add Memory step saves the user's message and returns the full conversation history as a messages array. A Prompt Call with use_step_input_as_messages: true sends the history directly to the model (enabling prompt caching). The response is then delivered and saved in parallel.

Step 1: Add Memory (conversation bank, key: "{{metadata.user_id}}", speaker: "user", content: "{{agent.input}}")
  Step 2: Prompt Call (use_step_input_as_messages: true, system_template: "You are a helpful assistant.")
    Step 3a: Streaming Result
    Step 3b: Add Memory (conversation bank, key: "{{metadata.user_id}}", speaker: "assistant")

The Add Memory step in step 1 returns the conversation history as:

[
  { "role": "user", "content": "previous message..." },
  { "role": "assistant", "content": "previous reply..." },
  { "role": "user", "content": "current message" }
]

This flows directly into the Prompt Call, which sends each message individually to the model. Older messages form a stable prefix that providers can cache — only the newest message needs full processing.

Figure 6.Chatbot pattern — the agent saves the user message, sends the conversation history as messages to the model (enabling prompt caching), then streams the response and saves it in parallel.

Store user preferences (general bank):

Step 1: Prompt Call — "Extract any user preferences from: {{input}}"
Step 2: Add Memory (key: "{{metadata.user_id}}", content: "{{input}}")

Figure 7.Extract preferences → Add Memory — the agent extracts user preferences and stores them in a general memory bank.

Accumulate facts across runs:

Step 1: Insight — Extract key facts from the input content
Step 2: Add Memory (key: "project-{{metadata.project_id}}", entry_metadata: {"type": "fact"})

Figure 8.Insight → Add Memory (facts) — the agent extracts key facts and accumulates them in memory across runs.

Search Memory Step

Searches a memory bank using vector similarity search. Retrieves memory entries previously written by the Add Memory step, optionally scoped to a key. This allows agents to recall past context — such as user preferences, previous decisions, or accumulated knowledge — and use it in the current run.

Fields

Field	Type	Required	Default	Description
`memory_bank_id`	UUID	Yes	—	The memory bank to search. Select one from the Memory Banks page.
`key`	string	No	`null`	Optional key to scope the search (must match the key used when writing). For cross-key searches, omit this field. Supports substitutions.
`query`	string	No	`null`	The query used to search memory entries. If not specified, the output from the previous step is used. Supports substitutions.
`top_n`	integer	No	`10`	The number of top memory entries to retrieve (1–100).
`added_after`	string	No	`null`	Time filter for entries added after this time. Supports time substitutions (`{{1 week ago}}`, `{{yesterday}}`).
`added_before`	string	No	`null`	Time filter for entries added before this time. Supports time substitutions.
`filter`	object	No	`null`	Optional metadata filter to further narrow results (e.g. by tags). Uses the same MongoDB-style filter syntax as retrieval steps.
`content_type`	enum	No	`application/json`	Output format: `application/json`, `text/html`, `text/plain`, or `application/xml` (XML requests are coerced to JSON, matching retrieval).

Behavior

When a key is provided, the search is scoped by injecting {"key": {"$eq": "<key>"}} as a metadata filter. Any user-supplied filter is combined with this using $and.
Output format matches the retrieval step: JSON returns {"matches": [...]} with content_title, content, content_url, content_publish_date, etc. For application/xml, the response body is still JSON (coerced to match retrieval behavior).
If no memory entries exist yet (first run), the search returns an empty matches array.
The memory bank's content source is provisioned when the bank is created.

Use Case Examples

Recall user preferences before generating a response:

Step 1: Search Memory (key: "{{metadata.user_id}}", query: "user preferences")
  Step 2: Prompt Call — "Using these remembered preferences: {{input}}\n\nAnswer: {{agent.input}}"

Figure 9.Search Memory → Prompt Call — the agent recalls user preferences from memory and uses them to personalise the response.

Search memories with metadata filtering:

Step 1: Search Memory (
  key: "project-{{metadata.project_id}}",
  query: "{{input}}",
  filter: {"category": {"$eq": "decision"}},
  top_n: 20
)

Figure 10.Search Memory (filtered) — the agent searches memory with a metadata filter to find only decision-type entries.

Time-bounded memory recall:

Step 1: Search Memory (
  key: "{{metadata.user_id}}",
  query: "{{input}}",
  added_after: "{{1 week ago}}",
  top_n: 5
)

Figure 11.Search Memory (time-bounded) — the agent searches only recent memory entries from the past week.

Cross-key search (no key):

Step 1: Search Memory (query: "{{input}}", top_n: 10)
  Step 2: Prompt Call — "Based on this knowledge: {{input}}\n\nAnswer: {{agent.input}}"

Figure 12.Search Memory (cross-key) — the agent searches across all memory keys to find globally relevant context.

Building a Memory-Augmented Agent

Combine the write and search steps to create agents with persistent memory:

Pattern	Description
Remember & Recall	Search memory at the start, generate a response, then write new facts back to memory
Preference Learning	Extract user preferences from conversations and store them for future personalization
Progressive Summarization	Periodically summarize accumulated memories and store the summary as a new higher-level memory entry
Context Enrichment	Search memory to add relevant past context before a prompt call or retrieval step

Example: Full memory-augmented Q&A agent:

Step 1: Search Memory (key: "{{metadata.user_id}}", query: "{{input}}")
  Step 2: Prompt Call — "Past context:\n{{input}}\n\nUser question: {{agent.input}}\n\nAnswer the question, referencing past context where relevant."
    Step 3: Prompt Call — "Extract any new facts or preferences from this exchange:\nQ: {{agent.input}}\nA: {{input}}"
      Step 4: Add Memory (key: "{{metadata.user_id}}", content: "{{input}}")
    Step 5: Streaming Result

This agent: searches memory for relevant past context, generates a response, extracts new facts from the exchange, and stores them for future runs.

Figure 13.Memory-augmented Q&A: the agent recalls context from memory, generates a response, extracts new facts, stores them back, and delivers the answer.

Load Memory Step

Loads all memory entries for a key in chronological order. Unlike the Search Memory step (which performs vector similarity search), this step retrieves the full list of entries ordered by creation time. Use this when you need the complete entry set rather than the most semantically relevant matches.

Fields

Field	Type	Required	Default	Description
`memory_bank_id`	UUID	Yes	—	The memory bank to load from. Select one from the Memory Banks page.
`key`	string	Yes	—	The key used to scope the memory load (must match the key used when writing). Supports substitutions.
`order`	enum	No	`newest_first`	`newest_first` returns most recent entries first; `oldest_first` returns earliest entries first.
`limit`	integer	No	`100`	Maximum number of entries to load (1–1000).
`content_type`	enum	No	`application/json`	Output format: `application/json`, `text/html`, `text/plain`, or `application/xml`.

Behavior

No vector search is performed — entries are retrieved by creation date in the specified order.
The key is matched exactly against entry metadata.
JSON output returns {"entries": [...]} with each entry containing content and created_at.
Text output returns entries separated by --- dividers.
If no entries exist for the given key, an empty result is returned.
This is a composite step — it supports child steps that receive the loaded entries as input.

Conversation Bank Output Format

When loading from a conversation bank, the Load Memory step auto-detects conversation entries and formats the output as a JSON messages array — the same format used by Add Memory:

[
  { "role": "user", "content": "Hello" },
  { "role": "assistant", "content": "Hi! How can I help?" },
  { "role": "user", "content": "Tell me about prompt caching" }
]

This is directly compatible with a downstream Prompt Call that has use_step_input_as_messages: true (see Input as Messages). Use order: "oldest_first" so the oldest messages form a stable prefix that providers can cache — using newest_first would change the prefix on every turn, defeating the cache.

Tip: For general banks, the output is a JSON array of [{"content", "title"}] objects.

Use Case Examples

Load full history for summarisation:

Step 1: Load Memory (key: "{{metadata.user_id}}", order: "oldest_first")
  Step 2: Prompt Call — "Summarise the following conversation history: {{input}}"
    Step 3: Add Memory (key: "{{metadata.user_id}}", speaker: "assistant")

Figure 14.Load Memory → Summarize → Add Memory — the agent loads full conversation history, summarizes it, and stores the summary.

Load recent entries for context:

Step 1: Load Memory (key: "{{metadata.user_id}}", order: "newest_first", limit: 10)
  Step 2: Prompt Call — "Based on recent context: {{input}}\n\nAnswer: {{agent.input}}"

Figure 15.Load Memory (recent) → Prompt Call — the agent loads the 10 most recent entries and uses them as context.

Export entries as plain text:

Step 1: Load Memory (
  key: "project-{{metadata.project_id}}",
  order: "oldest_first",
  limit: 1000,
  content_type: "text/plain"
)

Figure 16.Load Memory (export) — the agent loads all entries as plain text for export or downstream processing.

Load all preferences before generating a personalised response:

Step 1: Load Memory (key: "{{metadata.user_id}}", order: "newest_first")
  Step 2: Prompt Call — "Given these user preferences: {{input}}\n\nAnswer: {{agent.input}}"

Figure 17.Load Memory (preferences) → Prompt Call — the agent loads all stored preferences to personalise the response.

Load vs Search: When to Use Each

Use Case	Step to Use	Why
Find relevant past context	Search Memory	Vector similarity finds the most relevant entries
Summarise all interactions	Load Memory	Chronological order gives the full timeline
Answer a specific question	Search Memory	Semantic search finds matching context
Pre-load preferences or persona	Load Memory	Complete group contents needed, not just top match
Export or review history	Load Memory	Ordered, complete list of all entries
Inject the last N entries	Load Memory	Newest-first with a limit gives recent entries

← Back to Agent Steps Overview