One API Call for LLM Generation with Long-Term Memory

Response API combines LLM output, memory store, memory retrieval, and context assembly into one request.

With Response API, you can build agents that understand users over time without managing separate steps or extra logic.

How Response API Works

You send a message with real conversational intent

Developers send one input that includes the user’s latest message and any habits, goals, or current issues they want the agent to track.
You don’t need separate memory calls, custom rules, or manual context fetches.
What you send is pure conversational intent.
The Response API handles everything else.

Example input:

{
  "user_id": "u123",
  "message": "I moved to Tokyo recently. I'm still struggling with the morning routine here. Can you help me design a schedule that fits the habits we've discussed before, like my tendency to stay up late and my goal to exercise more?"
}

The API performs the entire memory pipeline internally

Memory Store

Stores long-term facts, personal preferences, goals, and habits as structured memory that can be retrieved in future conversations.

Memory Category

Organizes stored memory into clear categories — such as identity, preferences, goals, and ongoing issues — so the system can retrieve the right information with high accuracy.

Memory Retrieval

Finds all relevant past information using hybrid retrieval and recency logic.

Context Assembly

Combines retrieved memory, recent history, summaries, and state into a model-ready context bundle.

LLM Response Generation

Produces the final answer using the enriched context — all within the same call.

You receive a memory-enhanced response

The Response API returns:

The final LLM reply

Structured long-term memory

Retrieved past facts

Updates made during this turn

Just one Response API request that does everything an agent with memory needs.

Why Developers Choose Response API

Unified Request Model

Replace multi-step memory workflows with one consistent Response API call.

High-Accuracy Memory Retrieval

Response API gives your agent memU’s high-accuracy memory retrieval, making it easy to build long-term agents that remember users reliably.

Context Optimized for LLMs

Your agent receives structured memory and distilled context that models can use effectively.

Works With Any Model

Compatible with OpenAI, Anthropic, DeepSeek, local models, or any custom inference pipeline.

Focus on Agent Logic, Not Memory Engineering

You write the agent’s behavior. The API manages memory intelligence.

Pricing

Model

Input (per 1K tokens)

Output (per 1K tokens)

GPT Models

gpt-5-nano

$0.00005

$0.00040

gpt-5-mini

$0.00025

$0.00200

gpt-5

$0.00125

$0.01000

gpt-4.1

$0.00200

$0.00800

gpt-4.1-mini

$0.00040

$0.00160

gpt-4o

$0.00250

$0.01000

gpt-4o-mini

$0.00015

$0.00060

Gemini Models

gemini-2.5-pro

$0.00125

$0.01000

gemini-2.5-flash

$0.00030

$0.00250

Claude Models

claude-opus-4.1

$0.01500

$0.07500

claude-sonnet-4

$0.00300

$0.01500

Grok Models

grok-4

$0.00300

$0.01500

DeepSeek Models

deepseek-R1

$0.00055

$0.00219

deepseek-v3

$0.00027

$0.00110

deepseek-v3.1

$0.00055

$0.00165

deepseek-v3.1-thinking

$0.00055

$0.00165

MemU ModelsThe memory model is automatically invoked during conversations, but not on every interaction. The frequency of invocation is determined by factors such as context length and time intervals to optimize performance and cost-effectiveness.

memU-memory-model

$0.00060

$0.00200

FAQ

Agent memory (also known as agentic memory) is an advanced AI memory system where autonomous agents intelligently manage, organize, and evolve memory structures. It enables AI applications to autonomously store, retrieve, and manage information with higher accuracy and faster retrieval than traditional memory systems.

MemU improves AI memory performance through three key capabilities: higher accuracy via intelligent memory organization, faster retrieval through optimized indexing and caching, and lower cost by reducing redundant storage and API calls.

Agentic memory offers autonomous memory management, automatic organization and linking of related information, continuous evolution and optimization, contextual retrieval, and reduced human intervention compared to traditional static memory systems.

Yes, MemU is an open-source agent memory framework. You can self-host it, contribute to the project, and integrate it into your LLM applications. We also offer a cloud version for easier deployment.

Agent memory can be used in various LLM applications including AI assistants, chatbots, conversational AI, AI companions, customer support bots, AI tutors, and any application that requires contextual memory and personalization.

While vector databases provide semantic search capabilities, agent memory goes beyond by autonomously managing memory lifecycle, organizing information into interconnected knowledge graphs, and evolving memory structures over time based on usage patterns and relevance.

Yes, MemU integrates seamlessly with popular LLM frameworks including LangChain, LangGraph, CrewAI, OpenAI, Anthropic, and more. Our SDK provides simple APIs for memory operations across different platforms.

MemU offers autonomous memory organization, intelligent memory linking, continuous memory evolution, contextual retrieval, multi-modal memory support, real-time synchronization, and extensive integration options with LLM frameworks.

Build Agents That Remember

A single API call. A complete memory-aware agent loop.

Start building in minutes.