One API Call for LLM Generation with Long-Term Memory

Response API combines LLM output, memory store, memory retrieval, and context assembly into one request.

With Response API, you can build agents that understand users over time without managing separate steps or extra logic.

How Response API Works

You send a message with real conversational intent

  • Developers send one input that includes the user’s latest message and any habits, goals, or current issues they want the agent to track.
  • You don’t need separate memory calls, custom rules, or manual context fetches.
  • What you send is pure conversational intent.
  • The Response API handles everything else.
Example input:
{
  "user_id": "u123",
  "message": "I moved to Tokyo recently. I'm still struggling with the morning routine here. Can you help me design a schedule that fits the habits we've discussed before, like my tendency to stay up late and my goal to exercise more?"
}

The API performs the entire memory pipeline internally

Memory Store
Stores long-term facts, personal preferences, goals, and habits as structured memory that can be retrieved in future conversations.
Memory Category
Organizes stored memory into clear categories — such as identity, preferences, goals, and ongoing issues — so the system can retrieve the right information with high accuracy.
Memory Retrieval
Finds all relevant past information using hybrid retrieval and recency logic.
Context Assembly
Combines retrieved memory, recent history, summaries, and state into a model-ready context bundle.
LLM Response Generation
Produces the final answer using the enriched context — all within the same call.

You receive a memory-enhanced response

The Response API returns:

The final LLM reply
Structured long-term memory
Retrieved past facts
Updates made during this turn

Just one Response API request that does everything an agent with memory needs.

Why Developers Choose Response API

Unified Request Model
Replace multi-step memory workflows with one consistent Response API call.
High-Accuracy Memory Retrieval
Response API gives your agent memU’s high-accuracy memory retrieval, making it easy to build long-term agents that remember users reliably.
Context Optimized for LLMs
Your agent receives structured memory and distilled context that models can use effectively.
Works With Any Model
Compatible with OpenAI, Anthropic, DeepSeek, local models, or any custom inference pipeline.
Focus on Agent Logic, Not Memory Engineering
You write the agent’s behavior. The API manages memory intelligence.

Pricing

Model
Input (per 1K tokens)
Output (per 1K tokens)

GPT Models

gpt-5-nano
$0.00005
$0.00040
gpt-5-mini
$0.00025
$0.00200
gpt-5
$0.00125
$0.01000
gpt-4.1
$0.00200
$0.00800
gpt-4.1-mini
$0.00040
$0.00160
gpt-4o
$0.00250
$0.01000
gpt-4o-mini
$0.00015
$0.00060

Gemini Models

gemini-2.5-pro
$0.00125
$0.01000
gemini-2.5-flash
$0.00030
$0.00250

Claude Models

claude-opus-4.1
$0.01500
$0.07500
claude-sonnet-4
$0.00300
$0.01500

Grok Models

grok-4
$0.00300
$0.01500

DeepSeek Models

deepseek-R1
$0.00055
$0.00219
deepseek-v3
$0.00027
$0.00110
deepseek-v3.1
$0.00055
$0.00165
deepseek-v3.1-thinking
$0.00055
$0.00165

MemU ModelsThe memory model is automatically invoked during conversations, but not on every interaction. The frequency of invocation is determined by factors such as context length and time intervals to optimize performance and cost-effectiveness.

memU-memory-model
$0.00060
$0.00200

FAQ

Agent memory (also known as agentic memory) is an advanced AI memory system where autonomous agents intelligently manage, organize, and evolve memory structures. It enables AI applications to autonomously store, retrieve, and manage information with higher accuracy and faster retrieval than traditional memory systems.

MemU improves AI memory performance through three key capabilities: higher accuracy via intelligent memory organization, faster retrieval through optimized indexing and caching, and lower cost by reducing redundant storage and API calls.

Agentic memory offers autonomous memory management, automatic organization and linking of related information, continuous evolution and optimization, contextual retrieval, and reduced human intervention compared to traditional static memory systems.

Yes, MemU is an open-source agent memory framework. You can self-host it, contribute to the project, and integrate it into your LLM applications. We also offer a cloud version for easier deployment.

Agent memory can be used in various LLM applications including AI assistants, chatbots, conversational AI, AI companions, customer support bots, AI tutors, and any application that requires contextual memory and personalization.

While vector databases provide semantic search capabilities, agent memory goes beyond by autonomously managing memory lifecycle, organizing information into interconnected knowledge graphs, and evolving memory structures over time based on usage patterns and relevance.

Yes, MemU integrates seamlessly with popular LLM frameworks including LangChain, LangGraph, CrewAI, OpenAI, Anthropic, and more. Our SDK provides simple APIs for memory operations across different platforms.

MemU offers autonomous memory organization, intelligent memory linking, continuous memory evolution, contextual retrieval, multi-modal memory support, real-time synchronization, and extensive integration options with LLM frameworks.

Build Agents That Remember

A single API call. A complete memory-aware agent loop.