Add Persistent Chat Memory in Next.js + OpenAI with Postgres (Next.js AI memory)

9 min read

Published

Updated 5 months ago

Why persistent memory matters for chat apps

If you’re building a chat experience in Next.js, users quickly expect the assistant to “remember” context across page refreshes, new sessions, and even different devices. That expectation is hard to meet with in-memory state alone, because serverless functions restart, edge runtimes are stateless, and browser storage is limited and user-specific. The practical solution is persistent storage: save conversation messages (and optionally summaries) in a database and reload them when a user returns.

In this guide, you’ll implement Next.js AI memory using OpenAI for responses and Postgres for durable storage. The core idea is simple: every message is stored with a conversation ID, and each new response is generated using the most relevant prior messages (or a summary) pulled from Postgres.

What you’ll build

A Postgres schema for conversations and messages
A Next.js API route (or Route Handler) that writes user/assistant messages to Postgres
A memory-loading strategy that retrieves recent messages (and optionally a summary) to send to the model
A safe, scalable pattern that avoids unbounded context growth

Architecture overview: Next.js + OpenAI + Postgres

A typical persistent-memory flow looks like this:

Client sends a user message along with a conversationId (or requests a new one).
Server stores the user message in Postgres.
Server loads memory for that conversation (e.g., last N messages, plus an optional summary).
Server calls the OpenAI API with the memory + the new user message.
Server stores the assistant reply in Postgres.
Server returns the assistant reply to the client.

This approach keeps your app stateless at the runtime level while maintaining state in Postgres.

Prerequisites

Next.js (App Router recommended)
A Postgres database (local, managed, or hosted)
An OpenAI API key stored in environment variables
A Postgres client library (e.g., node-postgres) or an ORM (e.g., Prisma)

This article uses plain SQL and a minimal Postgres client to keep the concepts portable. If you prefer Prisma, you can translate the schema and queries directly.

Step 1: Design a Postgres schema for chat memory

You need two core entities: conversations and messages. Messages should be ordered, attributable (user vs assistant), and timestamped. You may also want a place to store a rolling summary to keep prompts small.

-- conversations table
CREATE TABLE IF NOT EXISTS conversations (
  id UUID PRIMARY KEY,
  user_id TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  summary TEXT
);

-- messages table
CREATE TABLE IF NOT EXISTS messages (
  id UUID PRIMARY KEY,
  conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user','assistant','system')),
  content TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_messages_conversation_created
  ON messages (conversation_id, created_at);

CREATE INDEX IF NOT EXISTS idx_conversations_user
  ON conversations (user_id, updated_at);

Notes:

Use UUIDs so IDs are unique across devices and environments.
Index (conversation_id, created_at) for fast “load recent messages” queries.
Store summary on the conversation row to avoid scanning messages when building prompts. (Optional but useful.)

Step 2: Connect Next.js to Postgres safely

In Next.js, keep your database connection on the server only. Store credentials in environment variables and never expose them to the client.

// lib/db.ts
import { Pool } from 'pg';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

export async function query<T = any>(text: string, params?: any[]): Promise<{ rows: T[] }> {
  return pool.query(text, params);
}

If you deploy to a serverless environment, consider using a Postgres provider and connection strategy appropriate for serverless (for example, pooling via a provider-specific solution). The exact setup depends on where you host, so follow your provider’s guidance.

Step 3: Create a Route Handler for chat with memory

With the App Router, you can implement a POST endpoint at app/api/chat/route.ts. The endpoint will:

Validate input
Create or reuse a conversation
Insert the user message
Load memory (summary + recent messages)
Call OpenAI
Insert the assistant message
Return the assistant message (and conversationId)

// app/api/chat/route.ts
import { NextResponse } from 'next/server';
import { randomUUID } from 'crypto';
import { query } from '@/lib/db';
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

type ChatRequest = {
  userId: string;
  conversationId?: string;
  message: string;
};

export async function POST(req: Request) {
  const body = (await req.json()) as ChatRequest;

  if (!body?.userId || !body?.message) {
    return NextResponse.json({ error: 'Missing userId or message' }, { status: 400 });
  }

  const conversationId = body.conversationId ?? randomUUID();

  // Ensure conversation exists (idempotent upsert pattern)
  await query(
    `INSERT INTO conversations (id, user_id)
     VALUES ($1, $2)
     ON CONFLICT (id) DO UPDATE SET updated_at = NOW()`
    ,
    [conversationId, body.userId]
  );

  // Store user message
  await query(
    `INSERT INTO messages (id, conversation_id, role, content)
     VALUES ($1, $2, 'user', $3)`
    ,
    [randomUUID(), conversationId, body.message]
  );

  // Load memory: summary + last 20 messages
  const convoRes = await query<{ summary: string | null }>(
    `SELECT summary FROM conversations WHERE id = $1`,
    [conversationId]
  );
  const summary = convoRes.rows[0]?.summary ?? null;

  const msgsRes = await query<{ role: string; content: string }>(
    `SELECT role, content
     FROM messages
     WHERE conversation_id = $1
     ORDER BY created_at DESC
     LIMIT 20`,
    [conversationId]
  );

  const recent = msgsRes.rows.reverse(); // chronological

  const systemPreamble = summary
    ? `You are a helpful assistant. Conversation summary: ${summary}`
    : 'You are a helpful assistant.';

  const inputMessages = [
    { role: 'system' as const, content: systemPreamble },
    ...recent.map((m) => ({ role: m.role as 'user' | 'assistant' | 'system', content: m.content })),
  ];

  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: inputMessages,
  });

  const assistantText = completion.choices[0]?.message?.content ?? '';

  // Store assistant message
  await query(
    `INSERT INTO messages (id, conversation_id, role, content)
     VALUES ($1, $2, 'assistant', $3)`
    ,
    [randomUUID(), conversationId, assistantText]
  );

  // Update conversation timestamp
  await query(`UPDATE conversations SET updated_at = NOW() WHERE id = $1`, [conversationId]);

  return NextResponse.json({ conversationId, reply: assistantText });
}

Model note: choose a model that fits your latency and cost targets. The example uses a commonly available chat model name; always confirm current model availability and naming in the official OpenAI docs for your account.

Step 4: Build the client-side chat call

On the client, keep track of conversationId so you can resume memory later. You can store it in the URL, localStorage, or your own user profile table. The simplest approach is to keep it in localStorage per browser and also associate it with userId on the server.

// example client helper
export async function sendChatMessage(params: {
  userId: string;
  conversationId?: string;
  message: string;
}) {
  const res = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(params),
  });

  if (!res.ok) throw new Error('Chat request failed');
  return res.json() as Promise<{ conversationId: string; reply: string }>;
}

Step 5: Prevent context bloat (the real challenge of Next.js AI memory)

Persisting every message is easy. The hard part is deciding what to send back to the model each turn. If you send the entire conversation forever, you’ll eventually hit context limits and pay for tokens you don’t need.

Common, reliable strategies:

Recent-window memory: send only the last N messages (e.g., 20–50) plus a system prompt.
Summary + recent window: keep a short summary in Postgres and prepend it to the prompt, then send only the last N messages.
Topic-based retrieval: store embeddings and retrieve only relevant prior messages. (More complex, but powerful.)

Optional: Add automatic summarization in Postgres

A practical middle ground is “summary + recent window.” Periodically update conversations.summary when message count grows beyond a threshold. You can do this in the same API route (every X messages) or via a background job.

Below is a simple pattern: when a conversation exceeds a certain message count, ask the model to summarize older messages and store the result. (Keep the summary short and factual.)

// Pseudocode snippet: summarize older messages when count is high
// 1) Count messages
// 2) If above threshold, fetch older messages (excluding the last N)
// 3) Ask the model for a concise summary
// 4) Save summary to conversations.summary
// 5) Optionally delete or archive older messages

Be careful with deletion: if you need auditability, prefer archiving over deleting. Also, ensure summaries don’t include sensitive data you shouldn’t store long-term.

Security and privacy essentials

Authenticate users: ensure userId is derived from your auth system, not blindly accepted from the client.
Authorize conversation access: verify the conversation belongs to the current user before reading/writing.
Avoid logging sensitive prompts: don’t print full message content in server logs.
Data retention: decide how long to keep messages and summaries; implement deletion/export if required by your policies.
Prompt injection awareness: treat stored user content as untrusted; keep system instructions separate and explicit.

Performance tips for production

Use indexes (shown above) to keep message retrieval fast.
Limit memory reads: fetch only what you need (summary + last N).
Stream responses if your UI needs real-time output; store the final assistant message once complete.
Consider background summarization to reduce latency on the main chat request.
If you add embeddings later, store them in Postgres using an appropriate extension supported by your environment, and retrieve only top-k relevant chunks.

Testing checklist

Start a new conversation and verify messages persist after refresh.
Resume an existing conversationId and verify the assistant uses prior context.
Confirm users cannot access other users’ conversations (authorization test).
Load test: verify your message query remains fast as tables grow.
Check failure handling: if OpenAI call fails, ensure the user message is still stored and the client gets a clear error.

Common pitfalls (and how to avoid them)

Unbounded prompts: always cap the number of messages you send to the model.
Client-trust issues: never accept arbitrary userId or conversation ownership from the client without verification.
Missing ordering: always order messages by created_at (or a monotonic sequence) and send them chronologically.
Storing too much sensitive data: decide what should be persisted; consider redaction or shorter retention.

Conclusion

Implementing Next.js AI memory is mainly about persistence and prompt discipline. Postgres gives you durable, queryable conversation history, and a simple “summary + recent messages” strategy keeps context useful without exploding token usage. Start with the basic schema and route handler, then iterate: add summarization, stronger authorization, and (if needed) retrieval with embeddings for long-running conversations.