Add OpenAI Embeddings Semantic Search to a Next.js App (RAG with pgvector)

10 min read

Published

Updated 5 months ago

Why semantic search (and why pgvector)?

Keyword search is great when users know the exact words in your content. Semantic search helps when they don’t—by retrieving text that is conceptually similar to a query, even if the exact terms differ. In a Retrieval-Augmented Generation (RAG) setup, you use semantic search to fetch the most relevant context, then provide that context to an LLM to generate grounded answers.

PostgreSQL with the pgvector extension is a practical choice because it keeps embeddings and metadata in the same database you may already use, supports similarity search, and can be indexed for faster retrieval. Next.js is a natural fit for building the UI and API routes that power search and Q&A.

What you’ll build

A Next.js app with an API route that embeds documents and stores them in Postgres
A semantic search API route that returns the most similar chunks using pgvector
A RAG endpoint that sends retrieved context to an OpenAI chat model
A minimal UI to test semantic search and RAG answers

Architecture overview (high level)

Ingestion: split documents into chunks → create embeddings → store (chunk text + metadata + embedding vector) in Postgres
Retrieval: embed the user query → run a vector similarity query in Postgres → return top-k chunks
Generation (RAG): build a prompt with retrieved chunks as context → call an OpenAI chat model → return an answer + sources

Prerequisites

Node.js (current LTS recommended)
A Postgres database you can connect to
pgvector installed in your Postgres instance (extension name: vector)
An OpenAI API key
Basic familiarity with Next.js App Router and API routes

1) Create the Next.js project

Create a new Next.js app (or use an existing one) and add the dependencies for Postgres access and OpenAI.

# Create app
npx create-next-app@latest semantic-search-nextjs
cd semantic-search-nextjs

# Postgres client
npm i pg

# OpenAI official SDK
npm i openai

2) Set environment variables

Add these to your .env.local file. Keep secrets out of client-side code—only use them in server routes or server actions.

OPENAI_API_KEY=your_key_here
DATABASE_URL=postgres://USER:PASSWORD@HOST:PORT/DBNAME

3) Enable pgvector and create a table

Connect to your database and enable the pgvector extension. Then create a table to store document chunks and their embeddings.

-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Store chunked text + embedding + metadata
CREATE TABLE IF NOT EXISTS documents (
  id BIGSERIAL PRIMARY KEY,
  source TEXT,
  chunk_index INTEGER,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL
);

-- Optional: a vector index for faster search (choose an index type supported by your pgvector version)
-- Example (commonly used):
-- CREATE INDEX IF NOT EXISTS documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops);
-- Note: ivfflat requires ANALYZE and benefits from tuning lists; consult pgvector docs for your version.

Important: the embedding dimension must match the model you use. If you choose a different embedding model than the one shown later, update vector(1536) accordingly.

4) Add a small Postgres helper

Create a reusable database helper to run queries from your Next.js server code.

// lib/db.ts
import { Pool } from "pg";

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

export async function query<T = any>(text: string, params?: any[]) {
  const res = await pool.query<T>(text, params);
  return res;
}

5) Create an embeddings helper (server-side)

Use the OpenAI SDK on the server to create embeddings for both documents and user queries. The code below uses the official OpenAI Node SDK.

// lib/openai.ts
import OpenAI from "openai";

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function embedText(input: string) {
  // Use an embeddings model available in your account.
  // If you change the model, ensure your pgvector column dimension matches.
  const resp = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input,
  });
  return resp.data[0].embedding;
}

6) Chunk your content for better retrieval

Semantic search works best when you embed smaller, coherent chunks rather than entire pages. A simple chunker can split by paragraphs and enforce a max character length. For production, you may want token-aware chunking, but a basic approach is enough to get started.

// lib/chunk.ts
export function chunkText(text: string, maxChars = 1200) {
  const paragraphs = text
    .split(/\n\s*\n/g)
    .map((p) => p.trim())
    .filter(Boolean);

  const chunks: string[] = [];
  let current = "";

  for (const p of paragraphs) {
    if ((current + "\n\n" + p).length > maxChars) {
      if (current) chunks.push(current);
      current = p;
    } else {
      current = current ? current + "\n\n" + p : p;
    }
  }

  if (current) chunks.push(current);
  return chunks;
}

7) Build an ingestion API route (embed + store)

This route accepts raw text (and an optional source label), chunks it, embeds each chunk, and stores everything in Postgres. Protect this route (e.g., admin auth) before using it in a real app—embedding can be costly and you don’t want arbitrary public ingestion.

// app/api/ingest/route.ts
import { NextResponse } from "next/server";
import { chunkText } from "@/lib/chunk";
import { embedText } from "@/lib/openai";
import { query } from "@/lib/db";

export async function POST(req: Request) {
  const { text, source } = await req.json();

  if (!text || typeof text !== "string") {
    return NextResponse.json({ error: "Missing 'text'" }, { status: 400 });
  }

  const chunks = chunkText(text);

  for (let i = 0; i < chunks.length; i++) {
    const embedding = await embedText(chunks[i]);

    // pgvector accepts vector literals like: '[0.1, 0.2, ...]'
    const vectorLiteral = `[${embedding.join(",")}]`;

    await query(
      `INSERT INTO documents (source, chunk_index, content, embedding)
       VALUES ($1, $2, $3, $4::vector)`,
      [source ?? null, i, chunks[i], vectorLiteral]
    );
  }

  return NextResponse.json({ ok: true, chunks: chunks.length });
}

8) Create a semantic search API route (pgvector similarity query)

To search, embed the user’s query and ask Postgres for the nearest neighbors. pgvector supports different distance operators; cosine distance is commonly used for embeddings. The exact operator and index configuration should match your pgvector setup.

// app/api/search/route.ts
import { NextResponse } from "next/server";
import { embedText } from "@/lib/openai";
import { query } from "@/lib/db";

export async function POST(req: Request) {
  const { q, k } = await req.json();

  if (!q || typeof q !== "string") {
    return NextResponse.json({ error: "Missing 'q'" }, { status: 400 });
  }

  const topK = typeof k === "number" ? Math.min(Math.max(k, 1), 20) : 5;
  const embedding = await embedText(q);
  const vectorLiteral = `[${embedding.join(",")}]`;

  // Cosine distance: smaller is more similar
  const res = await query(
    `SELECT id, source, chunk_index, content,
            (embedding <=> $1::vector) AS distance
     FROM documents
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [vectorLiteral, topK]
  );

  return NextResponse.json({
    query: q,
    results: res.rows,
  });
}

9) Add a RAG API route (retrieve + generate)

RAG is simply: retrieve relevant chunks, then pass them to a chat model as context. Keep the prompt explicit: instruct the model to use only the provided context and to say when the answer isn’t in the context.

// app/api/ask/route.ts
import { NextResponse } from "next/server";
import { openai } from "@/lib/openai";

export async function POST(req: Request) {
  const { q } = await req.json();

  if (!q || typeof q !== "string") {
    return NextResponse.json({ error: "Missing 'q'" }, { status: 400 });
  }

  // 1) Retrieve
  const searchResp = await fetch(new URL("/api/search", req.url), {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ q, k: 5 }),
  });

  if (!searchResp.ok) {
    return NextResponse.json({ error: "Search failed" }, { status: 500 });
  }

  const { results } = await searchResp.json();

  const context = results
    .map((r: any) => `Source: ${r.source ?? "unknown"} (#${r.chunk_index})\n${r.content}`)
    .join("\n\n---\n\n");

  // 2) Generate
  const completion = await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [
      {
        role: "system",
        content:
          "You are a helpful assistant. Answer using only the provided context. If the context does not contain the answer, say you do not have enough information.",
      },
      {
        role: "user",
        content: `Question:\n${q}\n\nContext:\n${context}`,
      },
    ],
  });

  const answer = completion.choices[0]?.message?.content ?? "";

  return NextResponse.json({
    question: q,
    answer,
    sources: results.map((r: any) => ({
      id: r.id,
      source: r.source,
      chunk_index: r.chunk_index,
      distance: r.distance,
    })),
  });
}

Note: Model availability and naming can vary by account and over time. Use a chat model available to you, and keep your embeddings model consistent with your stored vector dimension.

10) Build a simple UI to test search + RAG

This minimal page lets you ingest sample text, run semantic search, and ask RAG questions. It uses fetch calls to your API routes.

// app/page.tsx
"use client";

import { useState } from "react";

export default function Home() {
  const [text, setText] = useState("");
  const [source, setSource] = useState("demo");
  const [q, setQ] = useState("");
  const [searchResults, setSearchResults] = useState<any[]>([]);
  const [answer, setAnswer] = useState<string>("");

  async function ingest() {
    const res = await fetch("/api/ingest", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ text, source }),
    });
    if (!res.ok) throw new Error("Ingest failed");
    alert("Ingested!");
  }

  async function search() {
    const res = await fetch("/api/search", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ q, k: 5 }),
    });
    const data = await res.json();
    setSearchResults(data.results ?? []);
  }

  async function ask() {
    const res = await fetch("/api/ask", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ q }),
    });
    const data = await res.json();
    setAnswer(data.answer ?? "");
  }

  return (
    <main style={{ maxWidth: 900, margin: "40px auto", padding: 16 }}>
      <h1>Semantic Search + RAG (pgvector)</h1>

      <section style={{ marginTop: 24 }}>
        <h2>Ingest text</h2>
        <input
          value={source}
          onChange={(e) => setSource(e.target.value)}
          placeholder="source label"
          style={{ width: "100%", marginBottom: 8 }}
        />
        <textarea
          value={text}
          onChange={(e) => setText(e.target.value)}
          placeholder="Paste some documentation or notes here..."
          rows={8}
          style={{ width: "100%" }}
        />
        <button onClick={ingest} style={{ marginTop: 8 }}>Ingest</button>
      </section>

      <section style={{ marginTop: 24 }}>
        <h2>Search</h2>
        <input
          value={q}
          onChange={(e) => setQ(e.target.value)}
          placeholder="Ask a question or type a search query"
          style={{ width: "100%" }}
        />
        <div style={{ display: "flex", gap: 8, marginTop: 8 }}>
          <button onClick={search}>Semantic search</button>
          <button onClick={ask}>Ask (RAG)</button>
        </div>

        <h3 style={{ marginTop: 16 }}>Search results</h3>
        <ul>
          {searchResults.map((r) => (
            <li key={r.id} style={{ marginBottom: 12 }}>
              <div><strong>{r.source ?? "unknown"}</strong> (chunk {r.chunk_index})</div>
              <div style={{ fontFamily: "monospace" }}>distance: {r.distance}</div>
              <div>{r.content}</div>
            </li>
          ))}
        </ul>

        <h3 style={{ marginTop: 16 }}>RAG answer</h3>
        <p>{answer}</p>
      </section>
    </main>
  );
}

Operational tips (what matters in production)

Protect ingestion: require authentication/authorization, rate-limit, and validate input.
Store metadata: keep fields like URL, title, timestamps, and access control attributes so you can filter results per user/tenant.
Tune chunking: chunk size and overlap can significantly affect retrieval quality. Keep chunks coherent (headings + paragraphs) and avoid mixing unrelated topics.
Use indexes thoughtfully: pgvector indexing can improve speed at scale, but requires correct operator classes and tuning. Test with your dataset and consult pgvector docs for your version.
Add filters: combine vector similarity with SQL filters (e.g., WHERE source = ...) to narrow the search space.
Return sources: always return the retrieved chunks (or references) so users can verify the model’s answer.
Watch token limits: limit how many chunks you pass to the model and truncate long chunks if needed.

Common pitfalls to avoid

Mismatched vector dimensions: your pgvector column dimension must match the embedding model output dimension.
Embedding everything as one blob: large documents reduce retrieval precision; chunk instead.
Leaking secrets to the client: keep OPENAI_API_KEY and DATABASE_URL on the server only.
Assuming the model is always correct: RAG reduces hallucinations, but you still need source display and conservative prompting.
Skipping evaluation: test with real queries and track whether top-k retrieval actually contains the answer.

Next steps

Once the basics are working, consider adding document loaders (Markdown, HTML, PDFs), background ingestion jobs, per-tenant access control, and an evaluation harness to measure retrieval quality. With those pieces in place, you’ll have a robust semantic search and RAG foundation inside your Next.js app—powered by OpenAI embeddings and pgvector.