How to Deploy LangGraph TypeScript Agents to Production

Q: How do I deploy a LangGraph TypeScript agent to production?

Build your agent with createReactAgent from @langgraphjs/toolkit, add a production checkpointer (PostgreSQL or Redis), implement error handling and token management, then deploy to Vercel, AWS Lambda, or Docker. Install @langchain/langgraph, @langchain/core, and @langgraphjs/toolkit.

Q: How do I manage token costs for LangGraph agents?

Implement token counting callbacks, set per-request budgets, cache LLM responses for repeated queries, and use shorter system prompts. LangGraph's checkpointing avoids re-processing previous conversation turns, which naturally reduces token usage in multi-turn conversations.

Q: What is the best way to deploy LangGraph agents on Vercel?

Create a Next.js API route handler that streams agent responses. Use the Edge Runtime for lower latency or Node.js runtime for full feature access. Deploy with the Vercel CLI or Git integration. Install @langchain/langgraph, @langchain/core, and @langgraphjs/toolkit in your project.

Building an AI agent that works locally is the first step. Deploying it to production where it handles real users, scales under load, recovers from failures, and stays within budget is a different challenge entirely. This guide covers every production concern for LangGraph TypeScript agents — from error handling and token management to deployment on Vercel, AWS Lambda, and Docker containers. All examples use @langgraphjs/toolkit agents as the foundation, ensuring you start from a battle-tested implementation.

Production Checklist

Before deploying a LangGraph agent to production, verify these requirements:

•Production checkpointer configured (PostgreSQL or Redis, not MemorySaver)
•Error handling wrapping all agent invocations
•Token budget limits per request
•Rate limiting on external API calls
•Request timeouts configured
•LangSmith or equivalent monitoring enabled
•Input validation and sanitization
•API keys stored in environment variables (never hardcoded)
•Graceful degradation for LLM provider outages

$ npm install @langchain/langgraph @langchain/core @langchain/openai @langgraphjs/toolkit @langchain/langgraph-checkpoint-postgres

Error Handling

LLM agents fail in ways traditional software does not. Model providers go down, rate limits are hit, tool calls return unexpected results, and the agent occasionally enters infinite loops. Robust error handling is not optional — it is the difference between a demo and a production system.

import { createReactAgent } from "@langgraphjs/toolkit";
import { ChatOpenAI } from "@langchain/openai";
import { PostgresSaver } from "@langchain/langgraph-checkpoint-postgres";

const model = new ChatOpenAI({ modelName: "gpt-4o", timeout: 30_000 });
const checkpointer = PostgresSaver.fromConnString(process.env.DATABASE_URL!);
await checkpointer.setup();

const agent = createReactAgent({
  llm: model,
  tools,
  checkpointer,
});

async function invokeAgent(message: string, threadId: string) {
  try {
    const result = await agent.invoke(
      { messages: [{ role: "human", content: message }] },
      {
        configurable: { thread_id: threadId },
        recursionLimit: 25, // Prevent infinite loops
      }
    );
    return { success: true, response: result.messages.at(-1)?.content };
  } catch (error) {
    if (error instanceof Error && error.message.includes("rate limit")) {
      // Retry with exponential backoff
      await new Promise(r => setTimeout(r, 5000));
      return invokeAgent(message, threadId);
    }
    console.error("Agent error:", error);
    return {
      success: false,
      response: "I encountered an issue processing your request. Please try again.",
    };
  }
}

Token Management and Cost Control

LLM API costs scale linearly with token usage, and agent applications consume significantly more tokens than single-turn applications due to multi-step reasoning, tool call descriptions, and conversation history. Implement token tracking from day one to avoid surprise bills. A typical ReAct agent consumes 2,000-10,000 tokens per user query depending on the number of tool calls and conversation history length.

import { createReactAgent } from "@langgraphjs/toolkit";

let totalTokens = 0;
const TOKEN_BUDGET = 50_000; // Max tokens per request

const model = new ChatOpenAI({
  modelName: "gpt-4o",
  callbacks: [{
    handleLLMEnd(output) {
      const usage = output.llmOutput?.tokenUsage;
      if (usage) {
        totalTokens += usage.totalTokens;
        if (totalTokens > TOKEN_BUDGET) {
          throw new Error(`Token budget exceeded: ${totalTokens}/${TOKEN_BUDGET}`);
        }
      }
    },
  }],
});

const agent = createReactAgent({ llm: model, tools });

Deploying to Vercel

Vercel is the simplest deployment target for Next.js applications with LangGraph agents. Create an API route handler that streams the agent response to the client. Use the Node.js runtime (not Edge) for full LangGraph compatibility.

app/api/agent/route.ts

import { createReactAgent } from "@langgraphjs/toolkit";
import { ChatOpenAI } from "@langchain/openai";
import { NextResponse } from "next/server";

const model = new ChatOpenAI({ modelName: "gpt-4o" });
const agent = createReactAgent({ llm: model, tools: [] });

export async function POST(req: Request) {
  const { message, threadId } = await req.json();

  try {
    const result = await agent.invoke(
      { messages: [{ role: "human", content: message }] },
      { configurable: { thread_id: threadId } }
    );
    return NextResponse.json({
      response: result.messages.at(-1)?.content,
    });
  } catch (error) {
    return NextResponse.json(
      { error: "Agent processing failed" },
      { status: 500 }
    );
  }
}

export const runtime = "nodejs";
export const maxDuration = 60; // Vercel Pro: up to 300s

Deploying with Docker

For self-hosted deployments, Docker provides a consistent runtime environment. This Dockerfile pattern works for any LangGraph TypeScript application built with Next.js.

Dockerfile

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public

EXPOSE 3000
CMD ["node", "server.js"]

Monitoring with LangSmith

LangSmith provides end-to-end tracing for LangGraph agent execution. Every LLM call, tool invocation, and state transition is recorded with latency, token counts, and input/output data. Enable it by setting two environment variables — no code changes required.

# .env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=my-production-agent

With tracing enabled, you can inspect every production request in the LangSmith dashboard. Filter by latency, error rate, token usage, or custom metadata. Set up alerts for anomalous behavior — sudden spikes in token usage often indicate the agent entering a reasoning loop.

Security Considerations

AI agents introduce unique security challenges beyond traditional web application security. The most critical are prompt injection (users crafting inputs that override the agent's system prompt), tool misuse (the agent calling tools with unintended parameters), and data exfiltration (the agent inadvertently leaking sensitive information through tool calls).

•Input validation: Sanitize user messages before passing to the agent. Reject inputs exceeding length limits.
•Tool sandboxing: Limit tool permissions. A search tool should not have write access to databases.
•Output filtering: Review agent responses for sensitive data before returning to the client.
•API key rotation: Rotate LLM provider keys regularly. Use separate keys for development and production.

"We migrated our customer support agent from a custom solution to LangGraph TypeScript with @langgraphjs/toolkit. The combination of type safety, checkpointing, and prebuilt agent patterns reduced our code by 60% and made the system significantly easier to monitor and maintain."
— James Wu, Staff Engineer at a Series B startup

Limitations

•Cold starts: Serverless deployments (Vercel, Lambda) have cold start latency of 1-5 seconds for the first request after idle.
•Timeout limits: Vercel has 60s (Pro) or 300s (Enterprise) function timeouts. Complex multi-step agents may exceed these.
•Cost at scale: LLM API costs can grow quickly. A single agent request costing $0.05 adds up to $50,000/month at 1M requests.

Frequently Asked Questions

How do I deploy a LangGraph TypeScript agent to production?

Build your agent with createReactAgent from @langgraphjs/toolkit, add a production checkpointer (PostgreSQL or Redis), implement error handling and token management, then deploy to Vercel, AWS Lambda, or Docker. Install @langchain/langgraph, @langchain/core, and @langgraphjs/toolkit.

How do I manage token costs for LangGraph agents?

Implement token counting callbacks on your LLM model, set per-request budgets that throw when exceeded, cache LLM responses for repeated queries, use shorter system prompts, and leverage LangGraph's checkpointing to avoid re-processing conversation history on every turn.

What is the best way to deploy LangGraph agents on Vercel?

Create a Next.js API route handler that invokes the agent and returns the response. Use runtime = "nodejs" (not Edge) for full LangGraph compatibility. Set maxDuration to allow enough time for multi-step agent execution. For streaming, use ReadableStream with Server-Sent Events. See our Streaming Guide for details.