What is rate limiting in simple terms?

Rate limiting means capping how many requests a user or IP address can make to your API within a time window. For example, 100 requests per minute. Once they hit the limit, further requests are rejected with a 429 status code until the window resets.

Why don't AI coding tools add rate limiting automatically?

AI tools generate code that works for the happy path. Rate limiting is a defensive concern that doesn't affect functionality in demos or development. The AI doesn't think about what happens when someone writes a script to call your endpoint 10,000 times.

Can I rate limit on Vercel or serverless platforms?

Yes. Vercel offers built-in rate limiting with @vercel/edge and their KV store. You can also use Upstash Redis, which is designed for serverless rate limiting with per-request pricing and global edge deployment.

Rate Limiting: Stop Bots From Draining Your API Credits

The short version

Rate limiting restricts how many requests a client can make to your API within a given time period. Without it, anyone can call your endpoints as fast as their connection allows, which means bots scraping your data, attackers brute forcing passwords, or someone running up your API bill with a simple while(true) loop.

It's one of those things that never appears in the MVP but costs real money when it's missing.

The cost of not having it

Here's what happens to vibe coded apps without rate limiting:

A developer ships an AI wrapper app. It takes user input, sends it to the OpenAI API, and returns the response. Clean UI, works great. They share it on Twitter. Someone finds the endpoint, writes a script, and makes 50,000 requests overnight. The developer wakes up to a $2,400 OpenAI bill.

This isn't hypothetical. It happens regularly. And it's not just AI credits. Without rate limiting:

Login endpoints get brute forced. An attacker tries thousands of password combinations per second.
Email/SMS endpoints get abused. Someone triggers your "send verification code" endpoint 10,000 times, and you get a bill from Twilio or SendGrid.
Database heavy endpoints get hammered, slowing your app for everyone.
Signup endpoints get flooded with fake accounts.

Why AI tools never add it

Ask an AI to build you an API route and you'll get something like this:

// AI-generated Next.js API route - no rate limiting
export async function POST(request) {
  const { prompt } = await request.json();
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
  });
  return Response.json({ result: response.choices[0].message.content });
}

Functional. Zero protection. Anyone can call this endpoint unlimited times. The AI generated code for the feature, not for the real world.

How to implement it in Next.js

Option 1: In memory (development and single server)

For simple cases, a Map based approach works:

// lib/rate-limit.js
const rateLimit = new Map();

export function checkRateLimit(ip, limit = 10, windowMs = 60000) {
  const now = Date.now();
  const windowStart = now - windowMs;

  if (!rateLimit.has(ip)) {
    rateLimit.set(ip, []);
  }

  const timestamps = rateLimit.get(ip).filter(t => t > windowStart);
  rateLimit.set(ip, timestamps);

  if (timestamps.length >= limit) {
    return { allowed: false, remaining: 0 };
  }

  timestamps.push(now);
  return { allowed: true, remaining: limit - timestamps.length };
}

// app/api/generate/route.js
import { checkRateLimit } from '@/lib/rate-limit';

export async function POST(request) {
  const ip = request.headers.get('x-forwarded-for') || 'unknown';
  const { allowed, remaining } = checkRateLimit(ip, 10, 60000);

  if (!allowed) {
    return Response.json(
      { error: 'Too many requests. Try again in a minute.' },
      { status: 429 }
    );
  }

  // Your actual logic here
}

This works for a single server instance. It resets when the server restarts, and it doesn't work across multiple serverless function instances.

Option 2: Upstash Redis (production, serverless)

For production apps on Vercel or any serverless platform, Upstash is the standard choice. Their @upstash/ratelimit package is built for this:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '60 s'), // 10 requests per 60 seconds
});

export async function POST(request) {
  const ip = request.headers.get('x-forwarded-for') || 'unknown';
  const { success, remaining } = await ratelimit.limit(ip);

  if (!success) {
    return Response.json(
      { error: 'Rate limit exceeded.' },
      { status: 429, headers: { 'X-RateLimit-Remaining': remaining.toString() } }
    );
  }

  // Your actual logic here
}

This works across all serverless instances because the state lives in Redis, not in memory.

What to rate limit

Not every endpoint needs the same limits. Think about cost and risk:

AI/LLM endpoints: Tight limits. Each request costs money. 10 20 per minute per user.
Login endpoints: Tight limits. 5 10 attempts per minute per IP. This prevents brute force attacks.
Signup endpoints: Moderate limits. 3 5 per hour per IP.
Public read endpoints: Generous limits. 100 200 per minute. Enough for normal users, not enough for scrapers.
Password reset / email sending: Very tight. 3 per hour per email address.

Response headers

Good rate limiting tells the client what's happening. Include these headers:

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 1711100060
Retry-After: 45

This lets legitimate clients back off gracefully instead of hammering your endpoint and getting blocked.

The bigger picture

Rate limiting is part of a broader pattern: never trust the client. Your API is public, whether you intended it to be or not. Anyone can open browser DevTools, find your endpoints, and script requests against them.

Rate limiting won't stop a determined attacker, but it makes casual abuse expensive and slow. Combined with authentication, input validation, and monitoring, it's a basic hygiene layer that every production app needs.

Check your site

Want to know if your API endpoints are exposed? Scan it now and find out in 60 seconds.