How to Auto-Categorize WordPress Posts Using LLM APIs (2026)

⚡ The Fast Version
If you want to automatically organize your WordPress blogs using AI, you need to stop using “names” and start using “IDs.” Instead of asking an AI to guess a category name like “Health,” you should provide it with your existing list of WordPress Category IDs (like 1, 5, or 12). The AI then picks the best number. This prevents the AI from making up new categories or creating duplicates that mess up your site’s organization. By sending this ID back through the WordPress REST API, your posts get filed perfectly every single time without manual effort.
Most people treat AI like a creative writer, asking it to “come up with a good category” for a blog post. That’s fine if you’re playing around, but it’s a disaster for a professional website. If you give an AI total freedom, it will eventually start improvising. It might label a post “SEO” one day and “Search Engine Optimization” the next. Now, instead of a clean website, you have a messy “category soup” that confuses your readers and hurts your Google rankings.
The “pro” way to handle this is to treat the AI like a librarian, not a poet. You give the librarian a specific list of shelves (your WordPress Category IDs) and tell them, “Pick the one number where this book belongs.” No guessing, no nicknames—just a solid, data-driven choice. This keeps your site architecture lean and your editorial process bulletproof.
What “Auto Categorizing” Actually Means for Beginners
Think of auto categorizing WordPress posts as a digital sorting machine. It looks at your article’s title and text, compares it to the “folders” you already have in WordPress, and files it away. It does this using the “REST API”—which is just a fancy way of saying your website has a secret back door that lets different apps talk to each other. By using this door, a script can pull your content out, let the AI “read” it, and then go back in to check the right box for you.
The 5-Step Sorting Guide
| Step | What’s Happening? | The Benefit |
|---|---|---|
| 1 | Fetch Categories | You show the AI exactly which “folders” exist. |
| 2 | Scan the Post | The AI reads your headline and story. |
| 3 | Pick the ID | The AI chooses a specific number, not a word. |
| 4 | Double Check | The script makes sure that number is actually valid. |
| 5 | Update Site | The post is officially filed in WordPress. |
Why Numbers (IDs) Beat Words (Names)
Words are messy. One person writes “E-commerce,” another writes “Ecommerce.” Computers, however, love numbers. In WordPress, every category has a unique ID number that never changes. Even if you rename “Marketing” to “Advertising,” the ID stays the same. By forcing your AI to return an ID, you remove the risk of “labels that don’t match.” This is the best way to stop taxonomy drift—that slow slide into a disorganized website that makes finding old posts impossible.
A Beginner-Friendly Python Solution
The following script is a simplified “bridge” between your site and the AI. It’s designed to be strict. It fetches your real categories from WordPress so the AI isn’t guessing based on old data. It ensures the AI only gives you a number back, keeping your site’s “filing cabinet” perfectly organized.
import os
import json
import requests
from openai import OpenAI
# 1. Setup your connection details
SITE_URL = "https://yourwebsite.com"
USER = "admin"
PASSWORD = "your-app-password"
API_KEY = "your-openai-key"
client = OpenAI(api_key=API_KEY)
def get_live_categories():
# Ask WordPress for the current list of folders (categories)
response = requests.get(f"{SITE_URL}/wp-json/wp/v2/categories", auth=(USER, PASSWORD))
return [{"id": c["id"], "name": c["name"]} for c in response.json()]
def pick_best_category(post_text, cat_list):
# Turn the list of categories into a string the AI can read
options = "\n".join([f"ID {item['id']}: {item['name']}" for item in cat_list])
# The "Instruction Manual" for the AI
prompt = f"Look at this post: '{post_text}'. Pick one ID from this list: {options}. Return ONLY the ID number."
# Ask the AI to make the choice
result = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return int(result.choices[0].message.content.strip())
def update_wordpress(post_id, chosen_id):
# Go back to WordPress and check the box for the chosen category
data = {"categories": [chosen_id]}
requests.post(f"{SITE_URL}/wp-json/wp/v2/posts/{post_id}", auth=(USER, PASSWORD), json=data)
# Run the process
all_cats = get_live_categories()
best_id = pick_best_category("How to bake a sourdough bread", all_cats)
update_wordpress(123, best_id)
print(f"Success! Post 123 is now in category {best_id}.")
Common Pitfalls to Avoid
The “Improvisation” Trap: Never ask an AI to “create a category if one doesn’t fit.” This is how you end up with 500 categories for 500 posts. If none of your existing categories fit, the AI should flag the post for a human to review. Automation should follow your rules, not write new ones.
The “Stale Data” Problem: Some people hard-code their categories into their scripts. Don’t do this. If you delete a category in WordPress but forget to update your script, the AI will try to file posts into a folder that doesn’t exist. Always pull a “fresh” list from WordPress before running your AI sorting machine.
💡 Pro Tip
If you have a massive website with hundreds of categories, don’t show all of them to the AI at once. If your post is about “Recipes,” only show the AI your food-related subcategories. Giving the AI fewer choices actually makes it much smarter and faster at picking the right one.
Wrapping Up: System vs. Vibes
Successful WordPress auto-categorization isn’t about having the smartest AI; it’s about having the best system. If your “filing cabinet” (taxonomy) is messy, no amount of AI can fix it. But if you have clear categories and a script that enforces your rules using ID numbers, your website will stay organized forever with zero manual work. Stop relying on “AI vibes” and start building a predictable, data-driven workflow that respects your site’s structure.
Auto-Categorizing WordPress Posts with LLM APIs: FAQ
Why should I pass WordPress category IDs to the LLM instead of category names?
Because names are ambiguous and models improvise. If you ask an LLM to return a category name, it may return “SEO” one time, “Search Engine Optimization” the next, and “SEO Tips” the time after — none of which may match your actual taxonomy. IDs are stable, exact, and cannot be invented. Always fetch your live category list from /wp-json/wp/v2/categories before each run and pass the ID-to-name map directly in the prompt.
What happens if no category fits the post?
Flag it for human review — never let the LLM create a new category automatically. Allowing auto-creation is how sites end up with hundreds of one-post categories that fracture their taxonomy and confuse both users and search engines. The script should validate that the returned ID is in the list it sent. If it is not, set the post to draft and send an alert rather than filing it incorrectly.
Should I hard-code my category list in the script or fetch it fresh each time?
Always fetch fresh. Hard-coded lists become stale the moment you add, rename, or delete a category in WordPress. A fresh API call takes milliseconds and guarantees the LLM is working from your actual current taxonomy. The /wp-json/wp/v2/categories?per_page=100 endpoint returns all categories — call it at the start of every automation run.
How do I handle sites with hundreds of categories?
Narrow the choices before passing them to the LLM. Use a two-step approach: first, a lightweight classifier or keyword match narrows the full list to a relevant subset (e.g., 5-10 candidates). Then pass only that subset to the LLM for final selection. Sending 300 category names in a prompt wastes tokens, increases latency, and reduces classification accuracy because the model has to navigate too much noise.
Which LLM models work best for category classification?
GPT-4o-mini and Claude Haiku are the best choices for production categorization workflows — both are fast, cheap, and accurate at structured classification tasks when given explicit instructions and a constrained output format (return only the ID number). Avoid using large reasoning models for this task; the added latency and cost provide no meaningful accuracy improvement over smaller models when the input and output are both well-constrained.


