When Does a Business Need Django Instead of WordPress?

Most Los Angeles businesses do not need Django. Most Los Angeles businesses need WordPress, possibly with a few custom plugins, and that is genuinely the end of the story. WordPress runs 43% of the web for a reason. It is a mature, well-supported content management system that works for brochure sites, blogs, small e-commerce, and most kinds of professional services marketing presence.

But there is a real category of business problem where WordPress stops being the right tool, and trying to force it leads to fragile, slow, hard-to-maintain systems that are more expensive than the alternative would have been. Django is one of those alternatives. This article is the test we use to tell which category a project belongs to — and why getting it right on the first call saves clients tens of thousands of dollars.

What WordPress is genuinely good at

Let us start with the honest part: WordPress is excellent at content publishing. Pages, posts, media library, a familiar admin interface, a large ecosystem of plugins, and a sea of developers who know it. If your business problem is “we need a website where non-technical staff can edit pages and publish blog posts,” the answer is almost always WordPress.

It is also surprisingly good at structured content. With custom post types and custom fields, you can model “events,” “case studies,” “team members,” or “products” cleanly. Most of the small business problems that look like custom applications are actually content modeling exercises that WordPress handles well.

Where WordPress becomes the wrong tool

Five signals consistently push a project out of WordPress territory and into Django (or another web application framework — Django is just the one we use most).

1. The center of gravity is a transactional database, not content. If the heart of your application is a set of tables with relationships, queries, and business rules — orders, appointments, cases, students, members, inventory — and the website is the front door to that data, you are no longer building a website. You are building a web application. WordPress can be bent into doing this, but you are paying for the bending in performance, maintainability, and developer time.

2. The user experience involves real workflow, not just reading content. “Log in, file a request, watch its status change as it moves through three reviewers, get notified when it is approved or declined, see your history” is a workflow. WordPress can do this with enough custom plugins, but those plugins become the most expensive and brittle part of your stack. A Django application that models the workflow directly is dramatically simpler.

3. You need role-based access control beyond editor/contributor/subscriber. WordPress has user roles, but they were designed for editorial workflows. If your business problem involves “users can see their own data but not other users’ data, admins can see all data within their department, super-admins can see everything across departments, auditors can see read-only access to a specific subset” — you are reinventing user permissions on top of WordPress, and it will fight you the whole way.

4. There is real domain logic that needs to be tested. Billing calculations. Eligibility rules. Tax computations. Scheduling logic with constraints. Things that have to be right, that have edge cases, that change over time as the business changes. In Django, these live in plain Python functions that you can write tests for. In WordPress, they tend to live in scattered shortcodes, theme functions, and plugin hooks, and they tend to silently break the next time a plugin updates.

5. You are going to integrate deeply with other systems. A few API calls to Mailchimp and Stripe? WordPress handles that fine with off-the-shelf plugins. A two-way integration with a court filing system, a hospital EMR, a payment processor with reconciliation, and an internal accounting system, all talking to each other through a central application? That is a Django (or equivalent) job. You can technically wire all that into WordPress, but the maintenance burden compounds quickly.

The real-world examples

Here is how this decision plays out for the kinds of LA businesses we work with.

A law firm building a new public marketing site. WordPress, every time. The site is content — practice areas, attorney bios, blog. Editors are non-technical. Plugin ecosystem solves 90% of needs out of the box. Even if you want AI features layered on, you can do that without leaving WordPress.

The same law firm building a client portal for case status, document sharing, and secure messaging. Django, every time. The center of gravity is sensitive data with audit logs, granular permissions, and workflow. WordPress would be a bad fit for the same reasons that made it great for the marketing site.

A nonprofit running an annual public event with a content-heavy promotional site. WordPress, with custom post types for sessions, speakers, sponsors. Possibly headless WordPress for performance if traffic is high.

The same nonprofit running a year-round member portal with directory, dues, voting, and committee management. Django. Member portals are workflow applications wearing a website costume.

A consulting business with a marketing site and a blog. WordPress. Add AI assistants and content automation on top — still WordPress.

The same consulting business building an internal tool to manage 200 active client engagements with phase tracking, time entries, and partner sign-offs. Django. Or any number of off-the-shelf project management tools, honestly, if one fits.

The hybrid that actually works

The most common real-world pattern for growing LA businesses is hybrid: keep your marketing site on WordPress, where editors are happy and content publishing is fast, and run a separate Django application for the workflow piece, on a subdomain. So the marketing site is at yourbusiness.com on WordPress, and the client portal is at portal.yourbusiness.com on Django. They share branding, share authentication if needed, but they are two different applications with two different jobs.

This is genuinely the right answer for most growing businesses. WordPress is best at the things WordPress is good at. Django is best at the things Django is good at. Trying to force one tool to do both jobs is the most expensive form of “saving money.”

What about Next.js and React?

Worth a quick note. Next.js and React are frontend technologies — they handle what the user sees. They are not alternatives to WordPress or Django; they sit in front of either one. A typical modern setup is “WordPress as the CMS, Next.js as the frontend” (headless WordPress) or “Django as the backend, Next.js as the frontend” (Django + decoupled UI). When we talk about WordPress vs Django, we mean the backend question — where your data lives and where your business logic runs. Next.js is a separate decision about how the user-facing layer is rendered.

The cost asymmetry

Here is the practical bottom line. A custom Django application typically costs three to five times more to build than a WordPress site for a comparable scope, because you are writing more code from scratch and there are fewer off-the-shelf parts.

But: a WordPress site forced to do a Django application’s job costs five to ten times what either would have cost done right. The wrong choice compounds — every plugin makes the next plugin more brittle, every workaround creates two more workarounds, and at some point the system reaches the “we need to rebuild it” point.

Picking the right tool for the right problem on the first call is the single most valuable thing a technical partner can do for your business. The honest answer most of the time is WordPress. When it is not, an honest partner will tell you so before you spend a year and forty thousand dollars discovering it.

How to Use AI for Customer Service Without Violating Privacy Expectations

Customer service is the most tempting place to apply AI and the most dangerous one to do it carelessly. The same conversation that delights a customer at 11pm on a Sunday can become a privacy incident if the system is built without thinking. This article is for business owners who want the speed and availability benefits of AI customer service — and want to avoid the obvious traps.

To be clear about what this is not: it is not legal advice. We are not going to claim “fully HIPAA compliant” or “guaranteed to meet CCPA” — those statements would be irresponsible to make in a marketing article. What this is, is the practical engineering perspective on building customer service AI that respects the privacy expectations your customers already have.

The expectation, plainly stated

When someone reaches out to your business, they expect a few quiet things. They expect their question to be read by your team, or at most the kind of automated system that books appointments at the dentist. They expect their data to stay with you. They expect that if they say something sensitive — a medical concern, a legal worry, a financial detail — it does not end up training someone else’s product. And they expect to be able to talk to a human if the situation calls for it.

Most AI customer service systems are built without sitting with those expectations first. The technology is impressive, the privacy posture is an afterthought, and the result is a system that technically works and quietly violates trust.

Where the real risk lives

There are three privacy pressure points in any AI customer service system. Knowing them lets you build defensibly.

The training set. Did the AI learn from real customer conversations? If yes, were they anonymized? Where are those records now? A surprising number of small-business chatbots are trained, fine-tuned, or in continuous learning loops on actual customer messages. That is fine if it is documented and consented to. It is a problem if it is not.

The runtime data flow. Where does the customer’s question go when they hit send? In a typical hosted chatbot product, the text travels from the customer’s browser to the chatbot vendor’s servers, then to OpenAI or Anthropic or Google for the actual model response, then back. Each of those hops is a place where a data-handling decision was made — sometimes by you, sometimes by the vendor’s defaults. Knowing what those defaults are is the entire game.

The records. Where do the conversation transcripts live after the chat ends? In your CRM? In the chatbot vendor’s dashboard, with full content searchable by their staff? On a server in a jurisdiction you would not have chosen? “I don’t know” is the most expensive answer to this question.

The safe-by-default pattern

The pattern we use for clinics, law firms, and other privacy-sensitive Los Angeles businesses has four parts. None of it is exotic — it is just being deliberate about each pressure point.

1. Train only on public information. The AI’s “knowledge” is your website content, your published FAQ, your service descriptions, and the documents your team explicitly hands over. It does not learn from live customer conversations. If a customer mentions a medication or a case number, that information stays in the conversation log — it does not flow back into the model’s training. This alone solves a large fraction of the privacy concern.

2. Use API providers, not learning loops. For the actual response generation, we use commercial AI APIs (OpenAI, Anthropic, Google) with the no-training option turned on. This is a real setting in their enterprise terms — your prompts and responses are not used to train future models. Configuring it correctly takes about ten minutes and removes a major category of risk.

3. Strip what does not need to be there. If a customer pastes their social security number into a chat (people do this), the system should redact it before storing the transcript. If a customer shares medical information on what they think is a general-information chatbot, the system should flag the conversation for human review and not surface it in unsecured logs. Building these redaction and flagging rules upfront is much easier than retrofitting them after an incident.

4. Always offer the human. Every AI customer service surface should make it trivially easy to ask for a person. A clear “talk to a human” button. A phone number. An email address. The most expensive privacy mistakes happen when the AI is the only door and someone gets desperate trying to communicate a sensitive situation.

What customers should be able to do

This is the operational checklist we walk clients through. If your AI customer service can do all of these, you are in a defensible position:

  • A customer can ask for a transcript of their conversation and get it.
  • A customer can ask for their data to be deleted, and someone can actually do it within a reasonable window.
  • A customer can speak to a human within one click and one business day.
  • A customer can find your privacy policy from inside the chat in one click.
  • Your staff can search through past conversations and respect access controls — not everyone on staff sees everything.

None of this is new. It is the same operational hygiene you would build into a paper-based customer service operation. The technology just makes the rules easier to break by accident.

The two architectures that work

There are two reliable architectures for privacy-aware AI customer service. The boring pattern, and the slightly less boring pattern.

The boring pattern: assistant on top of public content. The AI is scoped to answer questions from your public website, FAQ, and service descriptions. If the customer asks something outside that scope — for example, about a specific bill or a specific appointment — the assistant says so and routes to a human. Most businesses can ship this in three to four weeks and it solves 50 to 70% of inbound questions without ever touching private data.

The slightly less boring pattern: scoped customer assistant. The AI is logged into a specific customer’s account context — it can see that customer’s appointments, billing history, or case file (and only theirs) — and answer questions about them. This is more useful and more dangerous. It requires identity verification, careful scoping of what the AI can see, an audit log of every query, and a clear separation between the model context and the data store. We do build these. We are careful about which clients we build them for.

The pattern that does not work: the AI plugged directly into your full customer database with no scoping, no audit, no identity check, and a “we’ll figure out the privacy story later” plan. We have seen this proposed. We have not seen it survive contact with reality.

What this means for your business

For most Los Angeles small and mid-size businesses, the right starting point is the boring pattern — an assistant trained on your public information, with a clear hand-off to humans for anything sensitive. It is the fastest to deploy, the cheapest to maintain, and the easiest to defend if anyone ever asks how your AI works.

You can layer in more capability later, once you have evidence that customers actually want it and your operations can support it. Privacy-aware engineering is not about having less AI. It is about having the right AI, scoped to the right problem, with the right guardrails on day one.

Practical AI Automation for Small Businesses in Los Angeles

Most articles about AI for small businesses are written for a hypothetical company that does not exist — one with a clean dataset, a software team, and a budget to experiment. Real Los Angeles small businesses look different. A two-attorney firm in Westwood. A dental office in Pasadena. A nonprofit in Boyle Heights. A boutique consulting practice in Culver City. None of them have a machine learning team. All of them have repetitive work that someone is doing by hand right now.

This article is for those businesses. What is actually worth automating with AI in 2026, what is not, and where the real return on a few thousand dollars of investment shows up.

Start with what hurts, not what is trendy

The first instinct, when AI tools are everywhere in the press, is to ask “where can we use AI?” That is the wrong question. The right question is the one any small business owner already asks themselves quietly on Friday afternoon: “where am I losing the most time to repetitive work that should not need a human?”

If you make that list — really make it, with numbers, on paper — you will find that the same five categories show up across most small businesses:

  1. Answering the same customer questions over and over
  2. Manually triaging incoming leads and email
  3. Drafting routine content (newsletters, social posts, simple proposals)
  4. Copying data between systems that should talk to each other
  5. Producing reports nobody reads but everybody expects

Every one of those is solvable today with practical AI automation at the small-business price point. The trick is solving them in the right order.

Tier 1: The FAQ assistant

If your team spends more than an hour a day answering the same questions, this is the highest-return AI project you can do. Cost: $2,500 to $5,000 to set up, plus monthly support. Payback: usually within the first quarter for any business with regular inbound questions.

The shape is simple. You give the AI a curated set of your business’s documents — your website content, your FAQ page, your service descriptions, your hours and policies — and it answers questions from that material on your website and via email. When it does not know the answer, it does not guess. It says “let me have someone follow up” and hands off to a real person.

The business value is not just time saved. It is that the AI answers at 11pm on a Sunday when your competitors do not. Most consumer-facing small businesses find that 30 to 50% of their inbound questions arrive outside business hours.

Tier 2: Lead triage and routing

If you get more than ten inquiries a week and someone on your team manually decides who responds to which, this one is next. A lead triage system reads incoming emails or form submissions, categorizes them (new client vs existing, urgent vs routine, in-scope vs out-of-scope), pulls relevant context from your CRM, and either routes to the right person or drafts a first reply for review.

The key word is “for review.” For the first three to six months, you absolutely want a human approving each automated reply before it goes out. After that, you can let routine categories go automatic and keep edge cases under review.

Cost: $3,000 to $8,000 to build, depending on how many systems it touches. Most of that cost is the integration plumbing, not the AI itself.

Tier 3: Content drafting workflow

If your business publishes content regularly — blog posts, newsletters, case study write-ups, social posts — an AI drafting workflow is worth setting up. Not “ChatGPT writes our blog.” A real workflow: research → outline → draft → human edit → publish, with AI doing the first three steps and a human owning the last two.

This is where small businesses tend to misuse AI most. They generate the post in two minutes and publish it. The result reads like every other AI-generated post on the internet, ranks for nothing, and tells your audience that you do not take your own content seriously.

Done right, the workflow saves you 60 to 70% of the drafting time while keeping the content recognizably yours. It is the difference between an AI tool and an AI system.

Tier 4: Data plumbing between tools

Almost every small business has the same problem: information lives in three or four tools (calendar, CRM, email, billing) and someone is the human glue. AI is now genuinely useful for the small-scale data plumbing problem — read this email, decide what kind of update it is, find the matching record in the CRM, update it, send a notification.

This is not really “AI” in the dramatic sense. It is just that large language models are very good at the messy parts (parsing free-form text, classifying intent) that used to require a human. The boring middle of business operations is where the real return lives.

What is not worth doing yet

To be honest about it: most small businesses should skip these for now.

Voice cloning and AI phone agents. The technology works in narrow demos, but the production reliability is not there for a small business that cannot afford to lose customers when it misfires. Wait six to twelve months.

“AI strategy consulting” engagements. If someone wants to sell you a $20,000 “AI readiness assessment” before any code is written, you are paying for slides. Skip it. Pay an engineer to build one of the things in the list above and learn what AI is good and bad at from the actual work.

Replacing your customer service team. Even at the largest enterprises, the right model right now is AI handling the routine tier and people handling everything else. A small business with one or two customer-facing people gains nothing by trying to replace them — and risks a lot by trying.

The privacy question

For Los Angeles businesses in regulated or sensitive spaces — clinics, law firms, anyone touching financial information — the privacy question is real and worth getting right at the start. The good news: the safest version of AI automation is also the easiest to deploy. Train the AI only on your public information. Keep sensitive records out of any prompt that goes to a third-party model. Put a human in the loop for any output that touches a real client.

This is not just risk management. It is also better automation. AI systems that are scoped narrowly to public, low-risk information are easier to test, easier to monitor, and less likely to embarrass you.

What two weeks looks like

The shape of a real small-business AI project: a week to listen and scope, a week to build a private prototype, a week to test with synthetic and then live traffic, and then steady refinement once it is in front of real users. Most of our small business engagements fit in this rhythm. The technology is no longer the constraint. The constraint is finding the right problem in your own business — and not over-engineering the solution.

Headless WordPress vs Traditional WordPress: What LA Businesses Need to Know

If you own a business in Los Angeles and your website runs on WordPress, you have probably been told at some point that you should “go headless.” Maybe by a developer, maybe by a vendor, maybe by a Medium article that made the architecture sound like the difference between a Ferrari and a sedan.

The truth is more boring and more useful: headless WordPress is the right answer for some businesses and overkill for others. This article explains the actual difference, when each one wins, and what it costs you to change your mind later.

Traditional WordPress, in plain language

In a traditional WordPress site, the same software does everything. Your editors log into yoursite.com/wp-admin, write a page, hit publish. WordPress saves it to a MySQL database. When a visitor lands on the page, WordPress reads from the database, runs your theme’s PHP templates, mixes in any plugins, and produces an HTML page on the fly.

This is what 43% of the web runs on. It works. It is well understood. It has a massive ecosystem of themes, plugins, and developers who know it. For a brochure site, a small business presence, or a content-heavy blog, traditional WordPress is the right choice and trying to over-engineer it is a waste of your money.

Headless WordPress, in plain language

In a headless setup, WordPress only handles the back of the house. Editors still log into wp-admin and write pages the way they always have. But the public website — the part visitors actually see — is a separate application built in a modern frontend framework like Next.js or Astro. That frontend pulls content from WordPress through the REST API (or GraphQL) and renders it however it wants, on its own server.

You still get WordPress for editing. You no longer get WordPress for rendering. The “head” — the public-facing presentation layer — is somewhere else.

What you actually gain

Three things tend to matter to real businesses:

Speed. A well-built Next.js or Astro frontend serves static HTML for most page loads. No PHP, no database query, no plugin chain. The first paint happens in a few hundred milliseconds rather than a few seconds. For a business with paid traffic, this is not a vanity metric — it directly affects bounce rates and ad conversion costs.

Design freedom. WordPress themes are powerful but they push you toward a certain shape. A custom frontend lets your designers do whatever they want — animations, complex layouts, micro-interactions — without fighting the theme system. If your brand is part of your competitive edge, this is significant.

Security surface reduction. The wp-admin URL can be moved to a subdomain that is not in any search engine’s index and only your team’s IPs can reach. The public site does not run PHP, does not load plugins, and is not a target for the usual WordPress exploit traffic.

What you actually give up

This is the part most articles skip.

Preview becomes harder. In traditional WordPress, an editor hits “preview” and immediately sees the page exactly as it will publish. In a headless setup, preview requires extra plumbing — usually a draft API endpoint and a preview build of the frontend. It works, but it is not free.

Most page-builder plugins stop working. Elementor, Divi, Beaver Builder — all of them generate HTML and CSS at render time, on the WordPress side. If your frontend is not rendering WordPress’s HTML, those builders are not building anything anyone will see. You will be writing real components in code, or using a CMS-side block editor with a custom frontend renderer.

Hosting gets more complex. Two servers instead of one: the WordPress backend somewhere (often shared hosting or a small VPS) and the Node-based frontend somewhere else (a VPS, Vercel, Netlify). Two environments to monitor. Two SSL certs. Two deploy pipelines. For a small team this is a real ongoing tax.

Some plugins become irrelevant. SEO plugins like Yoast still help in wp-admin, but a lot of what they do — generating meta tags, sitemaps, schema — has to be reimplemented or replaced on the frontend. The plugin marketplace assumption breaks.

The honest decision tree

For a Los Angeles business deciding between the two, here is the test we use:

  • Less than 5,000 visitors a month, brochure site, small content needs: Traditional WordPress. Caching plugin, decent host, done. Going headless will cost you more than you save.
  • You spend money on paid acquisition and care about conversion rates: Headless is worth a serious look. The page-load improvement alone often pays for the rebuild within a year.
  • You have a complex content team using Yoast, ACF, and editorial workflows: Headless, but keep WordPress as the CMS. Your editors do not need to know anything changed. They keep working in wp-admin.
  • You have heavy custom functionality (members area, e-commerce, complex forms): Almost always traditional WordPress with WooCommerce or a learning management plugin, unless you are large enough to staff a real engineering team.
  • You are starting fresh and care about brand presentation: Headless from day one. Going from traditional to headless later is expensive; starting headless is not.

What about hybrid?

There is a third option people forget: keep your main site traditional, and put a separate headless application on a subdomain for specific high-traffic or high-design pages. The marketing site stays on WordPress. The product, the pricing page, or the lead-generation flow lives in Next.js. Editors do not have to learn anything new for the main site, and the conversion-critical pages get the speed boost.

This is what most pragmatic small businesses end up doing. It is also what we build most often.

What it costs to change your mind

The good news: WordPress is a stable target. Content created in WordPress today is portable across both architectures. If you start with traditional WordPress and decide a year from now that you want a headless frontend, you keep all your content and just add the frontend application. No migration. No data loss.

That asymmetry is why we usually tell smaller LA businesses to start traditional and add the headless layer when there is a specific reason — a slow conversion page, a paid traffic problem, a design ambition — that justifies the second piece of infrastructure.

The technology should follow the business problem, not the other way around.

Why Los Angeles Law Firms Are Replacing Web Forms with AI Intake Assistants

Walk into most law firm websites in Los Angeles and you find the same thing: a polished header, a few practice-area pages, and a contact form that has not changed in a decade. Submit it, and somewhere inside the firm a paralegal eventually copy-pastes your information into Clio or Practice Panther, types up a screening summary, and emails an attorney to ask if the matter is even worth a consult.

That whole loop is now being quietly replaced by AI intake assistants. Not chatbots that say “How can I help you today?” and then route you to a human. Real intake systems that ask the right qualifying questions, run a conflict check against the firm’s matter database, summarize the conversation, and put a structured matter brief in front of the attorney before they ever pick up the phone.

What an AI intake assistant actually does

The phrase “AI chatbot” covers a lot of ground, and most of it is not what a law firm needs. A useful intake assistant for a personal injury, immigration, family law, or business litigation firm typically does five things:

  1. Qualifies the matter. It asks the dynamic questions a senior intake specialist would ask — what happened, when, where, who is involved — and adjusts based on the answers. A motor vehicle accident gets different questions than a slip-and-fall.
  2. Captures the structured facts. Dates, names, jurisdictions, insurance carriers, damages — the things your case management system needs as actual fields, not buried in a paragraph.
  3. Runs a conflict check. Against your existing client and matter database, before any privileged information is exchanged.
  4. Routes by practice area and urgency. A statute-of-limitations issue gets flagged. A general inquiry waits for the next available associate.
  5. Hands off cleanly to a human. Either a calendared consult with the right attorney, or a polite “this is not something we handle, here is what to look for” decline with no time wasted on either side.

Why web forms lose

A static web form has one job — capture a contact — and one mode: ask everything at once. Every additional field reduces submission rates. Every required field tries to predict in advance what information matters for every matter type. So firms end up with either thin forms that generate junk leads, or long forms that scare off the qualified ones.

An AI intake conversation has the opposite shape. It asks a few questions, listens to the answer, and asks the next question based on what was said. It can be longer than a form because it does not feel like a form. And it surfaces issues — like a missed statute of limitations or a conflict — that a form would simply collect and pass along.

The objections that come up — and the honest answers

“Is this UPL? Are we letting AI give legal advice?” A well-built intake assistant explicitly does not give legal advice. It collects facts, qualifies the matter, and schedules. Its scripts are written by the firm and reviewed by an attorney. Every response either asks a question, summarizes what the user said, or hands off to a human. This is the same thing your intake specialist does — and intake specialists are not practicing law either.

“What about privilege?” The intake conversation happens before an attorney-client relationship is formed, so it is not privileged communication in the technical sense. That is true of your web form too. What matters is data handling: encrypted at rest, access controlled, retention policies clear, and the user told upfront that what they share is not yet privileged. Good intake systems disclose this in the first message.

“Will it hallucinate?” This is the right question to ask, and the answer is: not if it is built correctly. A practical intake assistant is not a freewheeling chat model. It is a constrained workflow with a small, controlled set of allowable responses, an explicit knowledge base of the firm’s services, and refusal patterns for anything outside scope. The model is one component, not the whole system.

“What about clients who hate talking to bots?” They can skip it. Every well-designed intake assistant includes a clear “talk to a human” option, ideally with an actual phone number. The intake assistant exists to make life easier, not to gatekeep.

What it looks like in practice for a Los Angeles firm

A mid-size family law firm in West LA replaces its contact form with an AI intake assistant. A potential client visits at 11pm on a Sunday with an urgent custody question. The assistant asks the qualifying questions — jurisdiction, current court orders, immediate safety concerns — captures the facts, flags the matter as urgent, and books a Monday morning intake call with the available attorney. The attorney walks into the office with a one-page brief on her desk, not a vague form to chase down.

That is the actual product. Not magic. Not replacing the lawyer. Just removing the slow, error-prone middle step.

Where to start

You do not need to rebuild your website to do this. The intake assistant lives on top of your existing site as a chat widget or a dedicated intake page. Most firms can have a private staging version running within three to four weeks: one week to scope the conversation flows, one to two to build, and a final week to test with synthetic matters before any real client sees it.

The piece that takes longest is not the technology. It is sitting down with your senior intake person — the one who has been screening matters for fifteen years — and writing down what they actually do. The AI just runs the playbook they have already perfected.

Build a Secure Webhook Gateway for WordPress with Django, HMAC Verification, and Idempotent Jobs

This guide shows how to deploy a webhook gateway in Django that verifies third‑party signatures (e.g., Stripe, GitHub, HubSpot), normalizes events, runs idempotent background jobs, and safely updates WordPress via authenticated REST. It’s designed for multi-tenant SaaS sites and automation-heavy WordPress installs.

Architecture
– Sources: Third-party services send webhooks to a single Django gateway.
– Ingress: Django REST endpoint with HMAC/signature verification, timestamp tolerance, payload caps, and replay protection.
– Queue: Valid events normalized into a canonical schema and enqueued (Celery + Redis or RQ).
– Workers: Idempotent tasks execute business logic and call WordPress via secure REST.
– WordPress: Minimal plugin or functions.php adds authenticated REST routes and caches state.
– Observability: Structured logs, metrics, traces, and a dead-letter queue (DLQ).

Why a Django Gateway in Front of WordPress
– Security: Keeps secrets and signature logic server-side; network allowlisting on a single IP.
– Reliability: Centralizes retries, idempotency, and DLQ handling.
– Performance: Offloads heavy work from WordPress; reduces request time and PHP memory pressure.
– Control: One normalization layer supports many services and tenants.

Django Models (minimal)
– WebhookEvent
– id (uuid), source (str), event_type (str), received_at (datetime)
– raw_body (bytes), headers (json), signature_valid (bool), idempotency_key (str)
– status (received|queued|processed|failed|discarded), attempts (int), error (text)
– JobExecution
– id (uuid), event (fk), task_name (str), status, attempts, started_at, finished_at, error

Ingress Endpoint (Django REST Framework)
– POST /api/webhooks/{source}/ingest
– Steps:
1) Enforce POST, JSON, Content-Length cap (e.g., 512 KB).
2) Read raw body exactly as sent.
3) Validate timestamp header (e.g., Stripe’s t=… within 5 min).
4) Verify HMAC or provider signature with per-tenant secret.
5) Compute idempotency_key from provider event id + source + tenant.
6) Persist WebhookEvent with signature_valid flag and dedupe check.
7) If valid + new, enqueue Celery task; else return 200 for duplicates, 400/401 on invalid.

Example verification (Stripe-style)
– Signature header: Stripe-Signature = t=…,v1=HMAC_SHA256(secret, “{t}.{raw_body}”)
– Reject if timestamp skew > 300s or if computed HMAC != v1.

Pseudo-code (concise)
– Verify
– raw = request.body
– sig_hdr = request.headers.get(“Stripe-Signature”)
– t, v1 = parse(sig_hdr)
– if abs(now – t) > 300: 401
– mac = hex(hmac_sha256(secret, f”{t}.{raw}”))
– if not hmac_compare(mac, v1): 401
– Idempotency
– key = f”{source}:{tenant}:{provider_event_id}”
– if exists_in_store(key): return 200 (duplicate)
– store key in Redis with TTL 24h
– Persist + enqueue
– save WebhookEvent(…)
– celery_delay(event_id)

Normalization
– Map provider payloads to a canonical object:
– actor { id, email }
– subject { id, type }
– action { type, reason }
– data { free-form JSON }
– occurred_at
– tenant_id
– Keep raw payload for auditing.

Celery Task (idempotent)
– Load event by id; exit if already processed.
– Begin idempotency:
– Use Redis SETNX lock “job:{event.id}”
– Execute business logic:
– Transform event → downstream actions.
– Write records to DB; use UPSERTs for repeat events.
– Call WordPress REST with retries.
– Mark processed; release lock.

WordPress Integration
– Auth options:
– Application Passwords (Basic over HTTPS) for server-to-server.
– JWT (recommended if you need scoped tokens and rotation).
– REST route examples:
– POST /wp-json/ai-guy/v1/user-sync
– POST /wp-json/ai-guy/v1/order-event
– Hardening:
– Require server IP allowlist or token signature.
– Validate JSON schema with WP_REST_Request.
– Enforce rate limits via transients or a small options cache.
– Return 202 for async processing; never block on heavy work in PHP.

Minimal WordPress route (pseudo)
– register_rest_route(‘ai-guy/v1’, ‘/user-sync’, [
– ‘methods’ => ‘POST’,
– ‘permission_callback’ => function($request) {
// Verify Authorization header or shared HMAC
return current_user_can(‘manage_options’) || verify_server_token($request);
},
– ‘callback’ => function($request) {
$data = $request->get_json_params();
// validate schema; sanitize
// wp_insert_user or wp_update_user
// set/update user meta
return new WP_REST_Response([‘ok’ => true], 200);
}
])

Django → WordPress Request
– POST with:
– Authorization: Bearer or Basic (App Password)
– X-Request-Id: {uuid}
– X-Signature: HMAC_SHA256(shared_secret, raw_body)
– Retries: exponential backoff (e.g., 1s, 3s, 9s, 27s, jitter), max 5 tries.
– Idempotency: include Idempotency-Key header and handle at WordPress by short-circuiting duplicates.

Security Controls
– Signature verification per provider; rotate secrets quarterly.
– Replay protection with timestamp and nonce storage (Redis SETEX).
– Payload limits and JSON schema validation.
– Network controls: only expose /ingest via a public path; restrict /admin with VPN.
– Secrets in environment variables; no secrets in code or DB exports.
– Audit fields (created_by, source_ip, user_agent).
– PII minimization and encryption at rest if needed.

Observability
– Structured JSON logs: event_id, tenant, source, outcome, latency_ms.
– Metrics:
– webhook.ingest.count, .fail.count
– job.duration.ms, job.retry.count
– wordpress.http.2xx/4xx/5xx
– Tracing: propagate traceparent to WordPress; annotate external calls.
– DLQ: failed events after N retries go to DLQ table + Slack/Email alert.

Performance & Scale
– Keep ingress fast: verify + enqueue only; never call WordPress inline.
– Use gunicorn with async workers or uvicorn + ASGI for bursty loads.
– Redis for locks, idempotency keys, and short-lived state.
– Batch downstream operations when possible (e.g., queue coalescing per user).
– Backpressure: pause workers on WP 5xx storms; circuit breaker per endpoint.

Local Development
– Use ngrok or Cloudflare Tunnel for provider callbacks.
– Seed tenant/provider secrets in .env (never commit).
– Replay fixtures from saved JSON payloads.
– Integration tests:
– Valid signature → 202; invalid → 401
– Duplicate event → 200 no-op
– Worker retry logic on transient WordPress 5xx
– Contract tests for WordPress routes with schema checks.

Example .env (redacted)
– PROVIDER_SECRETS_STRIPE=tenantA:sk_live_xxx,tenantB:sk_live_yyy
– WORDPRESS_BASE_URL=https://example.com
– WORDPRESS_TOKEN=…
– HMAC_SHARED_SECRET=…

Cutover Plan
– Deploy Django gateway behind HTTPS (HSTS).
– Create provider webhook endpoints per tenant: https://api.yourdomain.com/api/webhooks/stripe
– Validate signatures live; mirror events to a non-production DLQ initially.
– Gradually enable WordPress writes per event type; monitor metrics.
– Add dashboards and alerts; document runbooks.

Failure Handling
– Provider 429/5xx at ingress: accept and queue; never call back providers.
– WordPress 4xx: mark failed, do not retry unless fixable (e.g., 409).
– WordPress 5xx/timeouts: exponential retry with jitter up to max window.
– Poison messages: move to DLQ with root-cause tag.

Deliverables checklist
– Django endpoint with signature verification and timestamp tolerance
– Redis idempotency + nonce store
– Celery worker with idempotent jobs and circuit breaker
– WordPress REST routes with auth, schema validation, and idempotent handling
– Observability: logs, metrics, traces, DLQ
– Runbooks: secret rotation, replay, backfills

This pattern keeps webhook security, reliability, and performance centralized, while WordPress remains a clean, fast presentation and light business layer. It’s battle-tested for multi-tenant automations and scales with your traffic.

Shipping a Production Support Agent: Brain + Hands with Django, Redis, and WordPress

This post walks through a production-ready support agent with a Brain + Hands separation, wired into WordPress on the front, and Django on the back. The goal: predictable behavior, fast responses, measurable quality, and easy handoff to humans.

Use case
– Tier-1 support for order status, returns, product info, and FAQ
– Handoff to human when confidence is low or user requests it
– Works in a WordPress site widget, Slack, and email (shared backend)

Architecture (high level)
– Front-end: WordPress chat widget (vanilla JS) -> Django REST endpoint
– Brain: LLM for reasoning + routing (no direct data access)
– Hands: Tools in Django (Postgres + Redis) exposed via function-calling schemas
– Memory: Short-term thread memory (Redis), long-term knowledge (Postgres + pgvector)
– Orchestrator: Deterministic state machine (Django service + Celery tasks)
– RAG: Product/FAQ index with embeddings; constrained retrieval
– Observability: Request logs, traces, tool latency, outcomes, cost
– Deployment: Docker, Nginx, Gunicorn, Celery, Redis, Postgres

Brain + Hands separation
– Brain (LLM): Planning, deciding which tool to call, assembling final answer. No raw DB/API keys. Receives tool specs only.
– Hands (Tools): Deterministic, side-effect aware, with strict input/output schemas. Tools never “think”—they do.

Core tools (Hands)
– search_kb(query, top_k): RAG over Postgres+pgvector. Returns citations with IDs and source.
– get_order(email|order_id): Reads order status from internal service.
– create_ticket(email, subject, body, priority): Creates support case in helpdesk.
– handoff_human(reason, transcript_excerpt): Flags for live agent queue with context.

Tool contracts (JSON schema examples)
– search_kb input: { query: string, top_k: integer PLAN -> (TOOL_LOOP)* -> DRAFT -> GUARDRAIL -> RESPOND
– TOOL_LOOP limits to 3 tool calls per turn
– If Brain calls an unknown tool or wrong schema: correct and retry once, else fallback to handoff_human
– Timeouts: 3s per tool; overall SLA 6s; degrade mode returns partial + “We’re checking further via email” and opens ticket

Guardrails
– Content filter: block sensitive/abusive content; offer handoff
– PII sanitizer: mask tokens before vector search
– Citation checker: if answer references kb, verify at least one valid citation is present
– Safety fallback: neutral response + create_ticket when filter trips

RAG implementation
– Storage: Postgres with pgvector for embeddings
– Chunking: 512–800 tokens, overlap 80
– Metadata: doc_id, section, source, updated_at, allowed_channels
– Query: Hybrid BM25 + vector; re-rank top 8 to 3
– Response: Return only snippets + URLs; Brain composes final with citations “(See: Title)”

Error handling
– Tool failures: exponential backoff (200ms, 400ms); then circuit-break for 60s
– LLM failures: switch to fallback model on timeout; respond with concise generic + ticket
– Data drift: if RAG index empty or stale, disable search_kb and escalate

WordPress integration
– Front-end widget: Minimal JS injects a floating chat; posts to /api/agent/messages with thread_id and csrf token nonce
– Auth: Public sessions get rate-limited by IP + device fingerprint; logged-in users attach JWT from WordPress to Django via shared secret
– Webhooks: Ticket created -> WordPress admin notice and email; agent takeover -> support Slack channel

Django endpoints (concise)
– POST /api/agent/messages: { thread_id, user_msg }
– GET /api/agent/thread/{id}: returns last N messages + status
– POST /api/agent/feedback: thumbs_up/down, tags
– Admin: /admin/agent/tools, /admin/agent/kb, /admin/agent/metrics

Celery tasks
– run_brain_step(thread_id)
– execute_tool(call_id)
– rebuild_kb_index()
– nightly_eval() against golden test set

Model selection
– Primary: a function-calling LLM with low latency (e.g., GPT-4o-mini or Claude Sonnet-lite). Keep token limits reasonable.
– Fallback: cheaper model with same tool schema to maintain compatibility.
– Temperature: 0.2 for tool routing, 0.5 for final drafting.

Cost and latency targets
– P50: 1.4s response (no tools), 2.8s with RAG, 3.5s with order lookup
– P95: <5s
– Cost: X%

Deployment notes
– Docker services: web (Gunicorn), worker (Celery), scheduler (Celery Beat), redis, postgres, nginx
– Readiness probes: tool ping, RAG index freshness, model API status
– Secrets: mounted via Docker secrets; rotate quarterly
– Blue/green deploy: drain workers, warm RAG cache, switch traffic

Minimal data models
– threads(id, user_id, channel, status, created_at)
– messages(id, thread_id, role, content, tool_name?, tool_payload?, created_at)
– kb_docs(id, title, url, text, embedding, updated_at, allowed_channels)
– tickets(id, thread_id, external_id, status, priority, created_at)

Snippet: tool call flow (pseudo)
– User -> /messages
– Orchestrator builds context from Redis + last N messages
– Brain returns tool_call: search_kb
– Celery executes search_kb, stores items
– Brain drafts answer with citations
– Guardrail checks
– Respond; optionally create_ticket if unresolved

Rollout plan
– Phase 1: FAQ-only RAG; no order lookups; human-in-the-loop
– Phase 2: Enable get_order with safe whitelist; add evals
– Phase 3: Enable create_ticket + SLA timers
– Phase 4: Add Slack channel and email ingestion to same backend

What to avoid
– Letting the Brain call HTTP endpoints directly
– Unbounded memory growth in Redis
– RAG over unreviewed or user-generated content
– Returning tool stack traces to users

Repository checklist
– /orchestrator: state machine, guardrails
– /tools: deterministic functions, schemas, tests
– /brain: prompt templates, model client, retries
– /kb: loaders, chunker, embeddings, indexer
– /web: Django views, serializers, auth
– /ops: docker-compose, nginx, CI, eval harness, dashboards

This pattern gives you a predictable, support-ready agent that integrates cleanly with WordPress, scales under load, and stays auditable.

A production-ready pattern for AI in WordPress: async jobs, signed webhooks, and external workers

Why this pattern
– WordPress is great at routing and rendering, not long-running I/O.
– AI calls are slow, variable, and expensive; they need retries, quotas, and tracing.
– The solution: push jobs to an external worker and accept results via signed webhooks.

Architecture (high level)
– Client (WP admin or theme) submits an AI request to a WP REST route.
– WordPress writes a job row (pending), enqueues to an external queue (or HTTP to a worker gateway).
– Worker (Python/Node) pulls the job, calls the AI provider, then POSTs a signed webhook back to WordPress.
– WordPress verifies the signature, stores result, and invalidates relevant cache.
– Frontend polls or uses SSE/WS via a lightweight proxy for updates.

Database schema (custom table)
– wp_ai_jobs
– id (bigint PK)
– user_id (bigint)
– status (enum: pending, running, succeeded, failed)
– input_hash (char(64)) for idempotency
– request_json (longtext)
– result_json (longtext, nullable)
– error_text (text, nullable)
– created_at, updated_at (datetime)
– idempotency_key (varchar(64), unique)
– webhook_ts (datetime, nullable)

Create the table on plugin activation
– dbDelta with utf8mb4, proper indexes:
– INDEX status_created (status, created_at)
– UNIQUE idempotency_key (idempotency_key)
– INDEX input_hash (input_hash)

Plugin structure (minimal)
– ai-integration/
– ai-integration.php (bootstrap, routes, activation)
– includes/
– class-ai-controller.php (REST endpoints)
– class-ai-webhook.php (webhook verifier)
– class-ai-repo.php (DB access)
– class-ai-queue.php (enqueue out to worker)
– helpers.php (crypto, validation)
– Do not store secrets in options; put them in wp-config.php.

Secrets and config (wp-config.php)
– define(‘AI_WORKER_URL’, ‘https://worker.example.com/jobs’);
– define(‘AI_WEBHOOK_SECRET’, ‘base64-32-bytes’);
– define(‘AI_JWT_PRIVATE_KEY’, ‘—–BEGIN PRIVATE KEY—–…’);
– define(‘AI_QUEUE_TIMEOUT’, 2); // seconds for outbound enqueue

REST endpoint: create job (POST /wp-json/ai/v1/jobs)
– Validate capability (logged-in or signed public token).
– Build idempotency_key from client or hash(input_json + user_id + model).
– Insert row (pending).
– Enqueue to worker:
– POST to AI_WORKER_URL with signed JWT (kid, iat, exp, sub=user_id, jti=idempotency_key).
– Timeout <= 2s. If enqueue fails, leave job pending; a retry worker (Action Scheduler) can re-enqueue.
– Return { job_id, status: "pending" }.

Example: tiny enqueue
– Headers: Authorization: Bearer
– Body: { job_id, idempotency_key, request: {…}, callback_url: “https://site.com/wp-json/ai/v1/webhook” }

Webhook endpoint: receive result (POST /wp-json/ai/v1/webhook)
– Require HMAC-SHA256 signature header: X-AI-Signature: base64(hmac(secret, body))
– Require idempotency_key and job_id in body.
– Verify:
– Constant-time compare HMAC.
– Check timestamp drift <= 2 minutes (X-AI-Timestamp).
– Enforce replay guard: cache "webhook:{jti}" in Redis for 10m.
– Update row (status to succeeded/failed, set result_json or error_text, webhook_ts).
– Return 204.

Minimal verification (PHP)
– $sig = base64_decode($_SERVER['HTTP_X_AI_SIGNATURE'] ?? '');
– $calc = hash_hmac('sha256', $rawBody, AI_WEBHOOK_SECRET, true);
– hash_equals($sig, $calc) or wp_die('invalid sig', 403);

Frontend polling pattern
– Client gets job_id, then polls GET /wp-json/ai/v1/jobs/{id} every 1–2s (cap at 30s).
– Cache-control: private, max-age=0. Use ETag from updated_at to 304 unchanged.
– Optional: stream via SSE proxied through PHP only if your infra supports long-lived requests without PHP-FPM worker starvation.

Idempotency and dedupe
– On create:
– If idempotency_key exists, return existing job.
– Also check input_hash + user_id within time window to reduce duplicates from flaky clients.

Rate limiting
– Per-user sliding window: e.g., 60 jobs/10m.
– Use wp_cache (Redis/Memcached). Key: rl:{user}:{minute-epoch}. Increment and check.
– On limit exceed, 429 with Retry-After.

Background retries
– Action Scheduler job scans pending/running older than N minutes:
– Re-enqueue if no worker ack.
– Mark failed if exceeded retry budget; store error_text.

Security checklist
– Do not accept webhooks without HMAC and timestamp.
– JWT to worker uses short exp (<=60s). Sign with ES256 or RS256; rotate keys quarterly.
– Sanitize and escape all fields when rendering.
– Disable file edits in prod; restrict wp-admin to known IPs if possible.
– Log minimal PII; encrypt sensitive request_json fields at rest if needed (sodium_crypto_secretbox).

Performance considerations
– Never call AI providers inside a WP page render path.
– Outbound enqueue must be non-blocking (<2s). Use Requests::post with short timeouts and no redirects.
– Store only necessary parts of result_json; large blobs to object storage (S3) with signed URLs.
– Use indexes to keep dashboard queries fast; paginate admin list by created_at DESC.
– Cache job summaries with wp_cache_set on read path; invalidate on webhook.

Worker reference (Python, outline)
– Pull from queue, call provider with circuit breaker and retry/backoff (e.g., 100ms→2s jitter).
– On completion, POST result to callback_url with:
– Headers: X-AI-Signature, X-AI-Timestamp
– Body: { job_id, idempotency_key, status, result_json, usage: {tokens, ms} }
– Keep results small; upload big artifacts elsewhere first.

Minimal job table index DDL
– INDEX status_created (status, created_at)
– INDEX user_created (user_id, created_at)
– UNIQUE idempotency_key (idempotency_key)

Observability
– Add a request_id to all flows; return it to client.
– Store provider latency, tokens, and error codes in result_json. Useful for cost/perf dashboards.
– Emit Server-Timing headers on job reads: worker;dur=123,provider;dur=456.

Admin UI ideas
– List jobs with filters (status, user, model).
– Re-enqueue button (capability checked).
– Export CSV of usage by date/user.

Deployment checklist
– HTTPS everywhere; verify real client IP behind any CDN.
– Set AI_WEBHOOK_SECRET via environment, not version control.
– Protect webhook with allowlist of worker IPs if static.
– Enable object cache. Prefer Redis with persistence.
– Load test: 200 req/s create → ensure PHP-FPM pool and DB connections stay healthy.
– Back up the table and rotate old rows to cold storage monthly.

What to avoid
– Synchronous AI calls in templates.
– Storing provider keys in options.
– Webhooks without signature or timestamp.
– Unbounded job payload sizes.

This pattern scales from small sites to high-traffic publishers, keeps your PHP requests fast, and centralizes reliability and security where they belong: in the worker and webhook boundary.

Production RAG for WordPress: pgvector + FastAPI backend, secure webhook intake, and a shortcode chat UI

Overview
This tutorial wires WordPress to a production-grade RAG backend:
– Intake: WordPress Media upload triggers a signed webhook to the backend.
– Index: Backend fetches the file, chunks text, stores embeddings in Postgres/pgvector.
– Serve: FastAPI endpoint answers user questions via retrieval-augmented generation.
– Frontend: A WordPress shortcode renders a chat box that queries the backend.

We’ll keep the stack minimal and production-ready:
– WordPress (webhook + shortcode)
– Python FastAPI backend
– Postgres + pgvector
– OpenAI embeddings + model (swap as needed)
– Nginx or cloud proxy, HTTPS, and API key auth

Architecture
1) User uploads PDF/Doc to WordPress Media.
2) WordPress sends a webhook: {file_url, title, post_id, signature}.
3) Backend validates the HMAC, downloads file, extracts text, chunks, embeds, stores in pgvector with a collection/site scope.
4) Chat UI (shortcode) hits /rag/query with apiKey to return grounded answers.

Prereqs
– WordPress admin access
– Python 3.11+, FastAPI, uvicorn
– Postgres 14+ with pgvector
– OpenAI API key (or compatible embedding/LLM)
– A secret shared between WP and backend for webhook signing

Database setup (pgvector)
— Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

— Documents table
CREATE TABLE IF NOT EXISTS documents (
id UUID PRIMARY KEY,
site_id TEXT NOT NULL,
doc_id TEXT NOT NULL, — WP attachment ID or slug
title TEXT,
source_url TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);

— Chunks table
CREATE TABLE IF NOT EXISTS doc_chunks (
id UUID PRIMARY KEY,
doc_id UUID REFERENCES documents(id) ON DELETE CASCADE,
idx INT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), — match embedding size
token_count INT,
created_at TIMESTAMPTZ DEFAULT now()
);

— Index for ANN search
CREATE INDEX IF NOT EXISTS doc_chunks_embedding_ivfflat
ON doc_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

— Filter speedups
CREATE INDEX IF NOT EXISTS doc_chunks_doc_id_idx ON doc_chunks(doc_id);
CREATE INDEX IF NOT EXISTS documents_site_doc_idx ON documents(site_id, doc_id);

FastAPI backend (app/main.py)
– Provides /webhook/wp-media to index uploads.
– Provides /rag/query for Q&A.
– Uses HMAC-SHA256 signature (X-WP-Signature) header.

from fastapi import FastAPI, Header, HTTPException, Depends
from pydantic import BaseModel
import hmac, hashlib, os, uuid, httpx, io
import asyncpg
from typing import List, Optional
from datetime import datetime
from fastapi.middleware.cors import CORSMiddleware
from openai import AsyncOpenAI

OPENAI_API_KEY = os.getenv(“OPENAI_API_KEY”)
WEBHOOK_SECRET = os.getenv(“WEBHOOK_SECRET”) # shared with WP
DATABASE_URL = os.getenv(“DATABASE_URL”) # postgres://…
EMBED_MODEL = “text-embedding-3-small”
GEN_MODEL = “gpt-4o-mini”

app = FastAPI()
app.add_middleware(CORSMiddleware, allow_origins=[“https://your-site.com”], allow_methods=[“*”], allow_headers=[“*”])

client = AsyncOpenAI(api_key=OPENAI_API_KEY)

async def db():
if not hasattr(app.state, “pool”):
app.state.pool = await asyncpg.create_pool(DATABASE_URL, min_size=1, max_size=8)
return app.state.pool

def verify_signature(raw_body: bytes, signature: str):
mac = hmac.new(WEBHOOK_SECRET.encode(), raw_body, hashlib.sha256).hexdigest()
return hmac.compare_digest(mac, signature)

class WPWebhook(BaseModel):
site_id: str
file_url: str
title: Optional[str] = None
attachment_id: str

@app.post(“/webhook/wp-media”)
async def wp_media(webhook: WPWebhook, x_wp_signature: str = Header(None), raw_body: bytes = b””, pool=Depends(db)):
# Signature check (requires a middleware or route body retrieval)
if not x_wp_signature or not verify_signature(raw_body, x_wp_signature):
raise HTTPException(status_code=401, detail=”Invalid signature”)

# Download file
async with httpx.AsyncClient(timeout=60) as http:
r = await http.get(webhook.file_url)
r.raise_for_status()
content = r.content

# Extract text (PDF/doc). Minimal example uses pdfminer.six if PDF; else fallback.
text = await extract_text_auto(webhook.file_url, content)
chunks = simple_chunk(text, max_chars=1200, overlap=100)

# Insert document
doc_uuid = str(uuid.uuid4())
async with pool.acquire() as conn:
await conn.execute(
“INSERT INTO documents(id, site_id, doc_id, title, source_url) VALUES($1,$2,$3,$4,$5)”,
doc_uuid, webhook.site_id, webhook.attachment_id, webhook.title, webhook.file_url
)

# Embed and insert chunks
embeddings = await embed_texts([c[“content”] for c in chunks])
async with pool.acquire() as conn:
async with conn.transaction():
for i, (chunk, emb) in enumerate(zip(chunks, embeddings)):
await conn.execute(
“INSERT INTO doc_chunks(id, doc_id, idx, content, embedding, token_count) VALUES($1,$2,$3,$4,$5,$6)”,
str(uuid.uuid4()), doc_uuid, i, chunk[“content”], emb, chunk[“tokens”]
)
return {“status”:”ok”,”doc_id”:doc_uuid,”chunks”:len(chunks)}

async def extract_text_auto(url: str, content: bytes) -> str:
import mimetypes, tempfile, os
mt = mimetypes.guess_type(url)[0] or “”
if “pdf” in mt or url.lower().endswith(“.pdf”):
from pdfminer.high_level import extract_text
with tempfile.NamedTemporaryFile(delete=False, suffix=”.pdf”) as f:
f.write(content); f.flush()
out = extract_text(f.name)
os.unlink(f.name)
return out or “”
# Basic fallback
try:
return content.decode(“utf-8″, errors=”ignore”)
except:
return “”

def simple_chunk(text: str, max_chars=1200, overlap=100):
text = text.strip()
if not text:
return []
chunks = []
i = 0
while i < len(text):
end = min(i+max_chars, len(text))
chunks.append({"content": text[i:end], "tokens": int((end – i)/4)}) # rough est
i = end – overlap
if i < 0: i = 0
return chunks

async def embed_texts(texts: List[str]):
if not texts:
return []
resp = await client.embeddings.create(model=EMBED_MODEL, input=texts)
return [e.embedding for e in resp.data]

class QueryBody(BaseModel):
site_id: str
question: str
k: int = 5
api_key: Optional[str] = None # simple per-site key

def require_site_key(key: Optional[str], site_id: str):
expected = os.getenv(f"SITE_{site_id.upper()}_KEY")
if expected and key != expected:
raise HTTPException(status_code=401, detail="Invalid API key")

@app.post("/rag/query")
async def rag_query(q: QueryBody, pool=Depends(db)):
require_site_key(q.api_key, q.site_id)
# Embed question
qemb = (await embed_texts([q.question]))[0]
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT c.content, 1 – (c.embedding $1::vector) AS score
FROM doc_chunks c
JOIN documents d ON d.id = c.doc_id
WHERE d.site_id = $2
ORDER BY c.embedding $1::vector
LIMIT $3
“””,
qemb, q.site_id, q.k
)
context = “nn”.join([r[“content”] for r in rows])

prompt = f”You are a helpful assistant. Use the context to answer.nnContext:n{context}nnQuestion: {q.question}nAnswer concisely with citations like [chunk #].”
messages = [{“role”:”user”,”content”:prompt}]
comp = await client.chat.completions.create(model=GEN_MODEL, messages=messages, temperature=0.2)
answer = comp.choices[0].message.content
return {“answer”: answer, “hits”: len(rows)}

Note: For raw_body signature verification, FastAPI needs request.state or a custom middleware to capture the raw bytes. In production, add a middleware to cache body for verification.

WordPress: webhook sender (plugin)
Create a small MU-plugin or standard plugin to post to the backend on upload.

post_type !== ‘attachment’) return;

$file_url = wp_get_attachment_url($post_ID);
$title = get_the_title($post_ID);
$site_id = get_bloginfo(‘url’); // or a fixed slug
$payload = array(
‘site_id’ => $site_id,
‘file_url’ => $file_url,
‘title’ => $title,
‘attachment_id’ => strval($post_ID),
);
$json = wp_json_encode($payload);
$secret = getenv(‘AI_WEBHOOK_SECRET’) ?: ‘change-me’;
$sig = hash_hmac(‘sha256’, $json, $secret);

$resp = wp_remote_post(‘https://api.your-backend.com/webhook/wp-media’, array(
‘headers’ => array(
‘Content-Type’ => ‘application/json’,
‘X-WP-Signature’ => $sig
),
‘body’ => $json,
‘timeout’ => 30
));
});

Shortcode chat UI
Adds [ai_chat] shortcode and a minimal UI that posts to /rag/query.

function ai_chat_shortcode($atts){
$a = shortcode_atts(array(
‘placeholder’ => ‘Ask about our docs…’,
‘site_id’ => get_bloginfo(‘url’),
), $atts);
ob_start(); ?>

<input id="ai-chat-q" type="text" placeholder="” style=”width:100%;padding:8px;” />

(function(){
const api = ‘https://api.your-backend.com/rag/query’;
const siteId = ”;
const key = ”;
const log = document.getElementById(‘ai-chat-log’);
const q = document.getElementById(‘ai-chat-q’);
document.getElementById(‘ai-chat-send’).addEventListener(‘click’, async function(){
const question = q.value.trim();
if(!question) return;
log.innerHTML += ‘

You: ‘ + question + ‘

‘;
q.value = ”;
try {
const r = await fetch(api, {
method: ‘POST’,
headers: {‘Content-Type’:’application/json’},
body: JSON.stringify({site_id: siteId, question, api_key: key})
});
const data = await r.json();
log.innerHTML += ‘

AI: ‘ + (data.answer || ‘No answer’) + ‘

‘;
} catch(e){
log.innerHTML += ‘

Error contacting AI backend.

‘;
}
});
})();

Writing via a small admin page, or define in wp-config.php and expose via get_option fallback.

Security and performance
– Transport: Enforce HTTPS end-to-end. Set CORS to your WP origin only.
– Auth: Use HMAC for webhooks and per-site API keys for /rag/query. Rotate keys regularly.
– Limits: Cap file size on WP, and validate mimetypes server-side. Queue large files.
– Costs: Use a small embedding model for indexing; cache embeddings by hash.
– Indexing: Run embedding in a background worker if uploads are frequent. Return 202 and poll status.
– Vector search: Tune ivfflat lists and analyze to your data size. Consider HNSW (pgvector 0.7+).
– Token control: Limit k and compress context (dedupe, summarization).
– Observability: Log latency, chunk counts, and hit scores. Add simple eval prompts for regression checks.
– Deployment:
– Postgres: managed instance with pgvector.
– Backend: Fly.io/Render/VM with health checks, 2+ replicas, stickyless.
– Secrets: Use platform secrets, not hard-coded keys.
– CDN: Serve static JS/CSS via WP enqueue, cache API via short TTL if answers are stable.

Local testing
– Create .env with OPENAI_API_KEY, WEBHOOK_SECRET, DATABASE_URL, SITE_{SITEID}_KEY.
– Run: uvicorn app.main:app –host 0.0.0.0 –port 8080 –proxy-headers
– Post a test webhook with curl and validate doc/chunk counts.
– Use the [ai_chat] shortcode on a test page.

What to adjust
– Swap extractors (unstructured, textract) for DOCX/HTML.
– Replace OpenAI with local or Azure endpoints by changing embed/generation calls.
– Add per-document metadata filters (post type, tags) in the query.

Inbox To Revenue: Deploying an AI Triage Router For Customer Ops (Gmail → Slack → Airtable)

Overview
Most SMBs lose money in the inbox: late replies, dropped leads, and manual copying into CRMs. This post shows how to deploy an AI triage router that classifies emails, extracts fields, assigns ownership, and generates first responses. Stack uses Gmail API, a lightweight Python service, an LLM, Slack for notifications, and Airtable as the system of record.

Target outcomes
– Classify inbound messages into 6-10 business-specific buckets
– Extract structured fields with >95% precision on core attributes
– Auto-acknowledge within 2 minutes, human follow-up within SLA
– Track cycle time and conversion in Airtable

Reference architecture
– Ingestion: Gmail API watch + Pub/Sub (or AWS SES/SNS) pushes new email IDs
– Processing: Python service (Cloud Run/Lambda) pulls raw MIME, normalizes text, strips signatures/footers
– Reasoning: LLM call (gpt-4.1-mini or Claude Haiku) with tool-free JSON output
– Persistence: Airtable (Tickets table), plus Redis queue for retries
– Notification: Slack webhook (team channel + assignee DM)
– Controls: Policy engine (PII redaction), rate limiting, eval harness
– Observability: BigQuery or Postgres for logs; Grafana/Looker dashboards

Airtable schema (minimal)
– Tickets: ticket_id, source, received_at, status, category, priority, customer_email, company, subject, summary, due_at, assignee, confidence, fields_json, reply_draft, url
– Categories: id, name, routing_rule, sla_minutes
– Agents/Assignees: id, name, slack_id, skill_tags, workload_score

LLM extraction targets
– category (enum): lead, support, billing, vendor, spam, career, legal, other
– intent: short verb phrase
– priority: low/normal/high (SLA map)
– entities: company, contact_name, email, phone, product, plan, order_id
– summary: 1-2 lines
– reply_draft: brief, factual, safe-to-send
– confidence: 0-1

Prompt shape (system)
– You are a router for customer operations. Output valid JSON only. Do not invent data. Leave null if unknown. Categories limited to: [list]. Keep reply_draft under 120 words, plain text, no promises we cannot keep.

Guardrails
– Temperature 0.2 for determinism
– Response format enforced with JSON schema validation
– If validation fails, fallback to simpler extraction prompt or rules

Routing rules (examples)
– lead → assignee with skill “sales” and workload_score < threshold; SLA 120 min
– billing → finance queue; SLA 240 min
– support with keywords (“down”, “outage”) → priority high; on-call Slack
– legal → do not auto-reply; escalate; redact attachments
– spam/marketing → closed; no Slack

Workflow
1) Watch: Gmail push notifies message_id
2) Normalize: Fetch MIME, remove tracking pixels, detect language
3) Safety: Strip PII from body preview; dedupe threads by Message-Id/In-Reply-To
4) LLM: Extract fields JSON, 2-shot examples per category
5) Persist: Upsert Ticket; compute due_at using SLA map; set status “new”
6) Notify: Post Slack summary with buttons (Claim, Reassign, Close, Send Draft)
7) Auto-acknowledge: If category in allowed list, send reply_draft to customer with footer “Human review in progress”
8) Measure: Log timings, confidence, corrections
9) Retrain: Periodic batch eval, update examples, adjust categories

Slack message format
– Title: [category][priority] subject
– Summary: 1 line + key entities
– Buttons: Claim (assign to self), Approve Draft (sends), Request Edit (opens modal), Reassign (picker)
– Thread: Bot posts Airtable link + due_at countdown

Failure modes and handling
– LLM timeout → retry with backoff; if still failing, default to rule-based category using keyword regex
– Low confidence (<0.6) → tag “needs_review”; do not auto-send; ping triage channel
– Large threads → summarize last human message only; include thread_size in log
– Attachments → virus scan; extract PDF text for entity match (order_id, invoice #)

Costs and performance
– Cost: ~ $0.002–$0.01 per email with small LLM; less if batching summaries
– Latency: Target <2s end-to-end; use streaming only for UI if needed
– Accuracy: Start with 6 categories; aim 95% precision on category, 98% on email detection, 85% on entities; iterate with error review
– Throughput: Cloud Run min-instances=0 for idle; scale to 100 rps bursts

Security and compliance
– Service account with restricted Gmail scopes
– Do not store raw bodies in logs; keep hashed identifiers
– PII redaction before Slack
– Secrets in GCP Secret Manager or AWS Secrets Manager
– Data retention policy in Airtable (archive after 180 days)

Evaluation loop (weekly)
– Sample 100 tickets; compare category, entities, SLA hit rate
– Track “first meaningful response” time and close rate per category
– Capture human edits to reply_draft for fine-tuning examples
– Adjust routing thresholds and on-call hours

ROI model (simple)
– Baseline: 400 inbound/month, 5 min manual triage each → 33 hours
– Post-automation: 30 sec review each → 3.3 hours
– Net saved: ~30 hours/month; at $45/hour → ~$1,350/month
– Plus conversion lift from same-day lead replies (track won vs. response time)

Implementation notes
– Use Gmail HistoryId to avoid double-processing
– Cache model responses for identical threads within 10 minutes (Redis)
– JSON schema example keys must be stable to preserve analytics
– Keep examples business-specific; swap in real subject lines, product names
– Add language detection; route non-English to bilingual assignees

Minimal endpoint contract (POST /triage)
– Input: message_id, thread_id
– Output: ticket_id, category, confidence, actions_taken [ack_sent, slack_posted]

Go-live checklist
– 2-week shadow mode (no auto-send), collect corrections
– Thresholds tuned; legal/billing excluded from auto-ack
– On-call rotation confirmed; Slack permissions tested
– Dashboards: SLA breach count, average first response, category distribution
– Runbook for outages and LLM provider failover

Extensions
– CRM sync (HubSpot/Close) on category=lead
– Voice/voicemail ingestion via transcription
– Calendar links in reply_draft for sales
– Priority boost for repeat customers (email/domain match)

Bottom line
Start narrow, measure aggressively, and keep humans-in-the-loop where it matters. This pattern reliably turns inbox chaos into a predictable, SLA-driven pipeline that pays for itself in the first month.