Building a Content Pipeline with n8n and a Local LLM

Building a Content Pipeline with n8n and a Local LLM

The default AI content pipeline assumption is that you need cloud API access to do anything useful. That assumption is wrong, and increasingly expensive to maintain as you scale content volume. A local LLM running on commodity hardware, combined with n8n for orchestration, handles the majority of structured content automation tasks at zero per-token cost.

This is a practical walkthrough of building that pipeline using Ollama, n8n for workflow automation, and Ghost CMS as the publishing layer.

The Stack

Ollama runs local LLMs via a simple REST API on localhost:11434. It manages model downloads, VRAM allocation, and request queuing. You interact with it via HTTP, the same way you would call OpenAI, but without the network latency or per-token billing. For content tasks, Mistral 7B, Llama 3.1 8B, or Qwen2.5 7B perform well for structured generation.

n8n is the orchestration layer. It handles the event triggers (new RSS item, scheduled scan, manual trigger), calls Ollama via HTTP Request node, processes the response, and routes the output to Ghost or a review queue. The built-in HTTP request node works directly with the Ollama API.

Ghost CMS is the publishing destination. Ghost's Admin API accepts posts in Lexical JSON format. The pipeline converts the LLM's text output to Lexical JSON before publishing.

The Workflow Architecture

The core workflow has five nodes. First, a Trigger node (schedule or webhook). Second, signal collection via HTTP Request node to content sources: RSS feeds, Hacker News API, arXiv API, Reddit JSON endpoint. Third, LLM generation via HTTP Request node to Ollama. Fourth, post-processing in a Code node to clean the generated text and convert to Ghost Lexical JSON format. Fifth, a publishing or review queue node.

The Prompt Structure

Local LLMs are more sensitive to prompt structure than frontier models. The prompt that works reliably for structured content generation starts with: You are a technical writer. Output ONLY the requested content. No preamble, no meta-commentary. Followed by the topic, target audience, word count, required sections, and format specification.

The key instructions are Output ONLY and no preamble. Local models have a strong prior toward explaining what they are about to do before doing it. Suppressing that behavior is the single most important prompt engineering step for structured content generation with local models.

Hardware Requirements

For a 7B parameter model at Q4 quantization: 6 to 8GB VRAM. An RTX 3060 with 12GB VRAM runs Llama 3.1 8B at 40 to 60 tokens per second, fast enough for automated content generation at meaningful volume.

For CPU inference without a GPU: Llama 3.1 8B at Q4 quantization runs at 8 to 15 tokens per second on a modern 8-core CPU. Usable for overnight batch processing or low-frequency automation.

Cost Comparison

Generating 1,000 articles at 800 words each costs approximately $0.96 with GPT-4o-mini. With a local Ollama setup amortized over one year of hardware cost, the per-article cost approaches zero. The crossover point where local inference is cheaper than API calls is approximately 200,000 tokens per month for mid-range GPU hardware.

The practical recommendation: use cloud APIs during development and low-volume testing. Switch to local inference once your pipeline is stable and volume justifies the hardware. The migration is straightforward as the Ollama API is a drop-in replacement for the OpenAI API format.

This publication is built on an AI-assisted content system that Crescevo deploys for B2B tech companies. If your team needs owned media that generates qualified pipeline — see the stack →
AI tools and capabilities change rapidly. Information may be outdated. Not a recommendation to deploy any AI system. Full disclaimer →