The n8n + PostgreSQL Integration Nobody Talks About

Written by elizabethsramek | Published 2026/04/02
Tech Story Tags: postgresql | automation | n8n | software-development | data | salesforce | workflow-automation | engineering

TLDRN8n-to-Postgres workflows can lead to problems in production. Most people use 'INSERT' when they should be using 'ON CONFLICT' The root cause is simple: most people use `INSERT` instead of 'ON'via the TL;DR App

I've been building data sync workflows for B2B companies for the better part of three years. The stack I keep coming back to is n8n piping data into PostgreSQL. Not because it's trendy.

Because it works at 50 records and at 500,000 records, and because when it breaks at 3 AM, I can actually debug it.

But here's what bothers me: almost every tutorial I find online about n8n + Postgres covers the happy path. Connect the nodes, insert some rows, screenshot the green checkmarks, publish the blog post. Nobody talks about what happens the second time your workflow runs. Or the tenth time. Or what happens when Salesforce hands you 8,000 records but only 2,000 at a time and your workflow doesn't know how to ask for the rest.

This article covers the three problems that will bite you in production and how I solve each one.

Problem 1: The Duplicate Record Disaster

Every n8n-to-Postgres workflow works perfectly once. You query an API, you get records, you insert them into Postgres, you feel great about yourself.

Then the scheduled trigger fires twelve hours later. The workflow runs again. It fetches the same records — plus any new ones — and inserts all of them. Your accounts table now has every record twice. Your reporting dashboard shows double the pipeline. Someone notices three days later when the quarterly numbers don't make sense.

The root cause is simple: most people use INSERT when they should be using INSERT ... ON CONFLICT.

The Fix: PostgreSQL Upserts

PostgreSQL's upsert mechanism is atomic. One SQL statement that either inserts a new row or updates an existing one if a conflict occurs on a unique constraint.

INSERT INTO accounts (
    salesforce_id,
    name,
    industry,
    annual_revenue,
    last_modified_date
)
VALUES (
    '0013600001h8F3EAAU',
    'Acme Corporation',
    'Technology',
    15000000.00,
    '2026-02-20T14:32:00Z'
)
ON CONFLICT (salesforce_id) 
DO UPDATE SET
    name = EXCLUDED.name,
    industry = EXCLUDED.industry,
    annual_revenue = EXCLUDED.annual_revenue,
    last_modified_date = EXCLUDED.last_modified_date;

The EXCLUDED keyword refers to the values you tried to insert. If a row with that salesforce_id already exists, Postgres updates it instead of creating a duplicate.

The critical prerequisite: your table needs a UNIQUE constraint on the field you're matching against. Without it, Postgres throws a confusing error about "no unique constraint matching the ON CONFLICT specification."

CREATE TABLE accounts (
    id SERIAL PRIMARY KEY,
    salesforce_id VARCHAR(18) UNIQUE NOT NULL,
    name VARCHAR(255),
    industry VARCHAR(100),
    annual_revenue NUMERIC(15,2),
    last_modified_date TIMESTAMP,
    synced_at TIMESTAMP DEFAULT NOW()
);

Why Not Conflict on the Postgres id?

I see this mistake constantly. Your table has an auto-incrementing id column. Seems like the obvious conflict target, right?

Wrong. Postgres assigns that id. Salesforce doesn't know it exists. When you sync an Account for the first time, Postgres generates id = 1. When you sync the same Account twelve hours later, there's no id in the incoming data — Postgres assigns id = 4801. No conflict detected. Duplicate created.

The conflict target must be a value that remains stable across syncs and is guaranteed unique in the source system. For Salesforce, that's always the 18-character record ID. For HubSpot, it's the vid or objectId. For Stripe, it's the object ID prefixed with cus_, sub_, etc.

Whatever system you're syncing from, identify the immutable external identifier and use that as your conflict target. Everything else falls apart without this.

Problem 2: The Salesforce Pagination Wall

Salesforce's REST API returns a maximum of 2,000 records per query. Not configurable. If your SOQL query matches 8,000 Accounts, you get 2,000 records and a nextRecordsUrl pointing to the next batch.

The response looks like this:

{
  "totalSize": 8247,
  "done": false,
  "nextRecordsUrl": "/services/data/v60.0/query/01gRM000002VLfEYAW-2000",
  "records": [ ... ]
}

The done flag is false as long as more pages exist. Most n8n tutorials ignore this entirely. They work great on dev orgs with 200 test records. They silently drop 6,000 records in production and nobody notices until the data team asks why the numbers don't match.

The Fix: Manual Pagination Loop in n8n

n8n's built-in Salesforce node handles pagination automatically for standard "Get All" operations. But if you're running custom SOQL via the HTTP Request node — which you'll need for filtered queries, joins, or custom objects — you're building the pagination loop yourself.

The node sequence I use:

[Schedule Trigger] → [Set: Init Variables] → [HTTP Request: Salesforce] → [Code: Extract + Check] → [IF: More Pages?] → [Loop Back or Proceed] → [Postgres Upsert]

The Code node after the HTTP Request does the heavy lifting:

const response = $input.first().json;
const records = response.records || [];
const done = response.done;
const nextUrl = done 
  ? null 
  : `https://yourinstance.salesforce.com${response.nextRecordsUrl}`;

return [{
  json: {
    records: records,
    nextUrl: nextUrl,
    done: done,
    pageCount: ($json.pageCount || 0) + 1
  }
}];

The IF node checks done === false. True branch loops back to the HTTP Request node, passing nextUrl forward. False branch proceeds to the Postgres upsert.

One warning: this pattern accumulates records in memory. Works fine up to around 50,000 records. Beyond that, you need to process and upsert each page as it arrives rather than accumulating everything first. I switch to a "process-per-page" pattern at that threshold — each page gets upserted immediately and the records are released from memory before the next page loads.

Problem 3: Schema Mapping That Actually Survives

Salesforce fields and Postgres columns rarely map cleanly. Salesforce sends AnnualRevenue as a float (or null). Postgres expects NUMERIC(15,2). Salesforce sends LastModifiedDate as an ISO string. Postgres wants a proper TIMESTAMP. Salesforce picklist values can contain characters that break your SQL if you're not parameterizing queries.

The two mistakes I see most often:

Mistake 1: Not handling nulls. Salesforce returns null for empty fields. If your Postgres column is NOT NULL and you try to insert a Salesforce record where AnnualRevenue is null, the entire batch fails. Either make the column nullable or provide a default value in your Code node.

Mistake 2: String concatenation instead of parameterized queries. Building SQL strings by concatenating field values is a recipe for broken syncs and SQL injection vulnerabilities. Always use parameterized queries. In n8n's Postgres node, use the "Query Parameters" field. In a Code node executing raw SQL through pg-promise or similar, use $1, $2 placeholders.

A Practical Schema Mapping Pattern

I add a Code node between the Salesforce HTTP Request and the Postgres upsert. This node transforms each record into a clean, typed object:

const records = $input.first().json.records;

return records.map(record => ({
  json: {
    salesforce_id: record.Id,
    name: (record.Name || '').substring(0, 255),
    industry: record.Industry || null,
    annual_revenue: record.AnnualRevenue 
      ? parseFloat(record.AnnualRevenue) 
      : null,
    last_modified: record.LastModifiedDate || null
  }
}));

This handles null values, enforces string length limits, and converts types explicitly. It's boring code. It prevents hours of debugging.

The Pattern That Ties It All Together

Here's the complete node sequence for a production-grade Salesforce-to-Postgres sync in n8n:

[Schedule Trigger (every 6 hours)]
  → [Set: Initialize pagination variables]
  → [HTTP Request: Salesforce SOQL query]
  → [Code: Extract records, check pagination]
  → [IF: done === false?]
      ├─ TRUE → loop back to HTTP Request with nextUrl
      └─ FALSE → continue
  → [Code: Schema mapping & type conversion]
  → [SplitInBatches: 500 records per batch]
  → [Postgres: Execute upsert query with parameters]
  → [Code: Log sync stats (records processed, errors)]

A few things to note:

SplitInBatches is critical. Postgres can handle large inserts, but sending 8,000 individual INSERT statements is slow and can timeout. Batching at 500 records keeps the transaction size manageable and gives you granular error reporting per batch.

The 6-hour schedule is deliberate. Salesforce API limits depend on your edition, and every API call counts. A query that paginates across 4 pages consumes 4 API calls. Add the OAuth token refresh and you're at 5 per sync. Running every 6 hours means 20 API calls per day for this single workflow. That leaves plenty of headroom for other integrations.

Log everything. I add a final Code node that captures: how many records were fetched, how many were inserted vs. updated, how many errors occurred, and the total execution time. This goes into a sync_log table in Postgres. When something goes wrong (and it will), this table is the first place I look.

Where This Gets More Interesting

This article covers the core pattern. But production environments introduce complications that go beyond what I can cover here: composite keys for junction objects, incremental sync using LastModifiedDate windows to avoid re-processing unchanged records, handling Salesforce's SOQL query length limits, and managing API rate limits when you're running multiple workflows against the same org.

I wrote a more detailed breakdown of the complete upsert blueprint for Salesforce-to-PostgreSQL via n8n that covers composite keys, junction objects, and the edge cases you'll hit at scale. If you're building this for production, that's worth reading before you ship.

Why This Stack Over Alternatives

People ask me why I don't just use Fivetran, Airbyte, or a managed ETL service. Fair question. For companies with a data team and a budget for another SaaS subscription, those tools are fine. They abstract away pagination, schema mapping, and upsert logic so you don't think about it.

But I work with B2B companies that have 3-10 person ops teams. They don't have a data engineer. They can't justify $500/month for a connector they'll use for one integration. They need something they can inspect, modify, and debug without calling vendor support.

n8n + Postgres gives them that. The workflow is visible. The SQL is readable. When it breaks, someone on the team can open the execution log, see exactly which node failed and why, and fix it without waiting for a support ticket response. That operational transparency is worth more than the convenience of managed tooling.

The pattern isn't sexy. It's a scheduled trigger, a pagination loop, a schema mapper, and an upsert query. But it runs, it doesn't create duplicates, and it handles 500,000 records without drama. That's all I ask of infrastructure.


Written by elizabethsramek | Elizabeth Sramek is a B2B strategy advisor and entrepreneur, building AI-first strategies that outperform, and outlast.
Published by HackerNoon on 2026/04/02