Contents
Share this article
Key Takeaways
When considering how to build a payment processing system, you need to consider five separate engineering layers working in sequence.
These include:
Most fintech teams that we’ve worked with begin their first payment system by integrating Stripe (or Adyen, or Braintree), calling the charge API, storing the result, and showing the user a confirmation. This can work for a while, but failure usually starts to show up due to a variety of predictable failure modes of a payment processing system built for the happy path and nothing else.
Let’s look at how you can build each of the layers needed for a scalable, reliable payment processing system, in the order data moves through it.
At Trio, we provide companies with experienced fintech developers from regions like LATAM, who are familiar with building production payment processing systems and can be placed in as little as 3-5 days.
Every production payment processing system starts with a payment intent. This is a record created in the database before any PSP call gets made.
The payment intent is ultimately what separates a production system from a gateway wrapper, and skipping it produces a double charge, which is by far one of the most expensive failure modes that we have encountered in payment engineering.
In a gateway wrapper, the flow is that the user clicks Pay, the application calls PSP, PSP responds, and the application stores the result.
The problem appears when the PSP call succeeds, but the response gets lost. Usually, we see this happening because of an issue like a network drop, an application crash, or a basic timeout.
In any of these cases, the application doesn't know whether the payment occurred, so when the user retries a second PSP call goes out, and a duplicate charge occurs.
In an intent-first system, the flow needs to be slightly different to prevent this. The user clicks Pay, the application creates a payment intent record, the application calls PSP with the intent's idempotency key, PSP responds, and finally the application updates the intent record.
If the response gets lost and the user retries, the application retrieves the existing intent, calls the PSP with the same idempotency key, and the PSP returns the result of the first and only processing attempt.
Production-grade idempotency needs three independent enforcement points in order to work efficiently. Here is what that looks like:
While three-layer idempotency can prevent a myriad of issues, unfortunately, there is one failure that you are still at risk of.
In this case, the PSP processes the charge, but the database write fails before it commits.
On retry, the intent doesn't exist in the database, so the system creates a new intent and submits a second PSP call with a different key, which ultimately leads to a second, duplicate charge.
The most common production solution that our engineers implement here is a write-ahead log or outbox record, which is created before the PSP call and deleted only after the database write succeeds.
On startup or after a crash, unresolved write-ahead records trigger a PSP status query ("what happened to this payment?") rather than a new charge attempt.
You need to make sure that the idempotency keys are generated by the client in order to facilitate this, though, not the server, so that the same key travels through every retry.
In the PSP adapter pattern, every payment service provider carries a different API surface, different response format, different error taxonomy, and different behavior under failure.
Stripe returns a 402 Payment Required with a structured error object. Adyen returns 200 OK with a resultCode field. Braintree uses its own object hierarchy.
If you unknowingly build your payment processing logic directly against any one of these, you wind up contaminating the business logic with provider-specific parsing. Adding a second PSP later means that you are going to have to work on your core processing logic.
The production solution we often end up implementing here is a PSP adapter layer that translates provider-specific responses into a canonical internal domain model.
Each PSP gets its own adapter responsible for constructing the provider-specific API request from the canonical payment intent, translating the provider response into a canonical result (status: AUTHORIZED / DECLINED_SOFT / DECLINED_HARD / ERROR, psp_reference, raw_response), and mapping provider error codes to canonical decline classifications.
This means that you can have your orchestration layer only interacting with the canonical model. Adding a new PSP means writing a new adapter, which means that you don’t have to risk touching your core payment processing logic at all.
Related Reading: Boost Approval Rates with Intelligent Payment Routing
The retry strategy for a failed payment is quite complex, since it largely depends on why it failed.
Soft declines are things that can generally be resolved. These are things like insufficient funds at this moment, an issuer requesting 3DS authentication, or a temporary issuer unavailability.
Hard declines, however, won't resolve at any point in time. Common examples include a stolen card, an invalid card number, or a closed account.
You should under no circumstances retry hard declines automatically. Repeated attempts on a stolen card trigger fraud alerts and can get a merchant account flagged or even terminated.
The adapter layer is what helps you map each PSP's error codes to soft versus hard. That classification drives the retry logic in the orchestration layer.
When a PSP starts returning elevated error rates like 5xx responses, timeouts, or even connection failures, then retrying with the same provider often makes things worse.
A circuit breaker monitors PSP health so that, after N consecutive failures within a window, you can route traffic to a secondary PSP for a cooldown period before re-testing the primary.
Multi-PSP routing also enables authorization rate optimization.
You can automatically route specific BIN ranges, currencies, or transaction types to the PSP that historically delivers the best authorization rate for those characteristics, giving it the best chance of success.
Your approval rates will likely only increase by a percentage or two, but across hundreds of thousands of transactions, this can be a massive difference.
A payment doesn't have two states. Instead, it passes through a sequence, each representing a distinct financial and operational condition, with specific actions required and specific transitions permitted at each step.

Each transition carries an explicit and named event, so there are no implicit state changes.
Instead, the state has defined permitted transitions (what states can follow, under what conditions), prohibited transitions (a SETTLED payment cannot transition directly to AUTHORIZED; a REFUNDED payment cannot be re-CAPTURED), and required side effects (a CAPTURED payment must trigger a ledger posting; a REFUND_INITIATED must reduce the available refund amount).
One of the most common production failure modes that we see here happens when a payment reaches AUTHORIZING, the PSP call fires, the response gets lost, and the payment stays stuck in AUTHORIZING indefinitely.
Meanwhile, the PSP has processed the authorization and is waiting for capture. The payment is live at the PSP, but this is entirely invisible to the application.
To fix this, we create a background worker that periodically scans for payments stuck in transitional states beyond a timeout threshold.
For each stuck payment that the worker finds, it queries the PSP status API directly, and the PSP response drives the state transition instead of the original webhook that never arrived.
This pull-based reconciliation treats the PSP as the source of truth for the payment state. It produces eventual consistency even after application crashes, network failures, and lost responses, without requiring the user to re-attempt.
Every payment event that changes the financial state of the system requires a corresponding ledger entry, whose authorization can be drastically delayed.
Capture reduces the hold and increases the ledger balance. Settlement moves funds from pending to settled. Refund and chargeback each generate reversal entries.
To do this correctly, in a way that is going to hold up under regulatory scrutiny, there are two foundational requirements:
PSPs communicate payment events through webhooks. The HTTP POST requests to a configured endpoint when the payment state changes.
Stripe retries webhook delivery for up to 72 hours on failure. Adyen, on the other hand, retries for 24 hours.
What this means for you practically is that the same webhook event will arrive at your endpoint multiple times. Designing your handler for at-most-once delivery produces incorrect refunds, duplicate ledger postings, and duplicate customer notifications at scale.
Processing a webhook event typically triggers downstream effects like a state machine transition, ledger entry, customer notification, or even just an analytics update.
If any one of these fails after others succeed, the system lands in an inconsistent state.
The outbox pattern handles this.
When processing the webhook, you need to write the state transition, the ledger entry, and the outbox events in a single atomic database transaction. A separate outbox processor publishes the events to downstream consumers.
If the outbox write fails, the entire transaction rolls back, and the webhook gets re-delivered and re-processed cleanly. If the outbox write succeeds but publishing fails, the outbox processor retries publishing without needing to re-process the original webhook.
Now that you understand the basic layers of a payment processing system, how can you make sure that those layers are all compliant?
Every component that stores, processes, or transmits cardholder data (the PAN, CVV, or full card expiration) falls within PCI DSS scope. That scope determines the cost and complexity of PCI certification.
The engineering goal is to minimize PCI DSS scope by never touching raw cardholder data inside your own systems.
Instead, you can use PSP-hosted payment pages or JavaScript libraries like Stripe Elements or Adyen Web Drop-in for card data collection.
In using these pages, card data can travel directly from the browser to the PSP's servers so that your application never touches the raw PAN or CVV.
On top of that, we recommend that you use tokenization for all subsequent operations, like capture, refunds, and even recurring billing. The token carries no value outside the PSP relationship and is stored safely in your database.
For card-on-file use cases like the kind used in subscription payments, network tokens issued by Visa and Mastercard follow the card across re-issues and expiry updates.
Authorization rates on recurring payments improve because the token stays valid when the underlying card gets replaced. PCI scope stays minimal because the raw card number never enters your infrastructure.
This architectural choice determines whether your PCI certification runs as SAQ A or SAQ D, which requires a full security assessment across the entire application stack.
Payment processing doesn't end at capture.
Captured funds need to settle through the acquiring bank to the merchant's bank account, and this settlement process introduces timing gaps, multi-party data flows, and reconciliation requirements distinct from real-time transaction processing.
Card network transactions typically settle T+1 or T+2. In other words, funds that were authorized on Monday arrive in the merchant account on Wednesday.
ACH and bank transfer settlements follow NACHA's same-day or standard windows. Instant payments and ISO 20022 rails settle in seconds.
What your production payment system needs to do is model settlement timing explicitly. A captured payment isn't a settled payment, and your ledger needs to track the difference between those two states.
Discrepancies need to surface automatically as well. Letting them accumulate until a month-end manual review means revenue leakage and fraud signals compound undetected for weeks.
Automated reconciliation running daily, immediately after each settlement window closes, catches these discrepancies the same day they appear. The finance team reviews exceptions to the pipeline surfaces; they don't run the pipeline themselves.
The five layers above map to specific architectural components. This represents what most production-grade systems converge on, not the only valid approach.
| Layer | Failure Mode Without It | What the Layer Prevents |
| Payment intent + idempotency | Network timeout causes a retry, and the user is charged twice. | Three-layer idempotency + write-ahead log prevents duplicate PSP submission. |
| PSP abstraction | PSP API change breaks payment logic. Hard decline is treated as soft decline. | Adapter layer isolates PSP-specific parsing. Canonical decline classification drives the correct retry. |
| Payment state machine | PSP confirms authorization; database shows PENDING; payment stuck indefinitely | Pull-based reconciliation resolves stuck states against the PSP source of truth. |
| Ledger (double-entry) | Balance drift: account shows $500, sum of entries shows $497.23. | Atomic double-entry is enforced at the DB transaction layer. A drift becomes mathematically impossible. |
| Webhook idempotency | Stripe retries the webhook 3x over 72 hours. The refund is processed three times. | Event ID dedup table. An idempotent handler returns 200 without re-processing. |
| PCI scope minimization | Internal systems store raw PAN. PCI DSS audit scope expands to the entire application stack. | PSP-hosted tokenization where raw card data never enters internal systems. |
| Settlement reconciliation | Captured transaction settles for the wrong amount, and the discrepancy is undetected for 30 days. | Daily automated matching against the PSP settlement file; exceptions surface the same day. |
Not every layer needs to be built from scratch. Doing so without actually needing to is going to cost a lot of time and money that you could spend on other parts of your financial application.
Modern PSPs and infrastructure platforms handle significant portions of this stack well.
Obviously, it depends on your specific requirements, but we recommend that you let the PSP own PCI compliance for card data capture (use hosted payment pages or JS libraries), card network connectivity, fraud scoring (Stripe Radar, Adyen RevenueProtect), chargeback dispute management, currency conversion, and international payment method support.
You can build other aspects yourself, like payment intent and idempotencies (PSPs don't provide this layer for your internal state), PSP abstraction and adapter patterns (necessary for multi-PSP portability), payment state machines with your business-specific transitions, double-entry ledger integrations, webhook processing with outboxes for downstream events, and the settlement reconciliation pipeline.
Payment orchestration platforms also work really well for teams that want pre-built multi-PSP routing (Spreedly, Primer, Corefy) or hosted reconciliation infrastructure without building the entire stack internally.
These failures that we covered above rarely surface in code review. Instead, we find that they surface in production, often weeks after deployment, when the right combination of network failure, concurrent requests, and PSP retry behavior finally produces the incident.
Engineers who build payment systems correctly have encountered these failure modes before.
At Trio, we place pre-vetted engineers who have built production payment processing systems across ACH, card networks, FedNow, and open banking rails.
Request a consult.
A payment stuck in a transitional state, typically AUTHORIZING or CAPTURING, usually means the PSP call fired and the response was lost before the application could record it. A background worker solves this by periodically scanning for payments stuck in these states beyond a timeout threshold, then querying the PSP status API directly for each one.
The PSP adapter pattern wraps each payment service provider in an adapter layer that translates provider-specific API responses into a canonical internal domain model. Stripe, Adyen, and Braintree are all different, and without an adapter layer, adding a second PSP requires modifying core payment logic.
Settlement reconciliation compares the payment system’s internal capture records against the PSP’s daily settlement file to detect discrepancies that represent revenue leakage, fraud signals, and potentially unreported financial errors. Reconciliation works correctly when it runs as an engineering pipeline.
A soft decline represents a payment failure that may resolve on retry, like insufficient funds at this moment, an issuer requesting 3DS authentication, or temporary issuer unavailability. A hard decline won’t resolve (a stolen card, an invalid card number, or a closed account) and must never be retried automatically.
Idempotency in a payment processing system means that retrying a failed payment request never produces a duplicate charge. It requires a payment intent record created in the database before any PSP call, a client-generated idempotency key submitted with every payment request, and enforcement at the API gateway, payment service, and PSP.
Expertise
Subscribe to our newsletter
Related
Content
Continue Reading