Three patterns we use for third-party API failures
Nearly every platform we build integrates at least one third-party API. Payments, analytics, email delivery, authentication, logistics. When those APIs fail, which they do, the question is not whether our code is correct. The question is whether the platform still behaves reasonably for the user in front of it.
Here are three patterns that come up in almost every project we work on. They are not novel. They are deliberately boring. The value comes from applying them consistently.
1. Circuit breakers around flaky dependencies
A circuit breaker watches for repeated failures from a dependency and, once a threshold is crossed, stops calling it for a short period. This matters for two reasons.
First, it protects the downstream service from a thundering herd when it is already struggling. Second, and often more important, it protects your own service from exhausting threads, connection pools or memory on calls that are likely to fail.
A practical starting point is three thresholds: the number of consecutive failures that open the breaker, the cooldown period, and the number of successful probes required to close it again. For most integrations we start at five failures, a 30-second cooldown and three successful probes. Tune from there based on real traffic.
2. Exponential backoff with jitter
When a call fails and is worth retrying, you almost always want to wait before trying again. Simple constant-interval retries create a synchronous retry storm that the upstream service will not appreciate.
Exponential backoff doubles the wait time between attempts. Jitter adds a random element so that many concurrent callers do not all retry at the same moment. Without jitter, two clients that failed at the same time will retry at the same time, and the upstream will see the same load pattern.
Keep retry counts low for user-facing code paths. A user waiting for a checkout confirmation does not want to stare at a spinner through five retry attempts. Two attempts is often the right answer. For background jobs you can afford more.
3. Degraded-mode responses
The most common mistake we see is code that assumes a third-party call must succeed. Dashboards that refuse to render if the analytics API is slow. Order pages that show a blank screen if the shipping estimator is unreachable.
In almost every case, there is a degraded-mode response that is better than an error. Cached data from an earlier call. A reasonable default. A message that explains what is missing without failing the whole page. The pattern is to identify, at design time, which parts of the page or response are essential and which are enhancements.
This sounds obvious, but it is an architectural decision, not an implementation detail. It is much harder to retrofit graceful degradation into a platform that was built assuming all calls succeed.
Putting it together
None of these patterns is sufficient on its own. Circuit breakers without degraded-mode responses still show users errors. Exponential backoff without a circuit breaker still hammers a failing service. Degraded-mode responses without retries give up too early.
The combination, applied with a little care and tuned to your actual traffic, makes the difference between a platform that visibly breaks when its dependencies wobble and one that keeps working.
If you are looking at an integration-heavy system and wondering whether it will hold up under real conditions, this is often a useful place to start the conversation.