Skip to main content

Scalable Microservices Architecture: Transforming Enterprise Applications

  1. Home
  2. /Success Story

We Decomposed a Monolithic Layer Processing 12,000+ Daily Transactions into Seven Independent Services — With 0 Downtime

By Brainstack Technologies•Microservices Architecture•2024
Microservices architecture migration from monolithic ERP integration

A B2B distribution company processing roughly 12,000 orders per day through an integration layer that connected their ERP (SAP Business One), warehouse management system, three logistics partners, and a payment gateway had reached a breaking point. The integration layer — a single Node.js monolith that had grown to around 180,000 lines of code over four years — was deployed as one unit. Every deployment was a full-system event that required a 45-minute maintenance window, typically scheduled for Sunday nights. A bug in the logistics rate-calculation module in February had taken down payment processing for three hours because both flows shared the same runtime.

Brainstack Technologies led a seven-month migration that decomposed this monolith into seven independently deployable microservices — without requiring any system downtime and without disrupting the 12,000+ orders flowing through the pipeline daily.

Project Overview

ClientA B2B distribution company (name withheld under NDA)
IndustryWholesale Distribution & Logistics
Scale~12,000 daily order transactions, 180K LOC monolith, 3 logistics partners, 1 payment gateway
Engagement Duration7 months (6 weeks architecture & planning, 5 months phased extraction)
Team3 backend engineers, 1 DevOps/infrastructure engineer, 1 architect (part-time), 1 QA engineer
ChallengeA four-year-old Node.js monolith handling all integration flows as a single deployable unit — creating cascading failure risk, 45-minute deployment windows, and inability to scale individual flows independently
SolutionStrangler fig migration extracting seven domain-aligned microservices behind a Kong API gateway, deployed on Kubernetes with independent CI/CD pipelines per service

The Challenge

The integration layer had started life five years earlier as a straightforward Node.js application that connected SAP Business One to a single logistics provider via REST APIs. Over four years, as the company onboarded two additional logistics partners (one using SFTP file exchange, one using SOAP APIs), added a payment gateway integration, and built inventory synchronization with their warehouse management system, the codebase grew to approximately 180,000 lines — all in a single deployable unit.

By the time we were brought in, the problems were compounding:

The February outage. A rate-calculation change introduced a memory leak that exhausted the Node.js heap within two hours. Because all flows — including payment processing — ran in the same process, the leak took down the entire system. Orders couldn't be processed for three hours. Post-mortem estimated the cost at roughly $85,000 in delayed shipments and penalty fees.
Sunday-night maintenance windows. Every release required redeploying the entire monolith — a 45-minute window with no orders processed. The ops team had to coordinate with logistics partners to pause inbound feeds, requiring 72-hour advance notice. Minor fixes became a multi-day process.
All-or-nothing scaling. During peak seasons, order volume tripled. Payment processing needed to scale, but shared resources with logistics and inventory meant provisioning three times the infrastructure across all flows — most of which didn't need the extra capacity.
Developer velocity stalled. Four developers in the same codebase meant constant merge conflicts. A change to the payment flow required regression testing against logistics and inventory integrations. The team spent roughly 30% of their time on integration testing that only existed because of architectural coupling.
Monolithic ERP integration architecture with scaling bottlenecks highlighted
Service boundary analysis mapping integration domains for decomposition

Our Approach

Strangler Fig Migration Strategy

We ruled out a big-bang rewrite immediately. The integration layer was processing 12,000+ orders per day; it couldn't go offline, and the business couldn't tolerate running a new untested system in parallel for months. Instead, we used a strangler fig approach: extracting one service at a time from the monolith while the remaining monolith continued to handle everything else.

The first six weeks were spent on architecture and decomposition planning. We analyzed the monolith's code, database schema, and runtime call patterns (we instrumented the monolith with OpenTelemetry for two weeks to capture actual request flows, not just what the code suggested). This analysis revealed seven natural domain boundaries:

  1. Order Ingestion — receiving and validating incoming orders from the ERP
  2. Payment Processing — gateway communication, authorization, settlement
  3. Logistics: Partner A — REST-based carrier integration
  4. Logistics: Partner B — SFTP file exchange (daily batch)
  5. Logistics: Partner C — SOAP API integration
  6. Inventory Sync — bidirectional sync with the warehouse management system
  7. Notification & Alerting — order confirmations, shipment tracking, failure alerts

We intentionally split logistics into three separate services rather than one unified "logistics service." The three partners used fundamentally different protocols (REST, SFTP, SOAP), had different SLA requirements, and changed at different rates. Combining them into a single service would have recreated the coupling problem at a smaller scale.

The extraction order was deliberate: we started with Notification & Alerting (lowest risk, no transactional data, easiest to validate) and ended with Payment Processing (highest risk, regulatory requirements, most complex error handling). This gave the team progressively harder challenges rather than starting with the most dangerous one.

API Gateway and Service Communication

We deployed Kong as the API gateway in front of both the monolith and the emerging services. During migration, Kong handled the routing logic: requests for extracted domains (e.g., /notifications/*) were routed to the new service, while everything else continued to hit the monolith. As each service was extracted, we updated Kong's routing configuration — no code changes to the monolith required for the switchover.

For inter-service communication, we used two patterns based on the consistency requirements of each flow:

Synchronous REST for the order-payment flow. When an order comes in, payment authorization must happen immediately and return a success/failure before the order is confirmed. This is a hard consistency requirement — eventual consistency is not acceptable for payment authorization. These calls go through internal REST APIs with circuit breakers (we used Opossum in Node.js) to prevent cascading failures.

Asynchronous messaging via RabbitMQ for everything else. Inventory updates, logistics dispatch notifications, and alerting all use event-driven messaging. When an order is confirmed, an "order.confirmed" event is published to RabbitMQ, and the relevant services consume it independently. If the notification service is temporarily down, the message waits in the queue — the order isn't affected.

The hardest communication problem was the logistics batch service (Partner B). This partner expected a single consolidated SFTP file every four hours, but orders trickled in continuously. We built a small aggregation service that consumed individual order events from RabbitMQ, batched them into 4-hour windows, generated the SFTP file in the partner's expected format, and uploaded it on schedule. This service was arguably the most custom piece of the entire architecture.

Containerization and Orchestration

Each of the seven services was containerized with Docker and deployed on a managed Kubernetes cluster (AWS EKS). We set up independent CI/CD pipelines using GitHub Actions — each service has its own repository, its own test suite, its own pipeline, and can be deployed to staging or production independently.

Deployment went from a 45-minute Sunday-night maintenance window to a rolling update that completes in under 4 minutes per service with zero downtime (Kubernetes rolling deployment strategy with readiness probes). The team now deploys individual services 8-12 times per week across the seven services combined, compared to the previous cadence of once per week for the entire monolith.

For the payment service specifically, we configured more conservative deployment guardrails: canary deployments that route 5% of payment traffic to the new version for 10 minutes before proceeding, automatic rollback if the error rate exceeds 0.5%, and a mandatory staging environment test against the payment gateway's sandbox before production deployment. The February outage had made leadership understandably cautious about payment-related changes.

Technology Stack

Service Layer
  • Node.js (Express) — Order Ingestion, Payment Processing, and three Logistics services. Node was the monolith's original language, so most extraction was straightforward.
  • Python (FastAPI) — Inventory Sync service. The warehouse management system's SDK was Python-only, so this service was written in Python from scratch rather than wrapping the SDK in a Node.js child process.
Communication & Routing
  • Kong API Gateway — chosen over AWS API Gateway because Kong allowed us to run the same gateway configuration in local development and production, simplifying the dev workflow. The routing rules that split traffic between monolith and services were managed as code in Kong's declarative config.
  • RabbitMQ — chosen over Kafka because message throughput (~12K orders/day) didn't justify Kafka's operational complexity. RabbitMQ's simpler queue model was a better fit for the event patterns we needed.
Data
  • PostgreSQL 15 — each service owns its own database schema. We enforced schema-per-service at the PostgreSQL level to prevent accidental cross-service queries.
Infrastructure
  • AWS EKS (Kubernetes), Docker, GitHub Actions for CI/CD (one pipeline per service)
Observability
  • OpenTelemetry for distributed tracing, ELK Stack for centralized logging, Prometheus + Grafana for metrics and SLO dashboards

Results

Deployment Speed

Deploy time dropped from 45 minutes (full monolith) to under 4 minutes per service (rolling update, zero downtime). Maintenance windows eliminated entirely.

Deployment Frequency

From 1 deployment per week (Sunday night, coordinated) to 8-12 deployments per week across services. Individual services updated 1-3 times per week.

Incident Blast Radius

Before: a bug in any module could take down all 12,000+ daily transactions. After: failures isolated to the affected service. Payment service had two incidents — neither affected logistics or inventory.

Scaling Efficiency

First post-migration peak: Payment and Order Ingestion scaled to 3x while five services stayed at baseline — ~60% savings on peak-season infrastructure vs. scaling the entire monolith.

New Integration Speed

Onboarding a fourth logistics partner took 3 weeks post-migration. Pre-migration estimate: 8-10 weeks due to regression testing and deployment coordination.

Developer Productivity

Merge conflicts dropped substantially. Integration-related overhead down from ~30% to ~10% of developer time.

Observability & Monitoring

We established the observability stack before extracting the first service — this was one of the most valuable decisions in the project. By instrumenting the monolith with OpenTelemetry first, the team could see request flows across the monolith's internal modules. When we started extracting services, the tracing data simply reflected the new boundaries without any additional instrumentation work.

The observability stack includes:

Distributed tracing (OpenTelemetry + Jaeger): Every request that enters the system gets a trace ID that follows it across all seven services. When the ops team investigates a slow order, they can see exactly which service introduced the latency — including the time spent waiting for external partner APIs that Brainstack doesn't control.

Centralized logging (ELK Stack): All service logs ship to a shared Elasticsearch cluster, correlated by trace ID. Searching for a specific order ID returns logs from every service that touched that order, in chronological order.

SLO dashboards (Prometheus + Grafana): Each service has defined SLOs — for example, the Payment Processing service targets a p99 latency under 800ms and an error rate below 0.1%. When a service approaches its error budget, the dashboard alerts the on-call engineer before users are affected. The team reviews SLO compliance weekly.

The observability investment paid for itself within the first month: the team's mean time to diagnose production issues dropped from roughly 2 hours (grepping through monolith logs) to about 15 minutes (tracing the request flow visually in Jaeger).

Observability dashboard with distributed tracing, latency, error rate, and throughput
CI/CD pipeline from code commit through tests and deploy to Kubernetes
Deployment pipeline with independent service rollout and rollback controls
Distributed tracing dashboard showing request flows across microservices

Key Engineering Lessons

01

Instrument before you extract. We deployed OpenTelemetry on the monolith two weeks before extracting the first service. This gave the team baseline visibility into request flows, latency, and error rates — which meant we could immediately compare a newly extracted service's performance against its monolith-era baseline. Without this, we would have been flying blind on whether each extraction improved or degraded performance.

02

Split logistics into three services, not one. Our initial architecture proposed a single "Logistics Service" handling all three partners. During planning, we realized that Partner B (SFTP batch) and Partner A (REST real-time) had fundamentally different runtime characteristics, failure modes, and change frequencies. Combining them would have created a mini-monolith. The decision to split saved us from re-coupling the architecture we had just decoupled.

03

The shared database was the real migration bottleneck, not the code. Extracting service code was relatively straightforward. The hard part was untangling shared database tables. The monolith used a single PostgreSQL database where the payment module and the logistics module both read from an "orders" table with 47 columns. We had to decide which service owned which columns, build data synchronization events for cross-service reads, and migrate foreign key relationships — all while 12,000 orders per day continued flowing through the system.

04

Team topology had to change alongside the architecture. The four developers who previously worked on the monolith initially continued reviewing each other's PRs across all services. This recreated the coordination overhead that microservices were supposed to eliminate. We restructured into two pairs, each responsible for a set of services end-to-end (deploy, monitor, fix). Deployment frequency doubled within two weeks of this change.

Tags:MicroservicesMonolith MigrationStrangler Fig PatternKubernetesAPI GatewayDistributed SystemsERP IntegrationDevOps
Share On:

Spending Sunday Nights on Deployments?

If your deployments require maintenance windows, your scaling bills spike because you can't scale individual components, or a bug in one module can take down unrelated flows — the architecture is working against you. We start with a two-week instrumentation and analysis phase to map your actual request flows, identify natural service boundaries, and build a phased migration plan that doesn't require betting the business on a big-bang rewrite.

Discuss Your Architecture

Explore Our Services

  • Web Development
  • Mobile Development
  • Custom Development
  • Software Development
  • API Development
  • Application Modernization

Newest Post

RAG Systems for Enterprise Knowledge Management
01.01.2025
RAG Systems: Architecture Decisions That Matter
Staff Augmentation vs Software Outsourcing
01.01.2025
Staff Augmentation vs. Outsourcing: A Decision Framework
EUDR Compliance Technology Guide
01.01.2025
EUDR Compliance Technology: A Practical Guide
Progressive Web Apps
21.10.2025
Progressive Web Apps: The Future of Web Development

Technologies

  • Cloud
  • DevOps
  • Microservices
  • React
  • Node.js
  • AI / ML
  • Mobile
  • Kubernetes
  • API Design
Contact us sidebar banner

Have a Vision? Let's Talk.

🇮🇳+91-8882177521🇦🇺+61-390057698
Book a consultationFree resourcesEngagement models

More Success Stories

Explore how we've helped other businesses with similar challenges.

Custom Software
development
Custom Software
Mobile Development
mobile
Mobile Development
Data Engineering
data
Data Engineering
Product Engineering
product
Product Engineering
AI & Machine Learning
ai-ml
AI & Machine Learning
Let's Talk

Have a Project In Mind?

Whether you need a dedicated team, a quick consultation, or end-to-end development — we're here to help you ship faster and smarter.

  • Clear delivery ownership
  • Fast onboarding with senior teams
  • Flexible engagement options
Quick ResponseWe respond within 24 hours
NDA FirstYour ideas stay protected
Free ConsultationNo obligation, no hidden costs
Book a 30-Minute Strategy CallFollow on LinkedIn
Brainstack Technologies

Skip the long hiring cycle and move faster with a delivery partner that ships production-ready software across web, data, cloud, and AI.

Get Free Consultation
  • LinkedIn
Our Services
  • Development
  • QA & Testing
  • EUDR Compliance
  • Cloud & DevOps
  • Data & Analytics
  • Product
  • AI/ML
Quick Links
  • About Us
  • Our Core Team
  • Blog
  • Case Studies
  • Engagement Models
  • FAQs
Contacts
+91-8882177521
+61-390057698
info@brainstacktechnologies.com

E 44/3, Pocket D, Okhla Phase II, Okhla Industrial Area, New Delhi, Delhi 110020

6 Jennings Street, Deanside, Melbourne, Victoria 3336, Australia

Copyright ©2026 Brainstack Technologies. All Rights Reserved.
  • Privacy Policy
  • Terms of Use