Supercharging Campaigns: How We Hit 100x Growth using Campaign Orchestrator

Every day, Spocto X is transforming the way businesses handle debt collection, using AI and machine learning to automate smarter strategies. It powers millions of customer interactions, blending accuracy with scalability to deliver results that matter. At the core of Spocto X‘s success is the Spocto X Campaign Orchestrator, the nerve center that ensures every communication is timely, targeted, and effective. 

Using the campaign orchestrator

This powerful tool coordinates communication across a wide range of digital channels—SMS, IVR, WhatsApp, Email—and non-digital touchpoints like telecalling, field collections, and legal strategies. With AI-guided recommendations, the Orchestrator fine-tunes every detail: from the ideal timing and preferred communication channels to the optimal schedules for reaching each customer. It even crafts personalized message templates, queues them for delivery, and continually monitors customer responses. Based on this feedback, it recalibrates future strategies, ensuring that every interaction is more precise than the last. This is the kind of intelligent automation that drives results with remarkable accuracy and effectiveness.

How does it work?

Lenders, including banks and NBFCs, provide a list of accounts requiring communication—“the allocation”. AI-generated campaign plans are tailored to each allocation and shared with the orchestrator. For every customer account, a dedicated workflow is created. Each workflow meticulously follows the campaign plan, ensuring timely and effective communication.

Challenges we faced

Managing millions of accounts and billions of communications daily isn’t just a task; it’s a test of endurance and precision. The system faces stringent demands such as:

  • Load: Handling upwards of 10 million customer accounts and upwards of a billion communications daily.
  • Timing constraints: Communications must be sent with a maximum delay of  10-minutes threshold.
  • Data Security: PII data must be encrypted at rest and in transit.
  • Multi-Tenancy: Sharing compute across multiple lenders
  • Cost: Infrastructure costs to remain within specified bounds
  • Reliability: Resilience, fault tolerance, regulatory compliance, observability, and auto-healing capabilities.


Using a workflow orchestration platform to streamline operations

We designed our orchestration system on  Temporal—a popular workflow orchestration platform.


Why Temporal for campaign orchestration?

Built for today’s fast-paced, complex systems, Temporal offers a powerful orchestration platform designed for fault tolerance, high availability, and seamless scalability. Backed by a thriving, active community, it ensures continuous innovation and support, helping teams stay ahead of evolving demands. With deep observability, Temporal gives you unparalleled visibility into your workflows, enabling you to track each step and make real-time adjustments. It excels at handling concurrent workflows, executing multiple tasks simultaneously without skipping a beat. Whether it’s managing time-sensitive operations or coordinating complex processes, Temporal delivers precision and reliability, ensuring that everything runs smoothly, even as the challenges scale.

How Temporal works

Temporal workflows are built to store their own data and execute specific tasks (known as “activities”) within defined timeframes. Dedicated “workers” carry out these activities, pulling tasks from Temporal’s managed task queue for execution. This architecture ensures precision, scalability, and fault tolerance at every step.

How it all fits together

  1. Workflows: The core unit of execution, representing long-running, durable, and stateful logic.
  2. Activities: Modular units of work executed by workers—these can be retried, timed out, or canceled based on requirements.
  3. Task Queues: A queue managed by Temporal where activities and workflows are scheduled for execution.
  4. Workers: Independent executors that pick up tasks from the task queue and execute them.
  5. Temporal Server: The brains of the operation—managing state, durability, and task scheduling. It ensures that workflows and activities are executed reliably, even in the face of failures.

With this architecture, we built Spocto X’s orchestrator to be  a fault-tolerant, scalable, and highly observable system.

The first steps and the early challenges

We started with a self-hosted Temporal setup. Each workflow mirrored a customer account from the lender’s loanbook, orchestrating communications as per the campaign plan. This approach worked flawlessly at low workloads when handling tens of thousands of accounts. But when we hit 50,000 workflows, cracks began to show—communications were delayed, breaching the acceptable 10-minute threshold.

Breaking through the first bottleneck

We optimized Temporal parameters and doubled the Kubernetes worker pods from 5 to 10. The bottleneck vanished, and the system stabilized… for a while. Then, as we crossed 150,000 parallel workflows, slowness returned, pushing us back to the drawing board.

Digging deeper, we discovered the root cause: our Temporal server relied on a PostgreSQL RDS cluster without sharding capabilities. To make matters worse, this RDS cluster was shared with other applications, compounding the problem.

To address scalability, we moved to a cloud-managed Temporal solution. This shift resolved our sharding woes and helped us scale to 3-3.5 lakh workflows. But as the number of workflows increased, the system struggled again and we observed a high latency in execution of the workflows due to the bursty nature of the scheduled communications.

Taming the bursts: navigating unpredictable challenges

Since the AI recommendation system generated communication schedules which were clustered around specific times in a day, there were a lot of spikes to handle the communications to be sent during these “peak” clusters. The large volume of activities in the workflow prevented on time execution.

Spreading the Load

We introduced a jitter; distributing workflow execution across a 15-minute window. This simple change evened the activity load and reduced latency.

Focused Worker Pods

Next, we restructured the responsibilities of the worker pods. Instead of juggling multiple tasks, we dedicated specific pods solely to Temporal workflows (Single Responsibility Principle). This reduced overhead and improved efficiency.

Scaling Up

We allowed worker pods to scale up to a maximum of 80 pods, which allowed workflows to execute on time. However, we still observed that the CPU and memory utilization was consistently high, the system was at its limits—and so were our infrastructure costs.

Moreover, relying on autoscaling based solely on CPU and memory just wasn’t cutting it. Despite scaling efforts, the pods stayed maxed out at full capacity, pushing costs higher and higher without delivering the efficiency needed.

Smarter Scaling

The key point was in choosing the right metric to scale up or down. We identified a temporal metric:  workflow schedule-to-start latency. In addition we adjusted thread counts per worker, number of pollers, and cache sizes, which eventually brought down overall utilization. This adjustment allowed us to scale down to just 10 pods during non-peak hours, drastically cutting costs without compromising performance.

With smarter scaling and jitter time in place, we achieved a million concurrent workflows. But new challenges arose—storage costs and inefficiencies in workflow termination.

Optimizing Storage

While using a cloud hosted Temporal, one factor of the cost is: Active storage. We realized that it was inflated  due to redundant notification templates. By moving these templates to an independent DB and fetching them on demand, we reduced storage costs significantly. Compression of payloads stored in Temporal, further trimmed the storage footprint.

Batch Processing

Previously, communications were sent as individual activities, driving up Temporal costs. Implementing batch processing allowed us to group communications, slashing activity overhead and boosting efficiency.

Scaling to new orchestration heights

These optimizations propelled us to handle 3 million workflows and deliver 10 million + communications daily. The system is now multitenant, fault-tolerant, and cost-efficient and there is a lot more room to scale now.

What’s next?

The journey doesn’t stop here. We’re exploring ways to further reduce active storage by implementing checkpoint saves and transitioning to retained storage. Consolidating activities and rethinking workflow creation strategies should result in greater efficiency. With further enhancements in our AI-generated campaign plans, the future looks promising.

Building SpoctoX’s orchestrator has been a great learning and an exhilarating experience , enabling us to achieve communication at scale with the right levels of observability and compliance. Through relentless optimization, we’ve turned challenges into opportunities, building a system that not only scales but does so cost-effectively.



About the Author:

Karthikeyan Seethapathy is a Senior Software Engineer at Spocto X – a Yubi Company, with over 6 years of experience in software development. He is passionate about solving complex, real-time challenges and building scalable solutions. Throughout his career, Karthikeyan has gained valuable expertise across multiple industries, including SAAS, ITES, logistics, and fintech. Specializing in Java SpringBoot applications, he has played a pivotal role at Spocto X, where he developed applications to plan and execute omni-channel campaigns, currently handling 5-10 million customer accounts. Additionally, Karthikeyan has successfully carried out code optimizations and critical security fixes, helping the company maintain its competitive edge in the industry.


Read Next