How to Scale to 1 Million Devices

Published by

Leandro Machado

October 16, 2024

Scaling USP to 1 Million devices

Achieving scalability to support a million devices requires robust load and stress testing. For the Oktopus Controller, we utilize the OBUSPA project to simulate USP devices, conducting tests to understand performance limits and introducing necessary safeguards against bursts of device connections.

From these tests, our team has developed several architectural enhancements which will be discussed later on this text. At this article you will find in-depth information on the USP Controller's configuration and the methodologies behind our stress tests.

Two major cloud providers were used: Google Cloud Platform and AWS, with resources deployed across multiple regions. We utilized Terraform Language to spin up instances running OBUSPA containers. This setup allowed precise control over container deployment rates, offering valuable insights into system behavior and scalability.

The Oktopus architecture comprises Golang-based microservices interconnected through the NATS messaging system. Each Message Transfer Protocol (MTP) module includes a translator that converts USP messages from protocols like MQTT, STOMP, and WebSockets to NATS format. This modular approach facilitates the creation of scalable microservices that can handle device data independently of their connection type.

Communication between the system's Core and Edge are abstracted and independent. Managing millions of connections without downtime is a key challenge. To address this issue Oktopus relies on multiple MTPs spreaded across regions near to the device: what we call the Edge Layer. Each device has a MTP redundant connection, so even if one server goes down for some reason, there’s still another connection path available.

It is still possible to maintain the connection with other controllers, creating a parallel connection with the device, without compromising performance and without being tied to a single ecosystem, as can be seen in the image below.

For those looking beyond, Oktopus offers the Enterprise plan, where the focus is on guaranteeing delivery, redundancy, security, and reliability. Our use of design patterns ensures multi-site and multi-tenancy support and configuration, immediately adapting to errors by rerouting to alternative geographical locations if a data center goes offline.

In the next example you will find another scenario, with some possible failure points. Since here is only one Oktopus core, all traffic is concentrated in just one main instance.

For optimal reliability, deploying multiple site cores is crucial. A distributed architecture mitigates failure points and ensures continuous device communication, even during disruptions. The next illustration shows the benefits of this architecture:

Devices can connect through multiple MTPs across different protocols, providing resilience. With these multiple connections, we have diverse possibilities. Example: If a port is accidentally blocked, alternative protocols are available. If the connection between Edge and Core fails, traffic is rerouted to the nearest operational cluster. These strategies are examples of how we optimize Oktopus enterprise-grade reliability solutions.

In the end, what are the system's resource requirements?

Following extensive testing and optimizations, our performance evaluation is segmented into Core and Edge resource usage:

Edge Layer: Charged with maintaining device connections, each using roughly 40 kBytes of memory. For every 10 connections per second, CPU usage approximates 15% of 1 vCPU, so more connections per second more CPU usage. In case of network failures, packets are rerouted to other clusters or retained until reconnection.

Core Layer: Implemented with Kubernetes for automated scalability based on demand, each microservice handles its responsibilities individually, allowing precise scaling per activity level, therefore, resource usage depends on the number of system operators, the quantity and types of devices, and the specific integrations that are present. For on-premises setups, we use K3S, and for cloud deployments, we use EKS, each offering distinct benefits.

GET STARTED

Take control of your
network today

The world’s most widely used open-source USP Controller and CWMP Auto Configuration Server, with enterprise-class features, services and premium support.

Get Started

Book a demo

How to Scale to 1 Million Devices

Scaling USP to 1 Million devices

In the end, what are the system's resource requirements?

Take control of yournetwork today

Take control of your
network today