🚀 Bindplane's first Launch Week goes live on June 2nd! New features launching all week.Explore now
Launch Week

Scaling Observability: How We Designed Bindplane to Manage 1,000,000 OpenTelemetry Collectors

Adnan Rahic
Adnan Rahic
Share:

Join the live stream at 11 am ET, here.

Platform teams tend to start with just one, or in some cases a handful of OpenTelemetry (OTel) Collectors usually running in gateway mode. They then embrace the benefit of a vendor-neutral, standardized, telemetry collector for unified logs, metrics, and traces.

Then—almost without warning—once they migrate away from vendor-specific agents, find themselves running thousands of agent-mode OTel collectors across containerized workloads in Docker and Kubernetes, and VMs. Herding such a fleet and keeping your sanity quickly becomes a nightmare.

We built Bindplane to streamline remote management of thousands of collectors, making config rollouts as simple as a button click. This year, we set—and reached—a new milestone: Bindplane now supports managing 1 million collectors. Safe to say, you can scale without constraints.

1 million OTel collectors managed by Bindplane

Where the 1 million number comes from

“Manage up to 1 million collectors” isn’t a vanity metric or fancy headline. We can back it up with field data and stress testing. One enterprise, Loblaw, runs more than 20,000 on its own. Our Platform team has stress-tested the control plane to one million concurrent collectors and verified it performs without issue. This isn’t hopeful marketing, it’s real-world scale.

How we make that scale practical

Scale shouldn’t add friction, and with Bindplane it doesn’t. Rollouts start Incremental—3 collectors, then exponentially larger waves—Enterprise users can switch to Progressive mode to canary on 5% (or any tag) before widening the rollout.

Each collector maintains an mTLS-secured OpAMP WebSocket connection to Bindplane, so the updated config.yaml streams down to the collector, hot-loads in memory, and confirms back. No SSH sessions, no Helm commands.

Since Bindplane tracks which version every collector is running, you can see errors if a collector fails to apply a config. Because changes are staged, a mis-configuration is isolated to only the first batch of collectors, making it easy to fix.

The result? You can push a change to a million collectors as effortlessly as to a hundred!

Why 1 million matters long before you hit it

Think your 100-node cluster is safe? Add test, staging, and disaster recovery, and you’re closing in on 10,000 collectors. Growth happens in bursts, usually after leadership sees a latency heat-map and asks, “Can we get this everywhere?” With 1 million in headroom, your answer is always “Yes.”

Looking ahead

We imagine a world where every workload emits telemetry by default, and scaling collectors is a given. Managing a million collectors is today’s benchmark; the two million-collector milestone is already on our whiteboard. If you’d like to push the limits, let us show you how Bindplane turns managing thousands of collectors into a fun afternoon activity.

Ready to try it? Spin up a free instance of Bindplane Cloud and hit the ground running right away.

Adnan Rahic
Adnan Rahic
Share:

Related posts

All posts

Get our latest content
in your inbox every week

By subscribing to our Newsletter, you agreed to our Privacy Notice

Community Engagement

Join the Community

Become a part of our thriving community, where you can connect with like-minded individuals, collaborate on projects, and grow together.

Ready to Get Started

Deploy in under 20 minutes with our one line installation script and start configuring your pipelines.

Try it now