How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture
Do you remember when humans used to write step-by-step tutorials?


Letâs bring that back. Today youâll learn how to configure high availability for the OpenTelemetry Collector so you donât lose telemetry during node failures, rolling upgrades, or traffic spikes. The guide covers both Docker and Kubernetes samples with hands-on demos of configs.
But first, letâs lay some groundwork.
How to define High Availability (HA) with the OpenTelemetry Collector?
You want to ensure telemetry collection and processing works even if individual Collector instances fail. Itâs outlined in three main points:
- Avoid data loss when exporting to a dead observability backend.
- Ensure telemetry continuity during rolling updates or infrastructure failures.
- Enable horizontal scalability for load-balancing traces, logs, and metrics.
To enable high availability itâs recommended that you use the Agent-Gateway deployment pattern. This means:
- Agent Collectors run on every host, container, or node.
- Gateway Collectors are centralized, scalable back-end services receiving telemetry from Agent Collectors.
- Each layer can be scaled independently and horizontally.

Please note, an Agent Collector and Gateway Collector is essentially the same binary. Theyâre completely identical. The ONLY difference is WHERE it is running. Think of it this way.
- An Agent Collector runs close to the workloadâin the context of Kubernetes it could be a sidecar, or a deployment for every namespaceâor for Docker, a service alongside your app in the
docker-compose.yaml
. This would tend to mean the dev team will own this instance of the Collector. - A Gateway Collector is a central (standalone) operation of the collectorâthink a standalone Collector in a specific namespace or even a dedicated Kubernetes clusterâtypically owned by the platform team. This is the final step of the telemetry pipeline letting the platform team enforce policies like filtering logs, sampling traces, dropping metrics, before sending it to an observability backend.
Hereâs an awesome explanation on StackOverflow. Yes, itâs still a thing. No, not everything is explained by AI. đ
To satisfy all high availability Iâll walk you through how to configure:
- Multiple Collector Instances. Each instance is capable of handling the full workload with redundant storage for temporary data buffering.
- A Load Balancer. Itâll distribute incoming telemetry data and maintain consistent routing. Load balancers also support automatic failover if a collector becomes unavailable.
- Shared Storage. Persistent storage for collector state and configuration management.
Now itâs time to get our hands dirty with some hands-on coding.
Configure Agent-Gateway High Availability (HA) with the OpenTelemetry Collector
Let me first explain this concept by using Docker and visualize it with Bindplane. This architecture is transferable and usable for any type of Linux or Windows VM setup as well. More about Kubernetes further below.
There are three options you can use. Either using a load balancer like Nginx or Traefik. Or, using the loadbalancing exporter thatâs available in the Collector. Finally, if youâre fully committed to a containerized environment, use native load balancing in Kubernetes with services and a horizontal pod autoscaler.
Nginx Load Balancer
The Nginx option is the simpler, out-of-the-box solution.
Iâll set up the architecture with:
- Three Gateway Collectors in parallel
- One Nginx load balancer
- One Agent Collector configured to generate telemetry (app simulation)

This structure is the bare-bones minimum youâll end up using. Note that you'll end up using three separate services for the gateway collectors. The reason behind this is that each collector needs to have its own separate file_storage
path to store data in the persistent queue. In Docker, this means you need to make sure each container gets a unique volume. Let me explain how that works.
Copy the content below into a docker-compose.yaml
.
1version: '3.8'
2
3volumes:
4 gw1-storage: # persistent queue for gateway-1
5 gw2-storage: # persistent queue for gateway-2
6 gw3-storage: # persistent queue for gateway-3
7 telgen-storage: # persistent queue for telemetry generator
8 external-gw-storage: # persistent queue for external gateway
9
10services:
11 # ââââââââââââââ GATEWAYS (3Ă) ââââââââââââââ
12 gw1:
13 image: ghcr.io/observiq/bindplane-agent:1.79.2
14 container_name: gw1
15 hostname: gw1
16 command: ["--config=/etc/otel/config/config.yaml"]
17 volumes:
18 - ./config:/etc/otel/config
19 - gw1-storage:/etc/otel/storage # 60 GiB+ queue
20 environment:
21 OPAMP_ENDPOINT: "wss://app.bindplane.com/v1/opamp" # point to your Bindplane server
22 OPAMP_SECRET_KEY: "<secret>"
23 OPAMP_LABELS: ephemeral=true
24 MANAGER_YAML_PATH: /etc/otel/config/gw1-manager.yaml
25 CONFIG_YAML_PATH: /etc/otel/config/config.yaml
26 LOGGING_YAML_PATH: /etc/otel/config/logging.yaml
27
28 gw2:
29 image: ghcr.io/observiq/bindplane-agent:1.79.2
30 container_name: gw2
31 hostname: gw2
32 command: ["--config=/etc/otel/config/config.yaml"]
33 volumes:
34 - ./config:/etc/otel/config
35 - gw2-storage:/etc/otel/storage # 60 GiB+ queue
36 environment:
37 OPAMP_ENDPOINT: "wss://app.bindplane.com/v1/opamp" # point to your Bindplane server
38 OPAMP_SECRET_KEY: "<secret>"
39 OPAMP_LABELS: ephemeral=true
40 MANAGER_YAML_PATH: /etc/otel/config/gw2-manager.yaml
41 CONFIG_YAML_PATH: /etc/otel/config/config.yaml
42 LOGGING_YAML_PATH: /etc/otel/config/logging.yaml
43
44 gw3:
45 image: ghcr.io/observiq/bindplane-agent:1.79.2
46 container_name: gw3
47 hostname: gw3
48 command: ["--config=/etc/otel/config/config.yaml"]
49 volumes:
50 - ./config:/etc/otel/config
51 - gw3-storage:/etc/otel/storage # 60 GiB+ queue
52 environment:
53 OPAMP_ENDPOINT: "wss://app.bindplane.com/v1/opamp" # point to your Bindplane server
54 OPAMP_SECRET_KEY: "<secret>"
55 OPAMP_LABELS: ephemeral=true
56 MANAGER_YAML_PATH: /etc/otel/config/gw3-manager.yaml
57 CONFIG_YAML_PATH: /etc/otel/config/config.yaml
58 LOGGING_YAML_PATH: /etc/otel/config/logging.yaml
59
60 # ââââââââââââââ OTLP LOAD-BALANCER ââââââââââââââ
61 otlp-lb:
62 image: nginx:1.25-alpine
63 volumes:
64 - ./nginx-otlp.conf:/etc/nginx/nginx.conf:ro
65 ports:
66 - "4317:4317" # OTLP gRPC
67 - "4318:4318" # OTLP HTTP/JSON
68 depends_on: [gw1, gw2, gw3]
69
70 # ââââââââââââââ TELEMETRY GENERATOR ââââââââââââââ
71 telgen:
72 image: ghcr.io/observiq/bindplane-agent:1.79.2
73 container_name: telgen
74 hostname: telgen
75 command: ["--config=/etc/otel/config/config.yaml"]
76 volumes:
77 - ./config:/etc/otel/config
78 - telgen-storage:/etc/otel/storage # 60 GiB+ queue
79 environment:
80 OPAMP_ENDPOINT: "wss://app.bindplane.com/v1/opamp" # point to your Bindplane server
81 OPAMP_SECRET_KEY: "<secret>"
82 OPAMP_LABELS: ephemeral=true
83 MANAGER_YAML_PATH: /etc/otel/config/telgen-manager.yaml
84 CONFIG_YAML_PATH: /etc/otel/config/config.yaml
85 LOGGING_YAML_PATH: /etc/otel/config/logging.yaml
86
87 # ââââââââââââââ EXTERNAL GATEWAY ââââââââââââââ
88 external-gw:
89 image: ghcr.io/observiq/bindplane-agent:1.79.2
90 container_name: external-gw
91 hostname: external-gw
92 command: ["--config=/etc/otel/config/external-gw-config.yaml"]
93 volumes:
94 - ./config:/etc/otel/config
95 - external-gw-storage:/etc/otel/storage # 60 GiB+ queue
96 environment:
97 OPAMP_ENDPOINT: "wss://app.bindplane.com/v1/opamp" # point to your Bindplane server
98 OPAMP_SECRET_KEY: "<secret>"
99 OPAMP_LABELS: ephemeral=true
100 MANAGER_YAML_PATH: /etc/otel/config/external-gw-manager.yaml
101 CONFIG_YAML_PATH: /etc/otel/config/external-gw-config.yaml
102 LOGGING_YAML_PATH: /etc/otel/config/logging.yaml
Open your Bindplane instance and click the Install Agent button.

Set the platform to Linux, since Iâm demoing this with Docker, and hit next.

This screen now shows the environment variables you'll need to replace in the docker-compose.yaml
.

Go ahead and replace the OPAMP_SECRET_KEY
with your own secret key from Bindplane. If youâre using a self-hosted instance of Bindplane, replace your OPAMP_ENDPOINT
as well. Use the values after -e and -s which represent the endpoint and secret.
Create a nginx-otlp.conf
file for the load balancer.
1worker_processes auto;
2events { worker_connections 1024; }
3
4stream {
5 upstream otlp_grpc {
6 server gw1:4317 max_fails=3 fail_timeout=15s;
7 server gw2:4317 max_fails=3 fail_timeout=15s;
8 server gw3:4317 max_fails=3 fail_timeout=15s;
9 }
10 server {
11 listen 4317; # gRPC
12 proxy_pass otlp_grpc;
13 proxy_connect_timeout 1s;
14 proxy_timeout 30s;
15 }
16}
17
18http {
19 upstream otlp_http {
20 server gw1:4318 max_fails=3 fail_timeout=15s;
21 server gw2:4318 max_fails=3 fail_timeout=15s;
22 server gw3:4318 max_fails=3 fail_timeout=15s;
23 }
24 server {
25 listen 4318; # HTTP/JSON
26 location / {
27 proxy_pass http://otlp_http;
28 proxy_next_upstream error timeout http_502 http_503 http_504;
29 }
30 }
31}
Create a ./config
directory in the same root directory as your docker-compose.yaml, and create 3 files.
1> config/
2 config.yaml
3 telgen-config.yaml
4 logging.yaml
Paste this basic config into the config.yaml
and telgen-config.yaml
for the BDOT Collector to have a base config to start. Iâll then configure it with Bindplane.
1receivers:
2 nop:
3processors:
4 batch:
5exporters:
6 nop:
7service:
8 pipelines:
9 metrics:
10 receivers: [nop]
11 processors: [batch]
12 exporters: [nop]
13 telemetry:
14 metrics:
15 level: none
And, a base setup for the logging.yaml
.
1output: stdout
2level: info
Start the Docker Compose services.
1docker compose up -d
Jump into Bindplane and create three configurations for:
- telgen
- otlp-lb-gw
- external-gw

The telgen configuration has a Telemetry Generator source.

And, an OTLP destination.

The OTLP destination is configured to send telemetry to the otlp-lb
hostname, which is the hostname for the Nginx load balancer Iâm running in Docker Compose.
Next, the otlp-lb-gw configuration has an OTLP source that listens on 0.0.0.0
and ports 4317
and 4318
.

The destination is also OTLP, but instead sending to the external-gw hostname.

Finally, the external-gw configuration is again using an identical OTLP source.

And, a Dev Null destination.

This setup enables you to drop in whatever destination you want in the list of destinations for the external-gw configuration. Go wild! đ
If you open the processor node for the Dev Null destination, youâll see logs flowing through the load balancer.

While in the otlp-lb-gw configuration, if you open a processor node, youâll see evenly distributed load across all three collectors.

Thatâs how you load balance telemetry across multiple collectors with Nginx.
If you would rather apply these configs via the Bindplane CLI, get the files on GitHub, here.
Load Balancing Exporter
The second option is to use the dedicated loadbalancing exporter in the collector. With this exporter you can specify multiple downstream collectors that will receive the telemetry traffic equally.

One quick note before about the load balancing exporter. You donât always need it. Its main job is to make sure spans from the same trace stick together and get routed to the same backend collector. Thatâs useful for distributed tracing with sampling. But if youâre just shipping logs and metrics, or even traces without fancy sampling rules, you can probably skip it and stick with Nginx.
Iâll set up the architecture just as I did above but with yet another collector instead the Nginx load balancer:
- Three Gateway Collectors in parallel
- One Gateway Collector using the
loadbalancing
exporter - One Agent Collector configured to generate telemetry (app simulation)
This behaves identical to an Nginx load balancer. However this requires one less step and less configuration overhead. No need to configure and run Nginx, manage specific Nginx files, instead run one more instance of the collector and use a trusty collector config.yaml
that youâre already familiar with.
The drop in replacement for the use case above is as follows. In the docker-compose.yaml
replace the otlp-lb
Nginx service with another OpenTelemetry Collector service named lb
.
1services:
2
3
4# ...
5
6 lb:
7 image: ghcr.io/observiq/bindplane-agent:1.79.2
8 container_name: lb
9 hostname: lb
10 command: ["--config=/etc/otel/config/lb-config.yaml"]
11 volumes:
12 - ./config:/etc/otel/config
13 - lb-storage:/etc/otel/storage
14 ports:
15 - "4317:4317" # OTLP gRPC - external endpoint
16 - "4318:4318" # OTLP HTTP/JSON - external endpoint
17 environment:
18 OPAMP_ENDPOINT: "wss://app.bindplane.com/v1/opamp"
19 OPAMP_SECRET_KEY: "01JFJGVKWHQ1SPQVDGZEHVA995"
20 OPAMP_LABELS: ephemeral=true
21 MANAGER_YAML_PATH: /etc/otel/config/lb-manager.yaml
22 CONFIG_YAML_PATH: /etc/otel/config/lb-config.yaml
23 LOGGING_YAML_PATH: /etc/otel/config/logging.yaml
24 depends_on: [gw1, gw2, gw3]
25
26# ...
Create a base lb-config.yaml
for this collector instance in the ./config
directory. Bindplane will update this remotely once you add a destination for the loadbalancing
exporter.
1receivers:
2 nop:
3processors:
4 batch:
5exporters:
6 nop:
7service:
8 pipelines:
9 metrics:
10 receivers: [nop]
11 processors: [batch]
12 exporters: [nop]
13 telemetry:
14 metrics:
15 level: none
Go ahead and restart Docker Compose.
1docker compose down
2docker compose up -d
This will start the new lb
collector. In Bindplane, go ahead and create a new configuration called lb
and add an OTLP source that listens on 0.0.0.0
and ports 4317
and 4318
.

Now, create a custom destination and paste the loadbalancing
exporter configuration in the input field.
1loadbalancing:
2 protocol:
3 otlp:
4 tls:
5 insecure: true
6 timeout: 30s
7 retry_on_failure:
8 enabled: true
9 initial_interval: 5s
10 max_elapsed_time: 300s
11 max_interval: 30s
12 sending_queue:
13 enabled: true
14 num_consumers: 10
15 queue_size: 5000
16 resolver:
17 static:
18 hostnames:
19 - gw1:4317
20 - gw2:4317
21 - gw3:4317

Note that the hostnames correlate to the hostnames of the gateway collectors configured in Docker Compose. Save this configuration and roll it out to the new lb
collector. Opening the gw
configuration in Bindplane and selecting a processor node, youâll see the telemetry flowing through all 3 gateway collector instances.

Youâll see an even nicer split by seeing the telemetry throughput across all collectors in the Agents view.

The lb and external-gw are reporting the same throughput with the three gateway collectors load balancing traffic equally.
The loadbalancing exporter is behaving like a drop-in replacement for Nginx. I would call that a win. Less configuration overhead, fewer moving parts, and no need to learn specific Nginx configs. Instead, focus only on the collector.
To get this sample up-and-running quickly, apply these configs via the Bindplane CLI, get the files on GitHub, here.
Since you now have a good understanding of how to configure OpenTelemetry Collector infrastructure for high availability, let's move into details about resilience specifically.
Building Resilience into Your Collector
When it comes to resilience, features like retry logic, persistent queues, and batching should be handled in the Agent Collectors. These are the instances sitting closest to your workloads; theyâre most at risk of losing data if something goes wrong. The Agentâs job is to collect, buffer, and forward telemetry reliably, even when the backend is flaky or slow.
How you configure the OpenTelemetry collector for resilience to avoid losing telemetry during network issues or telemetry backend outages:
- Batching groups signals before export, improving efficiency.
- Retry ensures failed exports are re-attempted. For critical workloads, increase
max_elapsed_time
to tolerate longer outagesâbut be aware this will increase the buffer size on disk. - Persistent Queue stores retries on disk, protecting against data loss if the Collector crashes. You can configure:
- Number of consumers â how many parallel retry workers run
- Queue size â how many batches are stored
- Persistence â enables disk buffering for reliability
Retry & Persistent Queue
Luckily enough for you, Bindplane handles both retries and the persistent queues out-of-the-box for OTLP exporters.
Take a look at the telgen configuration. This is the collector weâre running in agent-mode simulating a bunch of telemetry traffic.
In the telgen-config.yaml
, you'll see OTLP exporter otlp/lb
is configured with both the persistent queue and retries.
1exporters:
2 otlp/lb:
3 compression: gzip
4 endpoint: gw:4317
5 retry_on_failure:
6 enabled: true
7 initial_interval: 5s
8 max_elapsed_time: 300s
9 max_interval: 30s
10 sending_queue:
11 enabled: true
12 num_consumers: 10
13 queue_size: 5000
14 storage: file_storage/lb
15 timeout: 30s
16 tls:
17 insecure: true
This is because the advanced settings for every OTLP exporter in Bindplane have this default configuration enabled.

The persistent queue directory here is the storage directory that we configured by creating a volume in Docker.
1# docker-compose.yaml
2
3...
4
5volumes:
6 gw1-storage: # persistent queue for gateway-1
7 gw2-storage: # persistent queue for gateway-2
8 gw3-storage: # persistent queue for gateway-3
9 telgen-storage: # persistent queue for telemetry generator
10 lb-storage: # persistent queue for load-balancing gateway
11 external-gw-storage: # persistent queue for external gateway
12
13...
Bindplane then automatically configures a storage extension in the config and enables it like this:
1# telgen-config.yaml
2
3...
4extensions:
5 file_storage/lb:
6 compaction:
7 directory: ${OIQ_OTEL_COLLECTOR_HOME}/storage
8 on_rebound: true
9 directory: ${OIQ_OTEL_COLLECTOR_HOME}/storage
10service:
11 extensions:
12 - file_storage/lb
13...
Note that the OIQ_OTEL_COLLECTOR_HOME
environment variable actually is mapped to the /etc/otel
directory.
Now your telemetry pipeline becomes resilient and HA-ready with data persistence to survive restarts, persistent queue buffering to handle temporary outages, and failover recovery to prevent data loss.
Batching
Batching is a whole other story, because you need to add a processor on the processor node for it to be enabled before connecting it to the destination.
Agent-mode collectors should batch telemetry before sending it to the gateway collector. The OTLP receiver on the gateway side will receive batches and forward them to your telemetry backend of choice.
In the telgen configuration, click a processor node and add a batch processor.

This config will send a batch of telemetry signals every 200ms regardless of the size. Or, it will send a batch of the size 8192 regardless of the timeout. Applying this processor in Bindplane will generate a config like this:
1# telgen-config.yaml
2
3...
4
5processors:
6 batch/lb: null
7 batch/lb-0__processor0:
8 send_batch_max_size: 0
9 send_batch_size: 8192
10 timeout: 200ms
11
12...
Kubernetes-native load balancing with HorizontalPodAutoscaler
Finally, after all the breakdowns, explanations, and diagrams, itâs time to show you what it would look like in the wild with a simple Kubernetes sample.
Using Kubernetes is the preferred architecture suggested by the Bindplane team and the OpenTelemetry community. K8s will maximize the benefits you get with Bindplane as well.
Iâll set up the architecture with:
- One Agent-mode Collector running per node on the K8s cluster configured to generate telemetry (app simulation)
- A Gateway Collector Deployment
- Using a HorizontalPodAutoscaler scaling from 2 to 10 pods
- And a ClusterIP service
- Configured with persistent storage, sending queue, and retry
- An external Gateway Collector running on another cluster acting as a mock telemetry backend
Luckily enough getting all the K8s YAML manifests for the collectors is all point-and-click from the Bindplane UI. However, you need to build the configurations first, before applying the collectors to your K8s cluster.
For the sake of simplicity lâll show how to spin up two K8s clusters with kind, and use them in this demo.
1kind create cluster --name kind-2
2kind create cluster --name kind-1
3
4# make sure you set the context to the kind-1 cluster first
5kubectl config use-context kind-kind-1
Next, jump into Bindplane and create three configurations for:
- telgen-kind-1
- gw-kind-1
- external-gw-kind-2

The telgen-kind-1 configuration has a Custom source with a telemetrygeneratorreceiver
.
1telemetrygeneratorreceiver:
2 generators:
3 - additional_config:
4 body: 127.0.0.1 - - [30/Jun/2025:12:00:00 +0000] \"GET /index.html HTTP/1.1\" 200 512
5 severity: 9
6 type: logs
7 payloads_per_second: 1

And, a Bindplane Gateway destination.
Note: This is identical to any OTLP destination.

The Bindplane Gateway destination is configured to send telemetry to the bindplane-gateway-agent.bindplane-agent.svc.cluster.local
hostname, which is the hostname for the Bindplane Gateway Collector service in Kubernetes that youâll start in a second.
The final step for this configuration is to click a processor node and add a batch processor.

Next, the gw-kind-1 configuration has a Bindplane Gateway source that listens on 0.0.0.0
and ports 4317
and 4318
.

The destination is OTLP, and sending telemetry to the IP address (172.18.0.2
) and port (30317
) of the external gateway running on the second K8s cluster.
Note: This might differ for your clusters. If you are using kind, like I am in this demo, the IP will be 172.18.0.2.

Finally, the external-gw-kind-2 configuration is again using an OTLP source.

And, a Dev Null destination.

Feel free to use the Bindplane CLI and these resources to apply all the configurations in one go without having to do it manually in the UI.
With the configurations created, you can install collectors easily by getting manifest files from your Bindplane account. Navigate to the install agents UI in Bindplane and select a Kubernetes environment. Use the Node platform and telgen-kind-1 configuration.

Clicking next will show a manifest file for you to apply in the cluster.

Save this file as node-agent-kind-1.yaml
. Check out below what a sample of it looks like. Or, see the file in GitHub, here.
1---
2apiVersion: v1
3kind: Namespace
4metadata:
5 labels:
6 app.kubernetes.io/name: bindplane-agent
7 name: bindplane-agent
8---
9apiVersion: v1
10kind: ServiceAccount
11metadata:
12 labels:
13 app.kubernetes.io/name: bindplane-agent
14 name: bindplane-agent
15 namespace: bindplane-agent
16---
17apiVersion: rbac.authorization.k8s.io/v1
18kind: ClusterRole
19metadata:
20 name: bindplane-agent
21 labels:
22 app.kubernetes.io/name: bindplane-agent
23rules:
24- apiGroups:
25 - ""
26 resources:
27 - events
28 - namespaces
29 - namespaces/status
30 - nodes
31 - nodes/spec
32 - nodes/stats
33 - nodes/proxy
34 - pods
35 - pods/status
36 - replicationcontrollers
37 - replicationcontrollers/status
38 - resourcequotas
39 - services
40 verbs:
41 - get
42 - list
43 - watch
44- apiGroups:
45 - apps
46 resources:
47 - daemonsets
48 - deployments
49 - replicasets
50 - statefulsets
51 verbs:
52 - get
53 - list
54 - watch
55- apiGroups:
56 - extensions
57 resources:
58 - daemonsets
59 - deployments
60 - replicasets
61 verbs:
62 - get
63 - list
64 - watch
65- apiGroups:
66 - batch
67 resources:
68 - jobs
69 - cronjobs
70 verbs:
71 - get
72 - list
73 - watch
74- apiGroups:
75 - autoscaling
76 resources:
77 - horizontalpodautoscalers
78 verbs:
79 - get
80 - list
81 - watch
82---
83apiVersion: rbac.authorization.k8s.io/v1
84kind: ClusterRoleBinding
85metadata:
86 name: bindplane-agent
87 labels:
88 app.kubernetes.io/name: bindplane-agent
89roleRef:
90 apiGroup: rbac.authorization.k8s.io
91 kind: ClusterRole
92 name: bindplane-agent
93subjects:
94- kind: ServiceAccount
95 name: bindplane-agent
96 namespace: bindplane-agent
97---
98apiVersion: v1
99kind: Service
100metadata:
101 labels:
102 app.kubernetes.io/name: bindplane-agent
103 name: bindplane-node-agent
104 namespace: bindplane-agent
105spec:
106 ports:
107 - appProtocol: grpc
108 name: otlp-grpc
109 port: 4317
110 protocol: TCP
111 targetPort: 4317
112 - appProtocol: http
113 name: otlp-http
114 port: 4318
115 protocol: TCP
116 targetPort: 4318
117 selector:
118 app.kubernetes.io/name: bindplane-agent
119 app.kubernetes.io/component: node
120 sessionAffinity: None
121 type: ClusterIP
122---
123apiVersion: v1
124kind: Service
125metadata:
126 labels:
127 app.kubernetes.io/name: bindplane-agent
128 app.kubernetes.io/component: node
129 name: bindplane-node-agent-headless
130 namespace: bindplane-agent
131spec:
132 clusterIP: None
133 ports:
134 - appProtocol: grpc
135 name: otlp-grpc
136 port: 4317
137 protocol: TCP
138 targetPort: 4317
139 - appProtocol: http
140 name: otlp-http
141 port: 4318
142 protocol: TCP
143 targetPort: 4318
144 selector:
145 app.kubernetes.io/name: bindplane-agent
146 app.kubernetes.io/component: node
147 sessionAffinity: None
148 type: ClusterIP
149---
150apiVersion: v1
151kind: ConfigMap
152metadata:
153 name: bindplane-node-agent-setup
154 labels:
155 app.kubernetes.io/name: bindplane-agent
156 app.kubernetes.io/component: node
157 namespace: bindplane-agent
158data:
159 # This script assumes it is running in /etc/otel.
160 setup.sh: |
161 # Configure storage/ emptyDir volume permissions so the
162 # manager configuration can ge written to it.
163 chown 10005:10005 storage/
164
165 # Copy config and logging configuration files to storage/
166 # hostPath volume if they do not already exist.
167 if [ ! -f storage/config.yaml ]; then
168 echo '
169 receivers:
170 nop:
171 processors:
172 batch:
173 exporters:
174 nop:
175 service:
176 pipelines:
177 metrics:
178 receivers: [nop]
179 processors: [batch]
180 exporters: [nop]
181 telemetry:
182 metrics:
183 level: none
184 ' > storage/config.yaml
185 fi
186 if [ ! -f storage/logging.yaml ]; then
187 echo '
188 output: stdout
189 level: info
190 ' > storage/logging.yaml
191 fi
192 chown 10005:10005 storage/config.yaml storage/logging.yaml
193---
194apiVersion: apps/v1
195kind: DaemonSet
196metadata:
197 name: bindplane-node-agent
198 labels:
199 app.kubernetes.io/name: bindplane-agent
200 app.kubernetes.io/component: node
201 namespace: bindplane-agent
202spec:
203 selector:
204 matchLabels:
205 app.kubernetes.io/name: bindplane-agent
206 app.kubernetes.io/component: node
207 template:
208 metadata:
209 labels:
210 app.kubernetes.io/name: bindplane-agent
211 app.kubernetes.io/component: node
212 annotations:
213 prometheus.io/scrape: "true"
214 prometheus.io/path: /metrics
215 prometheus.io/port: "8888"
216 prometheus.io/scheme: http
217 prometheus.io/job-name: bindplane-node-agent
218 spec:
219 serviceAccount: bindplane-agent
220 initContainers:
221 - name: setup
222 image: busybox:latest
223 securityContext:
224 # Required for changing permissions from
225 # root to otel user in emptyDir volume.
226 runAsUser: 0
227 command: ["sh", "/setup/setup.sh"]
228 volumeMounts:
229 - mountPath: /etc/otel/config
230 name: config
231 - mountPath: /storage
232 name: storage
233 - mountPath: "/setup"
234 name: setup
235 containers:
236 - name: opentelemetry-collector
237 image: ghcr.io/observiq/bindplane-agent:1.80.1
238 imagePullPolicy: IfNotPresent
239 securityContext:
240 readOnlyRootFilesystem: true
241 # Required for reading container logs hostPath.
242 runAsUser: 0
243 ports:
244 - containerPort: 8888
245 name: prometheus
246 resources:
247 requests:
248 memory: 200Mi
249 cpu: 100m
250 limits:
251 memory: 200Mi
252 env:
253 - name: OPAMP_ENDPOINT
254 value: wss://app.bindplane.com/v1/opamp
255 - name: OPAMP_SECRET_KEY
256 value: <secret>
257 - name: OPAMP_AGENT_NAME
258 valueFrom:
259 fieldRef:
260 fieldPath: spec.nodeName
261 - name: OPAMP_LABELS
262 value: configuration=telgen-kind-1,container-platform=kubernetes-daemonset,install_id=0979c5c2-bd7a-41c1-89b8-2c16441886ab
263 - name: KUBE_NODE_NAME
264 valueFrom:
265 fieldRef:
266 fieldPath: spec.nodeName
267 # The collector process updates config.yaml
268 # and manager.yaml when receiving changes
269 # from the OpAMP server.
270 #
271 # The config.yaml is persisted by saving it to the
272 # hostPath volume, allowing the agent to continue
273 # running after restart during an OpAMP server outage.
274 #
275 # The manager configuration must be re-generated on
276 # every startup due to how the bindplane-agent handles
277 # manager configuration. It prefers a manager config file
278 # over environment variables, meaning it cannot be
279 # updated using environment variables, if it is persisted).
280 - name: CONFIG_YAML_PATH
281 value: /etc/otel/storage/config.yaml
282 - name: MANAGER_YAML_PATH
283 value: /etc/otel/config/manager.yaml
284 - name: LOGGING_YAML_PATH
285 value: /etc/otel/storage/logging.yaml
286 volumeMounts:
287 - mountPath: /etc/otel/config
288 name: config
289 - mountPath: /run/log/journal
290 name: runlog
291 readOnly: true
292 - mountPath: /var/log
293 name: varlog
294 readOnly: true
295 - mountPath: /var/lib/docker/containers
296 name: dockerlogs
297 readOnly: true
298 - mountPath: /etc/otel/storage
299 name: storage
300 volumes:
301 - name: config
302 emptyDir: {}
303 - name: runlog
304 hostPath:
305 path: /run/log/journal
306 - name: varlog
307 hostPath:
308 path: /var/log
309 - name: dockerlogs
310 hostPath:
311 path: /var/lib/docker/containers
312 - name: storage
313 hostPath:
314 path: /var/lib/observiq/otelcol/container
315 - name: setup
316 configMap:
317 name: bindplane-node-agent-setup
In short, this manifest deploys the BDOT Collector as a DaemonSet on every node, using OpAMP to receive config from Bindplane. It includes:
- RBAC to read Kubernetes objects (pods, nodes, deployments, etc.)
- Services to expose OTLP ports (4317 gRPC, 4318 HTTP)
- An init container to bootstrap a config to start the collector which will be replaced by the telgen-kind-1 configuration once started
- Persistent hostPath storage for retries and disk buffering
- Prometheus annotations for metrics scraping
Your file will include the correct OPAMP_ENDPOINT
, OPAMP_SECRET_KEY
, and OPAMP_LABELS
.
Go ahead and apply this manifest to the first k8s cluster.
1kubectl config use-context kind-kind-1
2kubectl apply -f node-agent-kind-1.yaml
Now, install another collector in the K8s cluster, but now choose a Gateway and the gw-kind-1 configuration.

Youâll get a manifest file to apply again, but this time a deployment. Save it as gateway-collector-kind-1.yaml
.

Hereâs the full manifest as a deployment with a horizontal pod autoscaler. Or, check out what it looks like on GitHub.
1---
2apiVersion: v1
3kind: Namespace
4metadata:
5 labels:
6 app.kubernetes.io/name: bindplane-agent
7 name: bindplane-agent
8---
9apiVersion: v1
10kind: ServiceAccount
11metadata:
12 labels:
13 app.kubernetes.io/name: bindplane-agent
14 name: bindplane-agent
15 namespace: bindplane-agent
16---
17apiVersion: v1
18kind: Service
19metadata:
20 labels:
21 app.kubernetes.io/name: bindplane-agent
22 app.kubernetes.io/component: gateway
23 name: bindplane-gateway-agent
24 namespace: bindplane-agent
25spec:
26 ports:
27 - appProtocol: grpc
28 name: otlp-grpc
29 port: 4317
30 protocol: TCP
31 targetPort: 4317
32 - appProtocol: http
33 name: otlp-http
34 port: 4318
35 protocol: TCP
36 targetPort: 4318
37 - appProtocol: tcp
38 name: splunk-tcp
39 port: 9997
40 protocol: TCP
41 targetPort: 9997
42 - appProtocol: tcp
43 name: splunk-hec
44 port: 8088
45 protocol: TCP
46 targetPort: 8088
47 selector:
48 app.kubernetes.io/name: bindplane-agent
49 app.kubernetes.io/component: gateway
50 sessionAffinity: None
51 type: ClusterIP
52---
53apiVersion: v1
54kind: Service
55metadata:
56 labels:
57 app.kubernetes.io/name: bindplane-agent
58 app.kubernetes.io/component: gateway
59 name: bindplane-gateway-agent-headless
60 namespace: bindplane-agent
61spec:
62 clusterIP: None
63 ports:
64 - appProtocol: grpc
65 name: otlp-grpc
66 port: 4317
67 protocol: TCP
68 targetPort: 4317
69 - appProtocol: http
70 name: otlp-http
71 port: 4318
72 protocol: TCP
73 targetPort: 4318
74 selector:
75 app.kubernetes.io/name: bindplane-agent
76 app.kubernetes.io/component: gateway
77 sessionAffinity: None
78 type: ClusterIP
79---
80apiVersion: apps/v1
81kind: Deployment
82metadata:
83 name: bindplane-gateway-agent
84 labels:
85 app.kubernetes.io/name: bindplane-agent
86 app.kubernetes.io/component: gateway
87 namespace: bindplane-agent
88spec:
89 selector:
90 matchLabels:
91 app.kubernetes.io/name: bindplane-agent
92 app.kubernetes.io/component: gateway
93 template:
94 metadata:
95 labels:
96 app.kubernetes.io/name: bindplane-agent
97 app.kubernetes.io/component: gateway
98 annotations:
99 prometheus.io/scrape: "true"
100 prometheus.io/path: /metrics
101 prometheus.io/port: "8888"
102 prometheus.io/scheme: http
103 prometheus.io/job-name: bindplane-gateway-agent
104 spec:
105 serviceAccount: bindplane-agent
106 affinity:
107 podAntiAffinity:
108 preferredDuringSchedulingIgnoredDuringExecution:
109 - weight: 100
110 podAffinityTerm:
111 topologyKey: kubernetes.io/hostname
112 labelSelector:
113 matchExpressions:
114 - key: app.kubernetes.io/name
115 operator: In
116 values: [bindplane-agent]
117 - key: app.kubernetes.io/component
118 operator: In
119 values: [gateway]
120 securityContext:
121 runAsNonRoot: true
122 runAsUser: 1000000000
123 runAsGroup: 1000000000
124 fsGroup: 1000000000
125 seccompProfile:
126 type: RuntimeDefault
127 initContainers:
128 - name: setup-volumes
129 image: ghcr.io/observiq/bindplane-agent:1.80.1
130 securityContext:
131 runAsNonRoot: true
132 runAsUser: 1000000000
133 runAsGroup: 1000000000
134 readOnlyRootFilesystem: true
135 allowPrivilegeEscalation: false
136 seccompProfile:
137 type: RuntimeDefault
138 capabilities:
139 drop:
140 - ALL
141 command:
142 - 'sh'
143 - '-c'
144 - |
145 echo '
146 receivers:
147 nop:
148 processors:
149 batch:
150 exporters:
151 nop:
152 service:
153 pipelines:
154 metrics:
155 receivers: [nop]
156 processors: [batch]
157 exporters: [nop]
158 telemetry:
159 metrics:
160 level: none
161 ' > /etc/otel/storage/config.yaml
162 echo '
163 output: stdout
164 level: info
165 ' > /etc/otel/storage/logging.yaml
166 resources:
167 requests:
168 memory: 200Mi
169 cpu: 100m
170 limits:
171 memory: 200Mi
172 volumeMounts:
173 - mountPath: /etc/otel/storage
174 name: bindplane-gateway-agent-storage
175 containers:
176 - name: opentelemetry-container
177 image: ghcr.io/observiq/bindplane-agent:1.80.1
178 imagePullPolicy: IfNotPresent
179 securityContext:
180 runAsNonRoot: true
181 runAsUser: 1000000000
182 runAsGroup: 1000000000
183 readOnlyRootFilesystem: true
184 allowPrivilegeEscalation: false
185 seccompProfile:
186 type: RuntimeDefault
187 capabilities:
188 drop:
189 - ALL
190 resources:
191 requests:
192 memory: 500Mi
193 cpu: 250m
194 limits:
195 memory: 500Mi
196 ports:
197 - containerPort: 8888
198 name: prometheus
199 env:
200 - name: OPAMP_ENDPOINT
201 value: wss://app.bindplane.com/v1/opamp
202 - name: OPAMP_SECRET_KEY
203 value: <secret>
204 - name: OPAMP_AGENT_NAME
205 valueFrom:
206 fieldRef:
207 fieldPath: metadata.name
208 - name: OPAMP_LABELS
209 value: configuration=gw-kind-1,container-platform=kubernetes-gateway,install_id=51dbe4d2-83d2-45c0-ab4a-e0c127a59649
210 - name: KUBE_NODE_NAME
211 valueFrom:
212 fieldRef:
213 fieldPath: spec.nodeName
214 # The collector process updates config.yaml
215 # and manager.yaml when receiving changes
216 # from the OpAMP server.
217 - name: CONFIG_YAML_PATH
218 value: /etc/otel/storage/config.yaml
219 - name: MANAGER_YAML_PATH
220 value: /etc/otel/config/manager.yaml
221 - name: LOGGING_YAML_PATH
222 value: /etc/otel/storage/logging.yaml
223 volumeMounts:
224 - mountPath: /etc/otel/storage
225 name: bindplane-gateway-agent-storage
226 - mountPath: /etc/otel/config
227 name: config
228 volumes:
229 - name: config
230 emptyDir: {}
231 - name: bindplane-gateway-agent-storage
232 emptyDir: {}
233 # Allow exporters to drain their queue for up to
234 # five minutes.
235 terminationGracePeriodSeconds: 500
236---
237apiVersion: autoscaling/v2
238kind: HorizontalPodAutoscaler
239metadata:
240 name: bindplane-gateway-agent
241 namespace: bindplane-agent
242spec:
243 maxReplicas: 10
244 minReplicas: 2
245 scaleTargetRef:
246 apiVersion: apps/v1
247 kind: Deployment
248 name: bindplane-gateway-agent
249 metrics:
250 - type: Resource
251 resource:
252 name: cpu
253 target:
254 type: Utilization
255 averageUtilization: 60
Hereâs a breakdown of what this manifest does:
- Creates a dedicated namespace and service account for the Bindplane Gateway Collector (bindplane-agent).
- Defines two Kubernetes services:
- A standard ClusterIP service for OTLP (gRPC/HTTP) and Splunk (TCP/HEC) traffic.
- A headless service for direct pod discovery, useful in peer-to-peer setups.
- Deploys the Bindplane Agent as a scalable Deployment:
- Runs the OpenTelemetry Collector image.
- Bootstraps basic config via an initContainer.
- Secure runtime with strict securityContext settings.
- Prometheus annotations enable metrics scraping.
- Auto-scales the collector horizontally using an HPA:
- Scales between 2 and 10 replicas based on CPU utilization.
- Uses OpAMP to receive remote config and updates from Bindplane.
- Mounts ephemeral storage for config and persistent queue support using emptyDir.
Your file will include the correct OPAMP_ENDPOINT
, OPAMP_SECRET_KEY
, and OPAMP_LABELS
.
Apply it in the first k8s cluster.
1kubectl apply -f gateway-collector-kind-1.yaml
Now, create an identical Gateway Collector as above but use the external-gw-kind-2 configuration.

Youâll get a manifest file to apply again, but this time apply it in your second cluster. Save it as gateway-collector-kind-2.yaml
. Hereâs what it looks like in GitHub. I wonât bother showing you the manifest YAML since it will be identical to the one above.
1kubectl config use-context kind-kind-2
2kubectl apply -f gateway-collector-kind-2.yaml
Finally, to expose this external Gateway Collectorâs service and enable OTLP traffic from cluster 1 to cluster 2, Iâll use this NodePort service called gateway-nodeport-service.yaml
.
1apiVersion: v1
2kind: Service
3metadata:
4 name: bindplane-gateway-agent-nodeport
5 namespace: bindplane-agent
6 labels:
7 app.kubernetes.io/name: bindplane-agent
8 app.kubernetes.io/component: gateway
9spec:
10 type: NodePort
11 ports:
12 - name: otlp-grpc
13 port: 4317
14 targetPort: 4317
15 nodePort: 30317
16 protocol: TCP
17 - name: otlp-http
18 port: 4318
19 targetPort: 4318
20 nodePort: 30318
21 protocol: TCP
22 selector:
23 app.kubernetes.io/name: bindplane-agent
24 app.kubernetes.io/component: gateway
And, apply it with:
1kubectl apply -f gateway-nodeport-service.yaml
Your final setup will look like this.

One Agent-mode collector sending telemetry traffic via a horizontally scaled Gateway-mode collector to an external Gateway running in a separate cluster. This can be any other telemetry backend of your choice.
Youâll have 5 collectors running in total.

And, 3 configurations, where 2 of them will be scaled between 2 and 10 collector pods.

To get this sample up-and-running quickly, apply these configs via the Bindplane CLI, get the files on GitHub, here.
Final thoughts
At the end of the day, high availability for the OpenTelemetry Collector means one thing: donât lose telemetry when stuff breaks.
You want things to keep working when a telemetry backend goes down, a node restarts, or youâre pushing out updates. Thatâs why the Agent-Gateway pattern exists. Thatâs why we scale horizontally. Thatâs why we use batching, retries, and persistent queues.
Set it up once, and sleep better knowing your pipeline wonât fall over at the first hiccup. Keep signals flowing. No drops. No drama.
Want to give Bindplane a try? Spin up a free instance of Bindplane Cloud and hit the ground running right away.
