Blog — observIQ

Bindplane Winter 2025 Release

Ryan Goins — Tue, 14 Jan 2025 05:32:35 GMT

Happy New Year! Winter has fully set in at Bindplane’s Michigan-based headquarters, but we’re keeping warm with the excitement from this piping-hot announcement and accompanying release. It’s just the thing to ignite the hearts, minds, and typing fingers (?) of our team and observability professionals as we kick off 2025. New Name. New Logo. Same Mission. First, observIQ is officially no more. As of today, we’ve rebranded as Bindplane (https://www.bindplane.com), aligning our identity with our flagship product and doubling down on our mission to build a powerful, OpenTelemetry-native telemetry pipeline. As a team, we’re excited to embrace 2025 as Bindplane while continuing to deliver exceptional value to our customers. Winter '25 Release In addition to the new name and logo, Bindplane’s Winter Release contains several exciting features: 1. Bindplane Agent is now BDOT Collector To kick things off: the Bindplane Agent is now officially known as the Bindplane Distro for OpenTelemetry Collector (“BDOT Collector” for short). Built with the OCB, the BDOT Collector aligns with OTel’s updated naming conventions and also marks the inclusion of a new enhancement—the Supervisor. The Supervisor is a process that oversees the operation of the OpenTelemetry Collector. Its primary responsibilities include: Starting and stopping the collector. Communicating with the OpAMP server on behalf of the collector. Managing the collector’s configuration based on OpAMP messages. Automatically restarting the collector in case of crashes. BDOT 2.0 is a big release that solidifies the foundation for the next phase of Bindplane. 2. Connector Support We’ve integrated one of the newest OpenTelemetry Collector components in Bindplane: Connectors. What is a Connector? As a quick refresher, connectors act as a bridge between telemetry pipelines, enabling the creation of complex signals like log-based metrics (data reduction) and facilitating advanced processing and routing use cases. Dan Jaglowski, a Principal Engineer and OTel Maintainer at Bindplane, helped co-develop the Connector framework and gave an in-depth talk at KubeCon EU ‘23. The Routing Connector In this release, we focused our attention on the Routing Connector first, as it provides users with a more streamlined and performant method to route their data, setting the stage for our next feature. 3. Data Routing v2 Over the past year, we’ve gathered feedback about the data routing experience in Bindplane, as it’s one of the key elements of a complete telemetry pipeline. With Data Routing v2, users can upgrade their Collector configurations and access a new stacked configuration view with streamlined controls--upgrading their pipelines on the fly. Data Routing v2 provides a few key benefits: Improved performance: lessening the performance impact on the collector Improved control: making it easier to alter your routes on the fly Streamlined view: making it easier to glean the details of your configuration across each signal type: metrics, logs, and traces 4. Collector Gateways on Topology Deploying the OpenTelemetry Collector as a gateway is powerful, as it facilitates secured and scaled telemetry, SIEM, and observability deployments. Knowing this, we’ve simplified the implementation and given them greater visibility on Bindplane’s Overview page. Previously, the Overview Page displayed source nodes on the left and their connections to destinations on the right. Now, when active data flows between a Gateway Destination and Source within the same project, Bindplane automatically generates an intermediate Gateway node. This feature simplifies tracing data paths from their source to their destination—even across multiple gateways—making complex telemetry configurations easier to monitor and optimize. 5. Processor Bundles Processor bundles simplify Bindplane’s capabilities by grouping multiple processors into a single, reusable entity. While OpenTelemetry offers powerful tools, configuring the OpenTelemetry Collector often requires chaining processors, even for simple tasks like parsing JSON logs—a process that can become tedious. This approach streamlines configuration by saving processors in the required order. For example, tasks like parsing JSON logs, severity, and timestamps can be combined into one efficient processor bundle. Winter Release wrap-up: Just the tip of the iceberg. Well, that’s a wrap on this announcement. All of the aforementioned features are available in BindPlane Cloud today, as well as beta releases of Data Routing v2 / BDOT 2.0. These will be coming to an on-prem release next week in version 1.85 of Bindplane. There’s much more to come for Bindplane in 2025 in the realm of integrations, routing, scalability, and more. Want to hear more? Join our first community call on January 15, 2025, where we discuss new features in detail in our first live stream. Happy New Year!

Bindplane Community Calls are Kicking off in Style

Adnan Rahic — Fri, 20 Dec 2024 17:06:27 GMT

What are community calls? Somebody calls them office hours. I like thinking of them as a tool for solving problems by finding points of connection between a product and its community. Most importantly, community calls serve as launchpads for communities. Community calls bring people together from all over the world. They serve as a connection. With Bindplane, we believe we’re building so much more than just a product. We’re building a community of OpenTelemetry enthusiasts and professionals. The goal is to connect all you wonderful folks who are passionate about standardizing telemetry and unifying their observability pipelines with OpenTelemetry. Enter, the Community Call! Mark your calendars for January 15th 2025, at 11:00 AM ET for our first community call. We'll be hosting these sessions on the second Wednesday of every month, making it easy for you to plan ahead and participate regularly. To ensure you don't miss out, there's no need to register. All sessions will be available live and on-demand on our YouTube channel and socials. Don’t worry about forgetting to join either, live stream details will be shared via our Slack channel and newsletter, as well as a shared Google calendar you can subscribe to, or use this iCal instead. Why Join Bindplane Community Calls? I believe community calls will become an awesome resource for our community, fostering collaboration and knowledge sharing between Bindplane users across the globe. You’ll get a chance to: Connect with fellow Bindplane users and share experiences Get early insights into upcoming features and product roadmap Participate in live Q&A sessions with the Bindplane team Get insights and help to optimize your telemetry pipelines Share best practices, troubleshoot together, and learn from top OpenTelemetry contributors Be the first to know about new updates, integrations, and tools to improve your OpenTelemetry-native telemetry pipeline Get first-hand info about upcoming hands-on workshops with the BindPlane team So… What Now? First things first, mark your calendar for January 15th and prepare your questions! We're eager to hear your feedback on the latest Bindplane releases, and answering your burning questions is our top priority. The whole team is thrilled to kick off these community calls in the new year. We can't wait to see you on the call in January! Quick Links YouTube Live Stream LinkedIn Live Shared Google Calendar (we’ll update this monthly) Slack Channel

Reduce Observability Costs with OpenTelemetry Setup

Michelle Artreche — Tue, 22 Oct 2024 18:00:21 GMT

Maintaining and visualizing telemetry data efficiently is super important for DevOps and SecOps teams. OpenTelemetry, a fantastic open-source observability framework, can really help with this without being too costly. Picture having a simple process that improves your data and helps your team make smart decisions without spending too much money. Let's chat about some budget-friendly ways to set up OpenTelemetry agents. This cost-effective approach will help you to take control of your telemetry data with ease! Make OpenTelemetry Work for Your Budget Get more from your data When it comes to using OpenTelemetry to enrich your data, it's all about being selective with what you focus on. By focusing on the most crucial parts of your application, you can cut back on unnecessary data collection and save on storage costs. Start off by pinpointing the key processes that really drive your business forward. Next, think about using sampling techniques to grab just the right amount of data instead of every little transaction. This way, you can still get all the insights you need without drowning in data. And don't forget to make sure your data stays consistent across all your different systems by using trace context propagation. On top of that, take advantage of OpenTelemetry exporters to direct your data only to the places that really need it. By enriching your data in a targeted way, you'll keep your insights top-notch without overwhelming your system. Reducing observability overheads To make OpenTelemetry more efficient, focus on handling data well. Start by compressing data before sending it, which can significantly reduce bandwidth use. Consider adjusting the frequency of data collection to balance between detail and resource use. Set up alert thresholds to reduce noise and avoid unnecessary processing of routine data. This helps prioritize critical alerts that truly need attention. Use OpenTelemetry's ability to work with your existing tools to make operations smoother and save on costs. Think about using serverless deployments to scale your observability setup with demand. This can help you save costs and keep things running smoothly. Saving on overheads not only helps your budget but also makes your observability process more agile. Key Configuration Techniques Optimizing OpenTelemetry for better performance and cost is really important. Start by customizing your agent configuration to fit your specific needs. Adjust the sampling rate to gather valuable data without overwhelming your system. You can also use adaptive sampling to automatically change the sampling rate based on traffic patterns. This way, you'll capture only the most important info during busy times, reducing unnecessary data collection. Another useful technique is to set up efficient batching and exporting intervals. By tweaking these, you can minimize the impact on your system's resources and export data less frequently, which can save you money. Remember to customize configurations for different environments like development, staging, and production. This helps you make the most of your resources and manage costs effectively. Strengthening DevOps and SecOps Teams Optimize through collaboration Building a strong collaboration between DevOps and SecOps teams is so important when it comes to making the most of OpenTelemetry implementation. It's all about setting common goals and metrics that line up with what the business wants to achieve. This helps make sure that everyone is on the same page about what success really means. Having regular get-togethers where both teams can share what they know is a great way to make sure everyone understands each other's priorities and challenges. Creating a single dashboard for monitoring can also be a big help in this, as it gives everyone one place to go for the real deal. It's also a good idea to keep those lines of communication wide open so that any issues can be dealt with super quickly. This could mean using chat platforms, having regular catch-ups, or using tools that let everyone see what's happening right now. Last but not least, it's important to involve both teams in the decision-making process for configuration and deployment. When everyone's involved, it means that the configurations are going to work well for both getting things done and keeping everything safe and sound. Ultimately, this helps create a really strong observability strategy. Practical implementation examples Seeing OpenTelemetry in action can provide valuable insights for DevOps and SecOps teams. For instance, a big e-commerce platform made its system more transparent without breaking the bank by focusing on key microservices. This helped them quickly spot and fix issues while keeping data usage in check. Then there's a fintech company that smartly adjusted their data sampling during busy times, so they could keep up with traffic without drowning in storage and analysis demands. And a healthcare provider combined OpenTelemetry with their existing monitoring tools, making their system more streamlined and efficient. This helped them improve their uptime and security measures. These stories show how using OpenTelemetry in the right way can really pay off for businesses, all while staying cost-effective. Measure ROI and success When you want to see how well OpenTelemetry is working for you, start by setting clear, measurable goals. These could be things like spotting and fixing issues faster, cutting costs, and making your systems work better. Keep track of these goals over time to see how things are getting better. Think about how your team is doing. See how well your DevOps and SecOps teams work together and handle issues now that you have better observability. If they're fixing things faster, that's a good sign. And don't forget to look at the money side of things. Compare how much you were spending on tools and storage before OpenTelemetry and how much you're spending now. Keep checking on all of this regularly to make sure you're still on track with your goals. By keeping an eye on these things, you'll see how well OpenTelemetry is working for you and make smart choices for the future. -- By adopting OpenTelemetry strategically, you'll gain better data insights while strengthening your DevOps and SecOps teamwork. As you start implementing these cost-effective techniques, remember that the real power comes from being adaptable and continuously improving. Keep refining your approaches based on feedback and changing needs to make sure that your observability practices stay relevant and strong. OpenTelemetry is more than just a tool—it's a driver for innovation and efficiency. Focusing on budget-friendly strategies will prepare you to make observability a key part of your operations.

Budget-Friendly Logging

Michelle Artreche — Mon, 14 Oct 2024 17:03:00 GMT

OpenTelemetry has quickly become a must-have tool in the DevOps toolkit. It helps us understand how our applications are performing and how our systems are behaving. As more and more organizations move to cloud-native architectures and microservices, it's super important to have great monitoring and tracing in place. OpenTelemetry provides a strong and flexible framework for capturing data that helps DevOps engineers keep our systems running smoothly and efficiently. I’m going to share some tips for using OpenTelemetry effectively so you can enhance your monitoring practices and make your applications even more reliable. Understanding OpenTelemetry Basics Key Components and Architecture OpenTelemetry is made up of three main parts: the API, SDK, and Collector. The API sets the standard for instrumenting code, allowing developers to create trace and metric data. The SDK implements the API, enabling data collection and export to different backends. It also includes processors and exporters to manage data handling and transmission. The Collector acts as a pipeline that can receive, process, and export telemetry data independently of the application code, providing flexibility in managing data flow. Understanding the architecture of OpenTelemetry helps DevOps engineers set up and configure monitoring systems effectively, ensuring they can capture essential performance insights and diagnose issues swiftly. This modular approach allows for customization and scalability, catering to the diverse needs of modern cloud-native applications. How OpenTelemetry Fits into DevOps OpenTelemetry easily fits into the DevOps system by improving observability, which is an important part of modern application management. In DevOps, keeping an eye on the application and getting quick feedback are crucial for maintaining its health and performance. OpenTelemetry provides a standard way to gather metrics and traces, allowing teams to understand the system's behavior deeply. This helps find problems, troubleshoot, and make the best use of resources. OpenTelemetry works well with different systems, so DevOps teams can choose the tools they prefer for analyzing and visualizing data. Also, it can track applications without using too many resources, which aligns with the DevOps principles of being agile and efficient. By including OpenTelemetry in the CI/CD pipeline, engineers can make sure that telemetry data is always collected and analyzed throughout the development process. This integration supports dealing with incidents proactively and constantly improving, which leads to more reliable and strong software systems. Common Use Cases in Monitoring OpenTelemetry plays a big role in monitoring as it provides valuable insights throughout the software lifecycle. One cool thing it does is distributed tracing, which helps track requests in complex, microservices-based systems. This visibility is super important for finding performance issues and understanding how different services interact. OpenTelemetry also helps monitor application metrics, like response times, error rates, and resource usage, which can help teams spot trends and unusual behavior. These metrics support planning for capacity and fine-tuning performance. Another big thing OpenTelemetry does is infrastructure monitoring, where it collects data from servers, containers, and network components. This gives a complete view that helps keep systems healthy and prevent downtime. Plus, OpenTelemetry's logging feature helps tie logs to traces and metrics, giving a full picture of the system's status. By using OpenTelemetry in these ways, DevOps teams can create stronger monitoring solutions that ultimately lead to better application performance and reliability. Best Practices for Implementation Efficient Data Collection Techniques To get the most out of OpenTelemetry, it's important to use smart ways to collect data. Sampling is a great way to achieve this. It involves capturing a smaller set of traces and metrics, which helps save on storage and makes processing more efficient without losing important insights. Setting the right level of detail for collecting metrics is also helpful. By focusing on key performance indicators and critical paths, DevOps teams can gather useful data while keeping things streamlined. Additionally, using batching, which groups data together before sending it, saves on network usage and speeds up the process. OpenTelemetry's Collector is flexible and can make handling data easier. By configuring it to filter, combine, and alter data before sending it to the backend, the entire data process can be made more efficient. Integrating with Existing Tools When you connect OpenTelemetry with your current tools, you can improve observability without causing disruptions. OpenTelemetry is free to use and works well with popular monitoring and logging platforms like Prometheus, Grafana, and Elasticsearch. To start, identify which parts of your current setup can benefit from better telemetry data. Use OpenTelemetry exporters to send collected data to these tools for smooth data flow and visualization. Also, use existing instrumentation libraries to avoid setting up the same things repeatedly and to maintain consistency across applications. Align OpenTelemetry's features with your current alerting and dashboard systems to keep a unified monitoring approach. Seek help from the community and check the documentation for advice on best practices and integration methods. By integrating OpenTelemetry carefully with your existing tools, you can improve system visibility, simplify monitoring, and manage performance more effectively without having to change your tech setup completely. Ensuring Data Privacy and Security It's important to keep data private and secure when using OpenTelemetry. First, set rules for what data to collect and how to handle it. Use encryption to protect data as it moves and when it's stored so only authorized people can see it. Put in controls and checks to limit data access based on people's roles. If possible, make the data anonymous to protect privacy while still getting useful insights. Keep an eye on who's accessing the data and make sure it's all above board. Also, stay updated on OpenTelemetry's security features and community guidelines to follow best practices. By making sure the data is safe and private, you can make the most of OpenTelemetry while making sure nothing bad happens with the data and following the rules. Optimizing OpenTelemetry Performance Reducing Overhead and Latency When using OpenTelemetry, it's important to minimize overhead and latency to keep your applications running smoothly. Start by being selective about the data you collect. Focus on the key metrics and traces that give you useful insights, instead of gathering everything and causing unnecessary processing. Try using adaptive sampling techniques to adjust the amount of data you collect based on system load or specific conditions, which can help you use your resources more efficiently. Use efficient data batching and queuing methods to cut down on network calls and transmission delays. Also, go for lightweight instrumentation libraries that won't slow down your applications. Set up the OpenTelemetry Collector to process data at the edge, which will take some load off your application servers. Remember to review and adjust your setup regularly to match your changing performance needs. Fine-Tuning Sampling Strategies Balancing data completeness and system performance in OpenTelemetry requires careful adjustment of sampling strategies. First, determine the appropriate sampling rate based on your application's needs and the importance of the monitored services. In low-traffic environments, consider using lower sampling rates to capture more detailed data, while high-traffic systems may require higher rates to manage data volume. Implement dynamic sampling to adjust the sampling rate based on real-time conditions such as system load or specific events. This approach ensures that important traces are captured during impactful incidents without overwhelming your infrastructure. Use head-based sampling for quick decision-making in trace collection or tail-based sampling to ensure the capture of specific, long-duration traces. Regularly review and adjust your sampling strategies based on insights gathered and changing application dynamics. Optimizing sampling strategies will help maintain a strong performance monitoring solution that provides valuable insights without unnecessary data collection overhead. Monitoring and Troubleshooting Tips Remember to effectively monitor and troubleshoot to optimize OpenTelemetry performance. You should begin by creating easy-to-read dashboards that combine important metrics and traces for a quick look at system health. Use alerts to quickly notify teams about any problems or performance issues. When problems come up, use distributed tracing to find bottlenecks and understand how requests move through your system. Look at logs and metrics along with trace data to get a complete view of incidents. Regularly check the data you've collected to see any trends or possible issues before they become big problems. Use root cause analysis to dig into recurring problems. Make sure you keep your monitoring setup updated to match any changes in your application's architecture or dependencies. Implementing OpenTelemetry is a smart move for any organization looking to improve the reliability and performance of their software systems. As cloud-native architectures continue to evolve and become more complex, having strong observability tools is really important. As you start using OpenTelemetry in your workflow, remember the community and available resources are valuable—use them to stay informed and adapt to evolving technology. With OpenTelemetry, you'll be well-prepared to build resilient, high-performing applications that meet the demands of modern digital environments.

OpenTelemetry Tips Every DevOps Engineer Should Know

Michelle Artreche — Fri, 11 Oct 2024 14:56:24 GMT

OpenTelemetry has quickly become a must-have tool in the DevOps toolkit. It helps us understand how our applications are performing and how our systems are behaving. As more and more organizations move to cloud-native architectures and microservices, it's super important to have great monitoring and tracing in place. OpenTelemetry provides a strong and flexible framework for capturing data that helps DevOps engineers keep our systems running smoothly and efficiently. I’m going to share some tips for using OpenTelemetry effectively so you can enhance your monitoring practices and make your applications even more reliable. Understanding OpenTelemetry Basics Key Components and Architecture OpenTelemetry is made up of three parts: the API, SDK, and Collector. The API sets the standard for instrumenting code, allowing developers to create trace and metric data. The SDK implements the API, enabling data collection and export to different backends. It also includes processors and exporters to manage data handling and transmission. The Collector acts as a pipeline that can receive, process, and export telemetry data independently of the application code, providing flexibility in managing data flow. Understanding the architecture of OpenTelemetry helps DevOps engineers set up and configure monitoring systems effectively, ensuring they can capture essential performance insights and diagnose issues swiftly. This modular approach allows for customization and scalability, catering to the diverse needs of modern cloud-native applications. How OpenTelemetry Fits into DevOps OpenTelemetry easily fits into the DevOps system by improving observability, which is an important part of modern application management. In DevOps, keeping an eye on the application and getting quick feedback are crucial for maintaining its health and performance. OpenTelemetry provides a standard way to gather metrics and traces, allowing teams to understand the system's behavior deeply. This helps find problems, troubleshoot, and make the best use of resources. OpenTelemetry works well with different systems, so DevOps teams can choose the tools they prefer for analyzing and visualizing data. Also, it can track applications without using too many resources, which aligns with the DevOps principles of being agile and efficient. By including OpenTelemetry in the CI/CD pipeline, engineers can make sure that telemetry data is always collected and analyzed throughout the development process. This integration supports dealing with incidents proactively and constantly improving, which leads to more reliable and strong software systems. Common Use Cases in Monitoring OpenTelemetry plays a big role in monitoring as it provides valuable insights throughout the software lifecycle. One cool thing it does is distributed tracing, which helps track requests in complex, microservices-based systems. This visibility is super important for finding performance issues and understanding how different services interact. OpenTelemetry also helps monitor application metrics like response times, error rates, and resource usage, which can help teams spot trends and unusual behavior. These metrics support planning for capacity and fine-tuning performance. Another big thing OpenTelemetry does is infrastructure monitoring, where it collects data from servers, containers, and network components. This gives a complete view that helps keep systems healthy and prevent downtime. Plus, OpenTelemetry's logging feature helps tie logs to traces and metrics, giving a full picture of the system's status. By using OpenTelemetry in these ways, DevOps teams can create stronger monitoring solutions that ultimately lead to better application performance and reliability. Best Practices for Implementation Efficient Data Collection Techniques To get the most out of OpenTelemetry, it's important to use smart ways to collect data. Sampling is a great way to achieve this. It involves capturing a smaller set of traces and metrics, which helps save on storage and makes processing more efficient without losing important insights. Setting the right level of detail for collecting metrics is also helpful. By focusing on key performance indicators and critical paths, DevOps teams can gather useful data while keeping things streamlined. Additionally, using batching, which groups data together before sending it, saves on network usage and speeds up the process. OpenTelemetry's Collector is flexible and can make handling data easier. By configuring it to filter, combine, and alter data before sending it to the backend, the entire data process can be made more efficient. Integrating with Existing Tools When you connect OpenTelemetry with your current tools, you can improve observability without causing disruptions. OpenTelemetry is free to use and works well with popular monitoring and logging platforms like Prometheus, Grafana, and Elasticsearch. To start, identify which parts of your current setup can benefit from better telemetry data. Use OpenTelemetry exporters to send collected data to these tools for smooth data flow and visualization. Also, use existing instrumentation libraries to avoid setting up the same things repeatedly and to maintain consistency across applications. Align OpenTelemetry's features with your current alerting and dashboard systems to keep a unified monitoring approach. Seek help from the community and check the documentation for advice on best practices and integration methods. By integrating OpenTelemetry carefully with your existing tools, you can improve system visibility, simplify monitoring, and manage performance more effectively without having to completely change your tech setup. Ensuring Data Privacy and Security It's important to keep data private and secure when using OpenTelemetry. First, set rules for what data to collect and how to handle it. Use encryption to protect data as it moves and when it's stored, so only authorized people can see it. Put in controls and checks to limit data access based on people's roles. If possible, make the data anonymous to protect privacy while still getting useful insights. Keep an eye on who's accessing the data and make sure it's all above board. Also, stay updated on OpenTelemetry's security features and community guidelines to follow best practices. By making sure the data is safe and private, you can make the most of OpenTelemetry while making sure nothing bad happens with the data and following the rules. Optimizing OpenTelemetry Performance Reducing Overhead and Latency When using OpenTelemetry, it's important to minimize overhead and latency to keep your applications running smoothly. Start by being selective about the data you collect. Focus on the key metrics and traces that give you useful insights, instead of gathering everything and causing unnecessary processing. Try using adaptive sampling techniques to adjust the amount of data you collect based on system load or specific conditions, which can help you use your resources more efficiently. Use efficient data batching and queuing methods to cut down on network calls and transmission delays. Also, go for lightweight instrumentation libraries that won't slow down your applications. Set up the OpenTelemetry Collector to process data at the edge, which will take some load off your application servers. Remember to review and adjust your setup regularly to match your changing performance needs. Fine-Tuning Sampling Strategies Balancing data completeness and system performance in OpenTelemetry requires careful adjustment of sampling strategies. First, determine the appropriate sampling rate based on your application's needs and the importance of the monitored services. In low-traffic environments, consider using lower sampling rates to capture more detailed data, while high-traffic systems may require higher rates to manage data volume. Implement dynamic sampling to adjust the sampling rate based on real-time conditions such as system load or specific events. This approach ensures that important traces are captured during impactful incidents without overwhelming your infrastructure. Use head-based sampling for quick decision-making in trace collection or tail-based sampling to ensure the capture of specific, long-duration traces. Regularly review and adjust your sampling strategies based on insights gathered and changing application dynamics. Optimizing sampling strategies will help maintain a strong performance monitoring solution that provides valuable insights without unnecessary data collection overhead. Monitoring and Troubleshooting Tips Remember to effectively monitor and troubleshoot to optimize OpenTelemetry performance. You should begin by creating easy-to-read dashboards that combine important metrics and traces for a quick look at system health. Use alerts to quickly notify teams about any problems or performance issues. When problems come up, use distributed tracing to find bottlenecks and understand how requests move through your system. Look at logs and metrics along with trace data to get a complete view of incidents. Regularly check the data you've collected to see any trends or possible issues before they become big problems. Use root cause analysis to dig into recurring problems. Make sure you keep your monitoring setup updated to match any changes in your application's architecture or dependencies. Implementing OpenTelemetry is a smart move for any organization looking to improve the reliability and performance of their software systems. As cloud-native architectures continue to evolve and become more complex, having strong observability tools is really important. As you start uisng OpenTelemetry in your workflow, remember the community and available resources are valuable—use them to stay informed and adapt to evolving technology. With OpenTelemetry, you'll be well-prepared to build resilient, high-performing applications that meet the demands of modern digital environments.

Using Trace Data for Effective Root Cause Analysis

Michelle Artreche — Mon, 07 Oct 2024 17:31:04 GMT

Solving system failures and performance issues can be like solving a tough puzzle for engineers. But trace data can make it simpler. It helps engineers see how systems behave, find problems, and understand what's causing them. So let’s chat about why trace data is important, how it's used for finding the root cause of issues, and how it can help engineers troubleshoot more effectively. Demystifying Trace Data Understanding Trace Data Basics Trace data is like a timeline of events from different parts of a system. It helps engineers understand how things happen in complex systems so they can figure out why the system behaves the way it does. Essentially, trace data is made up of time-stamped logs showing what the software or hardware components are doing and how they communicate. These logs come from all sorts of places like operating systems, applications, and network devices, and each one gives a unique view of what's going on. By looking closely at these records, engineers can spot patterns, find places where things slow down, and notice anything unusual that might be overlooked with regular monitoring tools. Understanding trace data is all about knowing how it's put together, where it comes from, and what kinds of events it tracks. Mastering this basic knowledge is crucial for using trace data to identify problems and improve systems. Key Benefits for Engineers Trace data offers so many great benefits that help engineers improve their diagnostic abilities. First off, it gives a really detailed view of how systems work, so engineers can see exactly what happened before something went wrong. This level of detail makes it easier to figure out the root cause of a problem instead of just guessing based on incomplete information. Secondly, trace data makes troubleshooting faster by pointing out areas where things aren't working as they should. This means engineers can fix problems more quickly, which reduces downtime and keeps the system running smoothly. Plus, trace data allows for keeping an eye on things in advance so potential issues can be spotted and dealt with before they become big problems. By using trace data, engineers can improve system performance, improve software quality, and enhance user experience. Overall, the information from trace data is super valuable for continuous improvement in engineering processes. Root Cause Analysis Simplified Step-by-Step Analysis Process Analyzing trace data step-by-step can really help make root cause analysis more efficient and accurate. It all starts with collecting data, where I gather all the trace logs from the different parts of the system. Then, I filter the data to focus on the important events, getting rid of any unnecessary info. Once I've got the streamlined data that I need, I can start looking for any patterns or unusual sequences that might be causing issues. Then, I dig deeper into these patterns to find out what's causing the problem, looking at how different parts of the system are interacting. Once I’ve found the root cause, I can come up with specific solutions to fix the problem. Finally, I test out these solutions to make sure they work without causing any new issues. This method really helps me use trace data effectively and makes troubleshooting a lot easier. Common Challenges and Solutions Dealing with trace data can be tricky, but there are a few ways to make it easier. One common problem is having too much data to go through, which can be overwhelming. To tackle this, engineers can use techniques to filter out the most important events, making it easier to see what's going on. Another issue is that trace data can be really complicated, so it's helpful to use special tools to help visualize and analyze it. These tools can do things like recognize patterns and spot unusual events, making the whole process a lot simpler. It can also be tough to make sure that all the trace logs from different sources are accurate and in sync. By using consistent time stamps and logging practices, engineers can make sure that the data is reliable. Lastly, understanding trace data can be hard if you're not used to it. With some training and practice, though, engineers can get the hang of it and use trace data to solve problems effectively. Enhancing Engineering Efficiency Real-World Success Stories Using trace data in root cause analysis has led to big successes in many industries. For example, a top e-commerce platform had issues with its servers going down during busy shopping times. Engineers used trace data and found a problem in the database layer that was slowing down transactions. They made targeted improvements, which not only fixed the downtime problems but also made transactions 30% faster. In another case, a car manufacturer used trace data to figure out why their electric vehicles' control system was failing sometimes. Engineers found a bug in the software that was only triggered by certain conditions. They fixed the bug with an update, which made the vehicles more reliable and made customers happier. These examples show how trace data can lead to big improvements by helping solve problems accurately. These success stories show how valuable trace data is in making engineering more efficient and creating stronger and more reliable systems for all kinds of uses. Future of Trace Data in Engineering The future of trace data in engineering looks really exciting! With advancing technologies, we can look forward to even more advanced solutions for monitoring and analyzing systems. As systems become more complex and interconnected, there will be a greater need for precise diagnostic tools. Trace data will be super important in addressing these challenges by giving us deeper insights into how systems behave and interact with each other. Plus, with the progress in machine learning and artificial intelligence, we can expect even better capabilities for analyzing trace data. These technologies can help automate pattern recognition and anomaly detection, making it easier and quicker to figure out the root cause of any issues. And when we integrate trace data with real-time analytics platforms, we'll be able to keep an eye on things and predict and prevent potential failures before they even happen. So, not only will trace data make troubleshooting smoother, but it will also help in creating more resilient and adaptive systems, showing how crucial it is for future engineering projects. Trace data provides engineers with a detailed roadmap to navigate complexities and enhance efficiencies across the board. As you dig into these insights, always keep in mind that the future of engineering depends on our ability to effectively utilize such data. So, keep exploring, keep questioning, and most importantly, keep solving. Happy debugging!

What I Wish I Knew Before Building My First OTel Collector

Michelle Artreche — Tue, 01 Oct 2024 18:56:56 GMT

Starting your journey to build your first OTel Collector can be really exciting, but it can also feel a bit overwhelming. OpenTelemetry, or OTel, is an amazing tool that can help standardize the collection of observability data, but it's normal to feel a bit lost at first. There are lots of little details and best practices that can make the whole process easier, but many of us end up learning them the hard way. I’m going to explore some important tips and insights that I wish I had known before I started working with OTel Collectors. Hopefully, these will help you avoid some common issues and set up a strong observability system with confidence. Understanding the Basics What is OpenTelemetry? OpenTelemetry, often abbreviated as OTel, is an open-source observability framework designed to provide a standardized approach to collecting telemetry data. This data includes logs, metrics, and traces that offer critical insights into the performance and behavior of your systems. OpenTelemetry provides a consistent way to gather this data, making it easier for developers to track the performance of their applications across different environments and technologies. It's part of the Cloud Native Computing Foundation, which supports projects that help with cloud-native environments. OpenTelemetry is useful because it lets engineers use a variety of observability tools without getting stuck with just one. This means businesses can choose the best tools for their needs. Understanding this framework is essential for creating an effective OTel Collector and improving system observability. Key Components of OTel OpenTelemetry is made up of a few important parts that all work together to give you a complete view of what's happening in your applications. First, there's the API, which lets developers add code to their applications to collect data about how they're running, like traces, metrics, and logs. Then there's the SDK, which actually puts the API into action, letting you customize how your data is processed and sent out. The OTel Collector is a really important part - it's like a middleman, taking in all that data, getting it ready, and sending it off to the right places so you can see what's going on. There's also something called Semantic Conventions, which makes sure that everyone is using the same names and formats for their data so everything is consistent. Finally, the Protocol sets the rules for how all this data gets sent around, making sure that everything can work together smoothly. Understanding how all these parts work together is super important for getting the most out of OpenTelemetry and making sure you can keep an eye on your applications, no matter where they're running. Why Use OTel Collector? The OTel Collector is super important in the OpenTelemetry ecosystem. It acts as a central point for processing and sending out telemetry data. One main reason I use the OTel Collector is because it separates data collection from data export. So its much easier to make configuration changes and allows more flexibility in choosing or switching between observability platforms. Additionally, the Collector can change and filter data, sending less data downstream and improving system performance. Another benefit is its ability to gather data from different sources, giving a unified view of your system's telemetry. The OTel Collector also improves security by reducing the number of external connections directly from your application, which lowers the risk of attacks. Using the OTel Collector, organizations can streamline their observability process, improve system performance, and keep flexibility in their observability strategy. Setting Up Your First Collector Installation and Configuration Tips When you're getting started with your first OTel Collector, the installation process can vary depending on your environment and needs. But don't worry. I've got some general tips to help you set it up smoothly. First things first, choose the deployment method that works best for you! You can install the Collector as a standalone binary, a container, or use an orchestration platform like Kubernetes. Each option has its own perks, so pick the one that fits your infrastructure. When you're configuring everything, start by defining your data sources and destinations in the Collector’s configuration file. This YAML file spells out how data is ingested, processed, and exported. Make sure to customize the configuration to match your specific requirements, including setting up pipelines for different telemetry data like traces and metrics. It's also important to enable logging within the Collector so you can troubleshoot any potential issues. And don't forget to keep the Collector updated to take advantage of the latest features and security patches. Getting the installation and configuration right sets the stage for efficient data collection and processing. Common Pitfalls and How to Avoid Them The OTel Collector setup can have its challenges, but knowing about common issues can help you steer clear of them. One common problem is getting the Collector's YAML file wrong. To avoid errors in data processing or export, make sure your setup is correct and use available tools to validate it. Another issue is not giving the Collector enough resources. It needs plenty of CPU and memory, especially when handling lots of telemetry data. Keep an eye on resource usage and adjust allocations as needed. Also, don't forget about security measures like securing communication channels with TLS. Always use encryption and authentication to protect data in transit. Incomplete or inconsistent instrumentation can also limit the effectiveness of your observability setup. Make sure to fully instrument your services for accurate insights. By considering these issues and planning ahead, you can create a strong and reliable OTel Collector environment, making your observability strategy even better. Best Practices for Beginners If you're new to setting up an OTel Collector, following best practices can really help make the process easier. Start by diving into the official OpenTelemetry documentation. It's full of helpful insights and examples to guide you through your initial setup. Begin with a simple configuration and gradually add more complexity as you get the hang of things. This approach will help you troubleshoot any issues more effectively. Keep track of changes to your configuration files using version control, so you can always go back to a previous version if needed. Don't hesitate to reach out to the community through forums and discussion groups for advice and solutions from experienced users. It's also a good idea to monitor the performance of the Collector itself to make sure it's running smoothly. And remember to test your setup in a staging environment before deploying it to production. Troubleshooting and Optimization Debugging OTel Collector Issues Debugging issues with the OTel Collector can be a bit tricky, but don't worry. It's manageable with a step-by-step approach. Start by taking a look at the Collector's logs – they provide detailed insights into its operations and can help identify specific errors or misconfigurations. When troubleshooting, make sure the logging level is set to DEBUG, but remember to switch to a less detailed level, like INFO in production, to keep things running smoothly. Next, double-check the configuration file for any mistakes or wrong paths, and use helpful tools if you have them. Take a peek at your network connectivity and firewall settings – sometimes issues here can stop data from getting where it needs to go. You can use tools like curl or telnet to check if everything's connected. If you're missing data or it's not quite right, take another look at how your applications are sending data. By working through these steps, you'll be able to fix common issues and make sure your OTel Collector setup is performing at its best. Performance Tuning Strategies When it comes to getting the best performance from your OTel Collector, there are a few things to keep in mind. First off, make sure your Collector has enough CPU and memory to handle the data it's expected to process. Keep an eye on how your resources are being used and adjust as needed, especially during busy times. You might also want to think about using multiple Collector instances to share the workload more evenly. Tweaking the Collector's settings, like batch size and timeouts, can also help balance performance with resource use. Another good idea is to filter and transform your data at the Collector level to cut down on the amount of data that needs to be processed and stored. This not only makes things run smoother but can also save you money. And don't forget to keep your Collector and its software up to date to take advantage of any performance improvements and bug fixes. Leveraging Community Support The OpenTelemetry community is a valuable resource for anyone working with OTel Collectors, especially when troubleshooting or optimizing setups. Engaging with this community can give you access to a wealth of knowledge and shared experiences. Start by joining online forums, mailing lists, or chat groups where people discuss challenges and solutions related to OpenTelemetry. Platforms like GitHub, Slack, and the CNCF's own channels are excellent places to ask questions, find documentation, and share your experiences. Many community members are experienced engineers who can offer insights into complex issues you might encounter. Additionally, attending webinars, meetups, or conferences focused on OpenTelemetry can help you understand and stay updated on the latest developments. Contributing back to the community by sharing your own findings or improvements can also be rewarding. When you're starting to create your first OTel Collector, it's important to first understand the basic concepts of OpenTelemetry. This involves recognizing the benefits of separating data collection from export and following best practices for the initial setup. By being aware of common issues, making strategic performance improvements, and engaging with the active OpenTelemetry community, you can confidently optimize your observability plan and create a strong system for collecting telemetry data. Keep in mind that the learning curve might be steep, but the knowledge gained will be crucial for maintaining consistent and effective observability across your systems.

How the OpenTelemetry Collector Powers Data Tracing

Michelle Artreche — Wed, 25 Sep 2024 14:59:49 GMT

OpenTelemetry, OTel, is an incredible open-source observability framework that helps you collect, process, and export trace data. It's super valuable for engineers who want to understand their systems better. At the heart of this framework lies the OpenTelemetry Collector, a pivotal component that turns raw traces into useful metrics. Let’s explore the importance of the OpenTelemetry Collector and show you how it makes it easier for engineers to make sense of data. Understanding OpenTelemetry What is OpenTelemetry? OpenTelemetry, or OTel for short, is an open-source framework that’s all about improving observability across complex systems. It provides a standardized approach to collect and process trace data so they can keep their applications running smoothly. OpenTelemetry supports multiple programming languages and fits right into various cloud-native environments. The framework makes it easier to collect traces and metrics. It provides a single API and SDK for capturing data, allowing developers to understand how well the application is working and quickly find problems. By using OTel, your team can turn raw traces into useful metrics to help make better decisions and solve issues faster. Its flexibility and ability to add new features make it a popular choice for organizations trying to improve how they are tracking systems. OpenTelemetry is now an essential part of modern performance monitoring, providing a strong solution for tracing and collecting metrics in distributed systems. Key Components of OTel OpenTelemetry has several important parts that work together to make data collection and observability efficient. First, the API defines the operations that developers can use to create and manage telemetry data. This includes generating traces and metrics to make sure everything is consistent across different platforms. SDKs complement the API by providing implementations that handle data collection and export, making it easier for developers to integrate OTel into their systems. The OTel Collector is another important part, acting as a go-between to receive, process, and export telemetry data from various sources. It helps to transform traces into metrics, which improves system observability. Lastly, semantic conventions standardize how telemetry data is tagged and formatted, making sure that data remains meaningful and easy to understand. Together, these parts create a cohesive framework that helps engineers gain valuable insights from their applications, improve performance monitoring, and make troubleshooting in complex environments more effective. Importance of Traces and Metrics Understanding and improving system performance relies on traces and metrics. Traces provide a detailed view of how requests move through different services, helping identify latency issues and user experience. On the other hand, metrics offer quantitative measurements over time, such as response times, error rates, and resource utilization, to monitor application health and performance. By using both traces and metrics, OpenTelemetry allows engineers to gain a comprehensive view of their systems, pinpointing specific issues and identifying long-term performance trends. This detailed and aggregated data enables proactive management and optimization of complex distributed systems, leading to improved reliability and user satisfaction. OpenTelemetry Collector Overview Role of OTel Collector The OpenTelemetry Collector is an essential part of the OTel ecosystem. It gathers, processes, and sends traces and metrics from different sources to ensure a smooth flow of data from applications to analysis tools. The Collector can receive data from multiple services, standardize it, and then send it to different backends for storage and visualization. This helps engineers centralize their efforts to monitor their systems. The OTel Collector also supports various processors and exporters, offering flexibility in managing and using data. By separating data collection from processing, the Collector improves scalability and resilience, making it easier to handle large volumes of data. Ultimately, the OTel Collector simplifies the monitoring process, allowing engineers to gain deeper insights and maintain system performance effectively. Transforming Traces into Metrics The ability to transform traces into metrics is one of the standout features of the OpenTelemetry Collector. This feature lets engineers create metrics from trace data, giving a high-level view of system performance while keeping the details of individual traces. For example, by looking at trace data, the Collector can make metrics like average response time, error rates, and request counts. These metrics are very useful for keeping an eye on how your applications are doing over time. The process involves putting together and summarizing trace data to make useful metrics that can be easily seen and studied. This change helps find patterns and trends that might not be clear from just looking at traces. By turning detailed trace information into useful metrics, the OpenTelemetry Collector helps engineers make decisions based on data, improve system performance, and make sure their services work well. Benefits for Engineers The OpenTelemetry Collector has so many benefits for engineers who want to improve how they observe and measure system performance. It offers a centralized platform for gathering and processing telemetry data, making it easier to manage traces and metrics across different parts of a system. This centralization simplifies dealing with various data sources and formats. The Collector can also change traces into metrics, helping engineers understand how the system is behaving and finding and fixing performance issues quickly. Its flexible design supports many different processors and exporters, allowing engineers to customize how they handle data. By using the Collector to handle data collection and processing, engineers can focus more on developing and improving their applications. This not only makes them more productive but also improves the reliability and performance of the systems they work with. Ultimately, the OpenTelemetry Collector gives engineers the tools they need to effectively observe and manage their systems. Implementing OpenTelemetry Collector Setting Up the Collector To set up the OpenTelemetry Collector, follow these simple steps to efficiently collect and analyze telemetry data. First, download the Collector binary for your operating system from the official OpenTelemetry repository. Next, configure the Collector using a YAML file, which defines the receivers, processors, and exporters you want to use. This configuration file is important as it determines how the Collector handles incoming telemetry data. After configuring the components, start the Collector process by executing the downloaded binary with the configuration file. It's recommended to test the setup in a development environment to ensure data flows as expected before deploying it in production. Additionally, monitor the Collector's performance and resource usage to ensure it scales with your system's demands. Following these steps will help you effectively implement the OpenTelemetry Collector and make the most of its capabilities for enhanced observability. Best Practices for Data Tracing When using OpenTelemetry for data tracing, it's important to follow some best practices to make sure everything runs smoothly. First off, identify the main operations and transactions in your system that need tracing, focusing on the important paths and potential bottlenecks. This way, you'll get the most relevant data without overwhelming your system with unnecessary traces. It's also a good idea to use clear and consistent names and tags for your traces to make them easy for everyone on your team to understand. And don't forget to set up the OpenTelemetry Collector to filter out any irrelevant or redundant data, which will help save storage and processing resources. Keep reviewing and updating your tracing strategy to keep up with any changes in your system and performance goals. Lastly, make sure to integrate your tracing data with visualization tools to create real-time dashboards for insights into your system's performance. By following these practices, you'll be able to get the most out of data tracing, improve system observability, and make continuous performance enhancements. Common Challenges and Solutions Implementing OpenTelemetry Collector can present several challenges, but understanding these issues and their solutions can make it easier. One common challenge is the complexity of configuration, especially when dealing with multiple receivers, processors, and exporters. To address this, start with a basic configuration and gradually add components while testing each step. Another issue is managing large volumes of telemetry data, which can strain resources. To handle this, sampling strategies should be implemented to reduce data volume and the Collector should be configured to filter out unnecessary traces. Integrating with existing monitoring tools can also be difficult due to compatibility issues. Make sure to use compatible versions of software and follow integration guides provided by OpenTelemetry. Additionally, maintaining consistent trace and metric data across distributed systems can be tough. Use semantic conventions and ensure all services adhere to the same tracing standards. By proactively addressing these challenges, engineering teams can effectively leverage the OpenTelemetry Collector for better observability and system performance.

How Telemetry Data Can Improve Your Operations

Michelle Artreche — Mon, 23 Sep 2024 20:44:29 GMT

Telemetry data, at its core, is all about transmitting real-time information from remote sources to centralized systems for analysis and action. This data is super important across different industries due to its ability to provide immediate, actionable insights that enhance operations and strategic decision-making. Understanding Telemetry Basics What is Telemetry? Telemetry is the process of automatically collecting, transmitting, and analyzing data from distant sources. It helps businesses keep track of different aspects of their operations, like machine performance and even environmental conditions. This data is sent to a central system for analysis so that decisions and adjustments can be made quickly to improve efficiency. Imagine having a complete view of your operational processes in real-time; that's the power of telemetry. Originally used in aerospace and telecommunications, telemetry is now being used in various industries, providing valuable insights into system behaviors and trends. By integrating telemetry data into your operations, you can identify bottlenecks, predict maintenance needs, and reduce downtime. This approach is not just for tech giants; small businesses can also use telemetry to improve their daily functions and increase productivity. It’s a useful tool for any organization looking to stay competitive. Evolution of Telemetry in Operations Telemetry has changed a lot over time and is now used in many different industries. As I said earlier, it was first used to watch spacecraft and satellites in fields like aerospace. But now, it's used in areas like logistics, manufacturing, and healthcare. Telemetry helps these industries by giving them real-time information so they can work better. New telemetry systems use smart sensors and IoT devices to give detailed insights into how everything is running. They can follow things like machine health and the environment, helping businesses plan better. Telemetry tech is still getting better, and soon it will offer even more advanced tools to make operations smoother and increase productivity. All this shows how important telemetry data is for shaping the future of high-quality operations. Key Benefits of Telemetry Data Telemetry data has so many benefits for businesses looking to improve operations. It provides real-time insights, giving companies the opportunity to quickly identify and address issues, reducing downtime and preventing disruptions. Telemetry data also gives a clear picture of system performance so that businesses can make well-informed decisions. Predictive maintenance is another advantage, as it allows companies to anticipate equipment failures, saving on maintenance costs and extending asset lifespan. Telemetry data highlights inefficiencies and bottlenecks, leading to targeted operational improvements. This data-driven approach boosts productivity and helps with better resource allocation. Overall, integrating telemetry data into operations empowers businesses to be more agile, responsive, and competitive. Implementing Telemetry Systems Choosing the Right Tools Choosing the right telemetry tools is extremely important for successful system implementation in your operations. The tools you choose can have a big impact on the quality and usefulness of the telemetry data you collect. When making your choice, think about things like scalability, integration capabilities, and ease of use. Scalability ensures that the system can grow with your business and handle increasing data volumes. Integration capabilities allow the telemetry system to work smoothly with your existing infrastructure, minimizing disruption. And remember to make sure the tools have user-friendly interfaces and functionalities to make it easy for your team to use. It's important to evaluate what kind of data the tools can collect and make sure it aligns with your specific operational goals. Cost is another consideration, but it should be balanced with the value that the tools provide. By analyzing these factors carefully, businesses can choose telemetry tools that not only meet their current needs but also support long-term strategic goals. Integrating Telemetry with Existing Systems Integrating telemetry systems with your current infrastructure is crucial for using telemetry data effectively. The first step is to ensure a smooth flow of information across all platforms, making it easier to see and make decisions. Start by checking your current systems thoroughly to find any compatibility issues. This will help you choose telemetry solutions that can work well with your existing technologies, minimizing disruptions. You should also think about using APIs and middleware tools that make it easier to share data between different systems. It's important to involve IT specialists early on to deal with technical challenges and make sure there are good security measures in place. Also, integrating telemetry data with existing analytics platforms can help you understand and report on data more effectively, giving you a better view of how things are working. Ultimately, successful integration will help your organization turn raw telemetry data into useful insights, leading to improvements in productivity. Overcoming Common Telemetry Challenges Implementing telemetry systems can present several challenges, but with thoughtful strategies, these can be effectively managed. A simple way to cover these challenges is to consider the following when implementing telemetry systems: Data overload: Sorting through large amounts of data can be overwhelming. To address this, clear data priorities have to be established, and advanced analytics tools must be used to filter and interpret the most relevant information. Data security: It's important to protect sensitive data against breaches and unauthorized access. Robust security protocols are essential when collecting and transmitting large volumes of data. Resistance to change: Encourage a culture of openness and provide comprehensive training to highlight the benefits of telemetry in enhancing operational efficiency. This can help overcome resistance to adopting new systems. Technical compatibility: Integrating telemetry with legacy systems may pose difficulties. To mitigate this, consider leveraging middleware solutions and seeking expert guidance for seamless integration. Analyzing Telemetry Data Data Collection Techniques Successful analysis of telemetry relies on gathering data effectively. There are different techniques for gathering telemetry data. Each technique can be suited to different operational needs. One common technique is using sensor machinery to monitor and report on performance metrics in real time. Another method involves monitoring data flow and network performance to identify bottlenecks and security threats. Log file analysis is also widely used to detect anomalies and trends over time. Additionally, using cloud-based telemetry solutions can make data more accessible and enable more comprehensive data analysis. Choosing the right data collection technique depends on your specific goals and infrastructure. Be sure your data collection methods match your goals to gather relevant and useful telemetry data. Interpreting Telemetry Insights Fully interpreting telemetry data is important for turning raw data into actionable strategies. The first step is to visually represent the data using graphs and dashboards. This makes complex data easier to understand by showing patterns and anomalies quickly. It’s important to remember to focus on the key performance indicators (KPIs) that align with your business goals. This will help you quickly narrow down the analysis to the most impactful data points. Advanced analytics tools, such as machine learning algorithms, can further improve understanding by finding hidden connections and predictive trends. It's important to regularly review these insights with different teams to encourage collaborative problem-solving and innovation. Setting up automatic alerts for important metrics can ensure a quick response to significant changes. When your organization can properly interpret telemetry insights, they’ll be set to make better business decisions, cut costs, and improve performance. Utilizing Telemetry for Decision Making Leveraging data from telemetry can greatly improve how well a company operates. Telemetry data shows what's happening in the business right now, so decisions can be made quickly and wisely. By keeping an eye on important numbers all the time, your organization can catch problems early and deal with them before they become big issues. This data helps decide which maintenance tasks are most important based on how equipment is doing, so resources can be used well and all unnecessary costs can be cut. Also, plans for the future can be based on data from telemetry, showing patterns that help make better long-term business plans. For example, knowing when the busiest times are and how resources are used can help decide on staffing and how much inventory to keep. Using tools that predict the future can make decision-making even better by showing what might happen. This way of doing things makes sure decisions are based on facts, not guesses, and it helps make the organization better all the time. Enhancing Operational Efficiency Real-Time Monitoring Advantages Real-time monitoring offers many benefits that help operations run more smoothly. It allows businesses to quickly access data and respond to changes and issues. This helps prevent small problems from becoming big ones, reducing downtime and keeping productivity high. Real-time data also helps track equipment performance and identify underutilized resources, leading to better resource management. Businesses can adjust operations based on current conditions, leading to improved efficiency. Real-time insights provide ongoing feedback about operational processes, helping businesses refine strategies for continuous improvement. This type of monitoring also ensures consistent and reliable delivery of services and products, enhancing customer satisfaction. Predictive Maintenance with Telemetry Predictive maintenance, powered by telemetry data, changes how businesses manage equipment and machinery. Unlike traditional maintenance strategies, which usually rely on scheduled checks or reactive repairs, predictive maintenance uses real-time data to predict potential equipment failures before they occur. Telemetry systems collect and analyze data from sensors in machines collect and analyze data; this constant flow of information helps organizations spot small changes that may indicate wear or potential failure. By dealing with these issues proactively, companies can prevent unexpected downtime, lower repair costs, and make their assets last longer. Predictive maintenance also improves safety by making sure equipment operates within safe limits, reducing the risk of accidents. Using this approach supports more sustainable practices by reducing waste and resource use. Optimizing Resource Allocation Using data from telemetry is important for allocating resources and making sure they are used to improve operations. It provides detailed insights into how resources are used, allowing businesses to find inefficiencies and make informed adjustments. For example, real-time data can show when machinery or equipment is not used much, so managers can move these assets to areas where they are needed more. This makes sure resources are used well, reducing waste and increasing productivity. Telemetry data also helps manage the workforce by analyzing patterns in task completion and work distribution. This can help plan shifts and make assigning tasks much easier. Also, telemetry data can improve inventory management by tracking stock levels and finding consumption trends, so stock can be replenished better and excess minimized. Future of Telemetry in Operations Emerging Technologies in Telemetry The future of telemetry in operations is being shaped by several emerging technologies that promise to enhance its capabilities and applications. One such technology is the Internet of Things (IoT), which is making it possible to gather data from more devices and places by using sensors. This helps us gather more data and make better decisions. Another new technology is edge computing, which processes data closer to where it is created. This makes it faster to respond to the data and to analyze it in real time. Also, improvements in artificial intelligence and machine learning are making it easier to predict things using operational data so we can make better decisions. Blockchain technology also has the potential to make operational data more secure and reliable. As these technologies get better, we will be able to use operational data for more advanced purposes. Telemetry Trends to Watch As telemetry continues to evolve, we want to point out several emerging trends that could reshape its role in operations: 1. Artificial intelligence and machine learning are being integrated into telemetry systems. This helps analyze large amounts of data quickly and provides better insights and predictive analytics. 2. Edge computing is altering how telemetry data is processed, allowing for faster real-time responses by handling data closer to its source. 3. There's a growing focus on cybersecurity to protect sensitive data within telemetry systems. 4. Telemetry is also being used for sustainability efforts, helping monitor and reduce environmental impact. It's important for organizations to stay updated on these trends to make the most of telemetry in operations. Preparing for Telemetry Advancements To take full advantage of upcoming improvements in telemetry, organizations need to get ready. First, check existing systems to make sure they can work with new technologies like AI, IoT, and edge computing. Invest in scalable infrastructure to make it easy to adopt new telemetry features as they come out. It's also important to train staff so they know how to use advanced telemetry tools. This could mean workshops or ongoing training focused on data analysis and technology. Don’t foget you’ll need to have a good plan for managing data as telemetry data gets bigger and more complex. This includes putting strong cybersecurity measures in place to protect sensitive information. Finally, foster a culture of innovation, so your teams are always looking for new ways telemetry can improve operations. By doing all this, businesses can be sure they can make the most of telemetry improvements in the future. Integrating telemetry data into your operational processes brings a ton of benefits! It helps you make real-time, data-driven decisions that can totally change how your organization works. By using telemetry, businesses can cut downtime, manage resources better, and predict maintenance needs more accurately. Getting immediate insights and acting fast not only makes things run smoother but also helps your company stay competitive in a fast-changing market. Embracing telemetry isn't just about keeping up with new technology — it's about making your operations smarter and more strategic for long-term success. Think of this as an investment in future-proofing your business, unlocking potential that will keep your organization flexible and ahead of the competition.

What are Connectors in OpenTelemetry?

Dan Jaglowski — Thu, 19 Sep 2024 13:28:55 GMT

Why the OpenTelemetry Collector? The OpenTelemetry Collector is a powerful tool for processing different types of telemetry data, such as metrics, traces, and logs, all in one place. This is important because traditional observability tools often require separate toolchains, which can be inconvenient and inflexible when changes are needed. Understanding the Pipeline Architecture At the heart of the Collector is a pipeline architecture where individual components manage and process telemetry data. It's important to differentiate between managing telemetry—routing, merging, and replicating data streams—and processing telemetry—filtering, annotating, transforming, etc. Introducing Connectors While the Collector was already a powerful tool for processing telemetry, it had limitations in managing data streams efficiently. Enter connectors—a feature set that overcomes these limitations and introduces new ways to manage telemetry within the Collector, while also supporting backward compatibility with the existing architecture. Connectors are a bridge between telemetry pipelines. They can forward and replicate data from one pipeline to another, condense detailed telemetry streams, and ultimately help you get the right data to the right place. If you want a crash course on the OpenTelemetry Collector and the capabilities of connectors, I think you’ll find the following talk helpful. Data Types and Pipelines In the Collector's pipeline architecture, it's important to clearly define the different data types. Each pipeline should handle just one type of data, whether it's logs, metrics, or traces. Receivers, processors, and exporters all have specific roles within the data pipeline, which helps keep things organized and easy to understand. Overcoming Limitations Now, with the introduction of connectors, the Collector supports the sequencing of data pipelines, conditional data flow, and correlated data processing. This framework allows for more advanced telemetry management than ever before. The OpenTelemetry Collector's connectors framework marks a significant step forward, providing a generalized system for telemetry processing. The result is a more powerful tool for observability, enhancing the management of telemetry streams in diverse and dynamic environments.

How OpenTelemetry is Transforming Observability

Michelle Artreche — Wed, 18 Sep 2024 13:35:39 GMT

What is OpenTelemetry? The OpenTelemetry project is changing how organizations approach observability. It aims to standardize monitoring across different systems. OpenTelemetry—commonly referred to as OTel—provides APIs, SDKs, exporters, and collectors. It is making data collection, analysis, and utilization more efficient, leading to better decision-making and technology adoption. Let’s jump in together and start exploring OpenTelemetry’s true impact on observability, highlighting its contributions, industry implications, and the excitement it generates among users and developers. Introduction to OpenTelemetry Impact on observability The impact of OpenTelemetry on observability is significant. It provides a unified approach to monitoring different systems by standardizing how telemetry data is collected and processed. This bridge between multiple platforms and services simplifies the landscape for engineers, allowing them to focus on innovation rather than integration challenges. OpenTelemetry offers APIs, SDKs, exporters, and collectors, creating a comprehensive toolkit that supports various environments, from cloud-native applications to legacy systems. Helping organizations achieve higher visibility and insight into their operations, leading to better decision-making and improved system reliability. The collaborative nature of OpenTelemetry encourages broad adoption across industries, setting a benchmark for others to follow. This collective effort enhances observability practices and drives industry-wide progress toward more efficient monitoring solutions. Industry Standards Evolution OpenTelemetry is a reference framework that is open-source and vendor-neutral. Encouraging collaboration across stakeholders to improve monitoring practices by standardizing them. This collaboration breaks down the silos typically associated with proprietary monitoring solutions. As more organizations adopt OpenTelemetry, it becomes a standard, leading to better compatibility and reduced complexity in integrating different systems. The development of these standards reflects the increasing need for a consistent way to manage and analyze telemetry data. OpenTelemetry's influence extends to shaping best practices and promoting innovation within the observability field. This evolution allows engineers to use a consistent set of tools and methods, promoting a community-driven approach to addressing complex observability challenges. Community of Providers and Customers OpenTelemetry has a strong community of contributors and users, working together to develop and adopt the initiative. This diverse community includes technology vendors, cloud providers, and end-users, who play each play a role in driving the project forward. By being part of OpenTelemetry, stakeholders can ensure that their needs are met and also influence the project's direction. This collaboration creates a robust ecosystem that benefits everyone involved. For providers, OpenTelemetry provides a platform to seamlessly integrate their services, adding value to what they offer. Customers, on the other hand, gain access to a standardized set of tools that make it easier to implement observability solutions across different platforms. The collective efforts of this community not only accelerate tech advancements but also make advanced monitoring capabilities accessible to organizations of all sizes. This ensures that organizations can benefit from cutting-edge observability practices, leveling the playing field. The Value of OpenTelemetry Industry-Wide Impact OpenTelemetry has a big industry-wide impact, helping to standardize observability in a previously fragmented landscape. By providing a common way to collect, process, and share telemetry data, making it easier for different systems and tools to work together. This helps service providers and customers by making it simpler to integrate and monitor systems. Service providers can offer more flexible solutions to their clients, and customers can more easily implement observability. This standardization also encourages more consistent and reliable monitoring practices across industries, benefiting individual businesses and the observability industry as a whole. Customer Benefits OpenTelemetry makes it easier and more efficient to make decisions by providing a standard way to observe what's happening in a system. This means it's easier to use multiple monitoring tools and systems together. By making things simpler, organizations can set up better ways to observe what's happening more quickly and with less work. This lets customers focus on using the data they collect to make better decisions about how their systems are working and how to make them better. OpenTelemetry doesn't favor one organization over another, so organizations can use the best tools they can find without being tied to one provider. This flexibility helps companies use new technology and improve their observability practices as they need to. OpenTelemetry also reduces the risks that come with using monitoring tools that are owned by specific companies, and it adapts to changes in industry standards. Because of this, customers get better information about how their systems are working, helping them improve their organization's performance and competitiveness. Data Analytics Advantage OpenTelemetry makes it easier to collect and analyze data, which helps organizations gain better insights and improve their operations. It standardizes data collection, allowing for more accurate trend analysis and anomaly detection. OpenTelemetry easily integrates with popular data processing and analytics frameworks like Prometheus and Grafana, making it simpler to turn raw data into valuable insights. This integration reduces the time and resources needed to extract meaningful information from telemetry data. By using data-driven strategies, organizations can improve their services, enhance user experiences, and maintain reliable systems. OpenTelemetry's data analytics benefits translate into making more informed strategic decisions and gaining a stronger competitive position in the marketplace. Excitement Around OpenTelemetry Front End Users’ Perspective From the front-end users’ perspective, OpenTelemetry represents a paradigm shift in how observability should be implemented. It helps us understand how our applications are performing. This is important for developers who want to improve the user experience by finding and fixing issues in real-time. With OpenTelemetry, you can monitor user interactions and see how our applications work in different situations. This is super helpful for front-end engineers who want to make sure their interfaces are smooth and responsive. By using OpenTelemetry, they can spot performance problems before they affect the user experience, so their applications become more reliable. This is exciting because it means you can improve performance without causing extra problems, making our applications even better than users expect. OpenTelemetry gives front-end developers the tools to consistently deliver high-quality digital experiences. Industry Transformation OpenTelemetry is a driving force behind the transformation of the observability industry. With the backing of over 300 companies, it is a collective effort to redefine how monitoring works in different environments. This change means businesses no longer need to build basic technologies from scratch. Instead, they can focus on creating unique features using OpenTelemetry's standard framework. This move encourages innovation because companies can devote resources to enhancing their specific offerings rather than duplicating core functions. The shift also affects how organizations approach system monitoring and performance optimization. By using OpenTelemetry, organizations get access to a flexible, scalable platform that supports different needs and environments. This adaptability is essential in a rapidly changing technological environment, ensuring businesses can maintain strong observability practices. Limitations and Risks of OpenTelemetry Challenges of Collaboration Collaborating on a project as ambitious as OpenTelemetry has several challenges. One issue is aligning the interests and priorities of a diverse group of stakeholders, including different vendors and independent contributors. This can cause conflicts or delays in decision-making because everyone needs to agree. Also, managing contributions from many groups needs strong governance to keep the project focused and doable. Another challenge is making sure the code stays good quality and secure, since so many people are working on it. We need to watch over contributions and engage with the community. Plus, as the project grows, there's a risk of adding too many features, which could make development and use more complicated. It's really important to deal with these challenges to keep OpenTelemetry growing and successful. Industry Dynamics While OpenTelemetry is a significant step forward for observability, it does not serve as a full-service solution for all monitoring needs. The dynamics of the industry require continuous integration and collaboration among different tools and platforms to provide comprehensive observability. OpenTelemetry sets the groundwork for standardization, but businesses still need to address specific use cases that may need additional tools or custom solutions. The diverse technological landscape means that organizations must navigate many options to fully achieve their observability goals. This can lead to complexity in integrating OpenTelemetry with existing systems, especially in environments with legacy infrastructure or specialized requirements. As we see the observability space evolve in real-time, new technologies and methods may emerge, which would require further adaptation and enhancement of OpenTelemetry. So, while OpenTelemetry is helpful, it emphasizes the ongoing need for innovation and cooperation in the observability ecosystem to meet the different and changing needs of the industry. The Role of BindPlane Contributions to OpenTelemetry observIQ and BindPlane play an important role in the OpenTelemetry project. Making a significant contribution to its development and adoption. One of the main contributions is the creation of the Stanza agent, a powerful tool for collecting and processing logs. This agent improves OpenTelemetry by offering a flexible and efficient way to manage log data, which is essential for full observability. Our team involvement demonstrates the collaborative nature of OpenTelemetry, where different stakeholders share their expertise to improve the framework. By taking part in this open-source initiative, our team helps to drive innovation and ensure that OpenTelemetry remains at the forefront of observability technology. Contributions allow organizations to use advanced telemetry solutions that can adapt to changing needs. Additionally, our active participation in the OpenTelemetry community shows our dedication to creating a strong and inclusive ecosystem that benefits all users and encourages wider adoption across industries. OpenTelemetry is a big step forward in the observability field. It offers a standard way for service providers and customers to work together. This open-source framework makes it easier to integrate different systems and encourages innovation by letting businesses focus on creating unique and valuable capabilities. Being able to efficiently collect, process, and analyze telemetry data helps with making better decisions and managing systems proactively. For customers, this means more reliable systems, a better understanding of performance, and the flexibility to adopt new technologies without being stuck with certain products. OpenTelemetry is a collaborative project, so it keeps evolving to meet industry needs. As organizations focus more on observability, OpenTelemetry is a strong and adaptable solution that helps them improve operations and provide better user experiences.

Best Practices for Multi-Cloud Observability

Michelle Artreche — Wed, 11 Sep 2024 15:12:16 GMT

If The Notorious BIG – the artist behind the iconic song "Mo Money Mo Problems" – had been an IT operations engineer, he might instead have labeled his hit "Mo Clouds Mo Problems." Why? Because the more clouds you have to manage and monitor, the more problems you're likely to run into. For example, when organizations opt for a multi-cloud architecture – meaning one that involves multiple public and/or private clouds – they face cloud monitoring and observability challenges that don't apply in single-cloud environments. That's why multi-cloud architectures should be accompanied by multi-cloud observability strategies. Keep reading for tips on why multi-cloud observability is important, what makes it uniquely challenging, and best practices for devising an observability strategy that conquers these challenges. What is observability? In the cloud and any other type of IT environment, observability is the ability to understand what's happening inside the environment based on external outputs, such as logs, metrics, and traces. In other words, when you observe an IT environment, you collect and analyze data from the environment to infer its internal state. Observability builds on the principles of monitoring, a practice IT teams have long used to measure the health and performance of digital resources. But whereas monitoring focuses on collecting data, observability goes deeper by correlating and analyzing multiple types of data in order to gain comprehensive visibility. The role of observability in multi-cloud operations The ability to analyze data comprehensively makes observability especially important when managing complex systems that involve multiple components, like multi-cloud architectures. Basic monitoring might suffice if you're managing a monolithic app hosted on an on-prem server, but not when you need to support a collection of distributed applications across multiple cloud environments. This is why observability and multi-cloud architectures go hand-in-hand. In most cases, it's virtually impossible to gain reliable visibility into multi-cloud infrastructure and workloads without effective observability tools and practices. Common challenges of multi-cloud observability While a multi-cloud observability strategy is important for virtually any organization that adopts a multi-cloud architecture, implementing multi-cloud observability is not usually easy. IT teams must overcome a number of challenges: Tool diversity: The monitoring and observability tools built into each cloud platform are different, and they typically do not integrate easily with each other or support competing cloud environments. As a result, multi-cloud observability sometimes requires the ability to juggle multiple tools. Disparate configurations: Each cloud platform also has its own Identity and Access Management (IAM) framework and other configuration settings. This is another factor that complicates multi-cloud observability because it means teams must be able to work across disparate configurations and understand the nuances of each cloud they are supporting. Architectural complexity: Multi-cloud architectures involve multiple cloud environments and services. This complexity can complicate observability and troubleshooting by making it challenging to pinpoint root causes. For example, imagine that an application is hosted in one cloud but processes data stored in another cloud. If the app begins experiencing high latency, you'd need to figure out whether the issue stems from a problem with the app itself, the cloud environment that hosts it, the other cloud platform where the data resides or the network that connects the two clouds. Compliance and security challenges: The more clouds you are managing, and the more data you are collecting and analyzing from them, the greater the risk that you'll accidentally expose resources to attack by, for example, storing sensitive data in an insecure location or accidentally applying a configuration that leads to a breach. Best practices for multi-cloud observability There is no "one simple trick" for solving all your multi-cloud observability woes. But there are several best practices that can help streamline the process of observing complex, multi-cloud environments: Standardizing monitoring approaches For starters, organizations should strive to standardize their monitoring tools and processes. For example, they could implement an observability pipeline that uses a standardized observability framework, like OpenTelemetry, to collect data from across all of their environments. This mitigates the challenges of having to rely on disparate tools within each cloud platform to collect observability data. Prefer open source, standards-based solutions More generally, open source, standardized observability frameworks, monitoring tools and data analytics solutions help to simplify multi-cloud observability. This is because they free organizations from becoming locked into cloud platform provider tools that don't integrate with each other or work well across clouds. Centralize data collection and analysis The greater your ability to store and analyze observability data in a central location, the easier it becomes to observe a multi-cloud architecture. This is another area where observability pipelines can help by pulling data from across all of your cloud environments and directing it to a central destination for analysis. Manage data security across clouds When working with complex multi-cloud architectures and data sets, it's critical to build security into the processes used to collect and analyze information. For example, data should typically be encrypted before you extract it from one of your clouds. You could also consider steps like anonymizing or minimizing data while it is in transit. Once again, this is an area where observability pipelines can help. Observability pipelines allow you to apply protections and transformations like data encryption, anonymization and so on while data is moving between cloud platforms – which means you can effectively secure the data even if your data collection tools don't provide these capabilities natively. Choosing the right multi-cloud observability tools When selecting cloud monitoring and observability tools capable of supporting a multi-cloud strategy, look for features like the following: Compliance with open data collection and observability standards, like OpenTelemetry, help ensure interoperability between tools. The ability to integrate with other data collection and analytics tools so that you can collect and interpret data using whichever approach works best for your team. Support for all of the cloud platforms, environments or services you need to manage. The ability to correlate data, not just collect it. Correlation is critical for gaining context on performance issues and getting quickly to root-cause problems. The ability to operate efficiently without consuming excessive amounts of CPU, memory, or disk space. This is important because the more resources your observability tools use, the greater the strain they place on your cloud environments and the higher they're likely to cost you to operate because you typically have to pay for the CPU, memory, and disk that your tools consume. Conquering multi-cloud observability with BindPlane BindPlane offers observability solutions built from the ground up for the complex, multi-cloud world we live in. You can seamlessly collect data from across all of the clouds you use, process it, secure it and analyze it using open, standards-based tooling. Learn more about how BindPlane can help solve multi-cloud observability challenges by requesting a demo.

Common Issues in OpenTelemetry Collector Contrib Configuration

Michelle Artreche — Tue, 10 Sep 2024 13:03:58 GMT

Observability has become essential for efficient system management, and OpenTelemetry is leading the way in this field. The OpenTelemetry Collector Contrib is an important tool for gathering telemetry data, providing developers and IT professionals with a flexible and powerful way to manage observability. We want to help you learn how to set up the OpenTelemetry Collector Contrib. We'll point out common issues and offer effective troubleshooting strategies. Whether you're an experienced developer or a DevOps engineer looking to improve your telemetry data collection, after reading this, you'll find value and be able to make the most of OpenTelemetry in your infrastructure. Understanding OpenTelemetry Collector Contrib What is OpenTelemetry Collector Contrib? The OpenTelemetry Collector Contrib extends the capabilities of the core OpenTelemetry Collector by providing additional components contributed by the community. These components, including receivers, processors, exporters, and extensions, offer a wider range of functionality for collecting and processing telemetry data. This allows developers to customize their observability strategies to better suit their infrastructure needs. By integrating with various telemetry data sources and destinations, it enhances the flexibility of data processing and transmission. The use of OpenTelemetry Collector Contrib optimizes observability setups to accommodate specific requirements, enabling more efficient monitoring and troubleshooting of systems. All of the Components OpenTelemetry Collector Contrib is built around several key components: receivers, processors, exporters, and extensions. They all play key roles in managing telemetry data efficiently. Receivers capture incoming telemetry data from various sources and ensure it is seamlessly ingested into the collector. Processors act on the data in transit, making transformations, filtering, or enhancing the data before it moves to the next stage. Exporters send the processed data to a destination, such as a backend service or a storage system, ensuring the telemetry data reaches its intended endpoint for analysis. Extensions provide additional functionalities that extend the collector’s capabilities beyond data handling, such as health checks or authentication mechanisms. Understanding these components will help you effectively configure and optimize OpenTelemetry Collector Contrib to suit your observability needs better, ensuring robust and reliable telemetry data management. Differences Between Core and Contrib Collector The primary distinction between the OpenTelemetry Core Collector and the Contrib Collector lies in the range of components they offer. The Core Collector provides essential components for data collection and management with minimal dependencies, focusing on reliability and basic observability tasks. While the Contrib Collector includes a broader range of community-contributed components such as specialized receivers, processors, and exporters for more advanced use cases. The Contrib version is suitable for users who need to integrate with a wider range of data sources or require advanced processing capabilities that are not available in the core package. However, it may introduce more complexity and dependencies. Users can choose between core and contrib based on their specific infrastructure requirements and objectives. Setting Up OpenTelemetry Collector Contrib Prerequisites and Installation Before you start installing OpenTelemetry Collector Contrib, make sure your system meets a few requirements. First, you'll need a supported operating system like Linux, Windows, or macOS. You'll also benefit from knowing how to work with YAML to set things up. Your network settings should allow the collector to communicate with the sources and destinations of telemetry data. Once your system is ready, you can install the collector in a few different ways. You could use pre-built software or Docker images. For a software installation, get the latest release from the OpenTelemetry GitHub repository and unzip it into a folder you like. If you're using Docker, pull the OpenTelemetry Collector Contrib image from a container registry. Make sure the collector has the right permissions to access what it needs. After installation, run some basic commands to check that the collector is working properly. This will make sure it's all setup and ready to manage the telemetry data. Configuration Guidelines Configuring OpenTelemetry Collector Contrib involves crafting a YAML configuration file that defines the desired setup of receivers, processors, exporters, and extensions. Start by clearly specifying each component in the file with the correct indentation. Configure receivers first to specify the sources of telemetry data. Then, define processors to manipulate data followed by exporters to transmit data to its final destination. Test each configuration change incrementally to catch errors early. You can use environment variables to dynamically modify configurations for different environments or deployment scenarios. After finalizing the configuration, validate it using built-in tools or commands provided by the OpenTelemetry Collector. Following these guidelines will help tailor the OpenTelemetry Collector Contrib to meet specific observability needs and optimize the flow of telemetry data through systems. Example Setup for Basic Use Case To illustrate a basic setup of OpenTelemetry Collector Contrib, consider a scenario where telemetry data is collected from an application and exported to a backend monitoring system. Start by defining a receiver in the YAML configuration to gather data from the application's telemetry endpoint. For example, use the OTLP receiver if the application exports data in OpenTelemetry Protocol (OTLP) format. Next, configure a processor to batch the incoming data, optimizing it for transmission. This setup reduces network load and enhances efficiency. Finally, an exporter should be set up to send the processed data to a monitoring backend, such as Prometheus or a cloud-based service. Specify the appropriate exporter within the configuration, ensuring the endpoint and authentication details are accurate. This example shows a simple but effective configuration that enables the flow of telemetry data from source to destination, providing a strong foundation for more complex observability tasks in various environments. Best Practices for Leveraging OpenTelemetry Collector Contrib Security Best Practices To keep the OpenTelemetry Collector Contrib setup secure and protect telemetry data, follow these steps: Only allow necessary services and users to access the collector, reducing the risk of threats. Use firewalls and VPNs to secure data while it's being transferred over a network. Make sure each component, like receivers and exporters, is properly authenticated and encrypted. Keep the collector and components up to date with security patches. Regularly monitor the collector's logs and metrics for any unusual activity and set up alerts for any problems. Performance Optimization It's important to optimize the performance of OpenTelemetry Collector Contrib to handle large amounts of telemetry data efficiently. To start, adjust batch sizes and time intervals to balance between throughput and latency, ensuring data is processed quickly without overwhelming system resources. Use processors to filter and aggregate data, reducing unnecessary information. Allocate enough CPU and memory resources for the collector's operations, especially in high-demand environments. Regularly monitor system performance metrics to identify bottlenecks or inefficiencies and make adjustments. Also, consider deploying multiple collector instances to distribute the load across different nodes. These strategies can enhance the responsiveness and efficiency of the OpenTelemetry Collector Contrib, even under heavy telemetry data loads. Customizing and Extending the Collector Customizing and extending OpenTelemetry Collector Contrib allows for the creation of tailored observability solutions that can meet your unique infrastructure needs. Start by identifying specific requirements that the default components can't meet, like custom data processing or integration with proprietary systems. You can add new components or modify existing ones by using the open-source nature of the contrib repository. This may involve developing custom receivers, processors, or exporters using the Go programming language on which the collector is based. If available, use vendor-specific components that align with your observability goals, as they can provide optimized integrations and additional functionalities. Engage with the OpenTelemetry community for guidance and to share your extensions, contributing to a broader ecosystem. By customizing and extending the collector, your organization can improve its telemetry data flow, enhance system insights, and gain a more comprehensive understanding of your operational environment, ultimately leading to better decision-making and system performance. Troubleshooting and Common Issues Common Configuration Errors When setting up OpenTelemetry Collector Contrib, you might come across common configuration mistakes that can disrupt the collection of telemetry data. One frequent error is incorrect YAML syntax, such as wrong indentation or missing colons, which can prevent the collector from understanding the configuration file properly. Always check the YAML using a linter to catch syntax issues early. Errors in the endpoints of receivers or exporters can also cause problems, often leading to failed data ingestion or transmission. Double-check URLs, ports, and authentication credentials to ensure they are accurate. Using mismatched component names in the configuration file can cause undefined behaviors; make sure each component is correctly referenced and compatible with the collector version being used. Verify that environment variables have the correct values and paths, as incorrect settings can disrupt data flows. By addressing these common errors through careful validation and testing, we can promise that the OpenTelemetry Collector Contrib will operate more smoothly, resulting in more reliable observability outcomes. Debugging Tips When debugging issues with OpenTelemetry Collector Contrib, it's important to follow a structured approach to quickly identify and resolve problems. Start by enabling detailed logging in the collector configuration to gain insights into its operations and pinpoint potential errors. Check the log files for error messages or warnings related to configuration or runtime issues. Use the otelcol command line flags to test specific components or data flows. Ensure that the collector is properly connected to telemetry data sources or backends by checking network configurations, firewall rules, and DNS settings. Use tools like curl or telnet to test endpoint accessibility and data transmission. Regularly update the collector and its components to take advantage of bug fixes and improvements. Engage with the OpenTelemetry community forums or GitHub issues for additional troubleshooting advice and support. Community Resources and Support OpenTelemetry Collector Contrib benefits from a vibrant community that offers extensive resources and support for troubleshooting and development. You can start by exploring the official OpenTelemetry documentation, which provides detailed guides and examples for setting up and configuring the collector. Engage with the OpenTelemetry community on platforms like GitHub, where you can report issues, join discussions, and access repositories for the latest updates and bug fixes. The OpenTelemetry Slack channel is another helpful resource. It provides real-time support and allows you to interact with other users and experts. You can also participate in community meetings and webinars to stay informed about new features and best practices. If you’re looking for an OTel 101 guide, I would love for you to check out our OTel Hub. There, you can learn how to master OpenTelemetry with a variety of videos, tutorials, and blog posts that provide everything you need to know.

Managing a custom distribution of the OTel collector with BindPlane

Joe Howell — Tue, 10 Sep 2024 03:32:00 GMT

Exciting news: it’s now possible to build a custom distribution of the OpenTelemetry Collector and remotely manage it with BindPlane. Though not all of BindPlane’s capabilities are available when managing a custom distribution (yet), it’s #prettycool, as it cracks open the door for teams looking to BYOF (bring your fleet), and manage them with our OTel-native telemetry pipeline. This advancement is made possible because of the significant contributions and progress made in the development OpAMP. OpAMP is the not-secret-at-all sauce that enables remote management of an OpenTelemetry Collector; it’s one of the critical components powering BindPlane’s scaled fleet management capabilities. Though we lightly touched on this in our Summer Announcement (also a fun read if you’re a fan of product updates and 16-bit era video game memes), I wanted to do a short walkthrough for users to do this themselves to familiarize themselves with the process as OpAMP’s refinement continues into 2024/25. But first, a quick refresher. What is OpAMP? OpAMP is a protocol designed to manage and configure telemetry agents at scale. It allows centralized control over agent configurations, health monitoring, remote updates, and lifecycle management, making managing agents in distributed environments easier. OpAMP improves the scalability and automation of agent management in modern observability systems. Andy Keller, a Principal Engineer at observIQ, contributed significantly to the design and implementation and gave a great talk with our friend Jacob Aranoff at ServiceNow at Kubecon 2024. Check it out if you’d like a quick deep dive by some of OpAMP’s subject matter experts. How is OpAMP implemented in the OpenTelemetry Collector? OpAMP is implemented in 2 ways in OpenTelemetry Collector: As a Collector extension that provides access to the health and configuration of the collector via OpAMP. As a Collector supervisor that relays the health and configuration of the collector and enables remote management of the collector configuration via OpAMP. Both of these implementations can be used together, providing a solution that includes information about the health of an OTel collector while also opening the door to remote configuration and management. What is the OpAMP Extension? The opampextension is an extension available in the OpenTelemetry Collector that can be configured to contact an OpAMP server, relaying read-only information such as the collector’s description, health, and configuration. It’s self-described as having limited functionality, a subset of OpAMP’s overall capabilities. What is the OpAMP Supervisor? When I mentioned progress previously, I primarily had Supervisor in-mind. The supervisor is a separate process that can run an OTel collector, opening the door to more of OpAMP’s remote management capabilities, such as: Receiving and pushing configuration from an OpAMP backend to an OTel collector Stopping and starting an OTel collector process Restarting the OTel collector in the event of a crash or failure Accepting connection and OTel collector details from the opampextension Updating the OTel collector packages/collector updates The supervisor is still very much in development (you can see the state of each of the key bits of functionality here), which means some significant functionality still needs to be introduced, and existing functionality may change. But it’s certainly ready for tinkering and consumption and ripe for additional feedback from the community. Connecting a Custom Distribution of the OpenTelemetry Collector to BindPlane Pre-reqs A running BindPlane instance and access to your SECRET_KEY and OpAMP server endpoint within your BindPlane config. You can use either of the following links to setup an instance: BindPlane On Prem https://observiq.com/download BindPlane Cloud: https://app.bindplane.com/signup Host/VM to run the custom OTel collector (using the same host as your BindPlane VM for testing purposes also works). Building a Custom Collector Distribution with the OpenTelemetry Collector Builder: First, build your custom collector with the OCB using the following steps: Step 1: Install the collector builder. You can follow the steps provided here: https://opentelemetry.io/docs/collector/custom-collector/#step-1---install-the-builde Step 2: Create a builder manifest with the steps provided here: https://opentelemetry.io/docs/collector/custom-collector/#step-2---create-a-builder-manifest-file or use the sample manifest I’ve created below: A couple notes about this manifest: The opampextension, healthcheckextension, snapshotprocessor are required for remote management with BindPlane. You can read more about Snapshots in BindPlane here. In this example, we’ve included the hostreceiver and the noop exporter for our minimal configuration. Respectively, these map to the Host Source and Dev Null destination in BindPlane. You can swap or add additional config Recently, Michelle Artreche published a great guide on building a custom distribution of the collector using the OCB, which takes you through the end-to-end process in detail--and provides information about some of the other collector distribution options. Installing the Supervisor Next, Install the supervisor on your collector host using the following commands provided below: Run the the following cmd/opampsupervisor directory of the opentelemetry-collector-contrib repository. Then run: Fill in ./local/supervisor-config.yaml with your BindPlane secret key and opamp endpoint. To help expedite, I’ve included a sample of the supervisor-config.yaml below: Running the Supervisor Lastly, start the supervisor using the following command. Since the supervisor's job is to run the collector, starting the supervisor will also launch your collector and connect the collector via the supervisor to your BindPlane instance. Viewing your Managed OpenTelemetry Collector in BindPlane Head over to your BindPlane instance, and you’ll now see your Collector on the ‘Agents’ page of BindPlane. And that’s it! Your custom collector should now appear in BindPlane with a ‘connected’ status, and the collector’s config (seen here as a no-code visualization) is now fully readable, editable, copyable, and deployable to other managed collectors in your fleet. Custom Distributions of the OTel Collector and BindPlane: Stayed Tune As OpAMP progresses, so will the connection between custom distributions of the OTel collector and BindPlane. Though some extra configuration is required to kick the tires today, the end state will be drastically simplified and seamless, something you’d expect from an OTel-native Telemetry Pipeline. If you’d like additional information about OpenTelemetry, OpAMP, or BindPlane, contact our team at info@observiq.com.

Strategies For Reducing Observability Costs With OpenTelemetry

Michelle Artreche — Fri, 06 Sep 2024 15:13:12 GMT

Keeping smooth and safe operations now relies entirely on observability. But as there's more and more data to keep track of, the costs are going up. This makes it hard for your companies to balance how well things are running and their budgets. OpenTelemetry can help by making a standard way to collect and process all the data. We're going to share how OpenTelemetry can save you money on observability and why having too much data can be costly. We'll also provide tips for simplifying your data-tracking system. Understanding Rising Costs Managing and storing telemetry data can become very expensive due to the increasing volume of data. Modern IT environments, especially those using containerized applications like Kubernetes, can create massive amounts of data. This data growth leads to higher storage, processing, and management costs. The complexity of handling various telemetry data streams is another significant cost factor. Organizations often use multiple tools and agents for data collection, requiring specialized knowledge and maintenance. The lack of standardization can lead to inefficiencies and higher operational costs. Also, being locked in with one vendor makes it super hard to switch to a different solution without spending a lot of money on moving. Dealing with large quantities in an inefficient manner, such as using inefficient agents, can lead to higher costs. It's important to be aware of these increasing costs so you can take steps to reduce them. Hidden Costs in Observability Hidden costs often surprise organizations in addition to the obvious expenses of data storage and processing. One hidden cost is the need for specialized skills as telemetry systems expand—leading to increased staffing or training costs. Another hidden cost is the challenge of transitioning to new platforms due to proprietary solutions. Inefficient use of telemetry agents at scale can lead to resource wastage. These hidden costs can undermine an organization's ability to manage its observability landscape efficiently. Identifying and mitigating these hidden costs is imperative for optimizing overall observability expenditure. The Need for Cost Reduction As the amount of telemetry data collected increases, the expenses for storing, processing, and managing it also go up. Without a proactive plan to cut costs, these expenses can get out of hand and put a strain on IT budgets. High monitoring costs can also limit an organization's ability to invest in other important areas like innovation, security, and infrastructure improvements. Implementing strategies to reduce costs can help you get the most value out of your data without overspending. Efficiently managing these costs helps with budgeting and improves operational flexibility. So, focusing on reducing costs helps organizations keep strong monitoring capabilities while being financially responsible. Causes of Rising Observability Costs Increased Management Complexity Dealing with data types from various sources often requires using multiple tools and platforms. Each tool may have its own setup and management needs, making the system hard to oversee. This complexity requires a higher level of expertise, which leads to increased training and retention costs for skilled personnel. Maintaining different systems involves significant work, such as regular updates, problem-solving, and integration efforts. This fragmented approach puts a strain on resources and reduces efficiency because teams have to work with different interfaces and processes. Expertise and Vendor Lock-In Using different tools for collecting and analyzing data requires specific expertise for each system. This can lead to added training and staffing expenses. Being tied to a specific vendor can make switching providers or integrating new technologies hard and expensive. Your organization may end up overpaying for services that no longer suit your needs. One way to reduce these challenges is to use open-source solutions and standardized tools to lower costs and decrease reliance on specific vendors. Quick Overview of OpenTelemetry OpenTelemetry, OTel, is a framework that simplifies collecting and processing telemetry data from different applications. It supports various programming languages and operating systems and allows organizations to choose and switch between observability platforms. It includes components like the OpenTelemetry Collector and instrumentation libraries to automate data collection. Benefits of OpenTelemetry OpenTelemetry has several benefits that make it a good choice for organizations looking to improve their observability strategy. It simplifies the process of collecting and managing telemetry data from different sources through its standardized approach. This reduces complexity and operational costs by eliminating the need for multiple proprietary tools. OpenTelemetry is also vendor-agnostic, allowing businesses to switch platforms or integrate new solutions without incurring significant migration costs. It supports a wide range of programming languages and environments, making it compatible and easy to implement. Organizations can centralize data processing through the OpenTelemetry Collector, streamlining operations and improving data consistency. Its robust community and open-source model ensure continuous improvements and support. OpenTelemetry makes telemetry data handling more efficient and offers a cost-effective solution for modern observability needs. Key Components of OpenTelemetry The OpenTelemetry Collector is the central component of OpenTelemetry. It gathers, processes, and exports telemetry data. It can be used in various settings, like cloud, on-premises, and containerized systems. Another important component is the Instrumentation Libraries. They help automatically generate telemetry data and support many programming languages, making it simpler for developers to add traces, metrics, and logs to their code. The Protocols used in OpenTelemetry set standard data formats, ensuring consistency and reducing the complexity of handling different types of telemetry data. OpenTelemetry also includes SDKs for custom instrumentation, offering flexibility for unique application needs. Together, these parts create a strong framework that makes it so much easier to collect and manage telemetry data. Strategies for Cost Reduction Standardizing Telemetry Ingestion By combining the tools and methods used to collect telemetry data, organizations can make operations less complicated. OpenTelemetry offers a unified way to gather telemetry data, allowing different types of data, such as logs, metrics, and traces, to be collected and processed using a single framework. This standardization removes the need for multiple specialized agents, reducing the complexity of managing them and minimizing the expertise required for this task. It also improves the consistency of the data, making it easier to analyze and draw insights from. With a standardized data-gathering process, organizations can manage their telemetry data more effectively, identify and remove duplicates, and concentrate on the most valuable data. Building a Telemetry Pipeline Constructing a telemetry pipeline reduces observability costs and optimizes data flow. A well-designed pipeline gathers, processes, and directs telemetry data, helping organizations manage large volumes efficiently. When you create a telemetry pipeline using OpenTelemetry, you'll set up the OpenTelemetry Collector as the main processing unit. This collector brings together data from different sources, makes changes to it, and sends it to the right places. By adjusting and adding to the data within the pipeline, organizations can reduce unnecessary data storage and processing costs. A telemetry pipeline can send data to low-cost storage choices (AWS S3, Google Cloud Storage, and Azure Blob Storage), keeping thorough data archives while spending less money. Building a telemetry pipeline with OpenTelemetry allows for efficient data handling, reduced costs, and improved observability in complex IT environments. Leveraging Centralized Management With a centralized management platform, businesses can control all their telemetry agents and settings from one place. This makes management easier, saving time and resources compared to managing agents across different locations. OpenTelemetry supports centralized management using protocols like OpAmp, which allows remote setup and monitoring of telemetry agents. Centralized management helps organizations quickly find and fix problems, improve data flows, and enforce consistent rules across their infrastructure. This reduces the work needed and lowers the chance of setup mistakes. Also, centralized management makes it easier to grow without complicating things. By using centralized management, your business can have more control over its observability setup, streamline operations, and cut the costs of managing different telemetry systems. Practical Implementation Filtering and Reducing Telemetry Organizations can decrease the amount of telemetry data stored and processed by using intelligent filtering mechanisms. OpenTelemetry offers tools to selectively filter data, allowing businesses to focus on important metrics while discarding redundant or low-priority information. This saves on storage and processing costs and improves the clarity and relevance of insights drawn from telemetry data. Reduction techniques like sampling or aggregation further optimize data sets by decreasing their size without compromising critical information. These strategies ensure that only the most important data reaches analytics platforms, making analysis faster and more efficient. Rerouting to Low-Cost Storage Storing less-used data in cheaper storage options can help businesses save money while still keeping important information accessible. OpenTelemetry enables this by allowing companies to route their data in a way that separates real-time data from data that can be stored more affordably. By using cloud-based storage services, organizations can find cost-effective solutions for managing large amounts of data. This helps reduce storage costs and makes sure that data storage meets compliance and retention requirements without the high cost of premium storage. The ability to easily move data from low-cost storage back to higher-tier systems when necessary provides flexibility for deeper analysis or investigations. By prioritizing cost savings while still ensuring data accessibility and compliance, rerouting to low-cost storage strikes a balance. Managing Agents at Scale OpenTelemetry offers solutions to make this process smoother through centralized configuration and control. By using tools that help with remote management, organizations can monitor thousands of agents from one interface, ensuring consistent configurations and quick deployment of updates. This centralized approach reduces the workload and minimizes the risk of configuration errors that can cause data inconsistencies or security issues. Automation plays a crucial role in managing agents at scale, ensuring optimal functionality, and maintaining a high level of observability across infrastructures. Advanced Techniques for Cost Management Intelligent Agent Management Intelligent agent management involves leveraging advanced techniques and tools to make telemetry agents work better—saving money and improving performance. You can do this by using automation and machine learning to watch and adjust how the agents work. Intelligent management systems let you change the number of agents you use based on your current needs. This means using fewer resources when you don't need as many agents. You can also use predictive analytics to predict and prevent problems so the agents work well and don't stop working. Intelligent systems also help you track how the agents are doing and quickly fix any problems. Using intelligent agent management can improve your observability systems while still controlling costs and keeping them reliable. Real-Time Problem Detection By using real-time analysis and monitoring tools, businesses can see the health and performance of their systems right away. This proactive approach involves always looking at telemetry data for anything unusual. Advanced algorithms and machine learning can make this more accurate so systems can predict problems before they happen. Real-time detection makes response times faster while reducing how much problems affect operations and customer experience. It also helps systems run well and saves money by preventing downtime and wasted resources. Strong real-time problem detection is important for managing complex IT systems and ensuring everything runs smoothly. Eliminating Unnecessary Telemetry To save money and make data processing faster, businesses should be smart about the telemetry data they collect. You can do this by setting clear rules for what data to collect and getting rid of anything that's not important. Use advanced filters to get rid of unnecessary data before processing it, which can reduce costs. Regularly check the data collection process to find ways to improve it. By collecting only the most important data, businesses can save money and make faster, better decisions. We discussed strategies for reducing observability costs using OpenTelemetry. Standardizing telemetry ingestion simplifies data management and reduces complexity. Building a telemetry pipeline centralizes data collection, processing, and storage, optimizing resource use and cost. Leveraging centralized management streamlines operations by offering a single control point for managing telemetry agents, enhancing efficiency, and reducing errors. We also talked about advanced techniques such as intelligent agent management and real-time problem detection, which improve system performance and cost management. Practical implementations like filtering and reducing telemetry, along with rerouting data to low-cost storage, were demonstrated as effective strategies for managing telemetry data economically. These key points underscore the importance of strategic planning and implementation in optimizing observability frameworks. As your organization moves forward, applying these insights will be essential in maintaining strong, cost-effective observability systems.

How Data Observability is Transforming Modern Enterprise

Michelle Artreche — Wed, 04 Sep 2024 14:03:59 GMT

Modern enterprises are more dependent than ever on data. That's why it's more important than ever for organizations to ensure that their data is accurate, reliable, and easily accessible. Data observability is a modern method that helps achieve this. It involves real-time monitoring of data to detect unusual patterns. By doing so, it ensures data quality and reliability, which boosts operational efficiency and governance. We’re going to explore the five pillars supporting its framework and explain the valuable benefits it offers to data-driven enterprises. Let's uncover the essentials of data observability and its essential role in modern enterprise data management. Introduction to Data Observability Importance in Modern Enterprises Data observability is really important for modern enterprises. It helps make sure that the data quality is good and operations run smoothly. Data-driven decisions shape business strategies, so the reliability of data is critical. Data observability lets enterprises monitor their data in real-time to quickly spot any problems and maintain data reliability. This proactive approach reduces data downtime, which can be costly and disruptive. Furthermore, data observability enhances data governance, ensuring compliance with regulations and standards. It also supports good data management by giving insights into how data moves through systems, helping enterprises fix problems and make data pipelines work better. As we start to see more and more enterprises rely on data, having solid data observability practices becomes a strategic necessity to keep the data system running smoothly and efficiently. Enabling Enterprise Data Management Data observability plays a pivotal role in effective enterprise data management. It provides a complete view of data workflows and pipelines, allowing organizations to stay in control of their data. These types of insights help track data lineage and understand how data moves and changes across systems. Transparency like this helps to better identify and fix issues within data processes. Remember that data observability also helps manage metadata, which is crucial for maintaining data quality and consistency. Real-time monitoring and abnormality detection capabilities further improve data reliability by addressing issues promptly. These types of insights help IT folks and data engineers keep their data systems strong and running smoothly. Ultimately, data observability help enterprises optimize their data management and makes sure they can use that data as a reliable asset for decision-making. What is Data Observability? Data observability involves closely monitoring, understanding, and managing the health of data systems in real-time. It's all about making sure the data is high quality, reliable, and well-governed. By using real-time monitoring and anomaly detection, data observability helps identify data issues like downtime and inconsistencies, ensuring data reliability. It involves tracking data lineage to understand how data changes across the pipeline, making sure that data changes and movements are clear and traceable. This approach is essential for enterprises looking to maintain robust data systems. By implementing data observability, enterprises can avoid operational disruptions and make better data-driven decisions. Where did data observability come from? Data observability evolved from traditional data monitoring and management to adapt to the complexity of modern data systems. Initially, organizations were focused on data testing and quality checks at specific points in the data pipeline. As data environments grew more complex, a more integrated approach called data observability emerged, drawing inspiration from software observability in DevOps. Data observability now includes advanced capabilities like anomaly detection, data lineage tracking, and comprehensive metadata management to ensure data reliability and governance at scale. Today, its recognized as an essential part of enterprise data management, maintaining high data quality, even when things are constantly changing. Related Content: What Is An Observability Pipeline, Anyway? The Five Pillars of Data Observability Freshness Freshness refers to how up-to-date the data is. Data observability tools monitor data pipelines to ensure that the data is current, alerting teams to any delays. Timely data is essential for supporting business operations and strategic initiatives. Quality Quality involves accuracy, consistency, and completeness. High-quality data is error-free and trustworthy for analysis and reporting. Data observability ensures reliability by continuously monitoring and maintaining these aspects. Volume Volume pertains to the amount of data in systems, which can affect performance and storage. Monitoring volume helps prevent issues and ensures efficient data processing. Schema Schema refers to how data is organized and its relationships. Keeping the data structure intact is vital for usability. Monitoring schema changes ensures they align with business needs and do not negatively impact data quality. Lineage Lineage is the path data takes from start to finish, including any changes it undergoes. Understanding data lineage is crucial for troubleshooting problems, ensuring the integrity of data, and maintaining compliance. Monitoring data lineage provides clear insights into data connectivity and aids in quick problem resolution. Why is Data Observability Important? Ensuring Data Quality and Reliability Data quality and reliability are primary functions of data observability, especially when enterprises rely on accurate data for decision-making. Data quality covers accuracy, consistency, and completeness, which are crucial for generating reliable insights and reports. Data observability tools continuously monitor these aspects and flag any discrepancies or errors for resolution, maintaining high data standards and preventing flawed analyses. Reliability ensures that data is available and up-to-date when needed, reducing downtime and preventing disruptions through real-time monitoring and anomaly detection. This capability is particularly important in fast-paced environments, where timely data access gives a competitive advantage. Overall, data observability enhances the trust and dependability of data, supporting robust enterprise data management strategies. Reducing Downtime and Anomaly Detection Maintaining continuous and smooth operations is so important in enterprise, and data observability helps achieve that by reducing downtime and catching problems early to avoid major disruptions and financial losses. Data observability tools monitor data pipelines in real time, quickly identifying potential issues so teams can fix them before they become major problems. Anomaly detection is important because it spots unusual patterns in data flow and performance, which might indicate underlying issues like data corruption or system failures. With advanced algorithms and machine learning, data observability platforms can find these anomalies early, giving an opportunity for prompt action and reducing the risk of extended downtime. These abilities improve the reliability and efficiency of data systems, supporting strong enterprise data management and operational continuity. Enhancing Data-Driven Decision-Making Maintaining data observability in modern enterpise allows decision-makers to have access to reliable data by continuously monitoring and maintaining data systems. This real-time insight into data health helps businesses make informed decisions quickly, identify trends and patterns, and minimize errors and inconsistencies in their data. Data observability fosters a culture of data-driven decision-making and empowers organizations to leverage their data for innovation and growth, enhancing overall business performance in today's competitive landscape. Key Features of Data Observability Tools Real-Time Monitoring and Anomaly Detection Real-time monitoring and detecting anomaly activity are important features of data observability tools. Real-time monitoring involves continuously watching data pipelines for potential issues. This helps keep data reliable and minimizes downtime. Anomaly detection focuses on spotting deviations from normal data patterns that could indicate problems. By using advanced algorithms, data observability tools can identify anomalies that might signal data issues. Early detection of these anomalies allows for quick fixes, preventing small issues from becoming big problems. Real-time monitoring and anomaly detection work together to maintain data quality and efficiency, helping enterprises make strategic decisions and stay competitive. Data Lineage Tracking and Metadata Management Data lineage tracking and metadata management are integral features of data observability tools, enhancing transparency and control over data systems. Data lineage tracking creates a map of data movement and transformation throughout the data pipeline. This type of visibility is crucial for troubleshooting issues, optimizing processes, and ensuring compliance with governance standards. Understanding the journey of data from its source to its destination helps organizations quickly identify and resolve data discrepancies. Metadata management organizes and maintains details about data, such as its origin, context, and usage. This ensures that data is easily accessible and understandable, supporting better decision-making and data governance. Tthese features help enterprises maintain high-quality data systems, providing clarity and insight into complex data environments. Data Observability vs. Other Practices Data observability and other data management practices, like data testing, monitoring, and quality assurance, have the same goal of ensuring data integrity, but they are different in scope and how they work. Data observability gives a complete view of data systems, focusing on real-time insights and end-to-end visibility across data pipelines. It covers many things, like data lineage, freshness, and volume, and provides a detailed framework for keeping data healthy. On the other hand, data testing usually involves specific checks at certain points in the data's life to make sure it's accurate and valid, but it often lacks the continuous oversight that observability gives. Monitoring, while similar to observability, usually looks at system performance and uptime instead of the data itself. Even though they are different, all these practices work together to make data management strong. Together, they make sure data stays accurate, reliable, and available, which helps with making good decisions and keeping things running smoothly in businesses. Related Content: Monitoring vs Observability Implementing Data Observability in Your Enterprise Getting Started To make sure your organization's data is easy to keep track of, start by planning how to do it. First, look at your current data setup to find where there are problems and where keeping track of your data would help. This means checking how data moves around, how it's managed, and how it's protected, to see what needs to get better. Then, set clear goals for your plan, like reducing how often data is unavailable, or making sure the data is good quality. Once you have your goals, pick the right tools and tech that match what you want and work with what you have now. It's important to get key people, like IT, data experts, and business leaders, involved so that everyone is on board and working together. Think about starting with a small project to test your plan, so you can fix and make it better before doing it everywhere. Last, create a system for monitoring and getting feedback to ensure that your plan continues to work and fits your business's changing needs. Best Practices and Overcoming Challenges To succeed in implementing data observability, it's important to follow best practices and overcome common challenges. Start by creating a culture that focuses on the importance of data quality and reliability throughout the organization. Encourage teams to work together to ensure that data observability tools and processes align with business goals. Regular training and awareness programs will help keep staff informed about new tools and practices. One challenge is integrating observability tools with existing systems, which can be addressed by choosing solutions with flexible APIs and seamless interoperability. Another challenge is managing the large volume of data generated by observability tools; make sure your infrastructure can handle this extra load without any performance issues. Establish clear metrics and KPIs to measure the success of your observability initiatives. By focusing on continuous improvement and adaptability, your organization can effectively implement data observability, enhance data management capabilities and make better decisions. Benefits of Data Observability Improved Data Quality and Reliability By continuously monitoring data workflows and systems, observability tools help organizations maintain high standards of data accuracy, consistency, and completeness. Real-time monitoring allows for immediate identification and correction of errors, ensuring that data remains trustworthy and dependable. This proactive approach reduces the risk of data corruption and minimizes downtime, both of which can disrupt business operations and decision-making processes. Enhanced data reliability supports better analytics and reporting, enabling leaders to base their decisions on accurate and timely information. With improved data quality, organizations can optimize their data-driven strategies, driving innovation and achieving competitive advantage. By integrating data observability into data management practices, enterprises ensure that their data assets are not only robust but also aligned with their strategic objectives, ultimately meeting all of those business goals. Enhanced Operational Efficiency \With continuous monitoring and real-time insights, teams can then quickly find and fix problems before they get too big. This means spending less time reacting to issues and more time on important work. Also, data observability makes data cleaning and error fixing easier by making sure data is good from the start. It can also quickly spot any unusual data, making it easier to fix problems fast. Plus, it helps teams use their resources better, so everyone and everything works well. Overall, using data observability in everyday work not only makes systems work better but it will also help your organization use data more effectively. Better Compliance Data observability is important for keeping enterprises in line with rules and standards. It gives a clear view of how data moves and where it comes from. This helps make sure the way data is handled follows industry rules. It's also important for showing who is responsible for data and tracking any changes made to it. Real-time monitoring helps find and fix rule-breaking quickly, which lowers the chance of getting in trouble. Data observability is especially useful in fields with strict rules like finance and healthcare. Adding data observability to a company's rule-following system helps create a culture of following the rules. This makes sure data is handled in a responsible and fair way. Not only does this cut legal risks, it also builds trust with people involved and helps the company compete better. Signs You Need a Data Observability Platform Frequent Data Quality Issues If you’re experiencing lots of data problems it might mean your organization could benefit from a data observability platform. Data mistakes, inconsistencies, and incomplete datasets can really hurt your ability to use analytics and make decisions. These problems often happen because of undetected errors in data pipelines or changes, which are hard to find without good monitoring tools. A data observability platform gives you real-time insights and can find unusual things, so your teams can quickly find and fix the main causes of data problems. By always watching over data processes, observability platforms help keep high standards for accuracy and reliability. High Data Downtime Frequent data downtime is another clear sign that your organization could benefit from using a data observability platform. This downtime can disrupt business operations, delay decision-making, and reduce stakeholder trust. It is often caused by undetected issues in data pipelines such as bottlenecks, system failures, or data corruption. A data observability platform provides continuous monitoring to identify these disruptions in real-time, allowing for a swift resolution. By offering insights into the health and performance of data systems, observability tools decrease the chances of prolonged downtime and improve overall system reliability. These platforms offer diagnostic information to better help you understand the causes of downtime, enabling proactive measures to prevent future occurrences. By minimizing data downtime, organizations can ensure uninterrupted access to critical data, support seamless operations, and maintain a competitive edge in data-driven environments. Inconsistent Data Across Systems If different systems have different data, your organization may benefit from using a data observability platform. When data is inconsistent, it can lead to unreliable analytics, poor business decisions, and a lack of confidence in data-driven strategies. This often happens when data is stuck in one system or doesn't match up across systems. A data observability platform will help you see how data moves and changes in all your systems. It helps you make sure the data is the same and accurate. With real-time monitoring, the platform can quickly find and tell your teams about differences so they can fix them quickly. The platform also helps you track where and how data inconsistencies happen. Fixing these differences helps you have one version of the truth, making your data more reliable and helping your organization make better decisions. Related Content: Splashing into Data Lakes: The Reservoir of Observability The Future of Data Observability Emerging Trends Data observability is being shaped by several emerging trends and technologies that promise to enhance its capabilities and reach. As data systems become more complex, observability tools are integrating artificial intelligence (AI) and machine learning (ML) for better anomaly detection and predictive analytics. This helps organizations anticipate and address issues before they affect operations. Cloud-native architectures and microservices are also driving the evolution of observability solutions, requiring tools that can handle distributed data environments effectively. We’re also seeing a growing focus on improving user interfaces and experiences to make observability tools more accessible to non-technical users. The development of open standards and APIs for data observability is promoting interoperability and flexibility across different platforms and tools, aiming to provide deeper insights, greater automation, and enhanced flexibility in managing data systems. Data observability is more than just a trend— it has emerged as a critical component of modern enterprise data management. It gives us all insights we need for our data health and performance, ensuring data quality and reliability. Helping us quickly reduce downtime and improve decision-making. Features like real-time monitoring, anomaly detection, data lineage tracking, and metadata management create a strong framework for maintaining robust data systems. As data environments become more complex, data observability's role will continue to grow, ensuring compliance, governance, and operational efficiency. For enterprises looking to enhance their data management strategies, now is the time to act. Implementing a robust data observability platform can transform how your organization handles and leverages its data assets. To start, look into data observability solutions, like BindPlane, that match your business goals and have the features you need. Ensure all relevant people are involved to successfully adopt and integrate the solution. Prioritizing data observability can give your organization an advantage, improve decision-making, and keep your data reliable in a rapidly changing enterprise environment.

What Is An Observability Pipeline, Anyway?

Michelle Artreche — Tue, 03 Sep 2024 13:15:00 GMT

When working with software monitoring and observability, there's a bit of a paradox. Here's how it goes: To understand what's happening in complex environments, you usually need to gather a lot of logs, metrics, traces, and other observability data. But, if you collect too much data without an efficient way of processing and managing it, the information becomes a hindrance more than a help. it's a problem for modern businesses committed to optimizing the performance and reliability of digital systems. Fortunately, there's a solution: observability pipelines. By routing and processing data efficiently and at scale, observability pipelines play a critical role in ensuring that having large volumes of observability data at your disposal doesn't ironically undercut visibility into software environments. What is an observability pipeline? An observability pipeline is a type of tool that moves observability data from its sources to its destinations. It can also perform tasks like data transformation, enrichment and aggregation. These are important capabilities in the context of modern application performance management (APM), as well as monitoring and observability for two main reasons. First, the volume of telemetry data – meaning logs, metrics, traces and other information that provides insight into the health and performance of software – that teams have to contend with has exploded over the past decade, due largely to the shift toward microservices and distributed architectures. A modern app could include dozens of individual microservices and containers, each producing its own logs and metrics – not to mention tracing data that tracks requests across multiple services. As a result, there is much more observability data, and more discrete data sources. As Forbes puts it, "there's exponentially more data coming from the proliferation of microservices and containers along with additional complexity and dependencies." The second key challenge is that observability data and workflows have become more complex in the age of microservices. To leverage modern observability data effectively, you need not just to collect it, but also to correlate data from different sources to gain context on performance issues and pinpoint root causes. This requires the ability to route and merge data from multiple locations. Because observability pipelines help solve these challenges, they are now essential to business and used to manage application performance. Gartner predicts that 40 percent of log telemetry data will be processed through observability pipelines by 2026, a 400 percent increase compared to 2022. Why use an observability pipeline? We just explained at a high level why observability pipelines are important in the context of modern APM and observability. But to illustrate the value further, let's take a look at more specific benefits of observability pipelines: Better security: Observability data could contain sensitive information, such as personal names stored in log files. Observability pipelines help keep this data safe by managing it in a centralized way. Plus, through features like data anonymization, pipelines can remove sensitive information to reduce security risks further. Faster incident response: By moving data as quickly and efficiently as possible, as well as by optimizing the data for analysis while it's in transit, observability pipelines help teams make sense of data quickly. This translates to faster incident response because the root causes of issues are easier to identify. Simplified data collection: With an observability pipeline, you can easily create automation that moves all relevant data from its places of origin to its destination – which is much simpler and faster than collecting and exporting data manually. Full data control: Instead of being limited by your data architecture and the features of your data analytics tools, observability pipelines allow you to remain in control of where your data comes from and what happens to it. Vendor neutrality: When you use observability pipelines based on standards like OpenTelemetry, which enables a vendor-neutral approach to telemetry data collection and management, you avoid becoming locked into certain observability tools or vendor ecosystems. Reduced storage costs: By making it possible to perform processes such as data minimization and compression before observability data even arrives at its destination, pipelines can help reduce overall data volumes – and, by extension, data storage costs. Observability pipelines help teams use observability data more efficiently and effectively, while also lowering security risks and providing observability cost advantages. Who's using observability pipelines? Observability pipelines can benefit virtually any organization that must collect, process and manage observability data on any significant scale. But they're particularly valuable for businesses that fall into at least one of the following categories: Those with tight compliance or security requirements, which pipelines help to address by reducing the security and privacy risks of observability data. Organizations that have adopted cloud-native computing strategies and architectures, which tend to increase the volume and complexity of observability data. Businesses seeking to embrace GitOps, which requires a standardized, systematic approach to data collection and management. Companies committed to open standards and open source, which are at the core of observability pipelines that manage data based on standards like OpenTelemetry. It's worth noting, too, that observability pipelines can benefit multiple types of teams and roles. IT engineers responsible for collecting and managing observability data are one obvious beneficiary of this type of tool. However, pipelines can also be useful for security analysts, who also need to collect vast quantities of data and route it to various SIEMs and other tools. Likewise, data engineers can benefit from observability pipelines as a way of streamlining the collection and processing of the data they manage from disparate sources. Getting started with observability pipelines The exact process for implementing an observability pipeline varies depending on which types of data you're collecting, what you're doing with it and which types of tools you use to work with it. In general, however, setting up an observability pipeline boils down to the following four basic steps: Identify data sources and destinations: These are the data resources that will serve as the starting and ending points of your pipeline. Identify data transformations: Determine which types of processes – such as data minimization, integration or deduplication – you need to perform within the pipeline. Choose an observability pipeline tool: Find a solution that can pull data from your sources, process it as you require and deliver it to the destinations. (In case you haven't noticed, we're partial to open, standards-based pipeline tools like BindPlane.) Deploy your pipeline: Implement the pipeline using whichever architecture – such as on-prem, cloud-based or a hybrid approach that combines the former and the latter – makes most sense based on the infrastructure you are working with. An unironic approach to observability Observability can be a real challenge when you struggle to move observability data where it needs to move, in the most efficient way possible. But with an observability pipeline at your disposal, having too much data to work with, or the inability to process data efficiently, no longer gets in the way of achieving visibility into software environments. To learn more about how observability pipelines work and how to implement one, learn about BindPlane, the vendor-agnostic observability pipeline solution that features over 200 integrations and can run on-prem, in the cloud or as part of a hybrid architecture.

observIQ Expands Advanced Support for Sumo Logic in Security and Observability Data

Jamie Gruener — Tue, 13 Aug 2024 18:19:10 GMT

We’re excited to announce that as part of our expanded alliance with Sumo Logic, observIQ extended its support for Sumo’s platform. This allows customers to send logs and metrics to Sumo Logic, leveraging our telemetry pipeline, BindPlane. We’ve also made it possible to automatically recommend processors in our pipeline that format data specifically as Sumo Logic expects—once Sumo Logic is a destination for BindPlane. Additionally, Sumo Logic customers now have the first fully integrated data pipeline based on OpenTelemetry to leverage as part of their telemetry data collection strategy. Our approach with Sumo Logic highlights the advanced capabilities of BindPlane by providing additional intelligence (notice I didn’t use the overused word AI) to simplify the configuration and management of telemetry data. While BindPlane has had the concept of processors for some time, we are now actively recommending processors as customers stream their telemetry data into specific platforms like Sumo Logic. These processors act as filters that allow you, as a customer, to decide the quantity, type, and format of the data you send to your SIEM and observability platforms. For the Sumo Logic destination, BindPlane automatically sets the _sourceCategory and datasource fields when sending the data to Sumo Logic. Sumo Logic customers looking to deploy OpenTelemetry in their environments will find the process much easier with observIQ. BindPlane, powered by OpenTelemetry, by streamlining the creation of an actionable, end-to-end telemetry pipeline. It brings together data standardization, fleet management, and control into a centralized hub that facilitates telemetry collection, processing, and transmission—all from a single location. BindPlane facilitates data migration from legacy platforms to Sumo Logic by supporting a wide range of source platforms. (A full list of BindPlane’s destinations is here) So, whether you are an existing Sumo Logic customer wanting to standardize on OpenTelemetry or a new Sumo Logic customer wanting to migrate to Sumo Logic, BindPlane can be the telemetry data pipeline to drive these initiatives. We are pumped to collaborate with Sumo Logic customers and partners to accelerate their telemetry data collection and OpenTelemetry deployments. BindPlane provides comprehensive, integrated support for Sumo Logic, offering several benefits for Sumo Logic customers: Unified Telemetry: Powered by OpenTelemetry, BindPlane enables teams to gather, process, and ship metrics, logs, and traces to any o11y or SIEM tool in a standardized way. Unified Fleet: From BindPlane, teams can deploy and manage thousands of OTel Collectors in Linux, Windows, and Kubernetes environments--all from a scalable interface. Unified Control: BindPlane streamlines a team’s ability to sculpt and direct data by coupling live previews and snapshots with intelligent processors and controls. Reduce the data volume by 40% or more before it arrives at your o11y or SIEM destination (in this case, Sumo Logic’s platform). Don’t take our word for it. Try it out yourself by requesting a BindPlane or BindPlane Enterprise trial license or contacting us, and let us know that you're a Sumo Logic customer or partner.

Navigating Open Source Software: All Your Questions Answered

Michelle Artreche — Tue, 13 Aug 2024 14:43:48 GMT

What is Open Source Software? Open source software refers to computer programs with source code available for anyone to inspect, modify, and distribute. Unlike proprietary software, open source software is developed collaboratively by a community of developers. One of the main benefits of open source software is cost savings. Because the source code is freely available, organizations can use and customize the software without paying licensing fees, reducing costs, especially for large-scale deployments. Another advantage of open source software is strong community support. With a large and active community of developers and users, bugs can be quickly identified and fixed, and new features can be added rapidly. Well-known open source projects include OpenTelemetry, the Linux operating system, the MySQL database management system, and the Python programming language. These projects have gained widespread adoption and become necessary software stack components. FAQs About Open Source Software Gartner®'s "A CTO's Guide to Open-Source Software: Answering the Top 10 FAQs" addresses common questions that organizations have when considering using open-source software. The article covers various topics, including licensing and legal considerations, security and vulnerability management, integration and customization, community support, contributions, overall ownership cost, and talent acquisition and development. Licensing and Legal Considerations Open source software comes with different licensing models, each with its own rules and requirements. Understanding these licenses is important to ensure legal compliance and mitigate potential risks. Here are some of the most common open source licenses: MIT License: This license is very permissive. It allows you to use, modify, and distribute the software without many restrictions. However, you have to include the original copyright notice and license text. Apache License: Similar to the MIT License, but with additional protections for patent rights and legal safeguards. It’s one of the more popular choices for web servers and libraries. GNU General Public License (GPL): A copyleft license requires any derivative works or modifications also to be released under the same GPL license. This can pose challenges when integrating GPL-licensed code with proprietary software. When using open source software, it's important to carefully review the license terms and make sure you follow any rules or restrictions. If you don't, your organization could face legal risks, such as breaking copyright laws or violating the license. Best practices for maintaining legal compliance include: Create an Open Source Policy: Create guidelines for evaluating, approving, and using open source components within your organization. Maintain an Inventory: Remember to keep track of all open source components used in your projects, including their licenses and any associated obligations. Implement Automated Scanning: Set up automated scanning using tools to check your codebase for open source components. Seek Legal Guidance: If you're not sure about license compatibility or have concerns about potential legal risks, ask legal experts familiar with open source licensing for help. Contribute Back: When you can, think about sharing any improvements or bug fixes with the open source community. Remember to handle open source licensing and legal considerations proactively. So you can enjoy the benefits of open source software while reducing potential risks and complying with laws and regulations. Security and Vulnerability Management Open source software can be just as secure as proprietary solutions, but it does need active security and vulnerability management. One of the main benefits of open source is that the community can review the code and find potential vulnerabilities. But, this also means that you’re putting your code at risk. Make sure you have a clear process for reporting vulnerabilities. So, security researchers and community members can report them responsibly. Many open source projects have bug bounty programs or security advisory lists that can help. Keeping your security patches and updates current is equally as important since they often fix vulnerabilities. When adopting open source software, it's important to check the project's security practices carefully. This includes how they handle security issues, review code, and how often they update for security. It's also a good idea to use automated tools to scan your open source dependencies for known security problems and bug alerts. It's important to use safe coding practices and regularly check security to reduce risks with open source software. Companies should also have clear rules for using open source and a process for reviewing and approving open source parts before adding them to their systems. Integration and Customization Open source software is flexible and can be easily customized and integrated. Many open source projects are designed to be modular and expansible, giving developers the opportunity to add different components to existing systems and workflows. This is particularly useful for organizations with unique requirements or legacy systems that are in need of an update. One of the key advantages of open source software is the ability to modify the source code to meet specific needs. With access to the codebase, developers can customize the software, add new features, fix bugs, or optimize performance for their particular use case. This level of customization is not typically possible with proprietary software, where the source code is not as easily accessible. Remember that customizing open source software may require a lot of development work and expertise. Skilled developers who understand the codebase are essential to make the necessary changes and ensure that the customizations are secure and don't cause problems with future updates. It's also important to consider how the software will integrate with existing systems. Open source software often has ways to connect with other tools and platforms, like APIs or plugins, which can make workflows more efficient. When you integrate open source components, it's important to follow best practices for software development, like testing, documentation, and version control. Organizations also need a plan for maintaining and updating the integrated components to ensure they stay compatible and secure over time. Community Support and Contributions Open source software is made strong by its community. One of the best things about open source is the ability to connect with developers, contributors, and users all around the world who actively support, maintain, and improve the software. Leveraging an open source community has many benefits like: Expertise Access: Open source communities bring together developers and experienced users with a wide range of skills and knowledge. Quick Issue Resolution: With a large community of contributors, bugs and issues are often identified and fixed quickly. Feature Requests and Improvements: Open source communities encourage users to contribute by submitting feature requests, bug reports, and code improvements. To effectively use the open source community, you need to take part and give back. Here are some guidelines for contributing: Understand the Community Guidelines: Understand each open source project's guidelines to ensure your contributions align with its goals. Report Bugs and Issues: If you encounter any bugs or issues, promptly report them to the project's issue tracker. Be sure to give detailed information, including steps to recreate the problem, any error messages, and relevant logs or screenshots. Contribute Code: If you have the skills and knowledge, consider contributing code fixes, enhancements, or new features. Follow the project's coding standards, write clear documentation, and ensure your code is well-tested before submitting a pull request. Participate in Discussions: Join the project's forums, mailing lists, or chat channels. Engage in discussions, share your experiences, and provide feedback. Contribute Documentation: Clear and up-to-date documentation is very important for open source projects. You can help by improving existing documentation, creating tutorials, or translating content. Related Content: Contributing to open source—How and when to get started Total Cost of Ownership (TCO) One of the most compelling advantages of open source software is its potential for long-term cost savings compared to proprietary alternatives. However, it's important to carefully analyze the Total Cost of Ownership (TCO) to fully understand the true financial implications. Open source software is often free to get at first. But then there are other costs like deployment, maintenance, support, and training. These hidden costs can add up fast, especially for big business solutions. Also, organizations might need to hire specialized talent or get outside help to make sure everything is done right and keeps running smoothly. On the other hand, proprietary software usually require more upfront fees and ongoing subscription costs. These expenses can increase with company growth, making it challenging to switch to other solutions because of vendor lock-in. When evaluating the TCO of open source versus proprietary software, it's important to think about a few things: Initial acquisition costs (licensing fees, subscription fees, or none for open source) Implementation and deployment costs (hardware, infrastructure, professional services) Ongoing maintenance and support costs (internal resources, external support contracts) Training and talent acquisition costs (upskilling existing staff, hiring specialized professionals) Scalability and flexibility (ability to adapt to changing business needs without incurring significant additional costs) Integration costs (compatibility with existing systems and tools) Exit costs (if deciding to switch solutions in the future) Remember to weigh all the factors to understand long-term costs carefully. Open source software may require an initial investment but can lead to significant savings over time, especially for organizations with skilled in-house teams dedicated to ongoing maintenance and support. Talent Acquisition and Development Embracing open-source software requires building a team with the right skills and mindset. Attracting and keeping talented people who are skilled in open source technologies is crucial. Here are some strategies to think about: 1. Cultivate a work environment that values open source principles, like collaboration, transparency, and knowledge sharing. 2. Invest in training programs and professional development opportunities to improve the skills of your existing workforce in open-source technologies. 3. Engage with the open source community by sponsoring events, contributing to projects, and participating in online forums. 4. Offer flexible work options, schedules, and opportunities for self-directed learning to opensource developers who value autonomy. 5. Celebrate your team's contributions to open source projects to attract like-minded individuals during recruitment. 6. Partner with universities and coding boot camps to recruit promising talent skilled in open source technologies. By creating an open source culture, providing professional development opportunities, and actively engaging with the community, your organization can attract and keep the talent you need to succeed with open source software. Open source software has so many benefits from cutting costs and community support to flexible integration and customization. But using it well means taking a thoughtful approach to licensing, security, and talent development. Understanding the potential challenges and following best practices can help mitigate risks and unlock the full potential of open source software. Whether you're new to open source software or looking to improve your existing strategy, we suggest downloading a A CTO’s Guide to Open-Source Software: Answering the Top 10 FAQs to get all of your questions answered.

How to Start Contributing to Open Source with OpenTelemetry

Michelle Artreche — Mon, 12 Aug 2024 20:20:55 GMT

Today, open source software is everywhere – from Linux-based servers, to Android smartphones, to the Firefox Web browser, to name just a handful of open source platforms in widespread use today. But the open source code driving these innovations doesn't write itself. It's developed by open source contributors – and you could be one of them. If you're passionate about helping to grow an open source tool or application, or you want to gain some valuable coding experience that you can show off to prospective employers, helping to develop open source may be an obvious thing to do. If you're new to open source, knowing where to start contributing might not be as straightfoward. This article leverages my experience with a vibrant community of open source coders who write the software that helps power BindPlane. Understanding open source projects Let's start with the basics: what is open source and what a typical open source project involves? Open source is software whose source code is publicly available. When software is open source, anyone can download, inspect and, in most cases, modify its code. This makes it different from closed source or proprietary software, which is usually released in the form of binaries – meaning source code is not publicly available. It's important to note that open source software isn’t always free of charge. While it often is, most open source licenses allow projects to charge fees for downloading or using their code, as long as the source code remains public. Most open source projects today host their code on platforms like GitHub or GitLab, where anyone can view it. In addition to storing a project's source code, these platforms also typically host a few other core resources, including: A README file, which describes how to compile and install the software. A CONTRIBUTING.md file, which includes guidelines on how to become a contributor to the project. A Code of Conduct statement, which establishes ethical guidelines for contributors to follow. Some projects offer additional resources, like documentation about how to use software. Why contribute to open source? There are two primary reasons for contributing to open source. The first involves a selfless impulse to give back by helping to develop software you use and love. If you've benefited from code written and freely shared by others, you might opt to pay it forward by sharing contributions of your own. The second reason is career advancement. Especially if you’re new to coding, contributing to open source provides real-world experience that enhances your job applications. Plus, being an open source contributor may help you network by getting to know other programmers, potentially opening up further career opportunities. Whatever your motivation, most open source projects will be happy to have you as a contributor as long as you follow their guidelines and add valuable code. Prerequisites for contributing To write valuable code, of course, you need at least basic programming skills. While you don’t need to be a top-notch hacker, some experience is crucial. If you're completely new to programming, it's advisable to develop an app or two of your own before writing code that you hope an open source project will accept. Familiarity with the tools used to manage code in open source projects is also necessary. Most projects today use Git. Git helps multiple developers work on the same codebase simultaneously by automatically helping to keep code in sync and avoid conflicts. It also provides version control features, which make it possible to track how code changes over time and revert to an earlier version if desired. Git can even be used to automate workflows using a technique called GitOps, but that's a topic for another day. Finding the right open source project Once you've confirmed that you have the prerequisite skills to contribute to open source, you'll want to find a project to contribute to. If there's a project you're passionate about because you use its software or believe in its mission, it's an obvious good candidate to consider. If not, you can browse projects on GitHub or GitLab. Sites like goodfirstissue.dev, which offers a curated list of open source projects, may also lead you to a project that fits well with your skills and goals. As you assess projects, think not just about what the project does, but also which technologies it uses. For instance, do you know the programming language or languages it uses? If the application it develops uses a microservices architecture, are you familiar with that approach to application design? Check as well how active the project is. If a project hasn't seen any new code contributions in months or years, it's likely that its developers have abandoned it, and that any contributions you attempt to make will never be reviewed. In that case, you could fork the project to revive it, but taking over someone else's project can be a tough task; you probably shouldn't do it until you've gained some experience working within open source communities. Getting ready to contribute Once you've chosen a project, read its contribution guidelines, if they exist, to learn how the developers expect you to contribute code. The guidelines might explain how to set up a development environment on your computer that is compatible with the project's tools, for example. Smaller projects may not have contribution guidelines, in which case looking at past contributions (which you can typically track through Git) is your best bet for getting a sense of how programmers contribute to the project. The project may also have a mailing list where you can ask about contributing – but be sure you've read through the project's resources first so you don't ask questions answered elsewhere. Making your first contribution When you're finally ready to make your first contribution, start by deciding what, exactly, to contribute. In most cases, it's wise to look for a request from the project, rather than developing an unsolicited feature or enhancement. Many projects describe goals using tools like GitHub Issues, so check there to see if there are specific requests you can work on. After writing the code to implement the contribution, submit a pull request. This is a formal notification to the project that you'd like it to integrate your code. As a best practice, include notes with your pull request explaining what the change does. What happens after your first contribution Open source projects have varying processes for reviewing pull requests, and some approach them in a more systematic or standardized way than others. In general, however, expect that existing contributors to the project will review your pull request, a process that could take anywhere from mere hours to weeks, depending on how much time the developers have to devote to the project and how complex your code is. The project may accept the contribution outright. If not, the developers will ideally provide feedback and identify changes they'd like you to make to improve your code. If not, though, don't take it personally or as a sign that the project doesn't want you to contribute. Most open source projects are volunteer-run, and code reviewers don't always have time to offer feedback. If your pull request was rejected without comment, consider assessing your code yourself to determine why it might not have made When your code is accepted – which we hope it will be – the project's developers will merge it into their codebase. This means your code has become an integral part of the open source application or platform – and that you should celebrate your success in making your first successful open source contribution! Continuing contributions and staying involved Making a first contribution to open source is great. What's even better is continuing to make contributions over time. The lifeblood of most projects is contributors who stick around for years and get to know the technology and culture in depth. Staying involved with a project can be as simple as continuing to make pull requests. But if you want to level up your engagement, consider applying for a leadership role, such as one where you help review code from others or plan the project's future direction. The process for becoming a leader varies because projects have different governance structures, but in many cases, you'll qualify once you have made a certain number of successful pull requests. Get started with open source! For newcomers, contributing to open source can seem challenging, but mastering some foundational concepts and practices simplifies the process. We would know. At BindPlane, open source is at the core of our approach to observability, which is why we maintain dozens of GitHub repositories where anyone can contribute to the code behind our monitoring and observability tools. It's also why our platform is powered by OpenTelemetry, the open source, community-developed standard for collecting telemetry data. If you're as passionate about open source as we are – and/or if you want some hands-on experience building software that plays a critical role in helping businesses around the world manage software performance – we'd love to work with you as an open source contributor. To find out whether one of our projects could be a fit, learn what our solution, BindPlane, is all about. Questions? Join our Slack community and chat with one of our developers.

Managing Observability Pipeline Chaos

Michelle Artreche — Wed, 07 Aug 2024 14:45:15 GMT

Optimizing Observability Pipelines The cloud environment has generated an unprecedented volume of data, making it increasingly difficult for enterprises to manage. With multiple SaaS and cloud-based applications in play, differentiating which data needs processing for analysis versus storage for regulatory compliance is a significant challenge. The growing number of data sources only complicates this further. So, getting clarity and control over this chaos is the goal, without having to overhaul your entire system. But what’s the best way to approach optimizing your current stack? We commonly see these three challenges: Simplify: Streamline the management of all your telemetry agents and data collection processes. This reduces complexity and improves operational efficiency. Standardize: Adopt open standards like OpenTelemetry to ensure a vendor-agnostic approach, making your systems more interoperable and flexible. Reduce: Lower data volumes to cut costs and drive efficiencies in data management and backend monitoring solutions. By focusing on these three challenges, enterprises can better manage observability pipelines, ensuring optimal performance and cost efficiency. Simplify Your Cloud Migration with an Observability Pipeline We work with customers embarking on cloud migration to design observability pipelines that accelerate the process. Many customers are still and will remain at least partially on-prem for various reasons and want complete control within their firewall. In both scenarios, they need to contain the growing chaos surrounding agent management and observability and be able to quickly gather, process, and transmit the telemetry data from any source to any destination. And for the actual developer teams – the hands-on keyboard folks – they want to be able to wrap their heads around managing the complexity of thousands of agents. With valuable time saved, they can focus on critical tasks. Embrace Flexibility with OpenTelemetry Choosing your preferred monitoring tools on the back end is ideal, which is why OpenTelemetry has become so popular. It simplifies the ingestion process across multi-vendor environments and enhances distribution within organizations. It initially focuses on logs, followed by metrics and traces. The flexibility counteracts the constraints of any single log management tool. For example, a major US healthcare provider was grappling with challenges related to size and complexity and looked to modernize its observability environment. They made a significant investment in enterprise tools like Splunk, New Relic, Elastic, and Datadog. By standardizing on OpenTelemetry, the eliminated vendor lock-in, giving users the freedom to choose the best monitoring solution for their specific use cases. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane Navigating Compliance and Security While Reducing Data Complexity Many enterprise customers operate within strict compliance and regulatory environments that vary across regions and countries. This requires maintaining some amount of data in perpetuity—raising the question of which data to analyze and which to keep for compliance. Coupled with tight security requirements, the complexity increases. Of course, not all data is created equal, so having a tool to help gather, process, and route to the correct destination is critical. By sending the appropriate data to the right tool, they save on volume and costs of analyzing and storing it. Related Content: Configuration Management in BindPlane Simplify Telemetry Pipeline Management and Cut Costs with BindPlane Enterprises strive to streamline their telemetry pipelines, reduce data storage costs, and minimize the time spent managing complex tasks. But the increasing amount of data, a wide range of vendors, and more applications are making this challenging for DevOps teams and affecting the bottom line. That’s why we’re seeing so much interest in BindPlane, the industry's first OTel-native telemetry platform. It addresses these issues by providing exceptional visibility across cloud, hybrid, and on-premise environments. It standardizes telemetry creation, transmission, and processing according to the OpenTelemetry standard, while seamlessly integrating with existing telemetry streams. To learn more about getting started with BindPlane OP, visit https://observiq.com/solutions. Questions? Join our Slack community and chat with our developers here.

How to Monitor JVM with OpenTelemetry

Deepa Ramachandra — Tue, 06 Aug 2024 15:29:31 GMT

The Java Virtual Machine (JVM) is an important part of the Java programming language, allowing applications to run on any device with the JVM, regardless of the hardware and operating system. It interprets Java bytecode and manages memory, garbage collection, and performance optimization to ensure smooth execution and scalability. Effective JVM monitoring is critical for performance and stability. This is where OpenTelemetry comes into play. OpenTelemetry's Role OpenTelemetry is a tool for monitoring and diagnosing the performance of distributed systems. It collects and processes telemetry data like metrics, logs, and traces, helping developers understand their applications, identify bottlenecks, and improve performance and reliability. We are continuously adding monitoring support for different sources. The latest addition is support for JVM monitoring using the OpenTelemetry collector. You can find more details about this support in OpenTelemetry’s repository. The best part is that this receiver works with any OpenTelemetry Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. Let us guide you through setting up this receiver with observIQ’s distribution of the OpenTelemetry Collector and sending the metrics to Google Cloud Operations. Here, JVM monitoring is managed using the JMX metrics receiver from OpenTelemetry. Monitor JVM with OpenTelemetry Performance metrics are the most important to monitor for JVM. Here’s a list of signals to keep track of: Heap Memory: It's important to keep an eye on heap memory to understand how your application manages memory when the traffic accessing your application changes. Heap memory is where the application stores objects. Depending on the number of users, the heap keeps objects related to requests. After the request is completed, the heap memory is supposed to clear these objects. If this doesn't happen as expected due to coding issues or lack of scalability, the problem needs to be identified and addressed before it causes the application to crash. Metrics such as jvm.memory.heap. help keep track of the total heap memory used at any given time. Garbage Collection: Once the heap memory no longer references the serviced request objects, the objects are cleaned out from the heap memory using the Garbage collection process. When garbage collection happens, the application performs poorly, leading to slower responsiveness. Therefore, making the garbage collection process shorter and faster is important for better application performance. Metrics such as jvm.gc.collections.count provide the total count of garbage collections at specific intervals. Threads: Monitoring the active thread count in the JVM is crucial. A higher active thread count can slow down the application. Having more threads puts greater demand on application resources like the processor and server utilization. By analyzing the thread count over time, you can determine the best thread count based on varying request traffic. It's important to adjust the number of threads based on changing traffic levels in the application. Metrics such as jvm.memory.pool.init gives information about the thread count at specified intervals. Configuring the JMX Metrics Receiver After the installation, you can find the configuration file for the collector at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) The first step is building the receiver’s configuration: We are using the JMX receiver to gather JVM metrics. The jar_path attribute allows you to specify the path to the jar file for gathering JVM metrics using the JMX receiver. This file path is automatically created when observIQ’s distribution of the OpenTelemetry Collector is installed. You should set the IP address and port for the system from which the metrics are collected as the endpoint. When we connect to JMX, there are different categories of metrics; this configuration is intended to scrape the JVM metrics. This target_system attribute specifies that. Set the time for fetching the metrics using the collection_interval attribute. The default value for this parameter is 10 seconds. However, if metrics are exported to Google Cloud operations, this value is set to 60 seconds by default. The properties attribute allows you to set arbitrary attributes. For instance, if you are configuring multiple JMX receivers to collect metrics from many JVM servers, this attribute enables you to set unique IP addresses for each endpoint system. Please note that this is not the only use of the properties option. The next step is to configure the processors: Use the resourcedetection processor to create an identifier for each JVM instance from which the metrics are collected. Add the batch processor to group the metrics from multiple receivers. It's important to use this processor in the configuration to benefit the collector's logging component. If you would like to learn more about this processor, check the documentation. To export the metrics, the next step is to set up a destination. You can find the configuration for your preferred destination in OpenTelemetry’s documentation here. Set up the pipeline. Viewing and Analyzing JVM Metrics The JMX metrics gatherer collects the specified metrics and exports them to the destination based on the detailed configuration above. We've guided you through setting up the JVM metrics receiver using observIQ’s OpenTelemetry Collector to send metrics to Google Cloud Operations. With this setup, you can monitor heap memory, garbage collection, and thread count for JVM performance. By following the steps above, you can ensure accurate collection and export of JVM metrics, enabling you to maintain optimal performance and stability for your Java applications. For more information, visit OpenTelemetry’s repository. Contact our support team at support@observIQ.com for assistance. Thank you for following along, and happy monitoring!

BindPlane Summer ‘24 Release

Joe Howell — Wed, 31 Jul 2024 19:22:00 GMT

observIQ + BindPlane: It’s Heating Up! As the summer heats up, so does innovation at observIQ. We are thrilled to announce a number of exciting updates for BindPlane, the industry’s first OTel-native telemetry pipeline. Read on for a summary of what’s new in BindPlane, themed and tuned with the excitement and energy of NBA Jam’s legendary announcer, Tim Kritzow. 1. Intelligent Controls: Boomshakalaka! At observIQ, our team is obsessed with refining and expanding BindPlane’s Total Control ™ feature set--ensuring teams have all the necessary tools and levers required to build an actionable telemetry pipeline. Here are some of the latest enhancements: Smart Processors: From the Snapshots page, BindPlane now intelligently suggests useful processors such as Remove Empty Fields, Filter by Severity, and Deduplicate Logs so teams can quickly reduce and get to value. If a known pattern is detected, the user can view, edit, and apply suggested processors to their pipeline with just a few clicks. Field Inspection: We also recently unveiled new functionality on our Snapshots page, allowing users to easily add useful processors to their pipeline to reduce and refine data in their pipeline. Snapshot Search: Teams can now easily search their incoming data to identify new reduction vectors. Filter by Condition (with OTTL): Teams can rapidly implement and filter their data by condition, simplifying OTel’s powerful (though a bit complex) OpenTelemetry Transport Language (OTTL). Check out one of our recent blog posts if you’d like to learn more about OTTL. Progressive Rollouts: With Progressive Rollout, teams can utilize attributes and labels to safely push OTel configurations in staged deployments before they go live in production. Improved Color Coding and Context: This is a simple but valuable enhancement for those of us who aren’t robots. It’s now much easier to see the before-and-after changes in your configuration and live data after a transform processor has been applied. 2. New Resources: Is it the shoes? BindPlane now features several new Sources, Processors, and Destinations, expanding BindPlane’s OOTB collection, processing, and transmission capabilities. For each, we focused on a few critical areas that our customers have been asking for: Sources Cloud AWS Cloudwatch: gathers telemetry from Cloudwatch API AWS S3: gathers and rehydrates data from S3 storage buckets Azure Blob Storage: gathers and rehydrates data from Azure blog storage buckets Testing and Validation Telemetry Generator: generates synthetic telemetry for building and validating pipelines without requiring a live source Self-Observability BindPlane Gateway Source simplifies the creation of an OpenTelemtery Gateway by providing additional context and visual distinction when compared against a generic OTLP source BindPlane Agent: exposes telemetry about your fleet of BindPlane Agents BindPlane OP: exposes telemetry about your BindPlane Instance Processors Enrichment Lookup Fields: The Lookup Fields processor can be used to add matching telemetry fields from a CSV file, facilitating complex, long-form data enrichment from external data sources. Filtering Filter by Condition: this processor drastically simplifies implementing processing rules with the OpenTelemetry Transport Language (OTTL) enabling users to filter with operators and common expressions. Parsing Parse CSV Parse Key Value Pair Parse XML Destinations: o11y Observe Snowflake InfluxDB SIEM Google Security Operations Sumo Logic 3. Resource Library: From Downtown! With BindPlane’s new Resource Library, users can now create and manage reusable Sources, Processors, and Destinations. Teams can create a resource once and easily insert it into one or many OTel configurations used by multiple teams spanning multiple deployments. BindPlane Cloud - Ready to Launch: Count It! After a successful beta phase, BindPlane Cloud is ready to launch! Cloud will soon go live, with all of BindPlane’s best features with the combined benefits that a SaaS platform provides: scalability, security, and redundancy. Cloud also adds SSO and additional authentication options. Interested? You can sign up for the free tier of BindPlane OP Cloud here to start your migration to OTel today. observIQ + BindPlane: We’re on Fire! (in a good, streaky, NBA Jam type of way) It’s been an exciting summer for us at observIQ. The interest in OpenTelemetry and telemetry pipelines is certainly heating up. We take pride in the problems we’re helping our customers solve - we’re excited to deliver more solutions to our customers in 2024. *** 16-bit End of Regulation Buzzer Sound ***

How to Ship AWS Cloudwatch Logs to Any Destination with OpenTelemetry

Keith Schmitt — Tue, 30 Jul 2024 17:45:59 GMT

Observability and log management are needed for a strong IT strategy. Two essential tools for these purposes are AWS CloudWatch and OpenTelemetry. AWS Cloudwatch provides real-time data and insights into AWS-powered applications' health, performance, and efficiency. On the other hand, OpenTelemetry is an open-source observability framework that assists developers in creating, gathering, and exporting telemetry data (such as traces, metrics, and logs) for analysis. Our team has recently contributed to OpenTelemetry, making it easier to gather logs from your entire infrastructure using free, open-source tools. You can access the latest OpenTelemetry capabilities through observIQ's distribution of the OpenTelemetry Collector, available here. This blog will teach you how to use OpenTelemetry to send logs from AWS Cloudwatch. You can use the AWS Cloudwatch receiver to send logs to popular analysis tools like Google Cloud, New Relic, OTLP, Grafana, and more. What signals matter? AWS CloudWatch is AWS’s primary logging solution. It collects logs from Lambda functions, EC2 instances, and EKS. If your system involves sources outside of AWS, or you need to analyze or store logs in a different tool, OpenTelemetry can help you manage data across different vendors. Amazon EKS Logs in CloudWatch include: API Server Component Logs Audit Logs Authenticator Logs Controller Manager Logs Scheduler Logs AWS Lambda Logs are generated by functions you create. Examples include RequestID logs, Duration logs, and Memory size and allocation logs. EC2 Instances provide flexible computing resources in the AWS cloud. The logs generated by EC2 depend on your specific computing processes. Related Content: OpenTelemetry in Production: A Primer Installing the Receiver If you do not have the latest AWS CloudWatch receiver installed with an OpenTelemetry Collector, we suggest using the observIQ OpenTelemetry Collector distribution. This distribution includes the AWS CloudWatch receiver and many others. The Installation is simple with our one-line installer. After running the installation command on your source, come back to this blog for more guidance. Configuring the Receiver To set up the Receiver, you can just open your OpenTelemetry configuration file. If you use the observIQ Collector, look for it in the following locations: /opt/observiq-otel-collector/config.yaml (Linux) C:\Program Files\Google\Cloud Operations\Ops Agent\config\config.yaml (Windows) Edit the configuration file to include the AWS Cloudwatch receiver as shown below: Below are some fields you can add or change in the config file: Resource Attributes aws.region cloudwatch.log.group.name Cloudwatch.log.stream Log Attributes ID Related Content: How to enrich data with OpenTelemetry Viewing and Analyzing Collected Logs To start receiving the specific AWS Cloudwatch logs, simply follow the steps outlined above. If you use Oracle DB, our solutions can significantly enhance your infrastructure monitoring. Start Using OpenTelemetry Want to improve your observability and log management? Test the latest OpenTelemetry tools with our version of the OpenTelemetry Collector here. Stay updated on our future posts and simplified configurations for different sources. If you have questions, requests, or suggestions, contact our support team. You can also join our open-source observability community Slack Channel.

How to Embed React in Golang

Dave Vanlaningham — Wed, 24 Jul 2024 18:08:00 GMT

In this article, we’ll learn how to embed a React single-page application (SPA) in our Go backend. If you’re itching to look at code, you can get started with our implementation here or view the final source code in the embeddable-react-final repository. In the meantime, it's worth discussing the problem we’re here to solve and why this is an excellent solution for many use cases. The Use Case Imagine you’ve built an application and API in Go – that may be used by a command line client or with REST. One day, your project manager emerges from playing Elden Ring long enough to inform you that your customers demand a graphical user interface. OK—no big deal. You can write a simple React App to use your API. Except your simple Web API, which previously was deployed with a single binary, now needs some dependencies. Some current React frameworks, like NextJS or Gatsby, are well supported but might be overkill and not as flexible as you like. Typically, you deploy a front-end application like this. Where the browser sends requests directly to endpoints on the same host, this middle-man server forwards them onto the backend API, where all of the logic is handled. In turn, this server sends the response back to the browser. This may be what you want. It would be best to obfuscate your backend from the rest of the internet. If your API is already exposed, there is an exquisite and straightforward solution that avoids node dependencies and allows multiple services to run. Prerequisites To follow along with this guide, you’ll need: Go 1.18 installed Node 16 installed Your favorite code editor Getting started You can go ahead and clone the starting point for our app. Here, we have a To-Do application. Unimaginative, yet still a cornerstone of web development tutorials. Without going into much detail, we have a REST API implemented in api/ and a React app in ui/. Let's start the API server. From the project directory: We can see we have a REST API listening on port 4000. Now, in a separate shell window, let's start our React app. Now, we’re running our React app in development mode, so go ahead and navigate to http://localhost:3000 and look at our React app. You should see some TODOs.\ And sure enough, our API got some hits: You might be asking, “How did this even work?”. Good question! Answer: Magic. Well… at least create-react-app magic. Check out ui/package.json line 5. We used create-react-app to bootstrap our UI directory to utilize a built-in development proxy server. When we run npm start behind the scenes, an express server is spun up, serving our HTML, JavaScript, and CSS. It also creates a WebSocket connection with our front end to push updates from the code when we save. While this works great in development, this “proxy server” does not exist in a production environment. We’re responsible for serving the static files ourselves. Related Content: Tracing Services Using OTel and Jaeger Embedding static files into our program We need a way to serve a built React application from our Go API. To do this, we can utilize the Go embed package to serve our file system. First, let's make our production build. In ui/ run We now have a build folder with some files in it: We’ve boiled our app to several static files by running an npm run build. From the project directory: Copy and paste this code: Now run: Let's break this down a bit. Note lines 14 and 15. This is utilizing the go:embed directive to save the contents of the build directory as a filesystem. We now need to use this so that Gin can serve as middleware, so we create a struct staticFileSystem that implements static.ServeFileSystem. To do this, we need to add the Exists method: This tells the server that when the client requests build/index.html, the file exists and needs to be served. Now we can use it in Gin middleware, line 21: Let's add this route in our API/start.go file, which now looks like this: Let's build the binary and see it in action. In the project root directory: We should see our server spin up. Now navigate to our backend server host localhost:4000, and voila! We have a React app running with no express server and no node dependencies. You can hand this off as an RPM or DEB package or make it available to Homebrew. Related Content: Creating Homebrew Formulas with GoReleaser The Refresh Problem Ok, cool; we've got a single page being hosted. But let's say we want another page. Customers demand websites with multiple pages, so we must be agile and support this ridiculous request. So, let's add an About page and utilize React Router to navigate to it. So in ui/ Let's add an About page. From the project directory: Copy this into it. Now, add a link to it in our ui/src/components/Todos.jsx file. Finally, add these routes with React Router. Our ui/App.jsx now looks like this: Now, let's rebuild our app and start it again. And when we navigate to it: Great! The only problem is to hit Refresh. This is unfortunate but not surprising. When we hit refresh, we told the server we were looking for the file at ui/build/about – which doesn’t exist. React Router manages the history state of the browser to make it appear as if we’re navigating to new pages, but the HTML of our document is still index.html. How do we get around this? Bonus: To further explain this phenomenon, check out Stijn de Witt’s answer to this stack overflow question. We should all be as thorough as Stijn. Create a fallback filesystem Essentially, we want to permanently server index.html on our / route. So, let's add some stuff to ui/ui.go. We’ve added some things here, including our newest struct, fallbackFileSystem. We’ve implemented our Exists and Open methods, ensuring they always return to index.html. Secondly, we’ve added some more middleware in AddRoutes: The order is important here. The first middleware checks to see if the file exists and ensures our CSS and JavaScript static files are available. It will serve them accordingly when the browser requests. Next, we say, “OK, we don't have that file, but we do have a nice index file.” This is the English translation of line 5 above. Let's rebuild and try again. After refreshing on /about, we see our About page in all its glory. A multi-paged, single-page React App embedded in a binary. Magic. Caveats While this app is a reasonable proof of concept, some notable subtleties must be covered in depth. Authentication – exposing an unauthenticated backend API to the broader internet is as dangerous as it sounds. Development workflow – While developing the UI, you’ll need to run both the backend server (with go run .) and the node development server (npm start). There are some tools to help you do this in a single shell window we use Concurrently. The UI/build directory must have files to compile the code. You might notice that the //go:embed build directive is unhappy if there are no files to embed in the ui/build. You’ll have to run an npm run build for the Go program to compile or satisfy it with a single file mkdir ui/build && touch ui/build/index.html. Summary We simplified development and deployment processes by embedding a static React application in our binary. It’s worth noting that this is not putting much of a strain on our backend service; it simply has to serve up some JavaScript and CSS files occasionally. The bulk of the work is still in the API routes, which is done by design in our current project. We’ve found this a valuable and elegant solution to hosting a React App with our Go-written backend. Acknowledgments My colleague Andy Keller came up with and developed the fallbackFileSytem workaround. We took inspiration from this issue in the gin repo to implement our staticFileSystem.

How to Build a Custom OpenTelemetry Collector

Michelle Artreche — Mon, 22 Jul 2024 19:29:52 GMT

Telemetry data collection and analysis are important for businesses. We're diving right in to explain the ins and outs of the OpenTelemetry Collector, including its core components, distribution selection, and customization tips for optimal data collection and integration. Whether you're new to OpenTelemetry or expanding your capabilities, this will help you effectively use the OpenTelemetry Collector in your observability strategy. Understanding the OpenTelemetry Collector The OpenTelemetry Collector is made up of several components: receivers, processors, exporters, connectors, and extensions. Each component serves a unique function in the data pipeline, facilitating the ingestion, processing, and export of telemetry data from various sources. Customizing these components allows organizations to fine-tune data collection strategies, optimize performance, and seamlessly integrate with existing infrastructures. Choosing a Collector Distribution Assess Your Needs When choosing an OpenTelemetry Collector distribution, there are a few factors to consider. These include your specific telemetry data requirements, the complexity of your environment, the level of support you need, and the extent of customization or scalability you require. Research Distributions To fully understand different distribution options, explore their features, built-in components, and available support services. Make sure to verify that the distribution supports all the platforms and languages your systems use. Documentation & Community A well-documented and easy-to-use distribution can save time and effort in setup and maintenance. Don’t forget about the valuable input and reviews from the community; research to see what your peers think about the distribution options. Options for OpenTelemetery Collector Distributions We've put together a list of common distributions of OpenTelemetry Collector for your consideration: OpenTelemetry Collector (Core/Contrib): This is the primary version provided by the OpenTelemetry community. It includes a basic set of components in the Core version and an expanded set in the Contrib version, which provides for extra receivers, processors, and exporters. AWS Distro for OpenTelemetry (ADOT): This Amazon Web Services distribution has been optimized for use in AWS environments. It includes specific enhancements and setups tailored for AWS and integrates with various AWS services and systems. Splunk Distribution of OpenTelemetry Collector: Optimized to work seamlessly with Splunk Observability Cloud and includes various improvements and extra features designed specifically for Splunk environments. Grafana Agent: The Grafana Agent, primarily designed for Prometheus, can send metrics and traces to Grafana Cloud and other backends using OpenTelemetry. Lightstep Distro for OpenTelemetry: Optimized for the Lightstep Observability platform and includes additions and optimizations for better integration and performance with the Lightstep suite. BindPlane Agent: The BindPlane Agent is a tool that can act as an agent, a gateway, or both. When used as an agent, the collector runs on the same host and collects telemetry. As a gateway, it collects telemetry from other agents and sends the data to its final destination. Building Your Collector Distribution The hypothetical scenario detailed within the webinar illustrates the process of building a custom OTel Collector using the OpenTelemetry Collector builder. The steps include downloading the OpenTelemetry Collector builder binary, refining the manifest file to remove unnecessary parts, and adding special connectors or processors for hotel data analytics. This method improves data collection accuracy and helps organizations get important insights to run operations better and satisfy customers. Critical Steps in Building a Custom Collector Distribution Install the builder Download the OTel Collector builder binary (ocb). Note: On Linux and OSX you will likely need to make the binary executable with chmod u+x ocb To check if the ocb is ready to be used, open your terminal and type ./ocb help. After pressing enter, you should see the help command output in your console. Step-by-step example of downloading and making executable on Linux; while also putting it in its own dedicated folder: You can also add this to your path, or create a symbolic link to it on the path if desired. Configure the initial manifest file The builder's manifest file is a yaml, and it acts as a blueprint to modify and compile all of the components you want to add to your Collector’s distribution — details like distribution name, description, and version. This step allows for precise integration of specialized functionalities needed for unique data collection scenarios. The dist map at the beginning of manifest contains tags to help you configure the code generation and compile process. These tags for dist are the same as the ocb command line flags. Here are the tags for the dist map: This is a friendly reminder that you can add custom values for distribution tags based on whether you want your custom Collector distribution to be available for others to use or if you're simply using the ocb for your component development and testing environment. All dist tags are optional and meant for customization. Testing and validation Before you deploy it, make sure to thoroughly test the custom collector in different situations to confirm that it works well. Testing will ensure that the collector meets performance benchmarks and captures telemetry data as intended. We will create a distribution for the Collector to help develop and test custom components. To get started, create a manifest file named builder-config.yaml with the following content: To customize the Collector distribution, start by adding modules, which are the specific components you want to include. For our development and testing collector distribution, we will add the following components: Exporters: OTLP and Debug Receivers: OTLP Processors: Batch Once you’ve added these components, the builder-config.yaml manifest file will reflect these changes: Create the Code and Establish your Collector’s distribution Just let the ocb do its job to ensure everything goes smoothly. Open your terminal and type the following command: ./ocb --config builder-config.yaml If the command runs successfully, the output should look like this: The folder otelcol-dev has been created in the dist section of your config file. It includes all the source code and the binary for your Collector’s distribution. The folder structure should look like this: You can use the code you generated to start your component development projects. This makes it easier for you to create and share your own collector distribution with your components. Automate Builds for Efficient Deployment Making builds and sending them out is important for making sure everything works well. Developers can streamline their build processes and ensure smooth and effective deployments by using tools like OpenTelemetry Collector Contrib, Manifest, and GoReleaser in a structured approach. The OpenTelemetry Collector Contrib offers a solid framework for building customized telemetry collectors. Manifest files serve as templates for defining crucial components. GoReleaser streamlines this process by automating the release of Go projects, minimizing the manual effort required for building and deploying applications. Staying Up-to-Date OpenTelemetry Releases Developers can access pre-configured workflows and templates by following the OpenTelemetry release process and using the OpenTelemetry releases repository. These resources make automating builds for different architectures easier, saving time and ensuring a standardized and reliable deployment process. Continuous Integration and Continuous Deployment (CI/CD) Deployed systems need to stay healthy and perform well. This requires using continuous integration and continuous deployment practices. These practices allow for regular updates, ensuring that builds always include the latest changes and enhancements. This helps to keep deployments secure and optimized. Tools like GitHub Actions automate builds and releases, improving overall efficiency. Why You Need Vulnerability Checks It is important to check for potential security risks before deploying software components to production. These checks are typically performed within the OpenTelemetry framework to ensure that custom components such as receivers and processors are secure and don't introduce vulnerabilities into the system. By addressing security concerns proactively, developers can prevent potential breaches and maintain a strong security posture. Integrating Vulnerability Checks into the Workflow To effectively check for vulnerabilities, adding a validation step to the development workflow is important. This step includes: Building the component: Make sure the custom component is correctly built and works. Checking for vulnerabilities: Scan the component for security issues. Meeting security standards: Verify that the component follows security guidelines before deployment. Automating Vulnerability Checks Developers can automate the scanning and validation processes by leveraging tools like GitHub Actions. This helps improve security by seamlessly integrating these processes into the deployment pipeline. Here’s how automation enhances security. Consistency: Automated processes make sure that every part undergoes the same strict security checks, reducing human errors and oversights. Efficiency: Simplified workflows speed up the scanning and validation processes, making deployments faster and more reliable. Proactive Security: Regular automated checks help find vulnerabilities early in the development cycle, allowing prompt fixes. Implementing Automated Vulnerability Checks 1. Set up automated actions: Configure your workflow to build components and perform vulnerability scans automatically. 2. Continuous monitoring: Ensure that your automation tools continuously monitor for vulnerabilities, even after deployment. 3. Regular updates: Keep the vulnerability scanning tools up to date with the latest security standards and threat intelligence to detect new vulnerabilities. Mastering the OpenTelemetry Collector empowers your organization to manage and optimize telemetry data efficiently. You can ensure a seamless and robust data collection process by customizing components, leveraging the OTel contrib distribution, and following best practices for setup, deployment, and security. Ready to take control of your telemetry data? Start building your custom OpenTelemetry Collector today and transform how you manage and analyze data.

How to Monitor SNMP with OpenTelemetry

Michelle Artreche — Tue, 09 Jul 2024 05:30:00 GMT

With observIQ’s contributions to OpenTelemetry, you can now use free, open-source tools to easily aggregate data across your entire infrastructure to any or multiple analysis tools. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here. In this blog, we cover how to use OpenTelemetry to monitor SNMP. The SNMP receiver can ship metrics to many popular analysis tools, including Google Cloud, New Relic, OTLP, Grafana, and more. What is SNMP? SNMP is a network management protocol used to exchange data between network devices. There are three main versions of SNMP, all of which are supported by the SNMP OpenTelemetry receiver. The SNMP receiver is most often used to monitor local area devices on the same network, so important signals vary by what kinds of devices appear on the network. SNMP is different from other receivers because it requires more specific knowledge of the devices on the network and specific configurations for the metrics to be collected. Some data that can be collected from SNMP include: Network Data Processes Uptime Throughput Device Data Memory Usage CPU Usage Temperature Introduction to OpenTelemetry The OpenTelemetry project (OTel), incubated by the CNCF, is an open-source framework that standardizes the way observability data (metrics, logs, and traces) is gathered, processed, and exported. OTel focuses specifically on observability data and enables a vendor-agnostic pathway to nearly any backend for insight and analysis. Installing the SNMP Receiver for OpenTelemetry If you still need an OpenTelemetry Collector with the latest SNMP receiver installed, we recommend using the observIQ OpenTelemetry Collector distro, which includes the SNMP receiver and many others. You can easily install it with our one-line installer. Feel free to return to this blog after running the install command on your source. Configuring the SNMP Receiver in OpenTelemetry If you're using the observIQ Collector, you can find your OpenTelemetry configuration file in one of the following locations: For Linux: /opt/observiq-otel-collector/config.yaml For Windows: C:\Program Files\Google\Cloud Operations\Ops Agent\config\config.yaml Open the configuration file and add the SNMP receiver following the provided example. Remember that SNMP manager configurations vary, so your setup may differ. For detailed instructions, refer to the SNMP monitoring guide on GitHub. Please see the examples below. Viewing the SNMP Metrics Collected by OpenTelemetry The SNMP metrics will now be sent to your chosen destination by following the steps outlined above. If you encounter any issues, please check that all authentication fields are correct and ensure that your exporter has the intended endpoint. observIQ’s monitoring technology is a big improvement for organizations that care about performance and efficiency. If you use SNMP, our solutions can significantly improve your monitoring. Look out for our future posts and simplified configurations for various sources. Ready to enhance your network monitoring with OpenTelemetry? Download our OpenTelemetry distribution or sign up for a free trial.

How to Install and Configure an OpenTelemetry Collector

Joseph Howell — Tue, 04 Jun 2024 19:57:03 GMT

In the last 12 months, there’s been significant progress in the OpenTelemetry project, arriving in the form of contributions, stability, and adoption. As such, it felt like a good time to refresh this post and provide project newcomers with a short guide to get up and running quickly. In this post, I'll step through: A brief overview of OpenTelemetry and the OpenTelemetry Collector A simple guide to install, configure, and ship observability data to a back-end using the OpenTelemetry Collector OpenTelemetry: A Brief Overview What is OpenTelemetry? The OpenTelemetry project (“OTel”), incubated by the CNCF, is an open-source framework that standardizes the way observability data (metrics, logs, and traces) are gathered, processed, and exported. OTel squarely focuses on observability data and unlocks a vendor-agnostic pathway to nearly any back-end for insight and analysis. What is an OpenTelemetry Collector? The OpenTelemetry collector is a service responsible for ingesting, processing, and transmitting observability data. Data is shared between data sources, components, and back-ends with a standardized protocol known as the OpenTelemetry Protocol (“OTLP”). The collector can be installed locally as a traditional agent, deployed remotely as a collector, or as an aggregator, ingesting data from multiple collectors. Benefits of using an OpenTelemetry Collector OpenTelemetry offers open-source monitoring tools that gather telemetry data for understanding distributed systems and applications. It helps overcome challenges like using proprietary tools and non-standard configurations. The project promotes a vendor-neutral framework and has gained support from various organizations. It provides flexibility through its collector SDKs, integrations, and distributions, and enables the consolidation of different telemetry pipes into an observability pipeline. Related Content: OpenTelemetry in Production: A Primer observIQ's contributions to OpenTelemetry observIQ has made several significant contributions to the OpenTelemetry Project: In 2020, observIQ donated the open-source log agent Stanza to the project. This code was further developed and established as the core logging library for the OpenTelemetry Collector in 2023. observIQ has contributed to and improved over 40 Receivers, Processors, and Exporters for popular technologies such as Azure, CloudFlare, NGiNX, and Windows Events for the OpenTelemetry Collector. observIQ has made significant contributions to the development of Connectors, which are an important part of the OpenTelemetry collector, facilitating advanced routing and connection between metric, log, and trace pipelines within the collector’s configuration. observIQ played a key role in designing and implementing the OpenTelemetry Agent Management Protocol (OpAMP), which enables remote management of the OpenTelemetry collector. In 2023, observIQ launched BindPlane OP, a purpose-built observability pipeline for OpenTelemetry. What are the primary components of the OpenTelemetry collector? Receivers: ingest data into the collector Processor: enrich, reduce, and refine the data Exporters: export the data to another collector or back-end Connectors: connect two or more pipelines together Extensions: expand collector functionality in areas not directly related to data collection, processing, or transmission. You can link these components together to create a clear and understandable data pipeline for observability within the collector’s configuration. Collecting and Exporting Host Metrics and Logs Let's start by considering the basic but crucial task of monitoring the health and performance of a Linux host running any workload. This involves gathering and sending host metrics and logs to a back-end for visualization and analysis. How to Get Started You’ll need a Linux host with superuser privileges - any modern distribution will work. For this example, I’ve deployed a Debian 10 VM on GCE. You'll also need a backend. I've opted for Grafana Cloud because it has a free tier with a native OTLP endpoint for data ingestion, making the configuration easier. You'll need a Grafana Cloud , and . You can set this up by following this link (takes about 5 minutes). Installing the OpenTelemetry Collector Start by running the installation command on your host. Remember, you can subsitute ‘0.85.0’ with newer releases as they become available. Once complete, otelcol-contrib will be added and managed by systemd; the collector will start automatically. You’ll find the collector configuration file here: /etc/otelcol-contrib/config.yaml Related Content: How to Install and Configure an OpenTelemetry Collector Reviewing the Default Configuration If you’re already familiar with the default configuration, you can skip the Configuring the Collector section. The default config.yamll includes pre-configured (optional) components and a sample pipeline to better understand the syntax. Let’s quickly take a look at each section: cat /etc/otelcol-contrib/config.yaml Extensions health_check: exposes an HTTP endpoint with the collector status information pprof: exposes net/HTTP/pprof endpoint to investigate and profile the collector process zpages: exposes an HTTP endpoint for debugging the collector components Receivers otlp: ingests OTLP formatted data from an app/system or another OTel collector opencensus: ingests spans from OpenCensus instrumented applications. prometheus: ingests metrics in Prometheus format -- pre-configured to scrape the collector’s Prometheus endpoint zipkin: ingests trace data in Zipkin format jaeger: ingests trace data in Jaeger format Processors batch: transmits telemetry data in batches, instead of streaming each data point or event. Exporters logging: exports collector data to the console. Very useful for quickly determining your config is working Service service: (AKA, “the collector”) where pipelines are assembled. It’s important to know that a component won’t be enabled unless it’s been referenced here. pipelines: reference the receivers, processors, and exporters configured above. Some (but not all) components can be shared across pipelines, as seen in the example (otlp, batch, logging). extensions: here’s where you enable your extensions that you’ve configured above. Note: logging, is the third type of pipeline you can create, but has not been added to the default config Configuring the Collector Next, let's update the config: vim /etc/otelcol-contrib/config.yaml I followed these steps: I removed optional components (for clarity, totally optional) I configured the required components I constructed both a metrics and logs pipeline Here's the result (with comments): Once the config.yaml has been updated, restart the collector review the output in your console: If all is well, you’ll start to see activity like this in your console, indicating the collector has restarted and data is flowing successfully: Finding your Observability Data in Grafana Cloud To access your Grafana Cloud account, open Grafana and go to the Explore console. Grafana Cloud automatically maps and directs OTLP data to Prometheus, Loki, and Jaeger data sources for metrics, logs, and traces. Note: if you’re running a local instance of Grafana, use the Loki and Prometheus exporters in place of the otlp_http exporter. Finding your Metrics To view your metrics, choose the Prometheus data source linked to your OTLP access policy. The metric names are associated with the groups we specified in the configuration. Finding your Logs To view your logs, select the Loki data source associated with your OTLP access policy. Then select then set ‘exporter = OTLP’ as the label filter. And that’s it! You’ve successfully installed, configured, and shipped observability data to a back-end using the OpenTelemetry collector. From here, you can continue to customize your configuration, build dashboards, and create alerts. I'll dive deep into those topics in a future post. If you have any questions or feedback or would like to chat about OpenTelemetry and observability, feel free to contact us on the CNCF Slack. Also, remember to subscribe to our newsletter for more tips, updates, and insights on observability and telemetry.

How to Monitor Host Metrics with OpenTelemetry

Joe Howell — Fri, 24 May 2024 22:59:00 GMT

Today's environments often present the challenge of collecting data from various sources, such as multi-cloud, hybrid on-premises/cloud, or both. Each cloud provider has its own tools that send data to their respective telemetry platforms. OpenTelemetry can monitor cloud VMs, on-premises VMs, and bare metal systems and send all data to a unified monitoring platform. This applies across multiple operating systems and vendors. In this post, I'll walk you through installing and configuring the OpenTelemetry Collector and gathering and shipping host metrics to Google Cloud Operations from a Windows and Linux host. Pre-reqs: A Linux or Windows host running on GCE. I'm using Debian and a Windows Server 2022 image in this example. A backend ready to ship and analyze your telemetry data. For this example, I’m using Google Cloud Operations. If you choose Google Cloud Operations, you’ll need: Remember to set up a service account (and corresponding JSON key) in your project and assign the following roles to the service account Logs Writer Monitoring Admin To access the service accounts, obtain the corresponding JSON key file. Set the path to your JSON key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable on your MySQL host. You can do this using the following command: GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json" Depending on how your system is set up, other authentication methods may be available. Step 1: Install the collector On Linux: Download and install the package: Once complete, otelcol-contrib will be added and managed by config.yaml, and the collector will start automatically. You’ll find the collector configuration file here: /etc/otelcol-contrib/config.yaml On Windows: On your Windows host, you can download the latest Windows executable from the open telemetry-collector-releases repo on your host. Once downloaded, open the command prompt as an Administrator, and untar the executable using the following command: After extracting the executable, download or copy the config.yaml to the collector's root directory: For reference, here's the default config: Related Content: Rapid telemetry for Windows with OpenTelemetry and BindPlane OP Step 2: Configure the collector Next, update the config.yaml with the one I’ve provided below to use the collector components below for Windows and Linux. hostmetrics receiver resourcedetection processor googlecloud exporter Then restart the collector: On Linux: On Windows: Related Content: How to Install and Configure an OpenTelemetry Collector Step 3: viewing the metrics in Google Cloud Operations You should now be able to view the host metrics in the Metrics Explorer in Google Cloud. Metrics collected Conclusion And that’s it - you've successfully configured OpenTelemetry to send host metrics from both Windows and Linux VMs in Google Cloud. We've walked through the process of installing and setting up the OpenTelemetry Collector, as well as collecting and transmitting host metrics to Google Cloud Operations from Windows and Linux hosts. Start leveraging OpenTelemetry now to get valuable insights into your system's performance and health for optimal operation.

observIQ Earns Gartner® Nod for Cutting-Edge Observability Innovation

Michelle Artreche — Mon, 20 May 2024 16:58:00 GMT

observIQ provides a unified telemetry platform using open standards and a powerful agent to collect, enrich, and transmit data. Built on an open-source framework, OpenTelemetry, it focuses on log management, metrics, and traces for modern observability at scale. observIQ Featured in Gartner®'s 2023 Hype Cycle for Monitoring and Observability observIQ is thrilled to be recognized in Gartner®'s 2023 Hype Cycle for Monitoring and Observability report. This year, observIQ has been positioned as a Sample Vendor in the Distributed Tracing category. Our inclusion in the Distributed Tracing category highlights our strengths in providing end-to-end visibility into complex microservices environments. observIQ helps Dev and Ops teams quickly identify and solve performance issues across services, infrastructure, and code. observIQ uses strong tracing capabilities powered by OpenTelemetry, to offer customers a better understanding of transaction flows and system interdependencies. This helps solve issues faster and makes better decisions based on detailed tracing data. What Our Inclusion Means This acknowledgment demonstrates our strategic position and advanced work in the observability space. It aligns observIQ with other top players that are shaping the future of monitoring and observability. We are focused on delivering a powerful and flexible observability platform, and this focus is paying off. While we are honored by this distinction, we won't stop here. We will keep pushing boundaries and finding new ways to meet the changing challenges of monitoring and observability. Most importantly, we’ll continue our commitment to exceed customer expectations, showing that we aim to provide the best value by solving real-world observability challenges. Key Observability Industry Trends The 2023 Hype Cycle report gives important insights into the main trends shaping the observability industry. As organizations speed up their digital transformation journeys, they are understanding the importance of having complete observability across their technology systems. Several key trends highlighted in the report are directly relevant to observIQ customers looking to optimize monitoring and troubleshooting. Gartner® predicts that AIOps, which stands for artificial intelligence for IT operations, will continue to grow. AIOps is expected to be increasingly used to automate routine tasks and find insights in large amounts of data. This aligns with observIQ's focus on using machine learning to help businesses cut through the noise and identify anomalies. The shift towards unified observability platforms is growing. The report highlights the increasing need for these platforms to bring together insights from metrics, logs, and traces. observIQ achieves this through its flexible, scalable platform designed to collect all types of telemetry data. Cloud-native environments are becoming more complex as organizations shift to using technologies like containers and microservices. To keep track of performance and availability across these dynamic, distributed environments, they need strong observability. observIQ provides visibility into these complex ecosystems. Observability-driven development is becoming more popular among DevOps teams. Instead of adding observability after deployment, teams are now integrating it into the development process. With observIQ, developers can easily add code for logs, metrics, and traces from the beginning. To improve business success, focus on enhancing customer experience. Companies are now prioritizing monitoring customer journeys and touchpoints because digital experiences are crucial. observIQ can track front-end performance and identify issues that harm CX. observIQ's Innovations in Observability observIQ was created to help organizations achieve complete observability across their entire environment. Our solution provides advanced capabilities to make this vision a reality for enterprises globally. Our platform's core is an advanced analytics engine that enables real-time log analysis at incredible speeds and massive scales. We process and analyze log data from any source, uncovering insights and detecting anomalies with embedded machine-learning algorithms. This allows users to observe their systems end-to-end and troubleshoot issues quickly. We were early pioneers in delivering observability through OpenTelemetry, an open standard for collecting and exporting telemetry data. Our platform is designed to ingest OpenTelemetry data out of the box, providing a unified view across traces, metrics, and logs. This eliminates data silos and enables faster incident response.

Multi-Project Routing For Google Cloud

Dylan Myers — Fri, 10 May 2024 18:00:49 GMT

When sending data to Google Cloud, like logs, metrics, or traces, it can be beneficial to split the data up across multiple projects. This division may be necessary since each team has its own project, a central project is used for security audit logs, or for any other reason that your organization has. BindPlane has effective tools to manage this process. In this walkthrough, we will add fields to telemetry entries, allowing us to associate entries with a specific project and properly route them. Prerequisites BindPlane OP At least 2 Google Cloud projects that you want to split telemetry among Permission to create service accounts, either using Google IAM or Workload Identity Federation Criteria on how to split your telemetry, based on what is within the telemetry itself Getting Started To get started, we first need to establish the criteria for the different backend projects. For this blog, we will be monitoring three log files on a Fedora Linux VM: /var/log/messages, /var/log/secure, and /var/log/firewalld. We will be routing logs from /var/log/secure to one project, all audit logs from /var/log/messages to the secure project, everything from /var/log/firewalld to a different project, SELinux logs from /var/log/messages to the same project as the firewalld logs, and then everything else to the “default” project. The final configuration will look like this: Project dylan-alpha Default project for everything that doesn’t get routed elsewhere Project dylan-beta Audit level logs. /var/log/secure “audit:” and “audit[\d]:” pattern matching from /var/log/messages Project dylan-gamma Non-audit security type logs /var/log/firewalld “SELinux” pattern matching from /var/log/messages These projects are already set up and ready to go. Later in the blog, we will set up the credentials to allow cross-project data sending. Preparing Google Projects Now that I’ve defined my criteria and have projects, I need to prepare the Google projects. The first step is to set up a service account in project dylan-alpha, as it will function as my primary account. I have decided to grant it full permissions required for logs, metrics, and traces, despite the fact that I am currently only sending logs. You can review the credentials requirements here. Within IAM settings in your Google Cloud Console, navigate to Service Accounts. Now, click Create Service Account. For the name of the service account, I suggest using 'telemetry-input. After generating a service account, copy the generated email address for later use. Next, click on the menu consisting of three dots and then select the option Manage keys. When you reach the key management screen, you should generate a JSON key file and download it. Once the project is switched to dylan-beta, we navigate to the main IAM page and click Grant Access with the copied email address and service key. This should open a sidebar where you can grant access by entering an email address. Paste the copied address for the service account and assign the required permissions for your telemetry type(s). Finally, click save. To create more projects, simply follow the same steps that were taken for dylan-beta. Repeat the process as many times as necessary. Data Flowing To prepare our data for the Google Cloud Projects, we need to create a configuration in BindPlane and deploy an agent to our system with that configuration. To do this, click on the Configurations link on the top bar in BindPlane. Then, click the Create Configuration. On the next screen, give your configuration a name and choose the OS type you're using. For example, you could name it Multi-Project-Routing and choose Linux. By clicking the next button, we will be taken to a screen where we can add sources. To do this, we need to click on Add Source. As I want to monitor log files on my system, I will select 'File' from the resulting list. I will then input the path values as follows: /var/log/messages, /var/log/secure, and /var/log/firewalld. Additionally, under the 'Advanced' settings, I have enabled the options for creating attributes of the file name (which is on by default) and file path (which is off by default). After you have finished configuring the source, click on the Save button followed by the Next button. This will take you to the page where you need to define your destinations. Since this blog focuses on Google Cloud, we will be selecting the Google Cloud destination. To authenticate, we will be using the json authentication method. Open the JSON service key file that you downloaded earlier and copy its contents into the appropriate box. Don't forget to give this destination a name and enter "dylan-alpha" in the Project ID box. After configuring my desired settings on the Destination, I click Save twice, resulting in the creation of my configuration and pipeline as shown in the screenshot below. To install an agent, I first click on the Agents link on the top bar. This will take me to the Agents page where I can click on the Install Agent button. On the next page, I select Linux as my operating system and choose my preferred configuration from the drop-down menu. This generates a one-liner that I can use to install the agent. Once the installation is complete, I need to go back to the configuration and click the Start Rollout. This will deploy the configuration to the agent, and I should start receiving telemetry data. Getting Telemetry Where It Belongs Now that telemetry is flowing to Google from our agent, all is good. Right? Well, no. Right now, everything is flowing to the dylan-alpha project. To fix the issue, we need to go to the configuration page and add some processors to enhance the logs with metadata for multi-project routing. First, we need to click on the processor icon on the left side, which is closer to the source. We will use the Add Fields processor twice- once for routing to dylan-beta, and once for routing to dylan-gamma. Using conditionals, we can select the telemetry on which the processor operates. For the first processor, we set the conditional to: (attributes["log.file.path"] == "/var/log/secure") or (IsMatch(body, "^\\w+\\s+\\d+\\s+\\d{2}:\\d{2}:\\d{2}\\s+\\w+\\s+audit(?:\\[\\d+\\])?:.*$")). Under the Attributes section below, we add a new field named gcp.project.id, and set its value to dylan-beta. For the second processor, we do the same thing with a different conditional: (IsMatch(body, ".*SELinux.*")) or (attributes["log.file.path"] == "/var/log/firewalld"), and the value of the attribute is dylan-gamma. The completed processors can be seen in the screenshots below. After saving these processors, return to the main configuration page. Then, select the right-hand processor icon closer to the destination and add a Group By Attributes processor. Set the attribute field to gcp.project.id. This is everything that is needed to route the data to the correct destination projects. However, there’s one more step that should be taken. The “default” project should act as a safety measure for anything that is missing the metadata needed to route it to another project. Since all projects have some basic logs related to the project coming in, I use the Add Fields processor to add a new attribute called no_project with a value of true. The conditional for this processor is set to: (resource.attributes["gcp.project.id"] == nil). This allows me to search for telemetry from this agent that doesn’t have this project intentionally set. Save these processors, and click the Start Rollout button. Once the rollout is complete, and enough time has elapsed for new logs to have been transmitted, we can see that all three projects have the logs that belong to them. Conclusion It is possible to perform multi-project routing for the Google Cloud Destination by using just a few simple processors to enrich the logs with a special resource attribute. You can also apply these same techniques to other processors to either enrich or reduce your data for any purpose. This method is also effective when you are using Workload Identity Federation, although the credential steps will differ. We will cover the use of WIF to authenticate in place of a service account in a future blog post once we have added official support for it.

Monitoring vs Observability

Joe Howell — Tue, 07 May 2024 02:19:54 GMT

Monitoring vs Observability: What is Reality? Before we start, I have a confession: I absolutely love Digg (people are still Digging things, right?) errr...Reddit. It actually is my front page to the internet, where I research upgrades for my home lab/VR/other niche hobbies, watch silly videos, ingest low-effort memes, judge if people are ‘AHs’ or not on /r/amitheasshole, and occasionally talk trash to other Redditors about my Michigan-based sports teams. An aside: my 11th-grade AP English teacher wouldn’t have been happy with that run-on sentence. Sorry, Mr. Smith. But I also love Reddit because it’s a great place to understand the community’s honest feelings about a topic—providing real mild, medium, and hot takes on various subjects when you’re really curious (yes, while acknowledging its echo-chamberiness). For this post, it felt like the perfect place to see how DevOps, SREs, and IT Ops folks think about the terms Monitoring and Observability. Are Monitoring and Observability the Same Thing? Often, the terms ‘monitoring’ and ‘observability’ are used interchangeably, and for good reason: both methodologies aim to achieve the same result, more or less: keeping your business-critical systems and applications running efficiently and securely. In fact, depending on your source, each word can be found in the other term’s definition—not confusing at all. Observe ~= Watch ~= Monitor ~= Observe ~= Watch ~=... well, you get it. When Googling for a clear comparison, there are many differing, jargon-rich, scientific descriptions. This also isn’t a surprise, as ‘Observability’ is thought to have been first coined in the 1960s by Rudolf Kalman, a brilliant mathematician and engineer in his famous paper about control theory. Even as someone working in this space for more than 10 years, I find these comparisons hard to digest at a glance. If I slightly unfocus my eyes (using the technique I picked up as a kid “reading” magic eye books) it almost looks like there may be no difference at all. Admittedly, as a non-brilliant-engineer, once a definition drifts into ‘internal states’ and ‘external states,’ my regular human brain tells me it’s time to head back over to Reddit and return to my hobbies, memes, and news about new firmware updates (sidebar: I could write a blog series on my general excitement for firmware updates), and finding new ways to tweak the performance of my Plex server that yield no real-world benefits for any of my users. And now, since I’ve memed myself into browsing Reddit again, let’s see what some of my fellow Redditors think about Monitoring vs. Observability and, you know, actually proceed with this post. Monitoring vs. Observability: some perspective from Redditors Unsurprisingly, there’s some sentiment that observability is just a fancy marketing term for monitoring or just another case of semantics. User /u/SuperQue summarized this pretty well (with a bunch of upvotes, by the way): /u/teivah had a similar-ish take; perhaps Observability is more of a fashion term wrapped around the three key signals/pillars of observability: But while perusing, /u/Just_Defy described it in a way that I really liked: If the digg button still existed, I absolutely would have absolutely smashed it. But since it doesn’t, I’ll add my upvote and co-opt this idea for the rest of this blog instead. Thanks /u/Just_Defy! Now, let’s jump into the why and bring this post home. Related Content: Understanding Observability: The Key to Effective System Monitoring Defining Monitoring and Observability What is Monitoring? Webster’s dictionary defines love as, err, I mean monitoring as: to watch, keep track of, or usually check for a special purpose. This definition tracks with how I think about it in DevOps/IT Ops/SRE land: Monitoring is the act of watching key signals to understand the state of a system or application. You can monitor metrics. You can monitor logs. You can monitor traces. You can monitor events. You can monitor profiles. You can monitor transactions. You can monitor flim flams, jub jubs, or any new signal that paints a clearer picture of your system's overall state. Each type of signal (well, maybe with the exception of jubs jubs) offers useful context about the overall state of your system but in most cases, doesn’t include enough information to pain a complete picture on its own. What is Observability? Similar to /u/Just_Defy's definition, I like to think about observability this way: Observability is the ability of a system or application to be easily understood. This means that your system application needs to expose information to understand what’s going on when it’s running, offline, or somewhere in between—enough to understand the unknown unknowns, enough to breach the 'easy' threshold. I’d argue that observability doesn’t require a pre-defined set of signals or pillars (googling around, it seems the number of ‘pillars’ of observability may be growing, or there’s an additional set of pillars to stack on the current ones), but rather just information for your team to move quickly and efficiently. Observability Litmus Test Much like software quality, it can be difficult to measure and judge whether a system has achieved the ‘observable’ gold star. I think of it more like Agile software development or DevOps. Observability is more of a methodology where you gather signals you think are important and continuously iterate. It’s also a bit of a gut check—“we’re solving issues in production efficiently.” I always trust my gut, of course, unless I decide to head to Arby's for lunch. Observability Criteria Here are a few criteria that help you figure out if you’re on the path to observability: You don’t need to deploy any new tools or code to completely understand an incident that occurs. You’re able to understand failures in a timely manner. If you’re saying “shit shit shit” for more than 2 hours, there’s probably work to do. You’re able to reason about a system’s state from a centralized location (and not 5 different tools). Challenges to Implementing Observability Though observability reemerged more than 5 years ago, there have certainly been challenges preventing teams from realizing its benefits. In the 2024 Observability Pulse Report, the average MTTR actually increased despite the promises of benefits and clarity that an observable system can provide. Non scientifically, I see a few items contributing to this: Telemetry data is still split across multiple SIEM and observability tools and backends. This slows down correlation/causation analysis and adds time to incident resolution. Telemetry data is still collected with different agents/collectors in different structures/formats. This makes it more difficult to reason about and derive meaningful insights when it arrives for analysis. The volume of telemetry data continues to grow, forcing organizations to make hard choices about their data, slowing query times, and generally making it more difficult to manipulate and analyze. Observability: Brought to you by OpenTelemetry Though we’re already starting to see references to observability 2.0 or the next generation of observability, personally, I’m not sure I’m quite ready for it. I’d make the case that we’re just now starting to arrive at a point where organizations can implement observable systems and applications with the help of OpenTelemetry. Related Content: What is OpenTelemetry? OTel standardizes how data is collected, formatted, and exported - and allows for connecting these signals together with context, all with a single set of tools. This is a critical piece of creating an observable system that didn’t exist before the project's inception. We're also seeing platforms further their efforts to natively support OpenTelemetry. Honeycomb is a leader here, but Splunk, Google, Grafana, and many more have GA'd native support for OTLP and are progressing with their native support for OTLP and consolidation of tools. OTel: The Building Block of Telemetry Pipelines OpenTelemetry also has the added benefit of being the perfect building block for telemetry pipelines. Telemetry pipelines, like BindPlane OP, allow organizations to gather, reduce, and refine all the telemetry required to build an observable system or application - and the controls to make it meaningful. In fact, there's probably a case to be made that a telemetry pipeline itself may be a pillar of observability, perhaps the most important. More on that thought in a different post, though. Monitoring vs. Observability: Are they different? Does it matter? The terms mean different things, but honestly, I don’t think precision in the vernacular really matters all that much, day to day. What matters is that your team understands the terms at a high level, why they're important, and has enough information to keep your systems running and figure out “why” if not. I suppose I could have opened with this. If you have any questions about observability, monitoring, OpenTelemtery, or BindPlane, contact our team at info@observiq.com.

How to Monitor SQL Server with OpenTelemetry

Daniel Kuiper — Mon, 15 Apr 2024 21:27:00 GMT

At observIQ, we've seen steady interest in observing the health of Windows systems and applications using OpenTelemetry. Requests on the SQL Server receiver continue to garner significant interest, so let's start there. Below are steps to get up and running quickly with the contrib distribution of the OpenTelemetry collector. We'll be collecting and shipping SQL Server metrics to a popular backend, Google Cloud. What is OpenTelemetry? OpenTelemetry “OTel” is a robust and comprehensive telemetry framework designed to capture, process, and transmit telemetry data such as distributed traces, metrics, and logs from your systems to an observability or SIEM backend for analysis. OpenTelemetry's Core Components As a quick primer, the OTel collector has a few primary components that facilitate data collection, processing, and transmission of the above signals. Here’s a quick breakdown: OpenTelemetry Collector: a lightweight data collector that can be deployed as an on-host agent or as a gateway for other collectors--shipping data to one or many configured destinations. The collector has a few primary components: Receivers: collect telemetry from a specific application or system (like SQL Server) or another OpenTelemetry collector via OTLP. Processors: transform the data by providing the levers to enrich, filter, mask (PII), and other data refinement techniques. Advanced users can utilize OTTL to do really interesting transformations. Exporters: transmit telemetry to another destination: another collector, to file, to an observability/SIEM backend These components can be chained together as a logical pipeline in the collector’s configuration file, mirroring the end-to-end flow of a telemetry pipeline. Next, let’s jump into some of the key SQL Server signals. OpenTelemetry Components to Monitor SQL Server Here are a few of the key OpenTelemetry components you can use to monitor your instance: sqlserver receiver collects SQL Server database/instance metrics hostmetrics receiver collects operating system and specific process metrics windowseventlog captures, parses, and ships Windows Events in a standardized way Which Signals Matter for Monitoring SQL Server? Here’s a short list of signals to consider when implementing SQL Server monitoring in your environment: Cache Hit Ratio Monitors how quickly requests are being handled from memory. If the ratio is lower, SQL Server may need more memory allocated. Transaction Write Rate Monitors the rate of transactions in the database. This provides a valuable context for overall activity in the database, bottlenecks, and over-utilization. User Connections Monitors active user connections in the database. Page Split Rates Monitors the rate of page splits, which occur when there’s insufficient space in an index. Excess page splitting can cause excessive disk I/O and decrease performance over time, and it is incredibly impactful in clustered environments. Lock Wait Rates Monitors the rate lock waits, occurring when a transaction needs to access another transaction's resource. Monitoring lock waits can help identify blocking and deadlocking issues, which can severely impact transaction performance. Log File Size and Growth Monitoring the log file volume and growth can prevent space issues and provide more insight into transaction volume and indicators to increase transaction volume size. OS/Process Metrics Monitor SQL server process consumption on a Windows host. Monitor OS consumption metrics to understand the Windows host’s overall health. Windows Events Monitors application, system, and security events related to SQL Server. It'd help provide context to help with root cause analysis. Conveniently, all these signals (and more) can be gathered with OpenTelemetry. Related Content: How to Monitor MySQL with OpenTelemetry Setting Up the OTel Collector to Monitor SQL Server Prerequisites Access to Windows Host running SQL Server (2012 R2 or later) Download the most recent 'otelcol-contrib' tarball for Windows from the releases linked in the getting started docs. This package includes all the components we need to step through this example. Extract the tarball after you download it. I’d recommend downloading 7-zip, or you can use Windows's relatively new tar PowerShell command: Have a backend ready to ship and analyze your monitoring data. For this example, I’m using Google Cloud Operations, a destination I’ve frequently used. To send to Google Cloud Operations, you'll need a service account in your project with the following roles: Logs Writer Monitoring Admin Create a new json key for your service account and copy it into a file for the collector to access Set the full path to your JSON key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable. Depending on your setup, there are other options available too! Configuring the SQL Server receiver In a Windows environment, you must first manually create a configuration file in the collector’s directory. This file provides instructions to the collector, calling specific components you’ve identified in your pipeline. First, let's start by adding a SQL Server receiver to your pipeline: Configuring the Host Metric receiver Next, add the Host Metric Receiver to our configuration, which is configured to gather cpu and memory metrics: Configuring the Windows Events receiver After that, add the Windows Events Receiver to our configuration, configuring it to collect application, system, and security events. Configuring the Google Cloud exporter Lastly, let's add the Google Cloud Exporter to our collector configuration. It will utilize the credentials/environment variable set in the prerequisite steps. Configuring your Pipeline Now, we can assemble our pipeline in the collector configuration, referencing the components we've added above. In this example, our pipeline will include metrics and logs as we're gathering both types of signals: Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP Running the OTel collector Run the collector binary by specifying the path to the configuration you just created, like below. Make sure to run as admin to collect all performance counters needed for metrics. Viewing the metrics collected If you followed the steps detailed above, the following SQL Server metrics will be available in your google cloud metrics explorer. By selecting the optimal backend, designing intuitive dashboards, and configuring intelligent alerts, you're not just envisioning a more efficient SQL Server environment — you're making it happen. Need assistance, have requests, or suggestions? Feel free to contact our support team at support@observIQ.com.

Enhancing Data Ingestion: OpenTelemetry & Linux CLI Tools Mastery

Dylan Myers — Thu, 11 Apr 2024 18:26:57 GMT

While OpenTelemetry (OTel) supports a wide variety of data sources and is constantly evolving to add more, there are still many data sources for which no receiver exists. Thankfully, OTel contains receivers that accept raw data over a TCP or UDP connection. This blog unveils how to leverage Linux Command Line Interface (CLI) tools, creating efficient data pipelines for ingestion through OTel's TCP receiver. Prerequisites One or more data sources that OTel does not natively support This will currently only work for logs or metrics that can be ingested as logs and converted to metrics in the OTel pipeline. One or more systems with the BindPlane OpenTelemetry Collector on which we will apply a configuration that includes a TCP Source Appropriate firewall rules for the chosen TCP port to allow incoming connections It can be a Gateway set of collectors behind a load balancer. A Linux system that either has or can have added all the CLI tools needed for a particular use case Some commonly used tools are netcat (aka nc, this one is required!), curl, jq, awk, grep, head, tail, date, and cut. Leveraging Linux (Or UNIX) CLI for Optimization Linux command line (CLI) utilities are the tools of choice for processing and transmitting data. I will illustrate the required tools (netcat, jq) and several additional tools that can help format the data properly before sending it. Netcat The first tool we will look at is called Netcat. Most Linux distributions have the actual binary in the short form of nc. Netcat is a tool that lets you read from and write to network interfaces. We will use it to write to a port the collector listens on using the TCP Source. The syntax we will be using is straightforward: command chain | nc localhost 7777 The above is an abbreviated sample with a command chain that ends with piping the output to Netcat. Netcat is sending it to localhost on port 7777. These are simply what I have chosen. jq Another tool that isn’t required for every data set but is for many of them is called jq. This tool is a lightweight JSON processor. The most common use for jq is to print JSON files pretty. However, for our use case, we will be using it to format the data so that each single record within the JSON is on one line. For example, we can format data that is in pretty print already back to one record per line and pipe it to Netcat as discussed above: command chain | jq -c '.[]' | nc localhost 7777 This will take a command chain that gets the data, such as using curl against an API endpoint, pipe it to jq for formatting, and then pipes it to netcat to send to the collector. Combining Tools for Enhanced Ingestion Other tools can be handy. For the sake of brevity, these will be very basic examples. I will also not cover every single possible tool. Linux has so many great CLI utilities that can be beneficial to data manipulation that a 400+ page book would be required to cover them all in all possible iterations thoroughly. Head and Tail First, a quick look at the head and tail utilities. Head is a tool that returns several lines from the start of a file (or data stream). If you specify a number prefixed with a - symbol, it will instead trim that many lines off the end of the output. Tail is the opposite in that it returns lines from the end of a file. If you pass it a number prefixed with a + symbol, it will start from that file line (or data stream) instead. This is useful for removing a file's header and/or footer. For example, to remove a 5-line footer and a 25-line header from a JSON file: head -n -5 sample.json | tail -n +26 | jq -c ‘.[]’ | nc localhostIn the example, we specify -5 for head to drop the last five lines of the file. Then, we specify +26 to tail to drop the file's first 25 lines (we specify the line we want to start on). Finally, we pass it on to jq and netcat as in the previous examples. Date If our data doesn’t have a timestamp on it, we can manually add one. The OTel collector can also do this via its “observedTimestamp” field. However, there are often reasons to add a timestamp before reaching the collector. I’ll be using echo and xargs in this contrived example. command chain | xargs -I{} echo "$(date +%FT%TZ) {}" | nc localhost 7777 This time, we have a command chain to gather the data. Then we iterate through each line of the data using xargs and use echo to prefix them with a date in the format of %FT%TZ, which can also be illustrated as YYYY-MM-DDTHH:MM:SSZ and looks like this: 2024-03-27T12:02:56Z. After prepending the data with the date (what a mouthful), we will send it to our collector via Netcat again. Curl Curl is a common tool for retrieval of data from a remote server. This can be used to retrieve data from an API. Our example here is very straightforward: curl remote.host.test:5580 | jq -c '.[]' | nc localhost 7777 As we’ve just built on our previous jq example using curl as the “command chain.” Curl reaches out, gets the data, pipes it to jq, which formats it, and finally pipes it to Netcat to be sent to the collector. Data Ingestion Via TCP Source In the previous section, I referred several times to sending the data to the collector. Our configuration to support this is a simple TCP logs receiver. I’ve configured it to listen on all interfaces and port 7777. Related Content: How to Manage Sensitive Log Data An End-to-End Example: NetFlow Data We have discussed the theory and general principle of using Linux CLI tools to massage our data into a format we can efficiently work with inside the OTel Pipeline. Now, I would like to use those principles to build a real-world example. A requirement of this example is a utility called glowflow2. I have an open request to turn goflow2 into an actual receiver (source); however, until that exists, I still need to ingest my NetFlow, sFlow, and IPFIX data. My command line looks like this: goflow2 | nc localhost 7777 Which uses the default ports of 2055 for NetFlow and IPFIX and 6343 for sFlow. It then pipes it to Netcat, connecting to localhost 7777, where the above TCP source is already running. $ goflow2 | nc localhost 7777 INFO[0000] starting GoFlow2 INFO[0000] starting collection blocking=false count=1 hostname= port=6343 queue_size=1000000 scheme=sflow workers=2 INFO[0000] starting collection blocking=false count=1 hostname= port=2055 queue_size=1000000 scheme=netflow workers=2 Now, I’ve edited the TCP source to make a few changes. First, I changed the Log Type to “goflow2”. The timestamp parsing was already set to Epoch and ns (nanoseconds) in preparation for this. Lastly, I added a short description: “goflow2 NetFlow/sFlow/IPFIX”. For testing purposes, I’m sending to a custom “nop” destination (No Output/No Operation) that drops the data into the ether. My pipeline currently looks like this: The next step involves connecting up a data source on the input side of goflow2. For the purpose of this blog, I’m using a data generation tool called Flowalyzer that supports NetFlow v5, NetFlow v9, and IPFIX. It is a Windows tool, but lucky for me, it runs just fine under Linux using Wine! I’ve configured it to send NetFlow v9 data. This is what it looks like: After clicking “Start” and letting it run for a bit, I have the following data flow and recent telemetry: Using Processors to Reduce, Enhance, or Convert Data Now that we have data flowing, we should take advantage of the data reduction, enhancement, and conversion capabilities offered by BindPlane. Right away, after looking at the data, I see an easy reduction option. In my test data, as_path, bgp_communities, and bgp_next_hop are all empty maps or strings. I only want data if it isn’t empty, so I will put in a Delete Empty Values processor to filter those out. Before Processor. After Processor. Under attributes is a bunch of net information irrelevant to our actual log. It is the network information related to the connection from Netcat to OTel’s TCP source. Thus, I will drop those by using the Delete Fields processor. Before and after of the Attributes. Making these two simple reductions reduces our data by about 20-25% with the sample data I’m generating with Flowalyzer. With some creativity and the right set of tools, logs from almost any source can be piped into an OTel collector using the TCP source. Those logs can be parsed and manipulated just like any other source. Many tools are useful, especially on Linux. However, as shown, some Windows tools can be used as well. If you’re struggling with a set of data from a strange source, have a sit down with your local Linux guru, this blog, and get creative! I’m confident a solution can be found, just as I did with my NetFlow use case. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP As I wrote my conclusion, I noticed that my NetFlow data had stopped coming through. A quick investigation reminded me that Netcat doesn’t handle dropped connections well. A collector restart for my config change had caused a “broken pipe” (Ncat: Broken pipe.). So, I’ve made use of Linux capabilities a little further and solved it by wrapping the Netcat portion in an infinite loop that keeps restarting it every time it shuts itself down: goflow2 | while true; do nc localhost 7777; done Just to reinforce that there is always a way. Okay, maybe not always, but almost always.

Gateways and BindPlane

Dylan Myers — Tue, 02 Apr 2024 17:37:00 GMT

The BindPlane Agent is a flexible tool that can be run as an agent, a gateway, or both. As an agent, the collector will be running on the same host it's collecting telemetry from, while a gateway will collect telemetry from other agents and forward the data to their final destination. Here are a few of the reasons you might want to consider inserting Gateways into your pipelines: Collection nodes do not have access to the final destination Limiting credentials for the final destination to a small subset, the gateways, of systems to reduce vulnerability of exposure If the gateways are on a cloud instance, ability to use instance level credentials - such as for Google Cloud destinations Offloading data processing (parsing) to the gateways to prevent overloading of collection nodes Apply universal parsing to data streams coming from disparate devices Provide correlation to those data streams, such as trace sampling Today, we will examine these reasons and some possible architectures for implementing gateways. We will also review how they appear in BindPlane. Prerequisites BindPlane OP Several BindPlane agents used as edge collectors One or more BindPlane agents used as gateway(s) A final destination, such as Google Cloud Logging/Monitoring/Trace (Optional) Load balancer for the gateways if using more than one Starting Point As a starting point, I am using the OpenTelemetry microservices demo running on GKE. In addition to this, I’ve created both a deployment and a daemonset of the BindPlane agent in the same cluster. For configuration, I have the generic OTel collector from the demo forwarding all data to my BindPlane daemonset. This is a sort of gateway in and of itself, but is only needed because I am not managing the embedded collector from the demo with BindPlane. The daemonset configuration consists of a Kubernetes Container source, a Kubernetes Kubelet Source, and an OTLP source. The OTLP source is the endpoint for the data from the embedded generic collector. And here's the configuration for BindPlane OP. Moving To Single Node Gateway Model We’re going to add a BindPlane agent into the pipeline as a gateway. Here is what the final architecture will look like. In order to convert this setup from direct to destination to an gateway model, I start by copying the configuration. Once I’ve created the duplicate configuration, I edit it to remove all the processors and replace all the destinations with a single OTLP destination. The processor removal isn’t required. However, I am doing it to illustrate the ability to offload such processing from the edge nodes to the gateway(s). Typically, gateway nodes are dedicated systems that are well-provisioned and do nothing else. Due to those higher resources dedicated entirely to the gateway agent, performing all processing on them is often desirable. This has the added benefit of simplifying the configurations present on the edge nodes. In the above screenshots, we are exporting to a single gateway on the IP 10.128.15.205. This gateway is configured with an OTLP source, the destinations previously configured on the pods, and also the processors that we removed from the pods. Using this model, we have successfully offloaded both credentials and processors from the edge nodes. This reduces our vulnerability of credential exposure by having them present on only a single system. It also reduces the workload on the k8s pods of our edge nodes. For simplicity and brevity, I showed the configuration of a single node gateway in this section. However, I did not apply these configurations and start the data flow. I will show the data flow at the end of the entire blog. Related Content: Configuration Management in BindPlane OP Moving To Multi-Node Gateway Model The multi-node gateway model is the same as a single node, with the exception of adding a load balancer and more nodes running the gateway configuration. Moving to this model can be done directly from edge node or single gateway node models. Since I wanted to show both models in this blog, I am moving from the previously demonstrated gateway model. The gateway configuration does not need any changes. It just needs to be applied to one or more additional nodes. In my case, I have a 3-node set. In front of them sits a load balancer forwarding port 4317, the GRPC OTLP port. A single minor change does need to be made to the edge configuration. This will replace the IP of the single node, 10.128.15.205, with the ip of the load balancer, 10.128.15.208. Related Content: A Step-by-Step Guide to Standardizing Telemetry with the BindPlane Observability Pipeline Verifying Data Flow Now that everything is set up and running, we can check one of our gateway nodes to validate that data is flowing. From the above screenshot, we can see that telemetry is flowing through our pipeline. We can toggle between logs, metrics, and traces to validate we’re seeing all three signals. For a final validation, we could also check our destinations. I’ll skip that today, as it has been covered in several previous posts. Next Steps Now that we have shifted final destinations and data processing to a gateway set, we could add additional processing to the gateway configuration. Any time a destination change is needed, it will only affect these few nodes. The new configuration for such a change could be rolled out very fast. Additional data inputs could be added to this configuration for direct-to-gateway sources such as syslog, raw tcp logs and metrics, and native OTLP trace instrumented applications. This sort of change would further offload work from your edge nodes. Architectures Today, we’ve examined two simple architectures and discussed ways to enhance them. However, other architectures for gateway exist. Touching on these briefly, with the two we examined today at the top of the list, we have: Single node gateway - ideal for small environments with a limited number of edge nodes Multi-node load balanced gateway set - scalable and ideal for most enterprise environments Multi-layer, multi-node gateway sets - For very large enterprise environments Has a gateway set per data center, region, or other division point These initial gateway sets will perform data processing, offloading The destination for these gateways will be a final load balanced gateway set that performs the transmission to the final destination(s) Ideally, in this large environment, the final gateway set will be distributed across multiple locations, and an intelligent load balancer will sit in front, directing traffic to the closest healthy nodes. Offers the most redundancy and data safety Traffic director gateway sets - For data segregation This could be a multi-layer gateway An initial gateway set figures out where traffic belongs and forwards that traffic Each directed traffic destination can go directly to the final destination or a destination gateway set in the multi-layer style above as appropriate for the volume of traffic There are likely other setups that I have yet to consider or think of, but these are the ones we see most frequently. Conclusion BindPlane provides users with a robust data management environment, and gateways are one of the most important tools in the arsenal. As seen today, there are many ways in which gateways can help protect your credentials, correlate, process, and route your data. Gateways provide much-needed flexibility to your data pipeline, especially when combined with other tools, creativity, and intelligent deployment strategies.

How to Manage Sensitive Log Data

Joe Howell — Fri, 29 Mar 2024 20:41:59 GMT

According to Statista, the total number of data breaches reached an all-time high of 3,205 in 2023, affecting more than 350 million individuals worldwide. These breaches primarily occurred in the Healthcare, Financial Services, Manufacturing, Professional Services, and Technology sectors. The mishandling of sensitive log data provides an on-ramp to various cyber-attacks. Compromised credentials, malicious insiders, phishing, and ransomware attacks can all be initiated with sensitive data stored in log files. As an organization, keeping sensitive logs safe is extremely critical. If you don't, it can have heavy consequences that impact your business and the people you work with, like partners, customers, and employees. Plus, as application and system architecture continue to grow, so does the amount of sensitive log data. Finding a good way to handle all this data securely is essential. In this post, I’ll guide you through: The risks of logging sensitive data Applicable regulations and standards Best practices for managing log data Tips to achieve logging compliance Tools to manage sensitive log data What is Sensitive Log Data? Sensitive log data refers to any information captured in log files that could potentially cause harm to an organization if exposed to an unauthorized party. Some of the key categories include: Personally Identifiable Information (PII) includes names, addresses, social security numbers, phone numbers, and email addresses. Financial Information: Credit card numbers, bank account numbers, or other financial transaction details. Protected Health Information (PHI): Medical records, health insurance information, or any data related to an individual's health status. Authentication Information: usernames, passwords, API keys, or session tokens that could be used to gain unauthorized access to systems or applications. Confidential Business Information: Trade secrets, proprietary algorithms, intellectual property, and other financial information. Encryption keys: Any keys used to encrypt or decrypt sensitive data, which could compromise the security of the encrypted information if exposed. Sensitive URLs or API endpoints: Log entries that reveal sensitive API paths or contain sensitive data within the URL structure. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP What are the Risks of Logging Sensitive Data? When you don't handle sensitive data logs carefully, it can lead to serious problems for both organizations and individuals. Here's what could happen: Data Breaches Suppose someone gets into your sensitive logs without permission; they could access PII, financial data, and intellectual property. Breaches like this could have an enormous monetary impact on your business and hurt the trust of your customers, partners, and employees. For example, a report by Ponemon in 2023 found that the average cost of a data breach worldwide was about 4.5 million dollars, and nearly 9.5 million dollars in the United States alone. Compliance Violations Sensitive data exposure may violate laws and regulations such as GDPR, HIPAA, and SOX. Breaking these rules can result in major fines, ranging from thousands to millions of dollars, depending on how bad the breach was. For instance, since 2003, the Office of Civil Rights has settled or imposed penalties for HIPAA violations totaling more than $140 million dollars in about 140 cases. Reputational Damage If your data leaks, people will lose trust in your organization, harming your public image and competitive advantage. Another way to think about it is one and done. In fact, a report from IDC in 2017 showed that 80% of consumers in developed countries will jump ship from a business that accidentally exposed their PII. So, it’s crucial to handle sensitive data logs with care to avoid these kinds of problems. What Laws Apply to Sensitive Log Data and Logging Compliance? Several key pieces of legislation have been passed-- implementing frameworks that directly affect how sensitive log data is managed. These frameworks require businesses to comply with specific requirements or face significant penalties. Laws and Regulatory Frameworks The General Data Protection Regulation (GDPR - EU): mandates the protection of personal information through data minimization, retention limits, access controls, and encryption. The Health Insurance Portability and Accountability Act (HIPPA - US): establishes requirements for access control, audit controls, data integrity, authentication, encryption, and retention of ePHI. The Sarbanes-Oxley Act (SOX - US): mandates the maintenance and retention of audit trails and logs that could affect financial reporting and compliance. The Federal Information Security Management Act (FISMA - US): requires that federal agencies develop, document, and implement an information security and protection program. The Gramm-Leach-Bliley Act (GLBA - US) mandates that financial institutions protect confidential consumer information, including logs and audit trails. GLBA Please note that there are many state and provincial regulations to understand and consider as well. What are the Standards that Apply to Sensitive Log Data? In addition to laws and regulatory frameworks, several security standards help guide organizations with specific requirements for managing sensitive data. Compliance with these standards is not legally mandated, but vendors often require proof of certification, demonstrating a commitment to security and proper controls. Here is a list of some of the most notable standards: The Payment Card Industry Data Security Standard (PCI DSS): is an internationally agreed upon standard put forth by the ‘big 4’ credit card companies. Often confused as a federal regulation, it prohibits logging full credit card numbers and CVV codes and requires masking, encryption, and tight access controls. Service Organization Control 2 (SOC 2): is a set of standards specifically tailored for companies that store customer data in the Cloud. It’s built around five “Trust Service Criteria”: Security, Availability, Processing Integrity, Confidentiality, and Privacy. ISO/IEC 27001 is a widely recognized international standard for implementing an Information Security Management System (ISMS). It includes requirements for maintaining logs and performing regular log reviews. National Institute of Standards and Technology (NIST SP 800-92): These guidelines provide guidelines related to secure log management and sensitivity considerations for sensitive log data. Logging Compliance vs. Certification A quick note about compliance vs. certification -- as these terms are occasionally used interchangeably (incorrectly). In this context, Compliance refers to adhering to regulatory requirements or optional standards mentioned above. Certification means stepping through a defined process where a third party validates your implementation and issues a certification. For example, for HIPAA, you can acquire a certification, but only compliance is mandated by law. Although certification tends to be optional, it is a useful way to ‘check the boxes’ and ensure your organization is aligned correctly. Next, let’s move down a level and discuss some best practices for managing sensitive data and tips for achieving logging compliance. Related Content: Reducing Log Volume with Log-based Metrics Best Practices for Managing Sensitive Log Data These best practices apply to the regulations and standards mentioned above. Map Sensitive Log Data: create a map of your sensitive log data identifying its source, location, and sensitivity level; corresponding to the regulations and standards that pertain to your business. Encryption: ensure all log data is encrypted during transmission and at rest with TLS or mTLS depending on where it's being delivered. Isolate Sensitive Log Data: tag and route sensitive log data to separate buckets, indexes, or back-ends/analysis tools. This makes it easier to apply different levels of monitoring and analysis, identify anomalous behavior, and mitigate the risk of exposing sensitive information. Implement role-based access controls (RBAC): utilize role-based access controls to limit exposure to sensitive log data to only the necessary teams or tools that need it. Within each role, implement minimized, least-privileged access. Structure your Logs: Adding standardized structure to your data makes detecting and identifying sensitive information easier. These clear structures also make log output easier for your software developers. Obfuscate Sensitive Log Data in Code: Tokenize and mask sensitive information like usernames, passwords, IDs, and credit card numbers before it is logged. Obfuscate Sensitive Log Data before Analysis: most observability and SIEM tools build-in functionality to filter, mask, and route sensitive log data. If it isn’t handled in code, use the tools available in your observability Code reviews: make data sensitivity checks included in regularly scheduled code reviews. Leverage Automated Tooling: utilize automated tooling to detect potentially sensitive information in your code. Github, for example, can automatically scan and notify and potentially notify on the existence of sensitive data-exposing patterns with secret scanning patterns. Log Testing and Validation: on the backside, include manual checks to identify unencrypted or sensitive data in logs as part of regression and/or acceptance testing. Retain Audit Logs Securely (longer than required, but not forever): Regulations dictate specific requirements for how long audit logs must be stored. But for many scenarios, audit logs provide useful context to investigate anomalous behavior and compromised systems. In some instances, systems are compromised for years without detection. Having audit data available that goes 6-12 months can be useful. Focused Training: Make a concerted effort to train your employees on the importance of managing sensitive data at regular intervals. Tips for Logging Compliance and Certification Following the logging security best practices are a good way to ensure compliance, but it’s not a comprehensive list. When talking to customers, we generally recommend the following: Research the appropriate legislation and regulatory requirements that directly impact your organization's applications and systems Align to one or many of the standards outlined above that apply to the domain of your business or application, using standards as a blueprint to help guide your teams with specific actions and tasks. Seek certifications, even if they’re optional. HIPAA, SOC2, and PCI DSS have optional certifications but eliminate risks and reduce hurdles as potential customers move forward in the buying process. Start your compliance and certification efforts sooner, rather than later. For SOC 2 compliance, it often takes 6-12 months to complete a 3rd party assessment. Use compliant observability and SIEM tools with required certifications to analyze your data Use a telemetry pipeline to centralize the management of your sensitive log data. Using a Telemetry Pipeline to Manage Your Sensitive Log Data Lastly, a quick note on telemetry pipelines. One of the challenges of managing sensitive log data is pulling together and analyzing disparate streams of log data to ensure their compliance. Sensitive log data can be sourced from different proprietary agents, arriving with inconsistent structure -- with a lack of a centralized management plane, makes it significantly more complex to mask and route data appropriately. The Case for OpenTelemetry OpenTelemetry is an open framework that provides a standardized set of components to collect, process, and transmit log data to one or many destinations. Its primary goals are 1. Vendor Agnosticism and 2. Data Ownership, giving its practitioners maximum control of their log data. In the context of sensitive log data, it provides the functionality to split and isolate low and high-sensitivity log data streams, as well as filter, mask, and later re-hydrate audit log data for further analysis. Here’s a few of the notable components. OpenTelemetry Collector: a light-weight agent/collector that ingests, processes, and ships telemetry data to any back-end Processors: are the mechanisms that enable filtering, masking, and routing log data within the OpenTelemetry Collector. Connectors: enable combining one or more telemetry pipelines within an OpenTelemetry Collector OTTL: an advanced language, enabling advanced transformations of your log data. Related Content: What is OpenTelemetry? Building a Telemetry Pipeline with BindPlane OP BindPlane OP builds on top of OpenTelemetry, and provides a centralized management plane for OpenTelemetry collectors and streamlines the process of filtering, masking, and routing sensitive data. It also simplifies the processing of routing log data to one or many destinations, and provides a singular view of your telemetry pipeline, enabling visibility and actionability on your sensitive data from a single place. And that’s a wrap. If you’re interested in learning more about security best practices, OpenTelemetry, or BindPlane OP, head over to the BindPlane OP solutions or reach out to our team at info@osberviq.com.

Turning Logs into Metrics with OpenTelemetry and BindPlane OP

Joe Howell — Fri, 22 Mar 2024 18:58:56 GMT

Turning logs into metrics isn’t a new concept. A version of this functionality is implemented in most agents, visualization tools, and backends. It’s everywhere because converting logs to metrics has many practical applications and is one of the fundamental mechanisms for controlling log volume in a telemetry pipeline. In this post, I’ll briefly overview log-based metrics, explain why they matter, and provide examples of how to build them using OpenTelemetry and BindPlane OP. What is a Log-Based Metric? As its name implies, a log-based metric ('LBM') is a custom metric derived from log data. LBMs are created by extracting key bits of information from a log, aggregating those bits with an operator (average, count, sum, etc.), and outputting the result as a time-series metric. Why Log-Based Metrics are Important Use Cases for Log-Based Metrics Log-based metrics satisfy a wide range of use cases for SREs, DevOps, Product, and Compliance teams, such as: Observing an application or system by supplying one or all of the 4 Golden Signals not natively exposed by the application or system. Monitoring anomalous security behavior by mathing (definitely a word, trust me) charting logon activity and access requests. Monitoring compliance standards, where logs are often the only available signal to work with. Real User Monitoring (RUM): tracking sign-ups, onboarding friction, in-app user behavior, and usage trends. Data Reduction: shrinking large volumes of log data into bite-sized, concentrated metrics. Benefits of Log-Based Metrics Implementing log-based metrics in your pipeline can have several lasting benefits: Significantly Reduces Costs: Firsthand, we’ve witnessed log volume and licensing costs reduced by up to 80% for specific applications after implementing LBMs. Reduces Stress on Infrastructure: implementing log-based metrics minimizes the load on your network and hosts, as the size of the data is significantly reduced Facilitates vendor-neutrality: When created outside your observability backend, LBMs can be carried forward to new platforms as your organization's needs change. Where should log-based metrics be implemented? Observability/SIEM backends Some observability/SIEM backends incorporate this functionality, processing the data after it’s delivered via ingestion API. However, this approach can come with extra costs, some of which are hidden. Creating log-based metrics in-platform means your team spends more time and effort customizing proprietary software, which biases your telemetry pipeline toward a specific vendor. Consequently, the customizations must be recreated if the budget or feature set dictates a migration to a new backend. Telemetry Pipeline (recommended) Conversely, by creating log-based metrics within an OpenTelemetry-backed telemetry pipeline, users can process log-based metrics closer to the edge of their pipeline, making it easier to carry the time and effort forward. Creating Log-Based Metrics with OpenTelemetry: 2 Methods Now, let’s dive in further. There are 2 methods to be aware of when building log-based metrics in OpenTelemetry-- each leveraging a different core component. Method 1: Building Log-Based Metrics with Connectors Summary and Breakdown A Connector is a relatively new component of the OTel collector that bridges metric, log, and trace pipelines. This flexibility enables the creation of LBMs with minimal OTel components and configuration. In this example, we’re collecting Windows events with the windowseventlog receiver and using the count_connector to count login attempts and construct our log-based metric. Within the collector's config.yaml, the count_connector is defined as both an exporter in the logs pipeline and a receiver in the metrics pipeline, establishing the bridge between the two. This creates a pathway for the LBM to move through the metric pipeline and on to Google Cloud for analysis. Sample OTel Collector Configuration For more context, here’s a sample OTel config.yaml that maps to the diagram above. In this config, we’ve named the log-based metric windows_event.count and have moved the log's event_id to an attribute which appears as a metric label in Google Cloud Operations. Ideal, But Not Recommended (Yet) Connectors will soon be the recommended approach to building log-based metrics. Today, however, there’s a functionality gap. Specific operations (like counting) require a corresponding connector; the connector library isn’t comprehensive enough to cover the most common use cases, but I expect it to expand steadily in 2024. As a quick aside, if you’re interested in learning more about Connectors, Dan Jaglowski gave an excellent talk at Kubecon EU ‘23 - I highly recommend it. Method 2: Building Log-Based Metrics with OTel Processors + BindPlane (Recommended) Summary and Breakdown The second method involves using processors to construct a log-based metric. The count_logs processor can be used to count occurrences of specific log events, but creating a log-based metric also requires stringing together several other processors: moveprocessor: to move some valuable bits in the body of a message to an attribute routeprocessor: to route the LBM to the metric pipeline's exporter filterprocessor: to drop Windows Events before they’re passed along to the googlecloudexporter Recommended, but with added complexity This method has proven very effective and is what we currently recommend to our customers. If you’re new to OTel, chaining three or more processors together may seem overly complicated for the outcome we’re trying to achieve—I certainly empathize. If you create your OTel collector configuration with BindPlane OP, nearly all complexity can be avoided. Creating Log-Based Metrics with BindPlane OP From BindPlane OP’s configuration builder, we can create log-based metrics with the help of a wizard and simplified UI. Create an OTel configuration First, we need to create our configuration: Next, add a Windows Events Source to your configuration. For this example, we only need to collect events from the Security channel, which captures login activity. Lastly, I’ll add a Google Cloud Destination and save my configuration. Here’s the result. Add Processors to your OTel configuration Next, add the processors and deploy the configuration to an OTel collector from BindPlane. Move Field Processor I want to dimension the metric I’m creating with the Windows event_id. To do so, I’ll need to move the event_id from the body of the message to an attribute Count Telemetry Processor Next, add the Count Telemetry Processor. I’ve named the metric windows_event.count and have added the event_id attribute we modified above. Rollout your config and view the log-based metric Lastly, I'll push the config to an OTel collector and head over to Google Cloud to verify that the new metric has arrived. Voila! And that’s a wrap! If you’re interested in OpenTelemetry or BindPlane OP or have any general questions, contact us at info@observiq.com or join us on BindPlane OP Slack to take the next step.

What is OpenTelemetry?

Joe Howell — Mon, 11 Mar 2024 19:53:00 GMT

At observIQ, we are big believers and contributors to the OpenTelemetry project. In 2023, we noticed project awareness reached an all-time high as we attended trade shows like KubeCon and Monitorama. The project’s benefits of flexibility, performance, and vendor agnosticism have been making their rounds; we’ve seen a groundswell of customer interest. What is OpenTelemetry? OpenTelemetry (“OTel”) aims to standardize how telemetry is generated, transmitted, and processed while being flexible enough to work with the telemetry streams you already have. It’s an open-source project governed by the Cloud Native Computing Foundation (CNCF). At the core of OpenTelemetry is OTLP, a protocol for modeling and transmitting all common types of telemetry. On top of this protocol is an ecosystem of standardized instrumentation libraries that make capturing and transmitting logs, metrics, traces, and (soon) profiles easy. Additionally, OTel provides a standalone collector that can receive telemetry from various sources or actively pull it (e.g. from a Prometheus exporter or log file). The collector can then process the telemetry in a wide variety of useful ways, and finally forward it to almost any popular telemetry backend. The best part is that it’s highly interoperable with other telemetry tools. If you’re using FluentBit or Prometheus, you can redirect those data streams through the OTel collector. It will translate the data into OTel’s data format and can be forwarded using OTLP or another common protocol. The project has 2 guiding principles (quoted from OpenTelemetry.io) You own the data that you generate. There’s no vendor lock-in. You only have to learn a single set of APIs and convention. As we wade deeper into the project, you see these principles are pervasive and drive decisions that help steer the project to where it is today. Related Content: OpenTelemetry in Production: A Primer What isn’t OpenTelemtry? The project does not include tools to analyze your data directly (other than providing KPIs around collection and transmission). Instead, the project squarely focuses on data collection to delivery - leaving actionable insights to the observability platforms like Datadog, Google Cloud, New Relic, and Splunk. Benefits of OpenTelemety The OpenTelemetry provides several key benefits: Standardization Opentelelemtry provides a single set of compatible, open-source monitoring tools that collect telemetry data to understand the state of your distributed systems or applications. For those building and maintaining applications and distributed systems: proprietary tools, agent fatigue, non-standard configuration, and performance issues are familiar challenges the project addresses directly. Vendor Neutral, Vendor Agnostic The project provides the telemetry tools to implement a vendor-neutral framework. This means you can safely de-risk instrumenting your applications and infrastructure, as OTel allows you to reroute your data with minimal configuration and standardization. Buy-in & Adoption Splunk, Datadog, New Relic, Google, Honeycomb, Grafana, observIQ (and many other organizations) have all promoted and contributed, rapidly accelerating the project since 2020—further solidifying ‘neutrality’ as a core principle. We also see native OpenTelemetry Protocol support within both applications and backends. You can find a nice breakdown of all the contribution activity here. Extensible OpenTelemetry is extensible, providing flexibility through its collector SDKs, Integrations, and Distributions, which can be extended to observe any application and use case. Similarly, Golang is inherently portable and can run on Linux, MacOS, Windows, and more. Platforms like Kubernetes and Cloud Foundry are at the forefront of rapid development and iteration in the project. Disparate Pipes vs. an Observability Pipeline Logically, each OpenTelemetry Collector constructs data pipelines within its configuration. More broadly, it enables the consolidation of disparate telemetry pipes into an Observation Pipeline. This enables actionability by making it possible to standardize and centralize configurations in a way that can be easily updated as an organization's needs change. Building your Observability Pipeline with BindPlane OP observIQ’s Unified Telemetry Platform, BindPlane OP, simplifies the creation and management of observability pipelines by Remotely deploying and managing a fleet of OpenTelemetry Collectors Streamlining OTel configuration with a guided configuration builder Providing a single-pane summary of the size and cost of your observability pipeline Simple controls to refine, reduce, and route your data You can find out more about BindPlane OP here: https://observiq.com/solutions Next, let’s break down some of the critical components of the project: Key Components of OpenTelemetry Overview The project is broken down into several components, guided in detail by the OpenTelemetry Specification. Here’s a quick breakdown: OpenTelemetry Specification OpenTelemetry Semantic Conventions OpenTelemetry Protocol (OTLP) Open Agent Management Protocol (OpAMP) OpenTelemetry APIs/SDKs OpenTelemetry Collector OpenTelemetry Semantic conventions OpenTelemetry Semantic Conventions are the guidelines and standards that ensure consistent and meaningful telemetry data by providing a common understanding of how to label and structure telemetry data. Some of the items semantic conventions cover: Data Formats: define the standard formats for telemetry data, such as distributed tracing and metric formats, to ensure interoperability between different systems and tools. Attribute Naming Convention: standardize attribute names for telemetry data to ensure consistency across different services and components within an application. Semantic Context: Describe the contextual information that should be included with telemetry to provide meaningful insights into the behavior and performance of applications. OpenTelemetry Protocol (OTLP) The OpenTelemetry Protocol (OTLP) transmits telemetry data (metrics, logs, and traces) between OpenTelemetry clients and backends. In many ways, it’s the project's backbone - its implementation catalyzes the rest of the project. Here’s a quick breakdown of its feature set Supports transmission of all signal types Efficient in terms of CPU and network usage Vendor-Neutral: the protocol itself isn’t biased towards a specific application or backend Open Agent Management Protocol (OpAMP) OpAMP is a network protocol that enables remote management of a fleet of agents or collectors. It can gather collector statistics, push configs, and handle other agent lifecycle tasks. It recently entered a beta phase in the OpenTelemetry Collector contrib repository. At observIQ, we’ve spent much time considering and contributing to OpAMP’s development. We use it as a core component of BindPlane OP, deploying and managing OpenTelemetry collectors remotely. Andy Keller, Principal Engineer at observIQ, and Jacob Aronoff, Staff Software Engineer from Lightstep, recently provided an excellent overview at Kubecon. I recommend watching it if you’ve got 30 minutes to spare. You can watch it here. OpenTelemetry APIs and SDKs OpenTelemetry APIs/SDKs allow you to instrument your code to expose OTLP-compatible telemetry data. Applications can be instrumented to ship this data directly to a backend or an OpenTelemetry Collector. Here’s the current list of available SDKs -- and the maturity of each signal contained in each (pulled from Opentelemetry.io): OpenTelemetry Collector The collector is the component responsible for receiving, processing, and exporting telemetry. It’s a critical piece of the OpenTelemetry framework, often (but not always) acting as the middleware between an application and the observability backend. Collector Components Receivers Receivers are responsible for ingesting telemetry data, translating it into OTLP-compatible data, and passing it into the OpenTelemetry pipeline for processing and analysis. Receivers collect data: metrics, events, logs, traces (MELT). The collector’s contrib repository currently includes ~80 receivers, which non-scientifically cover the ‘95%’ use case for our customers. If a technology is missing from the list, please file an issue in the project or contact us at (info@observiq.com) if you need a hand. Processors Processors are responsible for processing the data that moves through your OTel pipeline. They enable data enrichment, filtering, routing rules, and much more. You can find a full list of processors that are available here: Exporters Exporters are responsible for exporting the data from an OTel pipeline and collector. Exporters translate OTLP-compatible data in a format that’s compatible with the destination it’s headed to. Over time, the need for vendor-specific exporters will decrease as native OTLP. Connectors Connectors enable the transmission of telemetry data between different types of telemetry pipelines. For example, translating logs into metrics by processing log data and exporting it to a metrics pipeline. Dan Jaglowski, a Principal Engineer at observIQ and maintainer on the OpenTelemetry Collector, recently gave a great talk on Connectors at Kubecon. You can watch it here. Extensions Connectors enable the transmission of telemetry data between different types of telemetry pipelines. For example, translating logs into metrics by processing log data and exporting it to a metrics pipeline. You’ll see more posts from me on Connectors at a later date. OpenTelemetry Transform Language (OTTL) OTTL is a transformative language used in conjunction with several processors that enables users to filter, reduce, and route their data in conjunction with several processors. Related Content: What is the OpenTelemetry Transform Language (OTTL)? Familiarizing yourself with OpenTelemetry: kicking the tires If you’re interested in trying it out for yourself, here are a few resources I’d recommend: The Official OpenTelemetry Demo provides an expansive environment demonstrating instrumentation and usage in a typical Kubernetes-backed microservices environment. OpenTelemetry Registry provides a searchable list of OpenTelemetry components, simplifying the OTel mapping process. Lastly, to get an OpenTelemetry collector up and running quickly, refer to my recent post, How to Install and Configure an OpenTelemetry Collector OpenTelemetry in 2024: A preview Here are the recorded goals for the project in 2024, broken down by Github issue: OTel Project Goals: OTel Collector v1 All SDKs and APIs as “stable” for all three signals Semantic conventions - Database Semantic conventions - RPC/gRPC Semantic conventions - Resources (k8s, containers) Semantic conventions - Messaging Event Specification File-based configuration One Logging Bridge per Language Weaver / Codegen for instrumentation And that’s a wrap. If you’re interested in OpenTelemetry or BindPlane OP or have any general questions, contact us at info@observiq.com or join us on BindPlane OP Slack to take the next step.

BindPlane Flight Plan | January '24

Joe Howell — Wed, 06 Mar 2024 20:55:47 GMT

What’s new in BindPlane At observIQ, we’re constantly shipping new features to help users get the most out of BindPlane. In case you missed it, here’s a roundup of all the BindPlane news, updates, and improvements you should know about. Feature Round-up New UI + Improved Config Editing BindPlane received a fresh coat of paint, making it much easier to access the information you want when you need it most. The workflow to edit and roll out a new configuration to your agents has also been streamlined. Advanced Extension Configuration You can now add OpenTelemetry Extensions like healthcheck and pprof directly to any of your pipelines with just a few clicks. Previously, extensions were automatically handled and applied to configs. Users who require advanced configuration can now add them manually to their OTel configurations. Summary Page + Data Reduction The brand-new summary page makes it easier to monitor your fleet of agents and view precisely how much BindPlane is reducing your data. It consolidates and provides a clear view of what's happening in your observability pipeline. Featured Resources Lastly, we wanted to highlight Phil Cook's excellent write-up on how to Explore & Remediate Consumption Costs in Google Billing and BindPlane OP More to come! For questions or feedback, feel free to reach out to us at info@observiq.com

Rapid telemetry for Windows with OpenTelemetry and BindPlane OP

Joe Howell — Mon, 04 Mar 2024 20:49:16 GMT

At observIQ, we’ve seen continuous customer interest in scalable and performant observability solutions for Windows environments. As of 2023, Windows is estimated to be deployed to 75% of desktops worldwide. Unsurprisingly, we commonly speak to CTOs, DevOps, and IT managers responsible for managing fleets of thousands of Windows-based end-user and point-of-sale systems in the Financial, Healthcare, Insurance, and Education sectors. With a well-rounded set of integrations and OTel's logging library moving to a stable status in 2023, organizations now have access to an open, performant, and standardized framework to observe Windows fleets at scale. Based on customer demand and feedback - we’ve focused on making Windows a simple and powerful experience for customers - both contributions to OTel and specialized features in BindPlane OP, understanding that Windows infrastructure is critical and here to stay well into the foreseeable future. In this post, I’ll walk you through the following: Top Windows Use Cases Useful Windows OTel Components How to use BindPlane OP to simplify… Installing an OpenTelemetry Collector on Windows Creating a single OTel config, satisfying all of the above use cases Remotely deploy the OTel config to a fleet of Windows collectors Top Use Cases for Windows Over the past six months, customers have been inquiring about these use cases the most: Observing the health of a fleet of Windows VMs or POS systems by gathering OS and Process metrics Observing Security/Logon activity by collecting and analyzing Windows Events Observing IIS using application metrics, host metrics, application logs, and Windows system events Useful OTel Components for Windows Here’s a quick list of the most valuable components I’d recommend looking at if you’re considering OTel for stack. It’s worth looking at the configuration parameters and component limitations to ensure they’ll address your needs. Here’s a short list of the most valuable components for a typical Windows environment. Receivers Active Directory Domain Services Receiver Host Metrics Receiver IIS Receiver Windows Event Log Receiver Windows Performance Counter Receiver Microsoft SQL Server Receiver Filelog Receiver Processors Filter Processor Transform Processor Resource Detection Processor Exporters Verify you can find your desired destination here. Solving Windows Use Cases with OTel and BindPlane OP Pre-reqs: If you’d like to follow along and build and deploy the configure, you’ll need a few things handy: A running BindPlane OP instance. You can install the free edition on a Linux VM or container by following the steps provided here: https://observiq.com/download Access to one more Windows Machines you’d like to observe: Windows 10, Server 2012 R2 or later. Optionally, with IIS running and steady Windows event activity, if you would like a more representative test. 15 minutes of your time. Once you have an environment handy, you can proceed to the next step. Install an OpenTelemtry collector on Windows with BindPlane OP I’ll start by logging into my BindPlane OP instance. From the Agents tab, select Install Agent, and select Windows as the operating system. RDP into your host, open an elevated CMD prompt, and run the provided single-line installation command. After a few moments, the agent will appear in the list of agents in BindPlane OP, indicating it’s been connected and managed by BindPlane. Rinse and repeat for each Windows host you’d like to observe. Related Content: Configuration Management in BindPlane OP Create an OTel configuration with BindPlane OP From BindPlane OP’s Configurations tab, select Create Configuration. Give the configuration an apt name, and select Windows as the Operating System. Then, start adding sources to the configuration via BindPlane OP’s configuration builder. Add Sources (OTel receivers) to your config For this example, we’ll add the following Sources to our Configuration. A Source can gather metrics, logs, or traces depending on the application or system you want to observe. Select the applicable metrics and log files you want to collect for each Source. Note: BindPlane OP suggests default metrics and log paths, but it is worth double-checking they meet your needs and system configuration. Host Source The Host Source gathers metrics from a host: consumption and process metrics. It requires minimal configuration - only a friendly description. To the bottom right, you'll see process metrics as well. In this example, I will leave the default selections and save the Source to my config. IIS Source The IIS Source requires a bit more configuration; it can collect metrics and logs from an IIS instance. Again, I will leave the default selections: enabling metrics and validating the default log file’s path matches my test systems. Windows Events Source Lastly, I will add the Windows Events Source to my config. The Windows Events Source collects Windows Events and turns them into structured JSON logs. The big 3 channels are gathered by default (System, Security, and Application), which is what I need for my config to satisfy my use cases above. My config now includes 3 Sources; now I need a Destination. Add a Destination (OTel Exporter) Lastly, I’ll add a Destination to my config. In this example, I’ll ship telemetry data to Google Cloud Operations. Creating a Destination in BindPlane OP provides approximately the same experience for all backends: some combination of an API key, credentials, and region. After saving my Destination, I have a fully-baked OTel configuration file :airhorn_sound:. I now also have a human-readable representation of what’s in the file and the option to export it in raw form. Now, I can deploy this configuration to my agents and start shipping telemetry to my destination for further analysis. Next, let's push our OTel configuration to our collectors. Related Content: Getting Started with BindPlane OP and Google Cloud Operations Deploy your OTel config to your OTel Collectors with BindPlane OP Lastly, I only need to scroll down to the agent’s section of the configuration page and select Add Agents. Apply the configuration to your agents, and select Start Rollout. BindPlane OP will now push the configuration I’ve built to each agent (via OpAMP, under the hood). Verify and telemetry data in your Destination Now that I’ve configured my OTel collectors to ship data to Google Cloud, I’ll hop over and verify it’s arrived successfully. Drumroll…. And there it is! All of the signals I need to satisfy my use cases: Host metrics and events observe the health of my Windows hosts. Security Events to monitor successful and failed logon activity of my fleet. IIS application metrics and logs are used to understand the health of my web server, which can be correlated with the host metrics and events above in the event of an outage. Wrapping up Well, there you have it. Creating an OTel configuration for Windows (and other platforms) is incredibly simple. If you have any questions about pairing OpenTelemetry or BindPlane OP, reach out to us at info@observiq.com.

OpenTelemetry in Production: A Primer

Joseph Howell — Wed, 28 Feb 2024 21:26:56 GMT

At observIQ, we’re big believers and contributors to the OpenTelemetry project. In 2023, we saw project awareness reach an all-time high as we attended tradeshows like KubeCon and Monitorama. The project’s benefits of flexibility, performance, and vendor agnosticism have been making their rounds; we’ve seen a groundswell of customer interest. What is OpenTelemetry? OpenTelemetry ("OTel") is a collection of standardized protocols, tools, and libraries that enable the collection, refinement, and transmittal of telemetry data (metrics, logs, traces, and events) from your applications or infrastructure. It features over a hundred built-in integrations for popular technologies (Kubernetes, MySQL, Nginx, and many others). This lets you easily instrument your applications and infrastructure without writing custom code for each component. Related Content: What is OpenTelemetry? What isn’t OpenTelemtry? The project does not include tools to derive meaning from your data (other than KPIs around collection and transmission). Instead, the project focuses on collection - leaving actionable insights to vendor platforms like Datadog, Google Cloud, and Splunk. The project provides the tools to do it all from a telemetry standpoint but can be daunting without a primer. So, let’s jump into the why and then a bit of how you can evaluate if OTel is a good fit for your organization. Why OpenTelemetry? OTel offers a few critical benefits: Standardization and simplification A huge benefit to using OpenTelemetry is that it provides a single set of compatible, open-source tools that can gather all of the necessary signals to understand the state of your system or application. For those building and maintaining applications and systems: proprietary tools, agent fatigue, non-standard configuration, and performance issues are familiar challenges the project addresses directly. Buy-in and Adoption Nearly all major vendors support the project, making contributions that push forward the idea of vendor neutrality. Splunk, Datadog, New Relic, Google, Honeycomb, and observIQ (and many other organizations) have all contributed, rapidly accelerating the project over the last few years. You can find a nice breakdown of all the contributions here. Vendor Neutrality Lastly, and most importantly, OpenTelemetry enables vendor neutrality. This means you can instrument applications and infrastructure once and route your observability data to a back-end that best meets the needs of your business. From lowering costs and a more attractive feature set - whatever the reason may be, you can repoint your data with a simple configuration change (and rolled out en masse through BindPlane OP and standard DevOps toolchains such as Chef, Puppet, and Ansible). If these benefits sound attractive to your organization, read on to understand how to take the next step with OTel in production. Getting started Taking an Inventory To standardize telemetry collection for metrics, logs, and traces - you’ll need to take a high-level inventory of your tech stack, answering a few critical questions: What business-critical applications need to be observed in your environment? What programming languages are being used in your stack? What observability tools are you currently using in your stack? What signals are being collected, and how are they being collected? Where do you intend to send and analyze your telemetry data? A high-level understanding of the answers to these questions will allow you to take the next step - mapping your organization's needs to specific OpenTelemetry components that facilitate telemetry collection, transformation, and delivery. Familiarizing Yourself with OpenTelemetry To map the necessary OTel components to your use case, taking the time to familiarize yourself with the project better is a worthy exercise, as there’s a fair amount to digest. To do so, I’d recommend taking a look at a few essential resources: The Official OpenTelemetry Demo provides an expansive environment demonstrating instrumentation and usage in a typical Kubernetes-backed microservices environment. OpenTelemetry Registry provides a searchable list of OpenTelemetry components, simplifying the OTel mapping process. Lastly, if you want to kick the tires and get an OTel collector up and running. Doing so will provide you with a guide and some context for what it’s like to configure a collector and start shipping telemetry data in about 10 minutes. Related Content: How to Install and Configure an OpenTelemetry Collector Selecting a Collector Distribution Next, it’s important to know that several distributions of the OpenTelemetry Collector are available. Choosing one that aligns with your requirements is essential - but take comfort in knowing that the available configuration and components largely remain the same across distributions. Here’s a quick breakdown of what’s available: OpenTelemetry Collector Contrib This distribution includes the most components (receivers, processors, exporters) and is where you find the newest components that have not yet made it into the OpenTelemetry Collector Core repository. Generally, this is the right place to start if you want to test the waters with a sandbox collector, but it may include more stuff than you need when you’re ready to deploy to production. OpenTelemetry Collector Core This distribution includes a minimal, hardened set of components. We typically don’t recommend it for production environments, as core is not expansive enough (yet) to address some of the most common use cases we work with customers on a day-to-day basis. Vendor Distributions As the name implies, vendor distributions are built and managed by a specific vendor. It’s important to know that these distributions can include components specific to the vendor’s platform, resulting in unnecessary vendor lock-in. If you’re considering a vendor distribution, inquire about functionality specific to the distribution to understand the impact of moving away from the distribution in the future. One advantage of vendor distribution is that they often include support as a primary benefit, providing an SLA, which can often be required for large organizations. It’s also helpful to know that observIQ offers a supported distribution. It includes support for OpAMP enabling remote agent and OTel configuration management with BindPlane OP. Building your own OTel Collector Lastly, building your collector distro is an option as well. With the OpenTelemetry Collector Builder, you can build a distro that only includes the necessary components - minimizing unnecessary bloat and simplifying configuration. Also, stating the obvious - these distributions are often self-supported and maintained. Deployment Patterns Next, let’s take a look at some common deployment patterns. These are typical patterns we’ve seen in the field - but are flexible enough to scale with any environment. Instrument app ⇒ gather with an OTel collector (separate host) In this pattern, a custom application has been instrumented to expose OTLP metrics and traces gathered by a collector running on a separate host. This pattern has a few pros and cons: Pros It’s a simple pattern, providing a 1:1 mapping between an application and a collector Mitigates risk by deploying the collector to a host separate from your application Putting a collector in between your instrumentation and backend allows you to easily filter and reroute your data without modifying your app infrastructure Cons 1:1 application-to-collector ratio doesn’t scale for larger workloads If the collector is deployed on a separate host from the application, it’s unable to gather host metrics and log data, missing out on some telemetry that could be used for root cause analysis Instrument app ⇒ OTel collector (same host) In this pattern, a custom application has been instrumented to expose OTLP-compatible metric, log, and trace data gathered by a collector running on the same host. Pros Deploying a collector locally for log and metric collection from the node/host, in addition to gathering the application-specific telemetry, providing a complete picture of your application and host it’s running on for root cause analysis Cons May require an infrastructure change if there are not any agents or collectors deployed in your environment Instrument app ⇒ OTel collector (same host) ⇒ load balancer ⇒ collector group In this pattern, a collector has been deployed to the same host as the application. The collector is forwarding its data to a group of collectors behind a load balancer. Pros Scaling and redundancy when dealing with large amounts of telemetry data or high processing needs Cons More complexity when the data volume is low Requires separate tooling/hardware to yield desired scalability redundancy Instrument app ⇒ OTel collector (same host) ⇒ collector gateway Lastly, in this pattern, collectors have been deployed in this environment to act as gateways, which allow telemetry to be aggregated before reaching an observability backend. Pros Highly scalable data plane standardized on OpenTelemetry. Easy to add new destinations with access to any/all of your telemetry data. No need to re-instrument with vendor agents/SDKs By aggregating telemetry with a gateway, a user can refine the data at a central point in the pipeline The observability pipeline is shifted further away from a specific vendor Migrating to OpenTelemtry Next, let’s talk about migration. Most organizations have large, already-existing deployments with proprietary instrumentation and agents. Thus, we typically recommend migrating to OpenTelemetry in a phased process. High-level breakdown below: Phase 1: Greenfield OTel deployment We recommend using OpenTelemetry in greenfield deployments -- working with a clean slate in a PoC environment to minimize noise and risk. It enables customers to see the value of OpenTelemetry quickly. Phase 2: Redirect existing agents using OTel Once you’ve successfully tested OTel in your greenfield environment, you can repoint your existing agents to OTel collectors. It is critical to know that FluentD, Fluent Bit, Splunk agents, and more can have their output redirected (or duplicated) to OpenTelemetry collectors. Phase 3: Replace existing agents with OTel Collectors The last phase is replacing your existing instrumentation with OTel instrumentation. Replacing your observability stack with pure OTel is not required but recommended when available. If you’re considering, have questions, or want to chat about OTel in production, contact us at info@observiq.com. Thanks for reading!

What is the OpenTelemetry Transform Language (OTTL)?

Joe Howell — Mon, 26 Feb 2024 19:40:55 GMT

What is the OpenTelemetry Transform Language (OTTL)? The OpenTelemetry Transformation Language, or OTTL for short, offers a powerful way to manipulate telemetry data within the OpenTelemetry Collector. It can be leveraged in conjunction with OpenTelemetry processors (such as filter, routing, and transform), core components of the OpenTelemetry Collector. It caters to a range of tasks from simple alterations to complex changes. Whether you're dealing with metrics, spans, or logs, OTTL equips you with the flexibility to refine and shape your data before it's dispatched to its final destination for monitoring and analysis. The language is built on the principles of clarity and efficiency, allowing developers and DevOps to write expressive statements that perform transformations effectively. In this post, I’ll step through the following: Key Benefits of Leveraging OTTL Core Concepts of OTTL Developing with OTTL Common use cases with configuration samples Related Content: What is OpenTelemtry? Keys Benefits The key benefit that OTTL provides is giving the user granular controls to sculpt and refine their observability data -- broken down into some supporting benefits: Reduced verbosity - OTTL enables powerful transformations with minimal configuration Data Routing - enables the routing of observability data to one or many destinations Noise Reduction - trimming excessive data to streamline root-cause analysis Cost Reduction - trimming excessive data to keep observability costs within your budget Data Enrichment - tagging, identifying, and elevating key data streams for deeper analysis in your observability backend Core Concepts Under the hood of OTTL, there are a few core concepts to familiarize yourself with: Statements: These define the transformations performed on your telemetry data. They are structured commands following the OTTL's grammar and are defined in the collector’s configuration file (config.yaml) Contexts: Specifies the domain (like traces or metrics) in which the statements will apply. Here’s the list of available contexts: Resource Instrumentation Scope Span SpanEvent Metric DataPoint Log Functions: Invokable operations within statements that determine the nature of the transformation, such as renaming attributes and filtering data. There are 2 primary types of functions: converters (provide the utilities to transform telemetry) and editors (used to transform telemetry). You can find a full list of available functions here. Developing with OTTL Best Practices A few things to keep in mind, as you’re building your OTTL statements Simplicity: Keep your transformation rules as simple as possible. Complex rules take more work to maintain and understand. Modularity: Write modular statements that can be easily understood and replaced. Testing: Regularly test your rules to make sure they work as intended. Remember that while OTTL is powerful, the simpler and clearer your transformation rules are, the more maintainable your configuration will be. This straightforward approach to developing with OTTL can save you time and effort in the long run. Statements and Operators Statements in OTTL are combinations of items like variables, literals, and operators that evaluate a value. The structure of statements is intuitive if you're familiar with programming. For example, you could use an expression to increase a metric's value or to choose a specific span attribute. Meanwhile, common operators in OTTL include: == (equality) != (inequality) > (greater than) < (less than) >= (greater than or equal to) <= (less than or equal to) These operators allow you to perform logical comparisons with the data fields in your telemetry. Writing a simple OTTL statement To begin writing rules with OTTL, you'll need to understand its syntax and structure. A typical OTTL rule consists of an editor (set being the most common) and a list of assignments or transformations to apply. For example: This rule sets the attribute to 200 for all telemetry data where the http.method attribute is "POST". Common Use Cases Below are several common use cases we see where OTTL comes in handy when working with OpenTelemetry in the field: Parsing ‘body’ from the contents of a JSON log: Parsing ‘Severity’ and mapping severity within a log event: Parsing ‘Timestamp’ from a log event: Renaming a field within a log event: OTTL Setup and Configuration Configuring and testing OTTL consists of a few simple steps: Install the OpenTelemetry Collector: You can check out my recent post, “How to Install and Configure an OpenTelemetry Collector” -- this guides you through the detailed installation and configuration steps. Access the Collector’s Configuration File: The first step is accessing the OpenTelemetry Collector's configuration file where you'll specify your OTTL configurations. /etc/otelcol-contrib/config.yaml Define Transformation Rules: Within the configuration file, under the transform processor section, you will define your transformation rules using the OTTL syntax. Associate with Pipelines: Decide which pipelines (metrics, traces, logs) will use the transformation rules by linking them to the corresponding processors in the configuration file. Test and Validate: After setting your rules, it's important to test and validate them to ensure they work as expected. Performance Considerations Lastly, while you’re working with OTTL, it’s important to be aware of the performance ramifications: Overhead: Introducing any transformation operations can add extra processing overhead. It's important to balance the need for data transformation against the potential impact on the OpenTelemetry Collector's performance. Complexity: The more complex your transformations, the more load is put on your host. Try to keep your OTTL statements as simple as possible to minimize performance degradation. Sampling Considerations: Implementing transformations may interfere with the telemetry data's fidelity. If transformations occur before sampling decisions, this could affect the resulting sample and, your performance measurements as indicated in the Performance Benchmark of OpenTelemetry API. Hopefully, this overview will help you get started as you begin to use OTTL. If you have questions or feedback, or want to chat about OpenTelemetry or OTTL, feel free to reach out on the CNCF Slack (jhowell), or e-mail me at joseph.howell@observiq.com.

Configuration Management in BindPlane OP

Ryan Goins — Tue, 20 Feb 2024 16:19:00 GMT

Managing configuration changes within BindPlane OP is a straightforward process when using the newly introduced Rollouts features to deploy your changes. Rollouts provides a user-friendly platform for tweaking configurations, staging modifications, and implementing them across your agent fleet only when you’re satisfied with the changes. Here’s a step-by-step guide on how to use it: Step 1: Edit New Version – Navigate to any existing configuration and click on the “Edit New Version” button in the topology window's top right corner. This action will create a new draft version of your configuration and automatically redirect you to a tab labeled “New Version”. This is the area where you can freely modify your configuration. Step 2: Implement Desired Changes – Once you’re in the “New Version” tab, you can start making the necessary changes. The best part is that you can leave the page and come back to continue editing at any time, as your work-in-progress is automatically saved. Step 3: Start Rollout – After finalizing your configuration changes, hit the “Start Rollout” button to commence the deployment of your modifications to your agents. Step 4: Monitor Progress – A progress bar will appear, providing real-time updates about the status of your Rollout. The rollout takes place in batches, starting with a small batch of just 5. If this initial batch is successful, the rollout continues to increase until it reaches a maximum batch size of 100. This incremental approach ensures that the impact on your agents remains limited if there’s a misstep in the configuration changes. Step 5: Rollout Complete – Once the rollout is complete, the “New Version” tab will vanish, and a “Rollout Complete” message will pop up. You’ll then be automatically redirected to the “Current Version” tab. See it in action!

How to Monitor MySQL Using OpenTelemetry

Daniel Kuiper — Fri, 20 Oct 2023 20:09:00 GMT

MySQL is a widely used open-source database management system that is the backbone for many desktop, mobile, web, and cloud applications. It’s best known for being the ‘M’ in the still-prevalent LAMP stack (Linux, Apache, MySQL, PHP) and is often used as a supporting database for various web applications such as E-commerce, CMS, CRM, and forums. As we move forward into 2024, it’s important to reassess your monitoring strategy. Your strategy should be designed to adapt to the changing demands of distributed hybrid and multi-cloud architectures. It should be flexible and comprehensive enough to monitor the rest of your stack. The Benefits of Monitoring MySQL There are several key benefits to be aware of as you weigh how much effort you want to dedicate to implementing a solution. Here are some of the key benefits: Provides the context to optimize database performance by monitoring resource contention and query performance Exposes user trends to pinpoint friction points in your attached application Surfaces critical vulnerabilities by capturing anomalous usage patterns Enables rapid root-cause analysis and resolution through the collection and layering of metrics of log data Provides an audit trail for database changes and user activity, provides the meat impact analysis Related Content: How to Monitor SQL Server with OpenTelemetry What Signals Matter? It’s crucial to capture the 4 golden signals to ensure uptime for your database and attached application. In the MySQL context, this means capturing a representative set of the following signals: Query Performance Query response time/latency Slow/Long running queries Query Errors Throughput: Queries per second Transactions per second Resource Utilization: CPU Usage Memory Usage Disk I/O Network I/O Connections: Max Connections Active Connections/Threads Running Connections/Threads Failed Connections Replication: Replication latency Replication errors Database Growth Database growth rate InnoDB Row Operations Storage Engine (for InnoDB): Buffer Pool Usage InnoDB Row Operations Database Errors Configuration and operational errors The Case for Monitoring MySQL with OpenTelemetry Though there are many solutions teams can implement, such as the built-in MySQL Enterprise Monitor for example, OpenTelemetry provides the means to collect signals in parity with existing solutions but pulls ahead as the best long-term solution for those mulling over a fresh monitoring and observability strategy for MySQL and the rest of their stack. The project's primary goals of data ownership and vendor-neutrality alone make it worth considering, but the maturity and comprehensive toolset of offers make it a clear leader. Any effort spent in OTel now will yield dividends later in the form of cost and time savings and flexibility as monitoring needs of your organization grow and change. Core Components of the OpenTelemetry Collector As a quick primer, the OpenTelemetry Collector has a few primary components that facilitate data collection, processing, and transmission of the above signals. Here’s a quick breakdown: OpenTelemetry Collector: a lightweight data collector that can be deployed as an on-host agent (this is how we’ll be using it to monitor MySQL) or as a gateway for other collectors, shipping data to one or many configured destinations. The collector has a few primary components: Receivers: collect telemetry from a specific application or system (like MySQL) or another OpenTelemetry collector via OTLP. Processors: transform the data by providing the levers to enrich, filter, mask (PII), and other data refinement techniques. Advanced users can utilize OTTL to do really interesting transformations. Exporters: transmit telemetry to another destination: another OpenTelemetry collector, to file, to an observability/SIEM backend Each component can be logically connected as a pipeline, the collector’s configuration file. mysqlreceiver collects MySQL database/instance metrics hostmetricsreceiver collects operating system and specific process metrics filelogreceiver captures logs from the specified file path(s). These events can be processed and turned into structured log data like JSON. Related Content: OpenTelemetry in Production: A Primer Implementation: Monitoring MySQL with OpenTelemetry Prerequisites Before starting, there’s a few things you’ll need: MySQL instance (5.7+) running on a Linux or Windows VM with admin privileges. To collect MySQL logs (optionally), you must enable the 3 log types (error, general query, slow query) in your configuration file. Here are the steps to do so: Have a backend ready to go as a destination for your MySQL monitoring data. For this example, I’m using Google Cloud Operations. If you choose Google Cloud Operations, you’ll need you'll need: Service account (and corresponding JSON key) in your project with the following roles: Logs Writer Monitoring Admin Set the full path to your JSON key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable using the methods mentioned above on your MySQL host Other means of authenticating are available depending on your setup. Installing the OpenTelemetry Collector For this example, we’ll use the contrib distribution of the OpenTelemetry collector. Generally, we recommend ‘contrib’ as it provides all the necessary components to kick the tires on the bleeding-edge components the project offers. We'll be installing the OTel Collector on the same host as our MySQL instance. Linux: Follow the steps for Linux here. When running commands, replace 'otelcol' with 'otelcol-contrib' as the otelcol version does not include the MySQL receiver or Google Cloud exporter. Here’s an example for Debian: Windows: Download the most recent 'otelcol-contrib' tarball for Windows from the releases linked in the getting started docs. Extract the tarball to a directory where you'd like the executable to run. I’d recommend downloading 7-zip, or you can use the Windows tar PowerShell command: Create an empty file in the collector's root directory called config.yaml. This will be used as the collector's configuration file. (This is a required step for Windows, but not Linux deployments) Configuring the OpenTelemetry Collector Next, you can open the config.yaml for your collector and begin adding and configuring the abovementioned components. Note, for Linux, edit and overwrite the configuration file that was automatically created: MySQL Receiver Add the mysqlreceiver to the receivers section. Set MYSQL_USERNAME and MYSQL_PASSWORD environment variables. Modify config.yaml to add this receiver configuration (see steps below) You may have to change the default endpoint to match your environment. Host Metrics Receiver Add the hostmetrics receiver to your collector configuration. Call out the specific metrics the host receiver should gather or use the defaults provided in the example. File Log Receiver Add filelog receiver to your collector configuration. Configure the receiver to point at the log files you enabled on your MySQL instance. Google Cloud Exporter Add the Google Cloud exporter to your collector configuration. Assemble the Telemetry Pipeline Now, create a pipeline in the service configuation, referencing the components we called out above. Make sure to call out the appropriate receiver in the 'metrics' or 'logs' pipeline, where applicable. Set Environment Variables and Modify Service File Lastly, set the credentials as an environment variable, matching the variable specified in the configuration file. Linux: Edit the service to add environment variables: Windows: Use the setx command to set environment variables: Running the OpenTelemetry Collector After adding each receiver/exporter and constructing your pipeline, in your config you should look like this: Just so you know, you might configure your receivers, exporters, and any processors differently depending on your environment and monitoring needs. You can always configure more exporters if you'd like to send telemetry to multiple destinations. Linux Run the collector by restarting the service: You can check the health of the service with: You can check the collector log output with: Windows Adding a processor You can differentiate multiple MySQL hosts by including the hostname gathered by the Resource Detection Processor: Viewing MySQL OpenTelemetry Data If you follow the detailed steps above, the following MySQL metrics (and logs) will be available in your Google Cloud Operations Metrics and Logs Explorer. Some may only be collected if your MySQL instance's corresponding functionality is active. Check out the MySQL receiver readme and documentation for more configuration and metric options. Follow this space to keep up with our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at info@observiq.com.

Getting Started with BindPlane OP and Google Cloud Operations

Ryan Goins — Fri, 08 Sep 2023 19:35:57 GMT

BindPlane OP and the BindPlane Agent offer a unified solution for Google customers to ship telemetry from various environments to Google Cloud Operations. This 4-part video series will guide you through setting up and using BindPlane OP effectively. Part 1: Understanding the BindPlane Architecture Learn the basics of the BindPlane OP Server and how it interacts with the BindPlane Agent. Understand the differences between operating agents at the edge compared to using them as aggregators or gateways. Using Existing Agents like Splunk Universal Forwarders: Find out how to utilize agents like Splunk Universal Forwarders within the BindPlane OP environment. Part 2: Installing BindPlane OP Server Running the Installation: A step-by-step guide to installing the BindPlane OP server. Initializing the Server: Instructions on how to initialize the server post-installation. Part 3: Installing Your First BindPlane Agent Learn how to install the BindPlane Agent on a Windows system. Building Your First Configuration: Guidance on setting up your initial configuration for the BindPlane Agent. Rolling It out to the Agent: Instructions on deploying your configuration to the agent. Part 4: Configuring a Google Cloud Destination Steps to create a service account with the necessary permissions for Google Cloud. Learn how to set up the Google Cloud destination in BindPlane OP. View Data in Google Cloud Logging: Guide to viewing and analyzing your data in Google Cloud Logging. This series provided everything you need to get started with BindPlane OP and Google Cloud Operations. If you have any questions as you get started, please join our Slack community and reach out, we'd be happy to help!

Deleting Fields from Logs: Why Less is Often More

Ryan Goins — Fri, 01 Sep 2023 20:59:00 GMT

Logs serve as an invaluable resource for monitoring system health, debugging issues, and maintaining security. But as our applications grow more complex, the volume of logs they generate is increasing exponentially. While logs are crucial, not all log data is equally valuable. With the surge in volume, costs associated with storing and analyzing logs are skyrocketing, impacting both performance and price. The need for effective log management is more urgent than ever. A common way to start reducing the size of your logs is to eliminate the noise by removing unnecessary fields from them. Why Should You Delete Fields from Logs? Cost Efficiency: High-volume logs can be expensive to store and analyze. Removing extraneous fields can reduce storage costs and speed up query times. Improved Readability: Less clutter makes logs easier to read and understand. When you're troubleshooting, every second counts, and sifting through irrelevant fields can be time-consuming. Enhanced Performance: Excessive data can slow down your log management tools. Trimming down logs can result in faster indexing and more responsive searching. Data Compliance: Reducing fields can also help with adhering to data protection regulations by eliminating personally identifiable information (PII) that isn’t necessary for your logging objectives. Related Content: How to Remove Fields with Empty Values From Your Logs Common Culprits: Log Types with Unnecessary Fields Web Server Logs: These often contain numerous fields related to client requests, many of which are not useful for most analytical purposes. Application Logs: Custom application logs may include verbose debug information that is not needed in a production environment. Security Logs: While crucial for monitoring, these can sometimes capture more information than necessary, potentially causing both performance and compliance issues. Database Logs: Query logs and transaction logs may store an exhaustive amount of details, much of which might not be relevant for day-to-day operations or auditing. This blog post aims to guide you through the steps of optimizing your logs by deleting unnecessary fields using BindPlane OP. By the end, you'll be better equipped to manage your logs effectively, saving both time and resources. So, let's get started. 1. Add the "Delete Fields" Processor to Your Pipeline Start by clicking on one of the processor nodes in your pipeline and then add the "Delete Fields" processor. This will serve as the gateway to reduce your logs. 2. Use Snapshots to Identify Attributes for Deletion Once the processor is in place, use the Snapshots feature to identify which attributes within your logs you'd like to remove. For example, you might decide to delete `os.type` from the Resource Attributes and `http_request_responseSize` from the Attributes in your Nginx logs. 3. Customize with Log Condition (Optional) By default, the "Delete Fields" processor will remove the specified fields from all logs passing through the pipeline. However, if you'd like to apply this deletion only to specific types of logs, you can set a match expression in the "Log Condition" field. 4. Confirm and Click "Done" Once you're happy with the fields you've selected for deletion and any conditional logic you've set up, click the "Done" button to save your settings. 5. Validate Changes with Live Preview Before fully committing to the changes, you can confirm that the unnecessary fields were successfully deleted by checking the Live Preview on the right-hand side of the window. 6. Rollout to BindPlane Agents Last but not least, rollout these new configurations to your BindPlane Agents. As soon as you do, you should see your data throughput drop in real-time on the topology view—a visual confirmation that you've made your logging more efficient. And there you have it! You've successfully slimmed down your logs without compromising their utility. Now you're all set to enjoy a more streamlined, cost-effective, and high-performing log management experience. Check out the video tutorial below and for questions/requests/suggestions, reach out to us or join our community slack channel.

When Two Worlds Collide: AI and Observability Pipelines

JJ Jeffries — Fri, 25 Aug 2023 20:56:00 GMT

In today's data-driven world, ensuring the stability and efficiency of software applications is not just a need but a requirement. Enter observability. But as with any evolving technology, there's always room for growth. That growth, as it stands today, is the convergence of artificial intelligence (AI) with observability pipelines. In this blog, we'll explore the idea behind this merger and its potential. Understanding Observability As we stated in our previous “What is Observability” blog, our CEO, Mike Kelly, discussed observability, emphasizing its role in understanding a system's state with The Cube at Kubecon EU. Unlike monitoring, which targets specific metrics or events, observability offers a comprehensive view of a system's state, behavior, and performance. This understanding allows teams to proactively address issues, solve problems, and enhance system reliability. Related Content: Splashing into Data Lakes: The Reservoir of Observability The Need for AI in Observability Telemetry data is growing exponentially and observability pipelines are now processing more data than ever before. According to a recent study by Gartner, “By 2026, 40% of log telemetry will be processed through a telemetry pipeline product, an increase from less than 10% in 2022”. However, even with robust visualization tools, spotting issues, trends, and anomalies in real time is becoming humanly impossible. This is where the hot topic of Artificial Intelligence comes in as we get asked frequently, how can you incorporate AI into observability pipelines? The merger of AI with observability is more than just an integration. It's about creating a symbiotic system (think yin yang symbol) where both technologies improve each other's capabilities. Here are some examples: Data Processing: Observability pipeline tools, such as BindPlane OP, can preprocess data, structuring it in ways optimal for AI models. This can involve standardizing data, filtering out noise, and deriving additional features. Feedback Loops: AI models improve with feedback. Observability pipelines can continuously feedback anomalies that were actual issues, allowing models to refine their detection capabilities. It’s like having your own personal detective, helping reduce the time spent figuring out what went wrong. Automated Responses: Based on AI's real-time analysis, observability pipelines can automate certain corrective actions, while also giving suggestions for next steps such as filtering for further data reduction. End-to-End Insights: With AI analyzing data throughout the pipeline, organizations can get easy-to-understand insights that span from infrastructure health to user behavior analytics. This helps make our systems better and ultimately aids in user satisfaction. Even though AI is the popular buzzword, the integration of AI and observability is still in its infancy but is gaining traction. The two will form an even tighter bond as AI models become more sophisticated and observability data becomes richer. This evolution will help create more efficient and intelligent systems, enabling organizations to gain a great wealth of insights and stability as they continue their own growth and gain trust in artificial intelligence’s accuracy.

Exploring & Remediating Consumption Costs with Google Billing and BindPlane OP

Phil Cook — Thu, 17 Aug 2023 14:16:00 GMT

We’ve all been surprised by our cloud monitoring bill at one time or another. If you are a BindPlane OP customer ingesting Host Metrics into Google Cloud Monitoring, you may be wondering which metrics are impacting your bill the most. You may have metrics enabled that aren’t crucial to your business, driving unnecessary costs. How do we verify that and remediate? Using Metrics Explorer and BindPlane OP, you can find metrics that are using the most consumption and turn them off in your BindPlane OP configuration to save on your cloud bill. In this blog, we’ll show you how to do just that. Tracking down costs in Google Cloud Platform What are we being charged by Google? In the report below, we can see the daily cost. To run the report, perform the following steps: Go to the Google Cloud Console. Search for “reports” in the search bar at the top of the screen. Click on “Reports - Billing” Choose the appropriate billing account to view. After selecting the billing account, you’ll see a list of filters to apply on the right side. Under Month, choose “Last Month” or “Current Month”, or pick a custom range. Under “Group By” select “Service” Under “Projects,” choose the project where your metrics are routed. Under “Services,” select “Cloud Monitoring.” Now, you will have a view of all the monitoring charges from BindPlane OP metrics ingestion to Cloud Monitoring. In our example report, we can see that Google Cloud Monitoring expenses come to about ~$40 a day. In this report, we can perform a breakdown by service to what is driving the numbers. Nearly half of that is Google Cloud Monitoring expenses. ~$40 a day. Now, going to Google Cloud monitoring, we can track down what exact metrics are driving the costs. Search Metrics Explorer in the search bar and select it. Select the appropriate project in the top left. Under “Select a Metric” type in “Billing” In the dropdown choose Global > Billing > Metric bytes ingested. The metrics bytes ingested will have consumption used by different metrics namespaces within cloud operations. 1) Delete the Group by filter to allow for looking at individual metric namespaces. 2) Use the table view and sort by b/s value to find the metric that is using the most consumption. 3) You can trace this to the host where this data is coming from, by looking at the node_id or hostname of the metrics. Remediation in BindPlane OP. Now that you know which host metric is driving the most consumption, you can find the agent in BindPlane that matches that hostname, and turn it off to see the impact. 1. Click on the agent in question from the agents tab. 2. Click on the associated configuration. 3. Edit a New Configuration. 4. Select the Host Source. 5. Deselect “process.cpu.utilization. (see image) 6. Click Save 7. Click rollout to apply those changes to the agent. After making these changes, you should notice a drop in your daily consumption, and associated charges in your cloud bill. For questions, requests, and suggestions, reach out to us or join our community slack channel.

Splashing into Data Lakes: The Reservoir of Observability

JJ Jeffries — Fri, 11 Aug 2023 15:52:00 GMT

If you're a systems engineer, SRE, or just someone with a love for tech buzzwords, you've likely heard about "data lakes." Before we dive deep into this concept, let's debunk the illusion: there aren't any floaties or actual lakes involved! Instead, imagine a vast reservoir where you store loads and loads of raw data in its natural format. Now, pair this with the idea of observability and telemetry pipelines, and we have ourselves an engaging topic. What's a Data Lake? A Data Lake is a centralized repository that allows you to store structured and unstructured data at any scale. Imagine dumping everything from logs, traces, and metrics into a massive container. No need for defining structures beforehand; just send the data in. It's like storing water from different sources (rivers, streams, rain) into one vast lake. Observability – Seeing Beyond the Surface Observability isn't just about monitoring. It’s the art and science of understanding the state of your system by looking at its outputs. It’s the magical power of saying, “Ah! This error happened because of that misconfigured server!” In the vast ocean of data, how do we make sense of it all? That's where observability pipelines come in! Related Content: When Two Worlds Collide: AI and Observability Pipelines Observability Pipelines – The BindPlane Canals of Insight Think of observability pipelines as intricate canal systems. They channel water (or in our case, data) from the lake, filter out impurities, and guide it smoothly to the places it's needed the most. An observability pipeline takes raw, unstructured data, processes it, and then sends it off to monitoring tools, dashboards, or alerting systems. Here's how Data Lakes make observability pipelines even more powerful: Volume & Variety: Data lakes can store massive amounts of data. So, whether you're collecting logs from a new service or tracing data from a legacy system, there's always room in the lake. Agility: Need to modify or introduce a new data source? With a data lake, you don't need to re-architect everything. Just introduce your new data; your pipelines can adapt to pull from it. Advanced Analysis: Because all the data resides together, you can use advanced analytics and machine learning to derive more profound insights. Want to predict when a particular service might fail? Dive into the lake of past data and let the algorithms swim! Cost-Efficient: Storage solutions for data lakes are typically designed to be scalable and cost-effective. So you’re not breaking the bank while trying to get a clearer picture of your systems. Related Content: Maximizing ROI By Reducing Cost of Downstream Observability Platforms With BindPlane OP Making Waves with Data Lakes In the rapidly evolving tech environment, the need to understand our systems in real time has never been more crucial. But as we all know, with great power (or data) comes great responsibility. Using a data lake coupled with observability pipelines ensures that your data is stored efficiently and working hard to give you the insights you need. So the next time someone mentions "data lakes", envision this vast reservoir of insights, ready to be tapped. Whether you're troubleshooting a tricky issue or trying to optimize system performance, remember that the answer might just be lurking beneath the surface. For questions, requests, and suggestions, reach out to us or join our community slack channel.

Integrating BindPlane Into Your Splunk Environment

Dylan Myers — Sat, 05 Aug 2023 18:52:00 GMT

Part 1 of 3: Connecting The Pieces Preface Splunk is a popular logging, and in the case of Splunk Cloud, it is also a metrics platform. The BindPlane Agent can integrate with Splunk; both for incoming telemetry to a Splunk Indexer and outgoing telemetry from a Splunk Forwarder. By integrating in this manner, telemetry not natively supported by Splunk can be sent in, and going the other way, the telemetry can be sent to other platforms. Prerequisites BindPlane OP & a BindPlane Agent (Custom OpenTelemetry Collector) Splunk Ecosystem Splunk Universal Forwarder (sending data into the BP Agent) Splunk Heavy Forwarder (sending data into or accepting data from the BP Agent) Splunk Indexer (accepting data from the BP Agent directly or via a Heavy Forwarder) Plan the Architecture For this blog, I will use a Splunk Universal forwarder to send data to the BindPlane Agent. The Agent will then send that data to a Splunk Heavy Forwarder and Google Cloud Logging. The Splunk Heavy Forwarder then sends the data to the Splunk Indexer. Configuring the Universal Forwarder By default, the Splunk Universal Forwarder (UF) sends data over TCP in Splunk’s proprietary Splunk to Splunk (S2S) protocol. To allow the BindPlane Agent to receive data from the UF, it will need to be sent in a raw format instead. This is accomplished by creating a Splunk output configuration stanza that disables the S2S protocol by setting the parameter sendCookedData to false. Add this stanza to the tcpout defaultGroup. Configuring the Heavy Forwarder On the Splunk Heavy Forwarder (HF), a Splunk HTTP Event Collector (HEC) data input needs to be created. This is how the BindPlane Agent will send data back into the Splunk ecosystem. Additionally, HEC needs to be enabled under Global Settings. In the screenshots below, there are references to OTEL/otel. This is the BindPlane Agent, which is an OpenTelemetry collector. BindPlane Configuration In BindPlane, we create a configuration for the agent that matches the parameters specified in the UF and HF. This configuration will have a TCP source that matches the port specified in the UF tcpout:bindplane stanza, and a destination that matches the HF HEC data input. Data Flowing Now that everything is configured correctly, data should flow through the BindPlane Agent. We can see this data flow on the topology view on the agent page in BindPlane. Additionally, the data can be viewed on the Splunk Indexer. The data is also being replicated to Google Cloud Logging, which we will dive into in part 2 of this series. Conclusion With the proper configuration in place, data is actively flowing through the BindPlane Agent. This integration gives great flexibility in data input and extraction. In part 2 of this series, additional sources will be implemented into the pipeline; which can be sent into the Splunk ecosystem. Additionally, data duplication for Google Cloud Logging will be examined. Part 3 of the series will look at deeper use cases of the integration, including breaking Splunk vendor lock, data retention compliance, and more. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com or join our community Slack Channel.

Integrating BindPlane Into Your Splunk Environment (Part 2)

Dylan Myers — Sat, 05 Aug 2023 14:10:00 GMT

Part 2 of 3: Other Sources & Destinations Preface It can often be challenging to collect data into a monitoring environment that does not natively support that data source. Bindplane can help solve this problem. As the Bindplane Agent is based on OpenTelemetry (and is also as freeform as possible), one can bring in data from disparate sources that the Splunk Universal Forwarder does not easily support. Prerequisites The environment built in Part 1 Additional data sources For the blog, I will be using /var/log/messages as an additional data source. This source could be added to the Splunk UF, but it is easier to collect it directly. Logs in /var/log often require creating custom source types or downloading community Apps/TA. New Source In BindPlane In Bindplane, we want to add a new source to our configuration. This will be a File source. The following configuration values need to be set: File(s): /var/log/messages Log Type: var_log_messages (This is optional) Parse Format: regex Regex Pattern: (?P\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(?P[^\s]+)\s(?P[^\[]+)\[(?P\d+)\]:\s(?P.+) Parse Timestamp Checkbox: Checked Timestamp Field: timestamp Timestamp Format: Manual Timestamp Layout: %b %e %T 3-letter month abbreviation (%b) Space padded day of month (%e) HH:MM:SS (%T) Timezone set to the tz of the server; for me this is: America/Detroit Save this new source, and click “Start Rollout” to apply it to the agent(s) Data Flowing We can see our updated topology view’s data flow diagram with the new source in. For consistency with the Splunk source metadata, I added a processor to add a new body field called `entry_type` set to `LinuxSystemMessages.` I extract this field on the Splunk side for easy searches. In Splunk, this will look like so: The same search in Google Cloud Logging will look like this: Conclusion Using a Bindplane Agent to collect log data, virtually any logs can be sent to Splunk. Sending these logs to Google Cloud Logging or any other supported platform can also satisfy different use cases. It can also be used when moving from Splunk to another platform or vice versa. Allowing you to, for a time, send data to both platforms. This aids transition by overlapping the two platforms and allowing you to make sure the new platform’s capabilities match or exceed the ones you are leaving. Breaking vendor lock is one of the topics we will examine in part 3 as we continue to build on our environment. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com or join our community Slack Channel.

How to Remove Fields with Empty Values From Your Logs

Ryan Goins — Sat, 29 Jul 2023 16:18:00 GMT

Much of the log data we handle doesn't offer substantial insight and can be conveniently removed from your logs, helping us reduce costs. What may seem like a small adjustment, like deleting an attribute, can have significant implications when scaled up. A typical case involves fields in your logs presenting empty values or housing data considered irrelevant. Below, we’ll take a look at a few examples of what this looks like and how you can take action in BindPlane OP. Nginx The Nginx logs presented below contain several fields where the value is simply a "-". This is a common occurrence with Nginx, providing no substantial value. It's recommended to eliminate these fields. Postgres The Postgres logs displayed below have a 'role' and 'user' field, both of which are entirely empty. Here’s how you can remove these fields in BindPlane OP Incorporate the “Delete Empty Values” processor into your pipeline. Determine the types of values you want the processor to search for. By default, it targets null values, but it can also identify empty lists or maps. Advanced configuration: If there are any fields you don’t ever want to be removed, specify them as an exclusion. Specify strings that the processor should consider as empty. For instance, in the case of the Nginx logs above, we want to regard “-” as an empty string and should include it in this list. Use Live Preview to validate the processor is working as expected! That’s it! Check out the video tutorial below if you’d like to see this in action. For questions, requests, and suggestions, reach out to us or join our community slack channel.

Transforming Your Telemetry Has Never Been Easier

Ryan Goins — Sat, 22 Jul 2023 20:13:00 GMT

As the foundation of your observability stack, BindPlane OP provides great visibility into your telemetry data, all the way from collection to its final destination. With the introduction of Live Preview in BPOP Enterprise, and a brand new processor workflow, we’ve now made this even better. Live Preview Live Preview shows you changes to your telemetry in real-time as you’re working with your data. Add fields to your logs, mask sensitive data, delete unnecessary attributes and instantly validate those changes had the intended effect. With Live Preview, you now have the power to test and experiment with your data before rolling out changes to your fleet of agents. Nice. Updated Processor Workflow Adding and editing processors has been completely re-imagined, making it easier than ever to make and preview changes to your telemetry. Here’s what’s new: We’ve added Snapshots to the left side of the workflow, giving you access to your telemetry exactly when you need it most. Snapshots has been enhanced to provide you with a view of data at that specific point in your pipeline. Previously we only showed you a sample of data as it exited the pipeline. This gives you more control and a deeper understanding of your telemetry. For BPOP Enterprise customers, Live Preview is now on the right side of the workflow giving you instant feedback as you work with the processors in your pipeline. Here’s a quick demo of these great new features in action! For questions, requests, and suggestions, reach out to us or join our community slack channel.

Super Mario World Meets Observability Pipelines: Unlocking the Power of Data with BindPlane

JJ Jeffries — Sat, 15 Jul 2023 18:25:00 GMT

You may be thinking, “What on earth is an observability pipeline? And how does the Super Mario World fit in?” Hold on to your hat, my friend, because you’re about to find out! What’s an Observability Pipeline? First things first, we need to lay down some basics. In the most (oversimplified) sense, an observability pipeline transports your data from source to destination just like how Mario gets from place to place. It’s a digital conduit through which the data about your software system’s behavior flows, carrying vital information about your IT ecosystem's overall health and performance. Now, this isn’t just about knowing if your system is working or not. It’s about how well it’s working, why it’s not working when it’s not, and whether it might be considering a career change to become a professional gamer in the near future. Observability pipelines deal with three types of telemetry data: logs (your system’s personal diary), metrics (the health meter for your software), and traces (the digital breadcrumbs Mario and Luigi wish they had). An effective observability pipeline will harvest all of this data, mull it over, and then provide actionable insights about your system’s performance. Related Content: Understanding Observability: The Key to Effective System Monitoring But Why is this Important? “But why is this so important?” you ask. Imagine throwing a party in the Mushroom Kingdom without knowing you’re about to be invaded. The guests are arriving (user traffic), the music is pumping (system processes), and in the dark corners, Bowser (usually a bug or issue) is making a mess. Observability is like having Mario, a mushroom or fireflower, and supporting crew all in one. It allows you to find and clean up those messes before your guests notice, and the party turns into a complete disaster when Princess Peach is kidnapped. The Super Star BindPlane Now, let’s talk about a little superstar in the observability world, our product, BindPlane. It’s kind of like the Konami Code of the IT world, collecting performance metrics, log data, and all sorts of useful information. This bad boy doesn’t discriminate – it’s as comfortable hanging out with your on-premises tech as it is with your fancy cloud-based applications. BindPlane then delivers all that valuable data to your favorite observability or monitoring platforms in a normalized and enriched format, making it easy to understand. It’s like the warp zone of data – it doesn’t just take you to the next stage; it presents you with several pipelines of insights ready to take you to the level of your choice! And the flag on top? With BindPlane, you don’t have to worry about compatibility issues because this master of adaptability plays nicely with almost any platform. No more digital Thwomps and compatibility Chain Chomps. Just smooth, seamless data delivery. Related Content: Configuration Management in BindPlane OP Conclusion So, in the end, what’s an observability pipeline? It’s your software system’s cheat code, helping it stay healthy, efficient, and ready for whatever digital Koopa shell the world throws at it. And BindPlane? Well, that’s your warp zone in the world of observability, which gives you choice and control. So there you have it, folks, your crash course in observability pipelines and how BindPlane takes them to a whole new level. Now, go forth and observe! Here we gooooo! For questions, requests, and suggestions, reach out to us or join our community slack channel.

How we simplified our React components using Apollo Client and JavaScript Classes

Dave Vanlaningham — Fri, 09 Jun 2023 15:22:26 GMT

With the release of BindPlane 1.15.0, observe has introduced a new rollout flow. Now, a user can apply configuration changes to agents in a safer and more observable way. This new feature posed some interesting challenges on the front end, necessitating creative programming to keep our front end simple, readable, and well-tested. The Problem Our ConfigurationEditor component lies on a single Configuration page. This component is at the core of functionality for BindPlane OP, allowing users to view their current Configuration version, edit a new One, and inspect historical versions. This component controls which tab a user sees and the selected telemetry pipeline (i.e., Logs, Metrics, or Traces). Determining the initial state of this component isn’t straightforward, as it depends on data we’ve fetched from our server via the Apollo GraphQL client. Below is a simplified version of what this logic might look like. Alright, what’s going on here? We: Determine several variables (e.g., newVersion) based on the data we receive. We used a useEffect hook to set our stateful variables when those variables change. It works, but we see some issues right away: It’s hard to test. This will require a lot of mocked GraphQL responses to make sure we’re displaying the correct state. It’s hard to read. This maze of .find and data?. is sure to be glossed over. When code isn’t read, it isn’t understood and is more likely to be broken. We’re unsure if our data is defined or not. It’s not clear that our data variable has returned from the request. It’s harder to use attributes on this data in sub-components. The Solution We will clean this up, make it more readable and testable, and be sure that our data is defined. Define a Data Class At observIQ, we’re a big Go shop. One of the most powerful paradigms with Go is the ability to define methods on data structures. We can similarly do this in JavaScript by using classes. Check out this helper class. What did we do? We defined a class that implements our GetConfigurationVersionsQuery type. That is, we now have a class with all the fields of our query, and we can add some functions to help us work with the data. We are constructing the class with a data argument that must be defined. We can be sure that this data is present and ok to work with. For example, we can add a helper class to find our newVersion. Why is this better? This class is easily unit-testable. We can test that we correctly determine these versions based upon our data rather than a mocked response in a component test. It reduces lines and logic in our component, which we want to keep simple and readable. Let’s rewrite our component using these helper functions. Hey! That’s looking a lot better. We got rid of an entire block of logic inside the components render cycle and put everything in our useEffect. However, we can still make further improvements. Change out our useEffect for the onCompleted callback React’s useEffect hook is a powerful tool to update our state based on changing variables. However, it’s not quite right in this case, as can be seen by the smelly if statement: Instead, let’s use the handy onCompleted callback available in our query. Something like this: What have we done? We created a new stateful variable that contains our VersionsData class. We are setting it based on data we receive in onCompleted. We took all the logic from our useEffect and placed it in onCompleted. Why is this better? We know data is defined. This onCompleted callback requires that our data has returned without error. We only do this logic once. We only determine the initial state when our data comes in – not on first render. Summary By utilizing Javascript classes and the onCompleted callback, we have taken our front-end logic out of the component. React components are easiest to understand when they contain only their React-y things, like stateful variables and handler functions. Sometimes complex logic in the front end is unavoidable – but we found this pattern incredibly beneficial in simplifying our React components, improving readability, and enhancing testability.

A Step-by-Step Guide to Standardizing Telemetry with the BindPlane Observability Pipeline

Ryan Goins — Sat, 03 Jun 2023 13:32:00 GMT

Adding additional attributes to your telemetry provides valuable context to your observability pipeline and enhances the flexibility and precision of your data operations. Consider, for example, the need to route data from specific geographical locations, like the EU, to a designated destination. You can seamlessly achieve this with a ‘Location’ attribute added to your logs. Additionally, attributes can filter data, improve data classification, and aid in troubleshooting by providing enriched information about data sources. Attributes can tag data with specific server types, differentiate between production and development environments, or highlight high-priority data sources, improving data management efficiency. In this article, we’ll guide you through the process of leveraging these benefits by adding custom attributes to your telemetry data with BindPlane OP. Step 1. Identify the Attributes You Wish to Add In our example, we aim to incorporate an attribute called ‘Location’ with its value set as ‘EU’. Step 2. Add the Processor You can start by navigating to the configuration page and clicking on a processor node in your pipeline. Processors offer the flexibility to be integrated either immediately after a source or just before a destination, depending on the nature of the data you wish to affect. You can choose a location within the pipeline that best aligns with your requirements. Next, select ‘Add Processor’ and then ‘Add Attribute’. Step 3. Configure the Processor This step involves deciding on the types of telemetry to which you’d like to append this attribute—options include logs, metrics, and traces. BindPlane OP supports three actions: Upsert, Insert, and Update. In this instance, we will use ‘Upsert’. Input ‘Location’ as the Key and ‘EU’ as the value. Remember, this step is entirely customizable – you can add as many attributes as you want. Step 4. Validate the Processor To confirm the successful addition of your new attribute, use the Snapshots feature to inspect your data stream. Check if the new attribute, ‘Location’, has been appropriately incorporated into your telemetry. Conclusion Following these steps, you can easily standardize and enrich your telemetry using BindPlane OP, leading to more insightful and efficient data management. Enriched telemetry allows for better routing, filtering, classification, and troubleshooting, providing more control over your observability pipeline. Moreover, this standardization process can empower your team to quickly pinpoint key insights from your data, streamlining your decision-making process and enhancing your overall operational efficiency. In the future, as you continue to scale your operations and encounter more complex data environments, the ability to add attributes in BindPlane OP can become an essential tool. By embracing this level of customization and control, you can ensure your telemetry data is always primed to provide the most relevant, actionable insights for your evolving needs.

Maximizing ROI By Reducing Cost of Downstream Observability Platforms With BindPlane OP

Phil Cook — Sat, 27 May 2023 15:50:00 GMT

When engaging with potential customers, we are often asked, “How can we reduce spend on our observability platform like Splunk or Data Dog and simultaneously justify the cost of BindPlane OP?” Let’s dive in and see how the powerful capabilities of BindPlane OP can reduce your total ingest, and get a positive ROI on your BindPlane OP investment. How Do We Define ROI? Downstream observability platforms can be very expensive. Often, huge sums of raw log data are ingested into tools like Splunk and then indexed & analyzed later on. While it would seem like having all that data at your fingertips is great, we’ve found that there is a lot of noise and erroneous information ingested into these platforms. The important and useful data that organizations depend on is often a fraction of what is ingested in total. BindPlane OP can maximize your Return on Investment by reducing, parsing, and filtering this raw data upstream, before making its way to your favorite flavor of observability platform. What ROI Factors Do We Consider? Analyze overall spend of your observability platform. This includes: Licensing costs Ingestion volume Infrastructure costs Projected growth of data volume Year-over-Year Contractual obligations + vendor lock-in Tactics We Use to Reduce Observability Costs Filtering signal from noise Dropping data you don’t need Routing to low-cost storage Converting Logs to Metrics Removing Duplicate logs Saving costs with aggregator nodes Let’s look at how we implement each of these tactics in BindPlane OP. Filtering Using a processor, you can reduce the amount of trivial logs by filtering out logs you don’t care about. Let’s say you don’t care about any logs that have a severity below ERROR. You can add a new “Severity Filter” processor to your pipeline and set the severity to ERROR. It will automatically filter out all lower severity types. Dropping Logs Using the “Log Record Attribute Filter” processor, you can reduce noise by excluding any logs that match a certain key-value pair. You can use either a strict [exact] match or regular expressions. For example, let’s say that you are currently ingesting a syslog stream from an on-premise firewall, but you don’t care about entries containing a specific IP address. If you know the key=value pair that exists in the log stream, you can filter it out easily, saving on ingest costs. Routing Let’s say you are currently ingesting raw log data, but you want the ability to take those logs and instead send them to a cheaper option like Google Cloud Storage. It’s also common for different teams to have different operational tools. From a single agent, we can consolidate your collection footprint. Where the observIQ BindPlane agent can send to New Relic for APM Traces, Google Cloud Operations for SRE Metrics, and Splunk for SIEM use cases. Logs to Metrics Conversion Converting a series of raw logs to time series metrics allows you to filter out a mass amount of log data and turn them into dashboard data points you care about. Using the Extract Metric processor, you can use Expr language to extract a certain matching field from a log entry and convert it into a metric. Let’s say you are streaming a large set of logs from an application, with a string showing when a node is unhealthy. You can use the Extract Metric processor to match that status from the log entry, convert it into a metric, and send that metric to your observability platform for analysis. In some circumstances, this can reduce the logs ingested from that stream by 90%. Deduplication As stated earlier, there is often a lot of noise sent downstream that is unnecessary. Think of all those ‘200 OK’ messages in your HTTP logs. Let’s get rid of those duplicate entries and others by using the “Deduplicate Logs” processor. This processor will check every 10 seconds for duplicate log entries and remove them. For example, if you have 500 duplicate entries within that 10-second window, the processor will remove all of those duplicates and send only one entry to the destination. As you can imagine, the potential for cost savings here is huge. Saving Costs with Aggregator Nodes Replacing hundreds of vendor agents can be costly and time-consuming, and with BindPlane OP, you don’t have to. With our approach, you can leave your existing agents in place and point them to a fleet of aggregator nodes that unlock the full capabilities of BindPlane OP. In addition, these aggregators provide a secure way of passing along the data at the edge to the destination, rather than giving each agent access to the internet or API through a firewall. Save costs by reducing your overall fleet footprint, computing overhead, and consolidating virtual infrastructure. Conclusion BindPlane OP unleashes the power of observability by reducing costs and maximizing ROI. From filtering out trivial logs to converting raw data into meaningful metrics, BindPlane OP empowers users to make every dollar count. By embracing our innovative observability pipeline, organizations can drive efficiency, enhance data quality, and unlock the true potential of their observability investments. If you could save on your observability platform costs by ingesting only the data you truly care about, would you do it? For our customers, it was a no-brainer. Find out what you could save by starting a conversation in our Slack community or by emailing sales@observiq.com. Learn more about BindPlane OP: BindPlane OP Overview: https://www.youtube.com/watch?v=Hrqvyz_CfuU BindPlane OP Enterprise Features: https://docs.bindplane.observiq.com/docs/bindplane-enterprise-edition-features Documentation Portal: https://docs.bindplane.observiq.com/docs GitHub Repo: https://github.com/observIQ/bindplane-op

Understanding Observability: The Key to Effective System Monitoring

JJ Jeffries — Fri, 19 May 2023 19:44:36 GMT

In the rapidly evolving landscape of modern tech, system reliability has become critical for businesses to succeed. To ensure the stability and performance of complex distributed systems, companies rely on observability—a concept that isn’t synonymous but goes beyond traditional monitoring approaches. In this blog post, we will explore observability, the differences between telemetry data of metrics, logs, and traces, and why observability pipelines are essential for complete visibility. What is Observability? As our CEO, Mike Kelly, defined with The Cube at KubeCon EU, “There are many answers to that question, but there’s a technical answer in that it’s the ability to know the state of a system.” Ultimately, one wants to gain insights/analysis into the internal workings of a system based on its external outputs. Unlike monitoring, which focuses on specific metrics or predefined events, observability aims to provide a complete understanding of the system’s state, behavior, and performance. It enables teams to identify issues proactively, troubleshoot problems, and make informed decisions to improve system reliability. Related Content: Monitoring vs Observability: What is Reality? Telemetry: Understanding the differences between Metrics, Logs, and Traces To achieve observability, it is crucial to clearly understand the different types of telemetry data that can be collected and analyzed. Now, there’s debate about other forms, but we’ll stick to the basics of metrics, logs, and traces: Metrics Metrics are quantitative measurements that provide insights into a system's behavior over time. They are typically numeric values representing a particular aspect of system performance, such as response time, error rate, or resource utilization. Metrics are essential for tracking trends, setting thresholds, and triggering alerts based on predefined conditions. Logs Logs are textual records that capture specific events and activities within a system. They provide detailed information about what happened when it happened, and potentially why it happened. Logs are valuable for troubleshooting issues, conducting post-incident analysis and auditing system activities. They often include timestamps, log levels, error messages, and contextual data. Traces Traces provide a way to visualize the flow of transactions or requests across a distributed system. They capture the sequence of interactions between various components and services, allowing teams to identify performance bottlenecks, latency issues, and dependencies. Traces are beneficial in microservices architectures, where understanding end-to-end request flows is crucial. Related Content: observIQ Earns Gartner® Nod for Cutting-Edge Observability Innovation The Importance of Observability Pipelines Organizations have to set up robust observability pipelines to harness the full power of observability. These pipelines are responsible for reducing, simplifying, standardizing, and helping organizations scale their telemetry data from different sources to one or multiple destinations. Below are three points as to why these pipelines are essential: Data Aggregation Data is growing exponentially, and observability pipelines gather telemetry data from various sources, including metrics, logs, and traces. By centralizing and standardizing this data, organizations can have a holistic view, all in the same format. Routing With the massive amounts of telemetry data collected, organizations can easily route to appropriate destinations based on business requirements. Whether it's for real-time analysis or storage for compliance reasons, being able to transport data is key Filtering A report from the European Commission suggested that up to 90% of the data collected within organizations is never analyzed or used strategically. With observability pipelines, companies can remove unnecessary data, sending what matters to different endpoints, reducing the amount being ingested, and ultimately saving on costs to SIEM solutions like Splunk. Conclusion In conclusion, observability is a game-changer, offering a holistic understanding of system behavior, proactive incident response, and faster problem resolution. By implementing robust observability pipelines and leveraging the power of telemetry data, organizations can enhance system reliability, mitigate risks, and ultimately deliver exceptional user experiences in today’s digital landscape. Embracing observability is no longer an option but a necessity for companies seeking to thrive in an increasingly interconnected and complex world.

How to Reduce the Volume of NGINX Logs

Ryan Goins — Fri, 12 May 2023 18:21:21 GMT

If you’ve worked with NGINX web servers, you know they’re efficient but can generate a lot of log data. While this data is valuable, sorting through it can be a challenge, and the storage and processing costs can quickly add up. This is where BindPlane OP comes in. It helps reduce log volume while still preserving the crucial information. It streamlines your data, filters out the irrelevant bits, and zeroes in on key data points, helping manage storage and keep costs under control. In this post, we’ll guide you through refining an NGINX log data stream using BindPlane OP. We’ll dive into how to extract valuable metrics and reduce log volume by filtering out unnecessary logs. By the end of this, you’ll be able to navigate your log analysis process more efficiently, saving time and money. Creating Metrics From Logs The first step in taking control of your NGINX log data stream is to squeeze out some value from those logs by crafting meaningful metrics. That’s where the Count Telemetry processor in BindPlane OP comes into play. This processor counts the number of logs that meet a certain condition and generates a new metric. This means you keep the crucial info but can let go of some logs from the pipeline. Configuring the Count Telemetry Processor Getting the most out of the Count Telemetry processor means setting it up right. We’re going to make two unique metrics. Here’s how: The first is for counting logs dimensioned by http status code: Click a processor node in your pipeline Add Processor Count Telemetry Enable Logs Configure the processor as follows: Match Expression: attributes.http_request_status != nil The match expression defines the logs you’d like to match against (and therefore count). In this example we want all logs that have an http_request_status Metric Name: nginx.requests.status Enable Attributes: Key: code Value: attributes.http_request_status Collection Interval: 60 If you’ve done this correctly, here’s what the resulting metric will look like: The second is for counting logs and dimensioned by the the path of the request: Repeat steps 1-4 above Configure the processor as follows: Match Expression: true Metric Name: nginx.requests.path Enable Attributes: Key: path Value: attributes.http_request_requestUrl Collection Interval: 60 Reducing Log Volume by Dropping Health Check Logs: After we’ve pulled metrics from our logs, we can trim down our data stream even more by cutting the log volume. A great way to do this is by dropping logs for paths you don’t find valuable. In this example we’re going to drop health checks, but you should adopt this to fit your needs. This step can lead to a substantial reduction in log volume. Configuring a Processor to Exclude Health Check Logs: To implement this, we’ll configure a new processor using the Log Record Attribute Filter. Click a processor node in your pipeline Add Processor Log Record Attribute Filter Action: Exclude This will exclude the logs the processor matches Match Type: Strict Attributes: Key: http_request_requestUrl Value: /health In our example, this process results in a 14% reduction in log volume, making our log analysis process more manageable and efficient. By following the steps outlined in this guide, you’ll be well-equipped to manage and refine your NGINX log data stream using BindPlane OP, leading to better insights, improved system performance, and significant cost savings. Get started today by installing BindPlane OP and joining our Slack community where we can help you start reducing your telemetry data.

Deciphering Complex Logs With Regex Using BindPlane OP and OpenTelemetry

Dylan Myers — Thu, 04 May 2023 21:26:07 GMT

Preface Parsing logs with regex is a valuable technique for extracting essential information from large volumes of log data. By employing this method, one can effectively identify patterns, errors, and other key insights, ultimately streamlining log analysis and enhancing system performance. Prerequisites BindPlane OP & a BindPlane Agent (Custom OpenTelemetry Collector) A complex log file needing custom parsing Knowledge of Regex A selection of log samples that match all possible variations. (Optional) A good regex testing tool such as regex101.com Here is a link to regex101 for the examples from the blog: https://regex101.com/r/6hhy6K/4 Complex Log Data Samples In this post, we’ll examine log entries that resemble the examples provided below. By utilizing a script to write these entries to a file with the current timestamps, we can effectively work with real-time data. Dissecting The Data We can now take the first log entry above, and start dissecting it into sections that we wish to parse out. First, we’ll notice that we have two timestamps: The second timestamp is the one we will preserve to become our official timestamp, because it contains more information (timezone and year are useful, while the day of week isn’t really) that we can use to achieve the highest precision. Breaking this down, we will write a non-capturing pattern to match the first timestamp. ^\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s+The caret “^” anchors to the start of the line. This is followed by “\w{3}”, which captures the 3 letter month abbreviation. After the month, is “\s\d{2}\s”, which is to capture a space; the 2 digit, 0 padded day of the month; and another space. Finally, we have “\d{2}:\d{2}:\d{2}\s+” – for 2 digit hour, 2 digit minute, 2 digit second, and 1 or more spaces. For the second timestamp, we want a named capture group. This will become a named field in the JSON blob of parsed out fields. (?P\w{3}\s\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s\w{3}\s\d{4}) We’ve named this capture group “timestamp”. It contains the same basic regex as the other timestamp, with the addition of “\w{3}\s” at the start to capture the abbreviated day of the week, and “\s\w{3}\s\d{4}” replacing the “\s+” at the end in order to capture the 3 character timezone, and the 4 digit year. Going further into the log message, we will want to parse out the hostname and the system: In this message, our hostname is loggen-app10, and our system is test-system[712]. I was not told what the [712] was when I received these logs. I made the assumption that it is the PID (process ID), but I chose not to parse it out separately for now. Parsing these fields is fairly simple, and we end up with: “(?P[^\s]+)\s+(?P.*?):\s+”. We have a pair of named capture groups, hostname and system. The pattern for hostname is “[^\s]+”, which says capture any non-space character and capture as many of them as you can (greedy). This is followed by “\s+”, which captures at least one, but as many as possible (greedy again) space(s). The capture group for system is even easier, because after the space(s) we capture everything up to a colon character. To do this, we use “.*?”. What that pattern says is, capture any character 0 or more times, but don’t be greedy. After that, we have the colon character and another 1 or more spaces greedy. These aren’t captured, but are needed to pad out between this section and the timestamp section we wrote above. This results in the following starting pattern: ^\w{3}\s*\d{2}\s*\d{2}:\d{2}:\d{2}\s+(?P[^\s]+)\s+(?P.*?):\s+(?P\w{3}\s\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s\w{3}\s\d{4}) I won’t go through the entire pattern creation process, but I continue to chunk it up as I did above. The resulting final pattern is: ^\w{3}\s*\d{2}\s*\d{2}:\d{2}:\d{2}\s+(?P[^\s]+)\s+(?P.*?):\s+(?P\w{3}\s\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s\w{3}\s\d{4})\|(?P\w*)\|((?P.*?):\s+)?\[?(?P.*)\]?\|(?P.*?)\|(?P.*?)\|(?P[\d\.:]*)\|(?P[\d\.:]*) This final pattern includes the following named capture groups, which become fields in our JSON blob of parsed data: hostname system timestamp app_name message_type message event_message username external_ip internal_ip Implementing The Regex In BindPlane, I create a File source. This looks at my generated log file in /root/complex.log. I’ve selected regex under the Parse Format. Under the Regex Pattern, I put in the final pattern above. I’ve checked the box for Parse Timestamp, chosen Manual for the format, and put in the ctime parsing codes for my timestamp’s pattern. Once done, it looks like this: Sending & Verifying The Data To complete the testing, I need to create a destination and check the data there. For my use case, I’ve chosen a Google Cloud Logging destination. Once my pipeline configuration is complete, I attach it to an agent. After it has run for a few moments, I click the “View Recent Telemetry Button” on the agent’s page. The telemetry view shows me the following parsed log: Finally, I check it on the Google Cloud Logging console as well: This displays the same log entry, and it has a jsonPayload of our body’s JSON map object from the recent telemetry view. Next Steps For next steps, I would want to look at parsing that message value. It is frequently a key/value set; as it is in the screenshots & samples above. I could pass the data onward to a processor that parses these key/value entries into another layer of JSON. In the above example, body.message would get parsed back into itself, and you could have fields such as: body.message.result=Service Access Granted body.message.service=https://innosoftfusiongo.com/sso/logi… body.message.principal=SimplePrincipal(id=dawsonb, attributes={mail=[dawson.branden@fakeuni.edu], eduPersonAffiliation=[Staff], ou=[Recreation/Student Rec Center], givenName=[Dawson], cn=[Dawson Branden], title=[Asst. Director], employeeNumber=[5000000], o=[Vice ChancellorStudent Affairs], fakeuniOrg=[Vice ChancellorStudent Affairs], casMFARequired=[YES], uid=[dawsonb], eduPersonPrimaryAffiliation=[Staff], fakeuniCid=[5000000], fakeuniSeparationDate=[99991231000000Z], UDC_IDENTIFIER=[dawsonb], sn=[Branden], organizationalStatus=[Staff]}) body.message.requiredAttributes=”” Even this could be parsed further by putting a body.message.principle through a key/value parser as well. Now, someone is bound to wonder, “Why didn’t you just use regex parsing of the body.message subfields as well?” The answer: It is too inconsistent. The regex would be incredibly, and unreasonably, complex when we have the capability to parse key/value pairs already. Conclusion Many forms of data can be found in log files. This data often needs to be parsed to make it both more easily readable for humans and easier for automation and tooling later in the chain to act upon. While the example I worked with was performed on a simple file log, the techniques herein can be used on any log stream. In addition to regex parsing, BindPlane also supports json, xml, key/value pairs, and character-separated values. With the use of processors, these parsers can be chained together to parse embedded data and manipulate it all into a usable format.

Reducing Log Volume with Log-based Metrics

Josh Williams — Sat, 29 Apr 2023 15:46:54 GMT

As the amount of telemetry being collected continues to grow exponentially, businesses are continuously seeking cost-effective ways to monitor and analyze their systems. Data collection and monitoring can be expensive, especially when dealing with large volumes of logs. One approach to maintaining visibility while reducing the amount of data collected is through creating log-based metrics. However, traditional platforms that offer this capability often perform the computation at the platform level, which still incurs storage costs for both logs and metrics. To address this issue, BindPlane OP performs the metric computation at the edge, allowing users to reduce costs and gain greater control over their data. In this blog post, we’ll explore the concept of log-based metrics, the power of edge-based processing, and the benefits this approach brings to data collection and monitoring. Understanding Log-based Metrics Log Count The first approach to creating log-based metrics involves counting the number of logs over a specified time interval and generating a metric dimensioned by attributes present in those logs. This method allows users to condense large volumes of logs into meaningful and actionable metrics. Let’s use the example of access logs to illustrate this concept. By counting the logs over an interval, we can create a metric called “http.request.count.” We can then dimension this metric by the different status codes present in the access logs. This would enable users to keep track of the frequency of HTTP requests with specific status codes. For instance, users could set up alerts when the “http.request.count” metric surpasses a certain threshold for 4xx and 5xx status codes, indicating an issue in the system that requires attention. By utilizing this method, users can reduce the amount of data collected while still maintaining visibility into their systems, leading to more efficient monitoring and quicker issue identification. Log Extraction The second approach to creating log-based metrics involves extracting numerical values from logs and using these values to generate metrics. This method allows users to derive deeper insights from their logs by visualizing and analyzing the numerical data contained within them. Using the access logs example again, we can extract the average duration of a request and create a metric based on this value. This metric would provide users with an understanding of the performance of their system in terms of request durations. By analyzing and visualizing this metric over time, users can identify patterns, trends, and potential bottlenecks within their system. The need for a cost-effective solution As we discussed, platforms that offer log-based metrics computation perform these calculations at the platform level. This means that customers are still paying for the logs they send to the platform, as well as for the metrics they create from those logs. As a result, this approach can become quite costly, particularly when dealing with large-scale systems and high volumes of data. To address this issue, a more cost-effective solution is required—one that enables users to maintain visibility into their systems while reducing data collection costs. Our solution BindPlane OP Enterprise includes two different processors that can create log-based metrics at the edge before sending to your destinations. This allows our customers to perform costly calculations from within their observability pipeline, rather than paying for this computation and storage at the platform level. The first processor, called the Count Telemetry processor, provides the ability to count all three types of telemetry (logs, metrics, and traces). Typically used for counting logs, it can either count the number of logs passing through it, regardless of content, or create individual counts for dimensions specified by the user. For example, with access logs, we would likely specify the “status_code” and “endpoint” attributes as our dimensions. In contrast, with health check logs, we would likely avoid dimensioning altogether, as these are typically repetitive. The second processor, called the Extract Metric processor, enables users to extract a numerical value from any field on a log. The resulting metric is highly configurable, allowing users to specify the name, units, and extracted dimensions. In the case of access logs, this means we could extract the duration field from a log and convert it into a “request.duration” metric. We could then specify ms as the units and even dimension this metric based on the “endpoint” attribute of the log. Using this setup, we can now easily pinpoint which endpoints have the longest request durations. Implementing log-based metrics on the edge offers several advantages that can significantly enhance the data collection and monitoring experience for users. To recap, here are the main benefits: Reduced costs: Users can drop unnecessary logs before they reach the platform and compress the value of that data within a metric. Enhanced flexibility: Users are empowered to flexibly route the bulk of their logs to more cost effective locations, while sending their metrics to a more comprehensive solution with alerting and analytics. Increased visualization: Users can visualize and extract value from their logs, even in platforms without robust logging capabilities. Get started today by installing BindPlane OP and joining our Slack community where we can help you start reducing your telemetry data. By leveraging log-based metrics, you can unlock cost-effective monitoring and make more informed decisions for your business.

How to Mask Sensitive Data in Logs with BindPlane OP Enterprise

Ryan Goins — Fri, 21 Apr 2023 19:12:58 GMT

Logs often contain sensitive data, including personally identifiable information (PII) such as names, email addresses, and phone numbers. To maintain security and comply with data protection regulations, it’s crucial to mask this data before storing it in your log analytics tool. BindPlane OP streamlines this process with the Mask Sensitive Data processor, ensuring your logs are safe and compliant. Step 1: Identify the Sensitive Data to Mask First, identify the sensitive data in your logs that you want to mask. Use the Snapshots feature in BindPlane OP Enterprise to examine the logs flowing through your pipeline: You can just navigate to an agent page. Click “View Recent Telemetry” in the top right corner. You can browse the logs for the PII data you want to mask. In our example, we’re looking for email addresses to conceal. Step 2: Add the Mask Sensitive Data Processor Once you have identified the sensitive data to mask, it’s time to add the Mask Sensitive Data processor to your pipeline: Return to the configuration page and click on a processor node in your pipeline. Processors can be added immediately after a source or before a destination, offering complete flexibility over the affected data. You can choose the location in the pipeline that best suits your needs. Click “Add Processor.” Select “Mask Sensitive Data.” Step 3: Configure the Processor By default, BindPlane OP includes rules to mask credit card numbers, email addresses, phone numbers, SSNs, and IP addresses. Customize the processor’s configuration to suit your specific requirements: Modify the rules by removing any unnecessary ones or adding custom rules using regex. Review the available masking options and choose the best fit for your needs. Click “Done,” followed by “Save.” Step 4: Validate the Masking Process After applying the Mask Sensitive Data processor to your pipeline, ensure that it is working correctly: Use Snapshots again to inspect the data stream. Verify that the sensitive data is being masked as intended. If successful, masked values should resemble the example below. Protecting sensitive data in logs is critical to data security and compliance. With BindPlane OP’s Mask Sensitive Data processor, you can quickly identify, configure, and validate the masking process, ensuring that your logs remain secure and compliant while providing valuable insights for your organization.

Tracing Services Using OTel and Jaeger

Nico Stewart — Wed, 12 Apr 2023 14:58:27 GMT

At observIQ, we use the OTel collector to collect host/container-level metrics and logs from our systems. However, to monitor our applications (APM) in more detail, we use the OTel SDK and instrumentation libraries. This post aims to provide a quick start to setting up tracing exporting to a local Jaeger instance. What are traces? A trace is a collection of spans, which represent the execution of a logical unit of work. For example, this could be an HTTP request to an API which requires the server to verify the request is authenticated and fetch data from a database; the trace could consist of two traces: verifying authentication and, if successful, querying the database. Each span can have labels/attributes added to it to provide more detail, for example, the ID of a record being requested or the number of events about to be dispatched. Because spans are organized inside a trace, attributes don’t need to be added to each one. A common pattern is creating a trace for each API request and a span for each method that relies on an external system or time-intensive computation. For more detailed explanations, I recommend reviewing OpenTelemetry’s documentation. How to configure Jaeger We’ll configure Jaeger to run locally as a container for simplicity's sake. Jaeger is now available through its web UI at localhost:16686, and we’ll export traces over the port 14250. How to configure exporter The first requirement is to set the TracerProvider using the SetTracerProvider method from OTel. For exporting to Jaeger, we’ll actually configure an OTLP exporter pointing to our collector running an OTLP receiver. This allows us to perform filtering in the collector and supports exporting to other destinations like Google Cloud Tracing or Zipkin without changing any code. The exact method for creating the exporter is in the NewOTLPExporter method, but this is a simpler version for exporting to the collector locally on port 4317: With TracerProvider set, any new traces created with otel.Tracer will automatically export to the collector. Instrumenting OTel provides some packages to automatically instrument popular HTTP libraries, such as gorilla/mux and gin-gonic/gin. Using these will create traces for each request, requiring no code changes. To add additional instrumentation, we need a Tracer instance. A simple pattern for this is to instantiate a named tracer at the package level and reuse it when needed. As a simple example, imagine we have an HTTP API that looks up orders by ID from our Redis database. You can find the full code being referenced here at GitHub. Start returns a new span, and a context containing the span. If the context already contained a span, this new span is created as the child of it. We ensure the span is marked as completed and collected by deferring the call to span.End(). We can set attributes on the span later, or when creating it. In addition to attributes, we can set the “status” of each span to indicate if an error happened. You could also set the status as successful, but to reduce lines of code we only explicitly set it for errors. RecordError differs from SetStatus in that it doesn’t modify the status of the span; it only records that an error occurred at the current time. Viewing traces Once the server is running, we can make some queries to create traces. I know the database has nothing with the key 23b, so I expect to see the trace reflect an error: Alternately, querying for the order 23 should be successful. These are obviously very basic examples, below is a real trace from BindPlane OP which shows the steps taken to process a message from a collector: Summary Tracing can give deeper insight into applications than just logs or metrics on their own. By using the OTel libraries and exporting traces as OTLP, we can collect traces for multiple destinations using the same code. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com or join our open-source community Slack Channel.

How to Monitor Cloudflare with OpenTelemetry

Sam DeHaan — Wed, 05 Apr 2023 01:01:30 GMT

With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily monitor Cloudflare. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here. In this blog, the Cloudflare receiver is configured to monitor logs locally with OTLP– you can use the receiver to ship logs to many popular analysis tools, including Google Cloud, New Relic, OTLP, Grafana, and more. What signals matter? Cloudflare is a web infrastructure company that provides a variety of services to websites and internet applications including content delivery, DDoS protection, SSL encryption, domain registration, and more. The receiver collects logs by accepting log uploads from a LogPush job configured via the Cloudflare API. https://developers.cloudflare.com/logs/about/ LogPush is only available to sites on a Cloudflare Enterprise Plan The receiver supports all of the datasets supported by LogPush jobs (http_requests, spectrum_events, firewall_events, nel_reports, dns_logs, for example), so whatever activity a user is looking for from Cloudflare is available. For example, you can monitor http_requests for insight into server error frequency, request throughput, request origin trends, etc. Installing the Receiver If you don’t already have an OpenTelemetry collector built with the latest Cloudflare receiver installed, we suggest using the observIQ OpenTelemetry Collector distro that includes the Cloudflare receiver (and many others). Installation is simple with our one-line installer. Come back to this blog after running the install command on your source. Configuring the Receiver Navigate to your OpenTelemetry configuration file. If you’re using the observIQ Collector, you’ll find it in one of the following location: /opt/observiq-otel-collector/config.yaml (Linux) C:Program FilesGoogleCloud OperationsOps Agentconfigconfig.yaml (Windows) Edit the configuration file to include the Cloudflare receiver, as shown in the example below. Detailed instructions for configuring Cloudflare monitoring can be found here on GitHub. A couple of items to keep in mind: Cloudflare requires that a LogPush endpoint supports HTTPS, so a fully valid (not self-signed) SSL certificate is absolutely required. The receiver has to be set up and running before configuring the LogPush job, at this point, Cloudflare will send a “test” message to the receiver to confirm the configuration. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com or join our open-source community Slack Channel.

Integrating OpenTelemetry into a Fluentbit Environment using BindPlane OP

Joe Sirianni — Wed, 29 Mar 2023 11:54:43 GMT

Fluentbit is a popular logs and metrics collector used for monitoring anything from virtual machines to containerized applications. With the rise of BindPlane OP and OpenTelemetry, it is not uncommon for organizations to begin replacing Fluentbit, or integrating OpenTelemetry with Fluentbit. An organization may have hundreds or thousands of Fluentbit agents deployed to their endpoints but they want to manage the pipeline using BindPlane OP. These organizations have two choices: Replace their Fluentbit agents with OpenTelemetry collectors Integrate OpenTelemetry into their existing architecture The second option is often desired as it allows the existing Fluentbit agents to remain in place, as they are already configured and working great. This blog will show how we can insert OpenTelemetry into the middle of the architecture using BindPlane OP. Architecture For the purpose of this blog, we will be using Google Compute Engine instances (GCE). The approach taken in this blog can be used with any backend supported by Fluentbit and OpenTelemetry, such as Elasticsearch or Grafana Loki. We will be using Google Cloud Logginghttps://cloud.google.com/logging. Deployed to our environment, we have the following GCE instances: “Api” example application and Fluentbit (x5) BindPlane OP server observIQ OTEL Collector managed by BindPlane (x1) Fluentbit to Google Cloud (Stackdriver output) Before implementing OpenTelemetry, Fluentbit is configured to send all logs straight to Google Cloud. The Fluentbit configuration looks like this. The sample application log is being read at “/opt/logs/log.json”. A “record_modifier” filter is used to add the system’s hostname to the log record. This will allow you to filter logs based on hostname. The logs can be viewed in Cloud Logging and will look like this BindPlane Configuration Now that Fluentbit is sending logs to Google Cloud, we can move on to configuring BindPlane OP and its managed observIQ OTEL collector. Once configured, we will reconfigure Fluentbit to forward to the OpenTelemetry collector instead of sending logs to Google directly. This blog does not cover BindPlane’s installation. BindPlane OP’s installation documentation can be found here. Within BindPlane, create a new configuration with the OpenTelemetry source and the Google Cloud destination. Because we are running on GCE, default options for the OTLP source and Google destination will be sufficient. OpenTelemetry Collector Configuration On the Agents page, click on “Install Agents”. Select your platform and configuration. Copy the install command to the collector system. Once installed, the agent will appear with the configuration attached. Fluentbit to OpenTelemetry With BindPlane OP and a managed agent configured, we can move onto updating Fluentbit to forward to the OpenTelemetry collector instead of sending logs directly to Google. Modify the configuration to use the OpenTelemety output instead of Stackdriver. The new configuration looks like this: Note that the Host option must point to a resolvable hostname or an IP address. The port is `4318` which matches the OTLP HTTP port configured in the BindPlane configuration’s OTLP source. Once configured, restart all Fluentbit collector processes. You will notice that the configuration’s measurements will begin showing up. We can see the throughput at all stages of the pipeline. In addition to throughput measurements, you can confirm that logs are flowing to the collector by using the Recent Telemetry feature, available on the agent’s page. The same log can be viewed in Google Cloud: But There’s More Now that logs are flowing from Fluentbit to a BindPlane managed collector, we can immediately see value with the following features: Pipeline throughput measurements Agent snapshots Pausing Telemetry Enrich logs using processors Commonly, users will want to add metadata in order to enrich their logs. BindPlane can easily solve this using processors. We can add Log Record Attributes to the logs using the Add Log Record Attribute processor. Once saved, the agent’s recent telemetry snapshot will show additional attributes being added to each log record. Adding log record attributes is one of many ways BindPlane OP can be used to enhance your telemetry. Additional Thoughts / Next Steps Redundancy In production, it is recommended to utilize multiple agents when using the OTLP source type. These agents can be load balanced to distribute load and provide redundancy. When using a load balancer, the Fluentbit OpenTelemetry output should point to the loadbalancer’s IP address instead of an individual agent. Replacing Fluentbit The architecture shown in this blog proves that Fluentbit and OpenTelemetry can live seamlessly in the same environment. It is not necessary to replace Fluentbit when moving to OpenTelemetry and BindPlane. Users at large organizations can adopt OpenTelemetry by installing BindPlane managed OpenTelemetry agents on new systems, while keeping the old Fluentbit agents in place. This allows OpenTelemetry adoption to happen without requiring large changes to existing infrastructure. To learn more, visit our docs https://docs.bindplane.observiq.com/docs or ask questions directly by joining our bindplane slack community.

Five Things to Know About Google Cloud Operations Suite and BindPlane

Craig Lee — Wed, 22 Mar 2023 14:37:40 GMT

Google Cloud Operations is a robust integrated monitoring, logging, and trace-managed service for applications and systems running on Google Cloud and beyond. As part of our partnership with Google, we help extend Cloud Operations with BindPlane OP and OpenTelemetry monitoring for a complete monitoring solution. With BindPlane OP, Google Cloud Operations becomes a single pane of glass for monitoring all aspects of your data center, whether it’s on prem or running in the cloud. Did you know with BindPlane OP and Google Cloud Operations, you can now: Gather Metrics, Logs, and Traces from any data center observIQ, Distribution for OpenTelemetry and BindPlane can gather telemetry data from any data center or cloud. These observability signals are sent directly to Google Cloud Operations’ APIs, allowing Cloud Operations to be the single monitoring platform for your entire environment. Metadata adds business context to Telemetry in Google Cloud Operations Customers often add metadata to sort logs by Function, Application, Location, Cost Center, etc. This makes it possible to trace the significance of the business to the signals collected. Related Content: Getting Started with BindPlane OP and Google Cloud Operations BindPlane Configuration makes adding metadata easy to any Telemetry pipeline Metadata is quick and easy to add to a telemetry pipeline in BindPlane OP. In this example, we are using a Log Record Attribute. This can also be used to enhance metrics and traces. observIQ’s Products for Google Cloud Operations have an integrated support stream They are supported via a mutual support motion with Google and observIQ, where customers can initiate support requests with Google. observIQ will then be alerted to any issues with our products and work mutually with Google to resolve them. Monitor GCVE (Google Cloud VMware Engine) environments Monitoring GCVE is an everyday use case where the observIQ distro for OpenTelemetry can run on the Workload VMs. We can also monitor vSphere metrics and logs for complete coverage of your GCVE environment. Related Content: Exploring & Remediating Consumption Costs with Google Billing and BindPlane OP We are constantly working to improve the customer experience, so stay tuned. To learn more, visit https://observiq.com/solutions

observIQ Announces Enterprise Edition of Open Source Observability Pipeline BindPlane OP

observIQ Media — Thu, 03 Nov 2022 19:19:42 GMT

Grand Rapids, MI (November 3, 2022) – Continuing its commitment to open source observability, observIQ announces the enterprise edition of BindPlane OP. BindPlane OP provides the ability to control observability costs and simplify the management of telemetry agents at scale while avoiding vendor lock-in. BindPlane OP Enterprise adds 24/7 support, direct access to the BindPlane OP product team, and significant roadmap influence. It launches with Active Directory and LDAP authentication support for a unified and streamlined user authentication experience and meeting compliance requirements. In coming releases, the Enterprise edition will continue to add additional security and compliance functionality, including role-based access control (RBAC), secret management and audit reports. BindPlane OP addresses the growing challenge of exponential telemetry data growth in observability. It reduces telemetry data volume, filtering and deduplicating data for greater manageability and lowering the cost of data analytics. It also provides a single control plane for managing thousands of agents, with the ability to quickly deploy new agents, manage their configurations, and monitor their health in real-time. BindPlane OP supports OpenTelemetry processors and metric toggling, and data routing, unlocking full control of telemetry data. Quote from Mike Kelly, CEO of observIQ “As observability costs continue to rise, we’re seeing a growing recognition that observability pipeline management is key to controlling these costs and simplifying data collection. BindPlane OP Enterprise provides our customers with a solution to those challenges while leveraging the latest advancements in observability with OpenTelemetry.” The OpenTelemetry collector can be deployed to all hosts to gather metrics, logs, and traces immediately. BindPlane OP can also be deployed behind a firewall, without a connection to observIQ. It works with OpenTelemetry using the new Open Agent Management Protocol (OpAMP)for agent management, and will expand to support other OSS telemetry agents. Learn more at https://observiq.com/solutions/bindplane-enterprise/ and Join our Slack community. ### About observIQ observIQ develops fast, powerful and intuitive next-generation observability technologies for DevOps and ITOps – built by engineers for engineers. Learn more at www.observiq.com. Contact: media@observiq.com P: +1 650 996 0778 Follow us: LinkedIn: @observIQ Twitter: @observIQ

BindPlane OP Enterprise Reaches GA

observIQ Media — Thu, 03 Nov 2022 12:40:43 GMT

Today, we’re excited to announce that BindPlane OP Enterprise is now generally available. With single-sign-on support for LDAP/AD authentication and 24/7 customer support from one of the largest contributors to OpenTelemetry, BindPlane OP is ready for your enterprise environment. We introduced BindPlane OP in June with the mission of building an open-source observability pipeline that makes it easy to simplify and standardize your telemetry stack while helping you control costs with powerful data reduction tools. And it’s been clear that we’re on to something, as the early use and enthusiasm we’ve seen has continued to surpass our expectations. Working with many of you who are deploying BindPlane OP into large enterprise environments, we’ve consistently heard that you need the following in order to move BindPlane OP to production: The security features required to deploy in highly regulated environments Tools to scale BindPlane OP across your enterprise Support from observability and OpenTelemetry experts With BindPlane OP Enterprise, that is exactly what we’re delivering. While starting with authentication and best-in-class support, you can expect RBAC, configuration staging, and audit reporting to follow quickly. To get started with BindPlane OP Enterprise, reach out to us on Slack or email sales@observiq.com What else is new in BindPlane OP? While we’re excited to introduce BindPlane OP Enterprise, version 1.4.0 also brings some useful new features to the open-source edition. Data Flow Topology. For the first time, Data Flow Topology gives you complete visibility into your pipelines. At a glance, you can see which configurations send data to which destinations and, most importantly, how much they send. If you see a problem area, drill into the configuration to find the source, add a processor into the pipeline to re-route or reduce the data, and watch the impact reflected in the interface. Grab the latest release from our repo! Pause/Resume Telemetry Sometimes, you need to stop sending data as quickly as possible. Perhaps you’ve noticed PII being sent to a destination it shouldn’t be, or you need to rework your configuration. When editing a source or destination, you’ll find a pause/resume button that lets you instantly stop or start the data flow. We’re eager to build the best open-source observability pipeline in the world, so please reach out and let us know what you’d like to see on the roadmap. Or better yet, help us build precisely the observability pipeline you want by contributing to the project.

How to Enrich Logs and Metrics with OpenTelemetry Using BindPlane OP

Paul Stefanski — Mon, 17 Oct 2022 08:47:00 GMT

Data enrichment is the process of adding additional context or attributes to telemetry data at the source that increases its value during analysis. OpenTelemetry, a collaborative open-source telemetry project with the largest organizations in the observability space, can be configured to enrich logs and metrics from dozens of sources. This blog will show you the basics of using BindPlane OP to quickly deploy and configure OpenTelemetry to enrich data from a source. Getting Started with OpenTelemetry and BindPlane OP BindPlane OP is an open-source tool for managing telemetry data pipelines. If you’re already using the observIQ distribution of OpenTelemetry, but haven’t used BindPlane OP to manage your agents and sources, follow this 2-minute guide on connecting existing OpenTelemetry deployments to BindPlane OP. If starting from scratch, visit the BindPlane OP GitHub page or the BindPlane OP Documentation for easy setup instructions (~3 minute setup from start to shipping telemetry). BindPlane OP works on MacOS, Windows, and Linux. It’s vendor-agnostic so that you can use any integrated source and destination. Here, you can find an updated list of supported sources and destinations. New integrations are added frequently, so check with us in the BindPlane OP Slack if you don’t see what you’re looking for. BindPlane OP is the first telemetry pipeline built to work natively with OpenTelemetry. Using Processors to Enrich Telemetry Data Configuring OpenTelemetry agents to enrich data can be tedious, but BindPlane OP simplifies it. Once you’re up and running with BindPlane OP, with an agent installed to source and ship data, it only takes a few minutes to configure the agent to enrich data. You can navigate to the Agents tab and find the source from which you want to enrich data. 2. On the Agent page, please ensure the agent is configured with a configuration from your templates. You can add a configuration by clicking “Edit” on the top right if it isn't. To move on to configure the agent to enrich data, click the configuration name highlighted in blue. 3. On the configuration page, you can see details about your agent, including visualization of your data pipeline and data flow. To enrich data, click the source you want to enhance. 4. You’ll see a pop-up to edit the source. Click “Add processor” at the bottom 5. You will see a list of processors you can add to your source. If you want to filter data instead of or in addition to enriching, check out our blog on filtering telemetry data. You can add log record attributes or resource attributes to logs and metrics. In this example, we’ll add resource attributes. 6. Select Insert, Update, or Upsert data. Add the Key and Value of the resource attribute you want to enrich. You can click “New Row” to add multiple attributes. Then click “Save” on the bottom right. 7. Click “Save” again on the bottom right of the next window. You can always navigate back to this window by repeating steps 1-4 if you want to edit your processors in the future. 8. Your enriched data is shipping to your destinations! If you navigate back to the agent on the Agents tab and click “View Recent Telemetry,” you’ll get a snapshot of the recent data collected by the agent. BindPlane OP is the first observability pipeline built for OpenTelemetry. It makes managing your telemetry infrastructure easy with no vendor lock-in, and it’s open source and free to use for non-enterprise users. If you want to get involved or learn more about observIQ’s open-source observability efforts, please join us in our community Slack channel! We love to hear from you, work with you, and help you with your observability infrastructure.

How to monitor Oracle DB with Google Cloud Platform

Paul Stefanski — Wed, 05 Oct 2022 10:01:00 GMT

Monitor Oracle DB in the Google Cloud Platform with the Google Ops Agent. The Ops Agent is available on GitHub, making it easy to collect and ship telemetry from dozens of sources directly to your Google Cloud Platform. You can check it out here! Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations and monitor metrics and logs from Oracle DB in your Google Cloud Platform. You can check out Google’s documentation for using the Ops Agent for Oracle DB here: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/install-index What signals matter? Oracle DB is an enterprise database service often used for large deployments, so managing resources can take time and effort. Oracle Enterprise Manager is Oracle’s solution for monitoring Oracle DB. However, if you want to scan multiple environments with the same tool or avoid the cost of Oracle Enterprise Manager, then using the ops agent with Google Cloud Platform is ideal. The ops agent leverages the sqlquery receiver from OpenTelemetry with queries specific to the ops agent. The receiver collects 27 metrics, and audit and alert logs. There are a few general areas worth paying attention to: Audit Logs Audit logs are highly tuneable. When appropriately configured to your needs, they provide valuable data about the activity in your environment. Service Response Time oracle.service.response_time The average query response time – slowdowns may indicate underlying performance issues. Waits and Wait Timeouts oracle.wait.count oracle.wait.timeouts Significant increases in waits and timeouts often indicate underlying performance issues. Rollbacks oracle.user.rollbacks Unexpected rollbacks always indicate an underlying issue, often with data integrity. The Oracle DB receiver can gather all the above categories – so let’s get started. Before you begin If you don’t already have an Ops Agent with the latest Oracle DB receiver installed, you’ll need to do that first. Check out the Google Cloud Platform Ops Agent documentation for installation methods, including the one-line installer. Configuring the Oracle DB receiver for Metrics and Logs Navigate to your Ops Agent configuration file. You’ll find it in the following location: /etc/google-cloud-ops-agent/config.yaml (Linux) Edit the configuration file for Oracle DB metrics as shown below: For Audit Logs, add the following in the same yaml config file: Restart the Ops Agent with the following command: sudo service google-cloud-ops-agent restart sleep 30 You can edit the config file for more precise control over your agent behavior, but it is unnecessary. The Service ID (SID) and/or Service Name may need to be specified for your environment and the Endpoint. The SSL configuration works through Oracle Wallet rather than raw files, like most other OpenTelemetry configurations. You can find information about Oracle Wallets here. Viewing the metrics collected If you follow the steps detailed above, the following Oracle DB metrics will now be delivered to your preferred destination. List of metrics collected: observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Oracle DB, our solutions can significantly impact your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com. Join our open-source observability community Slack Channel.

How to Reduce Data Costs with OpenTelemetry and BindPlane OP

Paul Stefanski — Fri, 23 Sep 2022 18:59:27 GMT

Data costs fill a large column in many organizations’ accounting sheets. Data pipeline setup and management is a significant time sink for DevOps, IT, and SRE. Setting up telemetry pipelines to reduce unwanted data often takes even more time, which could better be spent creating value rather than reducing costs. This blog will show you how to quickly set up your data pipeline to filter unnecessary telemetry data. Getting Started with OpenTelemetry and BindPlane OP BindPlane OP is an open-source tool for managing telemetry data pipelines. If you’re already using the observIQ distribution of OpenTelemetry, but haven’t used BindPlane OP to manage your agents and sources, follow this 2-minute guide on connecting existing OpenTelemetry deployments to BindPlane OP. If starting from scratch, visit the BindPlane OP GitHub page or the BindPlane OP Documentation for easy setup instructions (~3 minute setup from start to shipping telemetry). BindPlane OP works on MacOS, Windows, and Linux. It’s vendor-agnostic so that you can use any integrated source and destination. Here, you can find an updated list of supported sources and destinations. New integrations are added frequently, so check with us in the BindPlane OP Slack if you don’t see what you’re looking for. BindPlane OP is the first telemetry pipeline built to work natively with OpenTelemetry. Using Snapshot to Sample Data You can sample logs directly from BindPlane OP without needing an analysis tool. Sampling logs with Snapshot is an excellent way to scan a source for unwanted or noisy data generation. The steps for using Snapshot are simple: Go to the Agents tab and click on the agent you want to Snapshot 2. Click “View Recent Telemetry” in the bottom left of the agent details Note: if the button is grayed out, make sure your agent is updated (an update button will appear next to the “VERSION” row in Details, and that the agent is running a configuration from your CONFIGS tab. Click “Edit” at the top right to add a config template from the CONFIGS tab. 3. Snapshot will display the last 100 log messages, the most recent batch of metrics, and traces. You can hit the refresh button on the top right to update the Snapshot. To expand a log message, click on the carrot on the left. 4. Select a log you would like to reduce or exclude. You can exclude or limit the volume of logs based on any field in the log message. With Snapshot, you can quickly identify log messages and metrics you want to reduce or filter. Once you’ve identified the necessary, redundant, or noisy data, copy the details onto a notepad or take a screenshot to inform your processor configuration. Using Processors to Filter Telemetry Data and Reduce Costs Processors enable OpenTelemetry agents to filter data and reduce data flow, which can dramatically reduce ingestion and analysis costs. The following steps will show how to add a processor to filter telemetry with the information collected from Snapshot. On the agent page to which you want to add the processor, click the Configuration name on the right. That will take you directly to the edit page for that configuration. 2. On the configuration page, click the source you want to add processors to 3. In the pop-up, click “Add processor” at the bottom 4. You will see many different processor types that can be added. Processors help enrich data and reduce it. We’re focused on filtering logs to reduce costs, so we’ll use the “Log Record Attribute Filter” processor. 5. Fill in the processor details using the drop downs and copying the information from Snapshot. The “Key” is any Attribute label that appears on the left when inspecting the log. The “Value” is anything that appears to the right of the Key when examining the log. You can add rows for as many filters as you want to run on the agent. 6. Click “Save”. Your agent will update automatically, and the designated logs will be filtered You can repeat that process using the “Metric Name Filter” processor and copy the metric name you want to exclude. Alternatively, if you're going to reduce the data flow but not filter any data completely, use the following steps: Add a “Log Sampling” processor. 2. Select the ratio of logs you want the agent to sample and click “Save.” Filtering metrics and sampling logs are excellent ways to reduce data costs while maximizing your data's value. BindPlane OP is the first data pipeline management tool that allows you to use OpenTelemetry on all sources with a smooth user interface to manage your entire data infrastructure in one place. To learn more about BindPlane OP, visit https://observiq.com/solutions/bindplane-op/ or chat directly in the BindPlane OP Community Slack.

How to Monitor Aerospike with OpenTelemetry

Nico Stewart — Tue, 06 Sep 2022 07:30:00 GMT

With observIQ’s latest contributions to OpenTelemetry, you can now easily use free, open-source tools to monitor Aerospike. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here. In this blog, the Aerospike receiver is configured to monitor metrics locally with OTLP–you can use the Aerospike receiver to ship metrics to many popular analysis tools, including Google Cloud, New Relic, and more. For Google Cloud users, the Aerospike receiver is also available through the Google Ops Agent. What signals matter? Aerospike is a distributed, fast, noSQL database technology. It uses flash storage for predictable performance and is helpful for its ability to add new nodes without downtime. Aerospike operates in memory, so it is essential to monitor memory-related metrics. Aerospike.node.memory.free This metric monitors the percentage of memory accessible to the Aerospike node. If the value gets too low, the server reaches its memory limit. If nodes frequently use high amounts of memory, operations should consider adding new nodes or increasing memory allocation per node. Aerospike.namespace.memory.free This metric monitors the percentage of memory allocated to the specific namespace that is still available. If a namespace runs out of memory or reaches its high watermark, writing to the namespace will fail. Aerospike.node.connection.count This metric indicates the number of connections opened and closed to the Aerospike node. Anomalous values could indicate client applications being unable to connect or peer nodes being unreachable or frequently crashing. All metrics above and more are shipped when you install the Aerospike receiver. Installing the Receiver Suppose you don’t already have an OpenTelemetry collector built with the latest Aerospike receiver installed. In that case, we suggest using the observIQ OpenTelemetry Collector distro, which includes the aerospike receiver (and many others). Installation is simple with our one-line installer. Come back to this blog after running the install command on your source. Configuring the Receiver Navigate to your OpenTelemetry configuration file. The Aerospike receiver is Linux-only. If you’re using the observIQ Collector, you’ll find it in one of the following locations: /opt/observiq-otel-collector/config.yaml (Linux) Edit the configuration file to include the Aerospike receiver as shown below: Add Aerospike into your Service pipeline so it looks similar to the following. Note that your processors and exporters may be different. Below are a few editable fields you can add or adjust in the config file. Viewing the metrics collected If you follow the steps detailed above, the following Aerospike metrics will now be delivered to your OTel destination. observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Vault, our solutions can significantly impact your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to monitor Vault with Google Cloud Platform

Nico Stewart — Fri, 02 Sep 2022 07:16:00 GMT

Monitor Vault in Google Cloud Platform with the Google Ops Agent. The Ops Agent is available on GitHub, which makes it easy to collect and ship telemetry from dozens of sources directly to your Google Cloud Platform. You can check it out here! Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations, and monitor metrics and logs from Vault in your Google Cloud Platform. You can check out Google’s documentation for using the Ops Agent for Vault here: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/install-index. What signals matter? Vault is a secrets store that can be distributed across multiple instances with a high level of encryption to handle data securely. Our integration collects metrics around the operations executed against the store and metrics related to token interactions. There are also audit logs related to the operation executed. vault.memory.usage This metric depicts the Vault RAM usage. Lower memory usage usually correlates to higher performance. If memory usage gets too high, interruptions, crashes, and data loss are possible. Vault.token.lease.count This metric verifies that leases are correctly distributed and no more leases are attempting access to the vault than expected. Operation counts Vault.storage.operation.get.count Vault.storage.operation.list.count Vault.storage.operation.put.count Vault.storage.operation.delete.count Operation counts are monitored to ensure that operations are completed correctly and that no unexpected operations are performed. The Vault receiver can gather all the above categories – so let’s get started. Related Content: Getting Started with BindPlane OP and Google Cloud Operations Before you begin If you don’t already have an Ops Agent installed with the latest Vault receiver, you’ll need to do that first. Check out the Google Cloud Platform Ops Agent documentation for installation methods, including the one-line installer. Configuring the Vault receiver for Metrics and Logs Navigate to your Ops Agent configuration file. You’ll find it in the following location: /etc/google-cloud-ops-agent/config.yaml (Linux) Edit the configuration file for Vault metrics as shown below: For Logging, add the following in the same yaml config file: Restart the Ops Agent with the following command: You can edit the config file for more precise control over your agent behavior, but it is not necessary. Here is a list of the most relevant editable fields that you can edit to adjust your agent: Metrics: Logs: Viewing the metrics collected If you follow the steps above, the following Vault metrics will now be delivered to your preferred destination. List of metrics collected: Prefix: workload observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Vault, our solutions can significantly impact your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to monitor Couchbase with Google Cloud Ops

Keith Schmitt — Tue, 30 Aug 2022 10:00:00 GMT

You can now easily monitor Couchbase metrics and logs in Google Cloud. Our logging and monitoring of Google Cloud contributions are available through the Google Ops Agent GitHub repository. You can check it out here! The Google Ops Agent uses the built-in Prometheus exporter and receiver to monitor Couchbase sources running Couchbase 7.0. You can find documentation on the Prometheus exporter in the Couchbase documentation. Information on the Prometheus receiver is available in the observIQ OpenTelemetry distribution on GitHub. What signals matter? Couchbase is a distributed noSQL database. It’s easy to distribute across multiple systems and scales. Signals that are often monitored on Couchbase include evictions, errors, and memory usage, as well as access logs: Spiking Evictions If ‘bucket.item.ejection.count’ spikes, it could show unexpected memory pressure. Unrecoverable OOM Errors If ‘bucket.error.oom.count’ signals unrecoverable errors, it indicates that the couchbase server is running out of memory and is unrecoverable. High Memory Usage If the ‘bucket.memory.usage’ bytes are higher than anticipated, it could show that the bucket needs to be allocated more memory. Couchbase HTTP Access Logs These access logs indicate what kind of traffic the couchbase is undergoing via its REST API. It could be an indication of bad requests or if things are operating normally. All of the above categories can be gathered with the Couchbase receiver – so let’s get started. Related Content: Five Things to Know About Google Cloud Operations Suite and BindPlane Before you begin If you’re already a Google Cloud user, You can set up your Google Cloud workspace to receive metrics and logs by following Google’s Ops Agent documentation. You can find installation and setup instructions here. Configuring the Couchbase receiver Navigate to your Ops Agent configuration file. /etc/google-cloud-ops-agent/config.yaml(Linux) C:\Program Files\Google\Cloud Operations\Ops Agent\config\config.yaml (Windows) Edit the configuration file to include the Couchbase receiver as shown below: In the same yaml config file, add Couchbase to your service pipeline so it looks similar to the following. Note that your processors and exporters will likely differ, and you must insert your admin username and password. You can edit the config file further to include specific labels under the “processors” field, but the Google Ops Agent already has default labels for the data collected. Relevant Editable Fields in Config: Viewing the telemetry collected If you follow the steps above, the following Couchbase metrics will now be delivered to your preferred destination. List of Metrics Collected: Related Content: How to monitor Oracle DB with Google Cloud Platform observIQ’s distribution of the OpenTelemetry collector is a game-changer for companies looking to implement OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

Creating Homebrew Formulas with GoReleaser

Corbin Phelps — Sun, 28 Aug 2022 10:00:00 GMT

Starting Out We chose to use GoReleaser with our distro of the OpenTelemetry Collector to simplify how we build and support many operating systems and architectures. It allows us to build a matrix of GOOS and GOARCH targets and automate the creation of a wide range of deliverables. The ones we have utilized are building tarballs, nfpm packages, docker images, and the Homebrew formula. Our goal is to make it easy for users to install our software on macOS so that they can easily try it out. We went with Homebrew as it’s familiar to many macOS users and would allow the user to try out our software and then remove it just as quickly when they were finished. As we started setting up Homebrew in GoReleaser, we found that documentation about creating a Homebrew formula was lacking. Also, it wasn’t easy to search for solutions when we encountered a problem. Homebrew provides a Formula Cookbook but it can be confusing if you aren’t already familiar with building formulas. We went through several iterations of our Homebrew formula as we learned more and more about the correct way to do things. First, we created a public repo to be our Homebrew formula. We would specify this as the place where GoReleaser would send formula updates. We created https://github.com/observIQ/homebrew-observiq-otel-collector. As we started setting up GoReleaser, we initially used the caveats, install, and plist blocks of the brews section in GoReleaser to create our formula. Caveats Block The caveats block can relay textual information to the user after a homebrew installation. We use it to Give info on how to start/stop/restart the homebrew service that is installed How to properly uninstall the entire formula And give info on where specific configuration files live Install Block Inside the install block, you can use the same brew shortcuts for installation used in the formula file. This ultimately will copy these same lines to the ruby formula file’s install block. For example, we use prefix.install to copy files and directories to homebrew’s versioned formula directory prefix.install_symlink to create symlinks in homebrew’s versioned formula directory etc.install to copy files and directories to homebrew’s shared etc directory bin.install to copy binary executable files to homebrew’s versioned formula’s “bin” directory lib.install to copy library files to homebrew’s versioned formula’s “lib” directory Related Content: Embed React in Golang Service Blocks The plist block was where we defined a plist file to allow our software to be run as a launched service. The service block wasn’t supported in GoReleaser when we started; once it was, we shifted to using that as it was easier to define for us and allowed a more brew-native way of managing our service. Our original plist block looked like the XML below: Once we saw GoReleaser supported the service block, we were able to simplify it to the following: We originally had some trouble creating the service as some “magic” words correspond to special directories in a brew installation. The cookbook documentation used these magic words in examples but did not list them as it does in the install section. We had to search the brew source code for a list of the support “magic” words. Here are a few of the common ones we used: Initial Config Here is the brews block we initially generated that created a working formula for us. Versioning Brew Formulas One issue we eventually stumbled upon was versioning our software releases with Homebrew. We found after every release GoReleaser would update the Formula repo by overwriting the previous formula. A user could easily update the formula and run brew upgrade to get the latest version. The issue we ran into was, what if you wanted a specific version of the Collector with a specific brew formula? You would have to know which commit in the Formula corresponds to the release you wish to. It is not very user-friendly. This also made it hard for us to test pre-releases as we wanted GoReleaser to generate formulas for release candidates but not to overwrite the production one. It wasn’t easy to find out how to version Homebrew Formulas. We looked at the homebrew-core repo for examples of how other formulas do it. There are a few unique things to do when versioning a formula. The formula name needs to be of the format formula-name@major.minor.patch.rb. The added @major.minor.patch lets Homebrew know which formula to get when specified in the brew command. Inside the formula, the class name must have special formatting, too. It must be of the format FormulaNameAT{Major}{Minor}{Patch}. So an example filename and corresponding class name for our Collector is observiq-otel-collector@0.6.0.rb and ObserviqOtelCollectorAT060 respectively. That formula file will exist in the Formula directory of your repo next to the current main formula, the formula you get if you just run brew install X. You can also add a version to the main formula so users can get it by version or by the basic brew command. To do this, create an Aliases directory on the same level as your Formula directory. Inside that directory, create a symlink to the main formula with a versioned name. If that’s confusing, here’s the command we run to create the symlink: Now that we know how to create a versioned formula, we need to update our GoReleaser config to generate versioned formulas for us. This should be simple since the formula and class names are taken from the name field under the brews block. We changed our name to observiq-otel-collector@{{ .Major }}.{{ .Minor }}.{{ .Patch }}. When we ran a test release with GoReleaser, though, we saw the class for the formula wasn’t quite right. GoReleaser was generating the class name as ObserviqOtelCollectorAT0_6_0. One quick pull request to GoReleaser, and we’ve fixed that. Here’s what our brews block of our GoReleaser config now looks like to support versions. We also made some changes to the configuration of the Collector, so there are additional flags and files in the install and service blocks. Persisting Configuration Files Initially, in our install block of the GoReleaser config, we used the prefix.install to place our configuration file in the main install directory of our formula. After reinstalling this formula, we found that our configuration file would be replaced with fresh defaults, and any user changes would be lost. This wasn’t ideal, so we had to figure out how to ensure this file persisted between installations. Ultimately, the solution was to make use of Homebrew’s etc directory. This is a shared directory amongst all formulas, so we had to make an extra effort so that our configuration file would be uniquely named. Now our GoReleaser install block looks something like this: The problem was almost solved, but we still preferred to have this configuration file “exist” in the base formula install directory. We also preferred to have this configuration file have its original name without the “observiq_” prefix. Luckily, using a symlink was a simple solution. Our final install block related to the configuration looked similar to this: With this solution, our configuration lived safely in Homebrew’s etc directory with a special prefix where it would never be automatically overwritten. At the same time, it would appear to also exist in the base installation directory without any naming prefix. There is one more thing to note here. When there is a new installation on top of our formula, homebrew automatically adds a new version of our configuration file to its etc directory. In our case, the file is named something like observiq_config.yaml.default. This will contain a clean config with default settings. This is a built-in behavior by Homebrew, and we haven’t found any way to change this. Conclusion GoReleaser provides a great way to distribute your Go program via Homebrew. It allows you to focus on the installation part of your application while taking care of all the formatting and setup of your formula file. Hopefully, we’ve given some good insight into the pitfalls we encountered when simplifying GoReleaser and Homebrew.

Configuring an OpenTelemetry Collector to Connect to BindPlane OP

Ryan Goins — Thu, 25 Aug 2022 10:00:00 GMT

Bindplane OP is the first open-source, vendor-agnostic, agent, and pipeline management tool. It makes it easy to deploy, configure, and manage agents on thousands of sources and ship metrics, logs, and traces to any destination. This blog shows you how to configure an existing OpenTelemetry Collector from any source to connect to Bindplane OP without needing to remove or reinstall the collector. BindPlane OP makes installing and managing the observIQ Distro of the OpenTelemetry Collector easy, automatically configuring it to connect to the BindPlane OP server instantly. This is done on the Agents page using the one-line installer script below. If you’re installing new agents, you should use this script, as it’s the easiest way to get going. However, you may already have a collector installed and want to configure it to connect to BindPlane OP. There are two ways to do this. If you’re okay with updating your collector, you can run the one-line installer from BindPlane OP on top of your existing installation. This will update your collector and connect it to BindPlane OP. However, if you need to maintain a specific version of the collector, you can follow the steps below to connect it manually. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP Here’s what you’ll need to do: Navigate to your collector home Linux: cd /opt/observiq-otel-collector Windows: “C:\Program Files\observIQ OpenTelemetry Collector” 2. Create a new file called “manager.yaml” and update it to include the following three values: The endpoint and secret_key can be found by looking at the one-line installer on the agent installation page, as shown below. You can generate a UUID for agent_id on linux using the command “uuidgen”. You can also see the server details in your BindPlane OP profile. The following command will show the default. Your finished manager.yaml should look like this: Save the manager.yaml and restart the collector. You will now see the collector connected in BindPlane OP. For more game-changing open-source observability tech tips, tools, and knowledge, follow observIQ on Twitter, LinkedIn, and stay tuned to the observIq observability blog.

How to Monitor Solr with OpenTelemetry

Deepa Ramachandra — Mon, 22 Aug 2022 10:00:00 GMT

Monitoring Solr is critical because it handles the search and analysis of data in your application. Simplifying this monitoring is necessary to gain complete visibility into Solr’s availability and ensure it performs as expected. We’ll show you how to do this using the jmxreceiver for the OpenTelemetry collector. You can utilize this receiver in conjunction with any OTel collector, including the OpenTelemetry Collector and observIQ’s collector distribution. What signals matter? Monitoring Solr includes scraping JVM metrics, such as memory utilization and JVM threads, and the metrics exposed exclusively by Solr, such as request counts and Caching-related metrics. The JMX receiver scrapes all the metrics necessary to gather the following critical inferences: Understanding request handling using request rates Solr nodes and clusters handle requests that are sent to it. Tracking the volume of received and handled requests helps fine-tune the performance and eliminate any bottlenecks. A dashboard for Solr can point to sudden dips or rises in requests received. Monitoring caching capabilities Caching is another key feature to monitor mainly because of Solr’s architecture. The caching feature facilitates easy access to cached data without buying into disk utilization. Usually, Caching leads to memory and disk expenses, which may bite into performance. Keeping all caching operations monitored ensures memory health and CPU utilization are optimized. Small caches lead to reduced hit rates, resulting in reduced node performance. At the same time, big caches deteriorate the JVM heaping performance and decrease the node performance. Request Latency The pace at which the requests are handled is another critical factor to monitor closely. The request latency clearly indicates how the queries and requests are handled. In an architecture where the search handlers are assigned specific search categories, tracking the latency across these handlers can give the difference in request latency between each handler and each data type. Also, comparing the request latency and request rates helps quickly identify issues with request handling. Configuring the JMX receiver to gather Solr metrics You can use the following configuration to gather metrics using the JMX receiver and forward the metrics to the destination of your choice. OpenTelemetry supports over a dozen destinations to which you can forward the collected metrics. More information is available about exporters in OpenTelemetry’s rehttps://github.com/open-telemetry. In this sample, the configuration for the JMX receiver is covered. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP Receiver configuration: Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration. Set up the endpoint attribute as the system running the Solr instance. Specify the jar_path for the JMX receiver. We are using the JMX receiver to gather Solr metrics. The jar_path attribute lets you specify the path to the jar file that facilitates gathering Solr metrics using the JMX receiver. This file path is created automatically when the observIQ OpenTelemetry collector is installed. Set the target_system attribute to Solr. When we connect to the JMX receiver, there are different categories of metrics; the Solr metrics and JVM metrics are the ones that this configuration intends to scrape. This attribute specifies that. Use resource_attributes to set the local host port number. Processor configuration: The resourcedetection processor creates a unique identity for each metric host so that you can filter between the various hosts to view the metrics specific to that host. The system attribute gathers the host information. The batch processor is used to batch all the metrics together during collection. Exporter Configuration: The metrics are exported to New Relic using the OTLP exporter in this example. If you want to forward your metrics to a different destination, you can check the destinations OpenTelemetry supports here. Setup the pipeline: Viewing the metrics All the metrics the JMX receiver scrapes are listed below. Alerting Now that you have the metrics gathered and exported to the destination of your choice, you can explore how to configure alerts for these metrics effectively. Here are some alerting possibilities for Solr: Alerts to notify that the Solr server is down Alerts based on threshold values for request rate, cache size, timeout count, cache hit count Alerts for anomaly scenarios where the values of specific metrics deviate from the baseline Set up resampling to avoid reacting to false alarms Notifying the on-call support team about any critical alerts Related Content: How to Install and Configure an OpenTelemetry Collector observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

Availability of Open Source Observability Solution BindPlane OP

observIQ Media — Thu, 11 Aug 2022 18:44:03 GMT

Manages thousands of agents in one control panel, with enhanced filters to reduce data for greater control, lower costs GRAND RAPIDS, Mich., Aug. 11, 2022 /PRNewswire/ — Open source telemetry innovator observIQ announces the general availability of BindPlane OP (observability pipeline), the first open source observability pipeline built specifically for OpenTelemetry. Enterprise-ready BindPlane OP provides the ability to control observability costs and simplify the management of telemetry agents at scale while avoiding vendor lock-in. BindPlane OP addresses the growing challenge of increased data overwhelming the ability to meaningfully monitor the pipeline. It filters data, reducing the volume, for greater manageability and lowering the cost of unused data analytics. It also provides a single control plane for managing thousands of agents, with the ability to quickly deploy new agents, manage their configurations, and monitor their health in real-time. A critical piece of an observability pipeline is the ability to augment telemetry by filtering, sampling, and enriching data easily. BindPlane OP supports OpenTelemetry processors and metric toggling, unlocking full control of telemetry data. BindPlane OP removes the need for proprietary agents with native support for OpenTelemetry. It offers over 50 total integrations – including: Windows; Linux; VMware vCenter and ESXI; MongoDB, Kafka; SQL Server; Microsoft IIS; Apache HTTP; and Zookeeper – and supports sending data to the most popular observability platforms. The OpenTelemetry collector can be deployed to all hosts and start gathering metrics, logs, and traces immediately. BindPlane OP can also be deployed behind a firewall, without a connection to observIQ. It works with OpenTelemetry using the new OpAMP protocol for agent management, with continued expanded support to other OSS agents. Quote from Mike Kelly, CEO of observIQ “One of our guiding principles with BindPlane OP is to make it simple for users to instrument once and send anywhere. Enterprises are looking for a way to manage the flood of observability data available and need a vendor-neutral solution supporting a multitude of observability tools. BindPlane OP solves these challenges.” To learn more Download the GA and Join our Slack community. About observIQ observIQ develops fast, powerful and intuitive next-generation observability technologies for DevOps and ITOps – built by engineers for engineers. Learn more at www.observiq.com.

How to Monitor SAP Hana with OpenTelemetry

Paul Stefanski — Wed, 10 Aug 2022 19:29:23 GMT

SAP Hana monitoring support is now available in the open-source OpenTelemetry collector. You can check out the OpenTelemetry repo here! You can utilize this receiver in conjunction with any OTel collector, including the OpenTelemetry Collector and observIQ’s collector distribution. Below are quick instructions for setting up observIQ’s OpenTelemetry distribution and shipping SAP Hana telemetry to a popular backend: Google Cloud Ops. You can find out more on observIQ’s GitHub page: https://github.com/observIQ/observiq-otel-collector What signals matter? SAP Hana is a column-oriented relational database management system. It functions in memory, so memory metrics are often necessary. Some specific metrics that users find valuable are: Memory Memory metrics offer information on current memory usage, minimum and maximum usage, and how memory is used between processes. Backups The age of the latest backup. It is important to monitor this in case of an error or crash. Replication Backlog Monitor the size of the replication backlog in the cluster. All of the above categories can be gathered with the SAP Hana receiver – so let’s get started. Related Content: How to Install and Configure an OpenTelemetry Collector Before you begin If you don’t already have an OpenTelemetry collector built with the latest SAP Hana receiver installed, you’ll need to do that first. We suggest using the observIQ OpenTelemetry Collector distro, which includes the SAP Hana receiver (and many others) and is simple to install with our one-line installer. Configuring the SAP Hana receiver Navigate to your OpenTelemetry configuration file. If you’re using the observIQ Collector, you’ll find it in one of the following locations: /opt/observiq-otel-collector/config.yaml (Linux) C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) For the observIQ OpenTelemetry Collector, edit the configuration file to include the SAP Hana receiver as shown below: Set up a destination for exporting the metrics, as shown below. You can check the configuration for your preferred destination from OpenTelemetry’s documentation here. Set up the pipeline: You can find the relevant config file here if you’re using the Google Ops Agent instead. Viewing the metrics collected Following the steps detailed above, the following SAP Hana metrics will now be delivered to your preferred destination. observIQ’s distribution of the OpenTelemetry collector is a game-changer for companies looking to implement OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

BindPlane OP Reaches GA

Ryan Goins — Wed, 10 Aug 2022 16:35:39 GMT

Today we’re excited to announce BindPlane OP – the first observability pipeline built for OpenTelemetry – is out of beta and now generally available. You can download the latest version here. Two months ago we released BindPlane OP in beta, and while we were confident we had something special, the response surpassed all of our expectations. We’ve had amazing conversations with many of you as you’ve started experimenting with the beta, already providing us with invaluable feedback. In fact you’ve even started submitting PRs and educating others on the power of BindPlane OP. We also started a Slack community where we hope to curate a community that has the shared vision of using and building open source observability. Thank you for joining us in this endeavor. It’s been great to see so much early participation and excitement for what we’re doing. What’s new in GA? Quite a bit! Certainly take a look at the full changelog, but here are the highlights: Enhanced Flow Control Perhaps the most critical piece of an observability pipeline is having the ability to augment your telemetry. Filtering, sampling, and enriching your data should all be at your fingertips. With this release, we’re introducing support for both metric toggling and OpenTelemetry processors, unlocking full control of your telemetry data. Metric toggles give you the power to enable and disable the individual metrics being sent to your observability platform, increasing the signal in your data and providing an additional lever for controlling costs. Processors are inserted between a source and destination and make it possible to manage the flow of your data. We’re starting with 9 processors covering a wide range of use cases; these fall into a few categories: Filter Processor – When drilling down into an issue in your environment, refining and sculpting the data in your pipeline is critical. With the Filter processor, you can include or exclude metrics, logs, and traces based on a keyword or regular expression – keeping the data that matters most. Log Enrichment Processor – Do you need to enrich your logs with additional metadata? These processors let you define new key/value pairs to insert, update, upsert, or delete from your logs Log Sampling Processor – The thing about logs is that there are a lot of them, and you rarely need them all. This processor let’s you set a sample rate so you can get the monitoring you need while reducing costs Raw Processor – Is there an OpenTelemetry processor you want to use but we haven’t built into the UI yet? This will let you add the processor YAML right into your config New Sources & Destinations We’ve significantly expanded the number of integrations. We now have over 50 total sources, up from the 20 that were available at beta. This includes support for many that have been heavily requested, such as: Aerospike VMware vCenter and ESXI Apache Wildfly Kafka SQL Server Microsoft IIS Apache HTTP Zookeeper Cassandra One of our guiding principles with BindPlane OP is to make it simple for you to “Instrument once. Send anywhere.” Supporting as many destinations as possible is what will make that reality. With GA we now support the following 10 destinations: Elasticsearch Google Cloud Jaeger NEW KafkaNEW Logz.ioNEW New Relic OpenTelemetry (OTLP) Prometheus SplunkNEW ZipkinNEW Do you have additional sources or destinations you’d like to see added? Let us know in Slack or open a PR. Agent Update Agent update, one of the most critical aspects of agent management, is now available in BindPlane OP. Beginning with version 1.6 of observIQ’s distribution of the OpenTelemetry Collector you can bulk update your agents with just a single click. This release of BindPlane OP is a crucial milestone as we work toward building open source and vendor neutral products for the collection, processing, and transport of telemetry data. We’d love for you to help us shape the future of open source telemetry by: Downloading the latest build of BindPlane OP Joining the conversation on Slack And, of course, contribute to OpenTelemetry and BindPlane OP

observIQ CEO Mike Kelly Talks Telemetry With TFiR

observIQ Media — Wed, 03 Aug 2022 15:36:12 GMT

observIQ CEO Mike Kelly is a telemetry industry veteran, and we’re excited to see him sharing his knowledge on the TFiR show. In this episode, Mike breaks down the basics of Telemetry, the tech, the state of the industry, and where he thinks it’s headed. He also discusses some of observIQ’s new technologies and open-source initiatives.

How to Monitor Apache Flink with OpenTelemetry

Jonathan Wamsley — Fri, 29 Jul 2022 20:08:02 GMT

Apache Flink monitoring support is now available in the open-source OpenTelemetry collector. You can check out the OpenTelemetry repo here! You can utilize this receiver in conjunction with any OTel collector, including the OpenTelemetry Collector and observIQ’s collector distribution. Below are quick instructions for setting up observIQ’s OpenTelemetry distribution and shipping Apache Flink telemetry to a popular backend: Google Cloud Ops. You can find out more on observIQ’s GitHub page: https://github.com/observIQ/observiq-otel-collector What signals matter? Apache Flink is an open-source, unified batch processing and stream processing framework. The Apache Flink collector records 29 unique metrics, so there is a lot of data to pay attention to. Some specific metrics that users find valuable are: Uptime and restarts Two different metrics record the duration a job has continued uninterrupted and the number of full restarts a job has committed, respectively. Checkpoints Several metrics monitoring checkpoints can tell you the number of active checkpoints, the number of completed and failed checkpoints, and the duration of ongoing and past checkpoints. Memory Usage Memory-related metrics are often relevant to monitor. The Apache Flink collector ships metrics that can tell you about total memory usage, both present and over time, mins and maxes, and how the memory is divided between different processes. The Apache Flink receiver can gather all the above categories – so let’s get started. Before you begin If you don’t already have an OpenTelemetry collector built with the latest Apache Flink receiver installed, you’ll need to do that first. We suggest using the observIQ OpenTelemetry Collector distro that includes the Apache Flink receiver (and many others), and is simple to install with our one-line installer. Configuring the Apache Flink receiver Navigate to your OpenTelemetry configuration file. If you’re using the observIQ Collector, you’ll find it in one of the following locations: /opt/observiq-otel-collector/config.yaml (Linux) For the observIQ OpenTelemetry Collector, edit the configuration file to include the Apache Flink receiver as shown below: You can find the relevant config file here if you’re using the Google Ops Agent instead. Viewing the metrics collected The Apache Flink metrics will now be delivered to your desired destination following the steps detailed above. observIQ’s distribution of the OpenTelemetry collector is a game-changer for companies looking to implement OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

BindPlane OP Build Process Using Goreleaser

Joe Sirianni — Thu, 28 Jul 2022 13:07:53 GMT

Intro BindPlane OP is written in Go. It is a single http webserver that serves REST, Websocket, and Graphql clients. It includes embedded react applications for serving the user interface. Go provides us with the ability to produce a single binary program that has no external dependencies. The binary is not dynamically linked to external libraries, making it easy to build, deploy, and run on any platform supported by the Go compiler. BindPlane OP officially supports Linux, Windows, and macOS. Unofficially, BindPlane OP can be built and run on any platform supported by Go. Generally, the build process of a Go program is straightforward. Run “go build” to quickly build your application. BindPlane OP’s build process is more complex. Because of this, we leverage a tool called Goreleaser for our build platform. Why Goreleaser? Why is BindPlane OP’s build process so complex? Technically, it is not. Nothing stops developers from running `go build` within the `cmd/bindplane` directory. This will output a perfectly functional `bindplane` binary, assuming they have already built the `ui` portion of the application. We leverage Goreleaser to streamline this build process. We have several requirements: Multiple binaries: BindPlane consists of the `bindplane` and `bindplanectl` commands. Goreleaser can build both of these. Pre-build commands: Goreleaser can run `make` targets that build the embedded react application (BindPlane OP’s web interface). Packages: Goreleaser lets us quickly build Debian (deb) and Red Hat (rpm) packages. macOS (Homebrew) packages. Container images: Gorelease can build and tag images for multiple CPU architectures. Each BindPlane release contains tags for major, major.minor, and major.minor.patch tags. BindPlane OP’s container images can multi-architecture (amd64 / arm64). Generated release notes: Gorelease can create release notes based on pull request names. Automatic release publishing: Goreleaser can create a Github release when a new tag is pushed. This release will contain the generated release notes and archives containing both commands and Linux packages. Without a tool like Goreleaser, we would need to maintain complicated `make` targets and shell scripts. Goreleaser Usage BindPlane OP’s Goreleaser configuration can be found here. Goreleaser has many features; some of the features I find notable will be detailed below. Before hooks Build Linux Packages Container Images Homebrew Before Hooks At the top of the configuration, you will find `before.hooks`. This section allows Goreleaser to run arbitrary commands to ensure the build environment is prepared for the actual build. Goreleaser will run `make ci` and `make ui-build` in this case. These commands will generate the Node JS web interface, which will then be bundled into the binary and served by the Go web server. You can learn more about this process here. Build The builds section is for building binaries. BindPlane OP has two binaries: `bindplane` (the server component), and `bindplanectl`, the command line interface. The build’s section allows us to specify a GOOS and GOARCH matrix and exclusion rules. For example, we want to build for Linux, Windows, Darwin, ARM, ARM64, and AMD64. We exclude ARM on Windows builds. Additionally, we can specify an environment and compiler flags. Linux Packages The `nfpms` section allows us to build Linux packages. Nfpm supports Debian (deb), Red Hat (rpm), and Alpine (apk) packages. Bindplanectl is straightforward to package. It includes only the compiled binary, and places it in `/usr/local/bin`. Bindplane’s binary is much more complicated, including pre and post-install scripts, directory creation, default configuration, and a system service file. The pre-install script handles setting up a system user, while the post-installation script handles setting up the system service. Packages are a great way to distribute a service intended to run on Linux. Container Images Goreleaser has excellent support for building container images that support multiple architectures. This is done by building independent images “observiq/bindplane-amd64” and “observiq/bindplane-arm64”. These images contain the correct binary for their architecture. Next, Goreleaser uses the `docker_manifest` to define a multi-architecture container manifest containing AMD64 and ARM64 images. When your container runtime pulls the image “observiq/bindplane”, it will detect the correct underlying image for your cpu architecture. The configuration for building BindPlane OP’s container images looks like this: Note that the tagging strategy involves tagging `major`, `major.minor`, and `major.minor.patch`. This allows users to pin to a given major or minor release without relying on something like “latest”. Users wishing to pin to a given release can use the `major.minor.patch` tag to prevent automatic updates. Homebrew Goreleaser supports generating Homebrew configurations. The configuration is straightforward: point Goreleaser to a repository and let it generate the correct ruby code. Challenges Some excellent features are hidden behind a paywall (Goreleaser Pro). In my experience, the Goreleaser developers have been responsive to issues and feature requests. Supporting their efforts with a Github sponsorship or paying for Goreleaser Pro is reasonable. Homebrew support is great, but you must take extra steps if you have a private homebrew repo. Homebrew removed the built-in ability to download from a private repository. This is not a Goreleaser issue, but it does complicate things. Building within CI is time consuming. You'll need to perform a full build to test packages/container images within your CI pipeline. This is correct as of 7/12/2022. Conclusion Building a Go application is generally a simple process. Goreleaser brings value in allowing you to standardize your build across many applications. At observIQ, we use Goreleaser for many public and private repositories. Goreleaser makes it easy to generate Linux packages and container images. The configuration is so simple that we have a hard time justifying not building these extra artifacts. Useful links Goreleaser github: https://github.com/goreleaser/goreleaser Goreleaser docs: https://goreleaser.com/ Goreleaser pro: https://goreleaser.com/pro/

How to Monitor Jetty using OpenTelemetry

Miguel Rodriguez — Tue, 26 Jul 2022 18:02:32 GMT

You can now monitor Jetty for free using top-of-the-line open-source monitoring tools in OpenTelemetry. If you are as excited as we are, look at the details of this support in OpenTelemetry’s repo. The best part is that this receiver works with any OpenTelemetry collector, including the OpenTelemetry Collector and observIQ’s collector distribution. Jetty uses the JMX receiver. In this post, we take you through the steps to set up the JMX receiver with the observIQ OpenTelemetry collector, configure it for Jetty, and send the metrics to Google Cloud. What signals matter? Jetty only produces seven metrics–you can see a complete list below near the end of the blog. You likely want to pay the closest attention to these three metrics: the jetty.select.count, jetty.session.count, and jetty.session.time.total. Select Count Monitors the number of select calls to the server. Session Count Monitors the number of sessions created in the server. Session Time Monitors the total amount of time sessions are active. Related Content: How to Install and Configure an OpenTelemetry Collector Configuring the JMX metrics receiver After the installation, the config file for the collector can be found at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) The first step is building the receiver’s configuration: We are using the JMX receiver to gather jetty metrics. The jar_path attribute lets you specify the path to the jar file that facilitates gathering jetty metrics using the JMX receiver. This file path is created automatically when the observIQ OpenTelemetry collector is installed. Set the IP address and port for the system from which the metrics are gathered as the endpoint. When we connect to JMX, there are different categories of metrics; the jetty metrics and JVM metrics are the ones that this configuration intends to scrape. This target_system attribute specifies that. Set the time interval for fetching the metrics for the collection_interval attribute. The default value for this parameter is 10s. However, if metrics are exported to Google Cloud, this value is set to 60s by default. The Properties attribute allows you to set arbitrary attributes. For instance, if you are configuring multiple JMX receivers to collect metrics from many jetty servers, this attribute enables you to set the unique IP addresses for each endpoint system. So that you know, this is not the only use of the properties option. Set up a destination for exporting the metrics, as shown below. You can check the configuration for your preferred destination from OpenTelemetry’s documentation here. Set up the pipeline. Viewing the metrics collected Based on the config detailed above, the JMX metrics gatherer scrapes the following metrics and exports them to the destination. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

Serverless Monitoring In The Cloud With The observIQ Distro for OpenTelemetry

Dylan Myers — Mon, 25 Jul 2022 21:09:19 GMT

Part 1: Google Cloud Run In this part 1 of a blog series on serverless monitoring, we will learn how to run the observIQ Distro For OpenTelemetry Collector, referred to as “oil-otel-collector”, in Google Cloud Run. For many reasons, someone may want to run monitoring in a serverless state. In our example, we will monitor MongoDB Atlas, a cloud-hosted version of MongoDB. Environment Prerequisites: MongoDB Atlas Target This target is already set up correctly with the API Access Keys Access to Google Cloud Run Access to Google Cloud Secret Manager Secrets Created for config, public key, and private key oiq-otel-collector configuration with the MongoDB Atlas receiver Container Images in the GCR repository Resources https://github.com/observIQ/observiq-otel-collector/blob/main/docs/google-cloud-run.md https://github.com/observIQ/observiq-otel-collector/tree/main/config/google_cloud_exporter/mongodb_atlas Setting Up Prerequisites The first task on our agenda is to get our container image transferred from Docker Hub to the Google Container Repository (GCR). To do this, we need a system with docker installed. Additionally, we need to have the project already created in Google Cloud. For this blog, we’ve created a temporary project called dm-cloudrun-blog. Now that we’re ready with docker and our Google Cloud project, we can run the following commands to import the image into gcr: Our second prerequisite task is to set up our secrets. For this piece of the puzzle, I go to the Google Cloud Secret Manager and create three secrets: mongo-atlas-priv-key, mongo-atlas-pub-key, mongo-otel-config. The values of the secrets are the ones set up on the MongoDB Atlas site for the 2 keys and the configuration we’ve written for the config secret. This is the configuration we’re using today: Creation of the Cloud Run Deployment Now that we have the prerequisites out of the way, we can focus on creating our deployment. We click “Create Service” in the Google Cloud Console under Cloud Run. On the next page, we will need to fill in several values. Under the initial display, fill in the following: Container Image URL (we created above): gcr.io/dm-cloudrun-blog/observiq-otel-collector:1.4.0 Service Name: I’m using oiq-otel-mongo Check “cpu always allocated” Container port: 8888 (collector’s metrics port) Set autoscaling min 1 and max 1 Ingress: Allow internal traffic only Authentication: Require authentication Now, we need to expand the Container, Variables & Secrets, Connections, and Security section by clicking the dropdown arrow to the right of that heading. Once it grows, we can access the VARIABLES & SECRETS tab. Click the REFERENCE A SECRET link. Using the Secret dropdown, select mongo-atlas-priv-key, and change the Reference Method to Exposed as the environment variable. Finally, the Name should be set to MONGODB_ATLAS_PRIVATE_KEY. Repeat this process for mongo-atlas-pub-key with MONGODB_ATLAS_PUBLIC_KEY. One more time, we click the REFERENCE A SECRET link. This time, we set the secret to mongo-otel-config and the Reference Method to Mounted as volume. For the Mount path, we put it to etc/otel/config.yaml. The preceding slash is a permanent part of the input textbox, so do not provide it. This will insert the config file into the appropriate place inside the container’s file system. We are now finished with container parameters. All other parameters can be left at the default, and we can click the blue CREATE button at the bottom of the page. Reviewing the Container Now that our image is deployed, we can click on it in the list of Cloud Run services. Doing so brings us to a dashboard of metrics for the container. We can choose from other tabs, such as logs, revisions, and triggers. The metrics here can tell us if our container needs to be edited to have more CPU and/or Memory. The logs will display the logs from inside the container, where we can see what is happening with the collector and rectify any issues it has by editing the configuration file secret. Conclusion At some point, most tech teams will need to monitor a serverless computing resource. Running an instance of a telemetry collector inside another serverless computing platform can often be an inexpensive and effective way to address this need. I look forward to the next installment of this three-part series: AWS Elastic Container Service. I'll be sure to repeat what we achieved with Google Cloud Run over in AWS in that installment. I look forward to seeing you there.

How to Monitor Hadoop with OpenTelemetry

Deepa Ramachandra — Mon, 18 Jul 2022 06:01:00 GMT

We are back with a simplified configuration for another critical open-source component, Hadoop. Monitoring Hadoop applications helps to ensure that the data sets are distributed as expected across the cluster. Although Hadoop is considered to be very resilient to network mishaps, monitoring Hadoop clusters is inevitable. Hadoop is monitored using the JMX receiver. The configuration detailed in this post uses observIQ’s distribution of the OpenTelemetry collector. We are simplifying the use of OpenTelemetry for all users. If you are as excited as we are, take a look at the details of this support in our repo. The JMX receiver used in this configuration works with any Open Telemetry collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. What signals matter? Monitoring performance metrics for Hadoop is necessary to ensure that all the jobs are running as expected and the clusters humming. The following categories of metrics are monitored using this configuration: HDFS Metrics: It is critical to monitor the Apache Hadoop Distributed File System(HDFS) to ensure the disc space availability and data storage optimization and to track the capacity of the file system. There are two types of HDFS metrics, namely, NameNode and DataNode. In the HDFS architecture, there is a single NameNode with multiple Datanodes. Metrics related to the NameNode are the most important metrics to monitor; any failure in the NameNode renders the data in that cluster inaccessible. The most critical metrics to scrape are: Use the metrics to gauge the overall performance of the Hadoop system. Keep track of anomalies in data directory growth and optimize data storage across the entire Hadoop system. All metrics related to the categories above can be gathered with the JMX receiver – so let’s get started! Related Content: How to monitor Solr with OpenTelemetry Configuring the jmxreceiver After the installation, the config file for the collector can be found at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) Receiver Configuration: Specify the jar_path for the JMX receiver. We are using the JMX receiver to gather Hadoop metrics. The jar_path attribute lets you specify the path to the jar file, which facilitates gathering Hadoop metrics. This file path is created automatically when observIQ’s distribution of the OpenTelemetry Collector is installed. Setup the endpoint attribute as the system that is running the Hadoop instance Set the target_system attribute to Hadoop and JVM. When we connect to the JMX receiver, there are different metrics categories; the Hadoop and JVM metrics are the ones that this configuration intends to scrape. This attribute specifies that. Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration. Related Content: Configuration Management in BindPlane OP Use resource_attributes to set the local host port number. The properties option allows you to set arbitrary attributes. For instance, if you are configuring multiple JMX receivers to collect metrics from many Hadoop servers, this attribute enables you to set the unique IP addresses for each endpoint system. So that you know, this is not the only use of the properties option. Processors: The resource detection processor is used to create a distinction between metrics received from multiple Hadoop systems. This helps filter metrics from specific Redis hosts in the monitoring tool, such as Google Cloud operations. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. If you would like to learn more about this processor, please check the documentation. Exporters: In this example, the metrics are exported to Google Cloud Operations using the googlecloudexporter. If you want to forward your metrics to a different destination, you can check the destinations OpenTelemetry supports here. Set up the pipeline: Viewing the metrics collected All the metrics the JMX receiver scrapes are listed below. Alerting: With these metrics and dashboards created for the Hadoop server, here are some alerting and monitoring steps that you can implement: In addition to the metrics specific to the Hadoop server, the OS and JVM metrics are tracked to give a complete view of the data usage capacity and projects for the HDFS. Set alerting for thresholds for capacity, blocks missing, block corrupt, and volume failures. Avoid false alarms by resampling the metrics. Set up alerts for failures related to individual data nodes. Set up alerts for memory shortage-related metric thresholds. observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com or join the conversation on Slack!

How to Collect and Ship Windows Events Logs with OpenTelemetry

Paul Stefanski — Fri, 15 Jul 2022 06:30:00 GMT

If you're using Windows, you'll want to monitor Windows Events. With our latest contribution to observIQ’s distribution of the OpenTelemetry Collector, you can easily monitor Windows Events with OpenTelemetry. You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. Below are steps to get up and running quickly with observIQ’s distribution and shipping Windows Event logs to a popular backend: Google Cloud Ops. You can find out more about it on observIQ’s GitHub page. What signals matter? Windows Events logs record many operating system processes, application activity, and account activity. Some relevant log types to monitor include: Application Status Contains information about applications installed or running on the system. If an application crashes, these logs may include an explanation for the crash. Security Logs Contains information about the system’s audit and authentication processes. If a user attempts to log into the system or use administrator privileges System Logs Contains information about Windows-specific processes, such as driver activity. All of the above categories can be gathered with the Windows Events receiver – so let’s get started. Related Content: How to Install and Configure an OpenTelemetry Collector Before you begin If you don’t already have an OpenTelemetry collector built with the latest Windows Events receiver installed, you’ll need to do that first. We suggest using observIQ’s distribution of the OpenTelemetry Collector, which includes the Windows Events receiver (and many others) and is simple to install with our one-line installer. Configuring the Windows Events receiver You can go ahead and navigate to your OpenTelemetry configuration file. If you’re using the observIQ Collector, you’ll find it at the following location: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) Edit the configuration file to include the Windows Events receiver as shown below: You can edit the specific output by adding/editing the following directly below the receiver name and channel: Configuring the Log Fields You can adjust the following fields in the configuration to adjust what types of logs you want to ship: Related Content: Configuration Management in BindPlane OP Operators Each operator performs a simple responsibility, such as parsing a timestamp or JSON. Chain together operators to process logs into a desired format. Every operator has a type. Every operator can be given a unique ID. If you use the same type of operator more than once in a pipeline, you must specify an ID—otherwise, the ID defaults to the value of type. Operators will output to the next operator in the pipeline. The last operator in the pipeline will emit from the receiver. Optionally, the output parameter can specify the ID of another operator to which logs will be passed directly. Only parsers and general-purpose operators should be used. observIQ’s distribution of the OpenTelemetry collector is a game-changer for companies looking to implement OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to Monitor Zookeeper with OpenTelemetry

Deepa Ramachandra — Thu, 14 Jul 2022 06:30:00 GMT

We are back with a simplified configuration for another critical open-source component, Zookeeper. Monitoring Zookeeper applications helps to ensure that the data sets are distributed as expected across the cluster. Although Zookeeper is considered very resilient to network mishaps, monitoring is inevitable. To do so, we’ll set up monitoring using the Zookeeper receiver from OpenTelemetry. The configuration detailed in this post uses observIQ’s distribution of the OpenTelemetry collector. We are simplifying the use of OpenTelemetry for all users. If you are as excited as we are, look at the details of this support in our repo. You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. Monitoring performance metrics for Zookeeper is necessary to ensure that all the jobs are running as expected and the clusters are humming. The following categories of metrics are monitored using this configuration: Znodes: Automatically discover Zookeeper Clusters, monitor memory (heap and non-heap) on the Znode, and get alerts of changes in resource consumption. Automatically collect, graph, and get alerts on garbage collection iterations, heap size and usage, and threads. ZooKeeper hosts are deployed in a cluster, and as long as most hosts are up, the service will be available. Make sure the total node count inside the ZooKeeper tree is consistent. Latency and throughput: A consistent view of the performance of your servers, regardless of whether they change roles from Followers to Leader or back – you’ll get a meaningful view of the history. Configuring the Zookeeper Receiver After the installation, the config file for the collector can be found at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) Related Content: How to Install and Configure an OpenTelemetry Collector Receiver Configuration: Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration. Setup the endpoint attribute as the system that is running the Hadoop instance Processor Configuration: The resource detection processor is used to create a distinction between metrics received from multiple Hadoop systems. This helps filter metrics from specific Redis hosts in the monitoring tool, such as Google Cloud operations. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. If you would like to learn more about this processor check the documentation. Exporter Configuration: In this example, the metrics are exported to New Relic using the OTLP exporter. If you want to forward your metrics to a different destination, you can check the destinations OpenTelemetry supports here. Set up the pipeline: Viewing the metrics All the metrics the Zookeeper receiver scrapes are listed below. Related Content: Managing Observability Pipeline Chaos and the Bottomline Alerting Now that you have the metrics gathered and exported to the destination of your choice, you can explore how to configure alerts for these metrics effectively. Here are some alerting possibilities for ZooKeeper: observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com or join the conversation on Slack!

How to Monitor Varnish with Google Cloud Platform

Jonathan Wamsley — Wed, 13 Jul 2022 04:49:00 GMT

We’re excited to announce that we’ve recently added Varnish monitoring support for the Google Cloud Platform. You can check it out here! Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations and monitor metrics and logs from Varnish in your Google Cloud Platform. You can check out Google’s documentation for using the Ops Agent for Varnish here. What signals matter? Varnish is a popular content streaming technology. Important metrics include information from clients, performance, threads, and backend. Client Metrics Connections Requests Performance Metrics Cache Hits Evictions Thread Metrics Creations Failures Backend Metrics Success Failures General Health The Varnish receiver can gather all the above categories – so let’s get started. Related Content: Exploring & Remediating Consumption Costs with Google Billing and BindPlane OP Before you begin If you don’t already have an Ops Agent installed with the latest Varnish receiver, you’ll need to do that first. Check out the Google Cloud Platform Ops Agent documentation for installation methods, including the one-line installer. Configuring the Varnish receiver for Metrics and Logs Navigate to your Ops Agent configuration file. You’ll find it in the following location: /etc/google-cloud-ops-agent/config.yaml (Linux) Edit the configuration file for Varnish metrics as shown below: For Logging, add the following in the same yaml config file: Related Content: Getting Started with BindPlane OP and Google Cloud Operations Restart the Ops Agent with the following command: sudo service google-cloud-ops-agent restart sleep 30 Viewing the metrics collected So that you know – the following Varnish metrics will now be delivered to your preferred destination following the steps detailed above. Varnish is a high-performance web application accelerator that caches requests and delivers content. observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Varnish, our solutions can significantly impact your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com or join us on Slack!

How to Monitor Cassandra using OpenTelemetry

Deepa Ramachandra — Fri, 08 Jul 2022 19:00:00 GMT

We are constantly working on contributing monitoring support for various sources; the latest in that line is support for Cassandra monitoring using the OpenTelemetry collector. If you are as excited as we are, take a look at the details of this support in OpenTelemetry’s repo. The best part is that this receiver works with any OpenTelemetry Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. In this post, we take you through the steps to set up the JMX receiver with observIQ’s distribution of the OpenTelemetry Collector and send out the metrics to New Relic. What signals matter? Performance metrics are the most important to monitor for Cassandra. Here’s a list of signals to keep track of: Availability of resources: Monitoring the physical resources and their utilization is critical to Cassandra’s performance. Standard JVM metrics, such as memory usage, thread count, garbage collection, etc., are good to monitor. If there’s a decrease in the computing resources, the Cassandra database’s performance will be affected. Volume of client requests: As with monitoring other databases, monitoring the time taken to send, receive, and fulfill requests is necessary. The volume of requests is also an indicator of unforeseen spikes in traffic, possibly an issue with the application/ database. Latency: Latency is a critical metric to monitor for Cassandra databases. Continuous monitoring helps identify performance issues and latency issues originating from a cluster. Values of read and write requests are monitored to create a holistic view of execution speed. Related Content: How to Install and Configure an OpenTelemetry Collector Configuring the JMX metrics receiver After the installation, the config file for the collector can be found at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) The first step is building the receiver’s configuration: We are using the JMX receiver to gather Cassandra metrics. The jar_path attribute lets you specify the path to the jar file, which facilitates gathering Cassandra metrics using the JMX receiver. This file path is created automatically when observIQ’s distribution of the OpenTelemetry Collector is installed. Set the IP address and port for the system from which the metrics are gathered as the endpoint. When we connect to JMX, there are different categories of metrics; the Cassandra metrics and JVM metrics are the ones that this configuration intends to scrape. This target_system attribute specifies that. Set the time interval for fetching the metrics for the collection_interval attribute. The default value for this parameter is 10s. However, if exporting metrics to New Relic, this value is set to 60s by default. The Properties attribute allows you to set arbitrary attributes. For instance, if you are configuring multiple JMX receivers to collect metrics from many Cassandra servers, this attribute will enable you to set the unique IP addresses for each endpoint system. So that you know, this is not the only use of the properties option. Related Content: Configuration Management in BindPlane OP The next step is to configure the processors: Use the resourcedetection processor to create an identifier value for each Cassandra instance from which the metrics are scraped. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. If you would like to learn more about this processor check the documentation. Finally, as shown below, we’ll set up a destination for exporting the metrics. You can check the configuration for your preferred destination from OpenTelemetry’s documentation here. Set up the pipeline. Viewing the metrics collected Based on the config detailed above, the JMX metrics gatherer scrapes the following metrics and exports them to the destination. observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to Monitor Tomcat with OpenTelemetry

Deepa Ramachandra — Tue, 28 Jun 2022 18:51:07 GMT

We are constantly working on contributing monitoring support for various sources; the latest in that line is support for Tomcat monitoring using the JMX Receiver in the OpenTelemetry collector. If you are as excited as we are, look at the details of this support in OpenTelemetry’s repo. You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. In this post, we take you through the steps to set up this receiver with observIQ’s distribution of the OpenTelemetry Collector and send out the metrics to Google Cloud Operations. What signals matter? Performance metrics are the most important to monitor for Tomcat servers. Here’s a list of signals to keep track of: Application metrics: Metrics related to each application that is deployed. Metrics such as tomcat.sessions and tomcat.processing_time gives insight into the number of active sessions and the processing times for the application since startup. Request Processor Metrics: Monitoring the request processing times helps gauge the hardware needs to enable the Tomcat server to handle the required number of requests in a specific period. Metrics such astomcat.request_count and tomcat.max_time gives insights into the total number of requests processed since the start time and the maximum time taken to process a request. Managing the traffic to the server: Tracking requests sent and received gives a good idea of the volume of traffic the server is handling at any time. This is essential, especially during peak traffic times, as it closely monitors the server’s performance based on traffic volumes. The tomcat.traffic metric shows the request received and the response sent at any time. A number of threads: By default, Tomcat servers create 200 threads; as the limit is reached, Tomcat continues accommodating a certain number of concurrent connections. However, keeping track of the total number of threads created is necessary. The tomcat.threads metric gives the total number of threads. All metrics related to the categories above can be gathered with the JMX receiver – so let’s get started! The first step in this configuration is to install observIQ’s distribution of the OpenTelemetry Collector. For installation instructions and the collector's latest version, check our GitHub repo. Related Content: How to Install and Configure an OpenTelemetry Collector Enabling JVM for Tomcat Tomcat, by default, does not have JVM enabled. To enable JVM, follow the instructions linked here. Configuring the jmxreceiver After the installation, the config file for the collector can be found at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) The first step is the receiver’s configuration: We are using the JMX receiver to gather Tomcat metrics. The jar_path attribute lets you specify the path to the jar file, which facilitates gathering Tomcat metrics using the JMX receiver. This file path is created automatically when observIQ’s distribution of the OpenTelemetry Collector is installed. Set the IP address and port for the system from which the metrics are gathered as the endpoint. When we connect to JMX, there are different metrics categories; the Tomcat and JVM metrics are the ones that this configuration intends to scrape. This target_system attribute specifies that. Set the time interval for fetching the metrics for the collection_interval attribute. The default value for this parameter is 10s. However, if exporting metrics to Google Cloud operations, this value is set to 60s by default. The Properties attribute allows you to set arbitrary attributes. For instance, if you are configuring multiple JMX receivers to collect metrics from many Tomcat servers, this attribute will enable you to set the unique IP addresses for each endpoint system. Please note that this is not the only use of the properties option. The next step is to configure the processors: Use the resourcedetection processor to create an identifier value for each Tomcat system from which the metrics are scraped. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. To learn more about this processor, check the documentation. The next step is to set a destination for exporting the metrics, as shown below. You can check the configuration for your preferred destination from OpenTelemetry’s documentation here. Set up the pipeline. Viewing the metrics collected Based on the above configuration, the JMX metrics gatherer scrapes the following metrics and exports them to Google Cloud Operations. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

Filtering Metrics with the observIQ Distro for OpenTelemetry Collector

Dylan Myers — Mon, 20 Jun 2022 17:16:31 GMT

This post will address the common monitoring use case of filtering metrics within observIQ’s OpenTelemetry (OTEL) Collector distribution. Whether the metrics are deemed unnecessary or filtered for security concerns, the process is straightforward. We will use MySQL on Red Hat Enterprise Linux 8 for our sample environment. The destination exporter will be to Google Cloud Operations, but the process is exporter agnostic. We use this exporter to provide visual charts showing the metrics before and after filtering. Environment Prerequisites Suitable operating system observIQ Distro for OTEL Collector installed MySQL installed MySQL Least Privilege User (LPU) setup OTEL is configured to collect metrics from MySQL Resources observIQ Distro for OTEL Collector Download MySQL Receiver Documentation MySQL Metadata File (Lists the Metrics) Initial Metrics Once configured using the LPU I created, MySQL metrics should be flowing. We will focus on the specific metric `mysql.buffer_pool.limit` for our purposes. Currently our config.yaml MySQL section looks like this: After waiting for at least 5 minutes to get a good amount of data, metrics will look something like this in Google’s Metrics Explorer: Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP Filtering Now that metrics are flowing, we can filter them. First, let us discuss the reasons for filtering this specific metric. The answer is simple: it isn’t really all that useful or necessary. Barring a configuration change by the DBA, it will be a flat line. Even after a configuration change, it simply steps that flat line up or down. To filter, we first need to look at the metadata file for the MySQL receiver. In this file, we find a listing of the attributes and metrics associated with this receiver. If we go to the metrics section of the file and see our pool limit metric, we learn it looks like this: This lets us know it is enabled by default and describes the metric and other essential data. As these are the defaults, we can interpret from it that if we set the `enabled` parameter to false, it should disable this metric–aka filter. It will not be collected; since it isn’t collected, it will also not be sent to the exporter. To achieve this in our configuration file, we make the following changes: This replicates the structure from the metadata file but with everything else trimmed other than the bare minimum number of lines needed to achieve our goal. Once this has been changed and the collector restarted, I again wait at least 5 minutes and check Google’s Metrics Explorer to see what has changed: The screenshot shows that data was last sent to Google at 10:48, now 11:13. Related Content: How to Monitor MySQL with OpenTelemetry Conclusion While the information needed is located in a few different places, filtering is straightforward. Also, one can always contact observIQ support if they need help finding the necessary documents to provide the information. Finally, don’t forget that the metadata we looked at also includes other helpful information for understanding your data.

Introducing BindPlane OP

Ryan Goins — Wed, 15 Jun 2022 20:47:21 GMT

We’re pleased to announce the beta release of BindPlane OP (Observability Pipeline), the first open-source observability pipeline built specifically for OpenTelemetry. When we launched BindPlane in 2018, we set out to build a best-in-class observability management platform. With over 150 high-fidelity metrics and log integrations, intelligent alerting, and a centralized collector, BindPlane has been the solution IT teams have depended on for managing complex observability data streams and agent configurations. We’ve worked closely with many of you over the years, and a standard set of challenges and requests have arisen time and again: You have stringent compliance requirements and want an option to deploy BindPlane. Open source has gone from being a nice-to-have to a requirement. It would be best to have more control over where your data flows and the ability to adjust it to control costs. Full integration with your DevOps automation stack is required for today’s scale. Over the last six months, our team has been working diligently to solve each of these challenges, shifting our focus from building proprietary solutions to open-source ones. A huge part of that effort has been contributing to the OpenTelemetry project as we realize our vision of delivering a completely open-source telemetry stack. Here are just a few of the contributions we’ve made this year: We contributed our highly performant logging agent, Stanza, to the OpenTelemetry community, which recently became stable as OpenTelemetry Logging. We contributed 30 metric receivers to the OpenTelemetry project. We created an OpenTelemetry plugin framework, allowing simplified configuration to gather telemetry from popular technologies. We released a distribution of the OpenTelemetry Collector that leverages the Open Telemetry Agent Management Protocol (OpAMP), the first to do so. Building on those efforts, we’re very excited to introduce the next piece of the stack – BindPlane OP, the first open-source observability pipeline built for OpenTelemetry. Enterprise-ready, BindPlane OP allows you to control your observability costs and simplify the deployment and management of telemetry agents at scale. With the beta releasing today, we’ve solidified the foundation of BindPlane OP. Why BindPlane OP? Open Source First off, BindPlane OP will be completely open source. It’s built to work with OpenTelemetry using the new OpAMP protocol for agent management, and over time, we plan to expand support beyond OpenTelemtry to other OSS agents. As we’re still in beta, the repo isn’t open-sourced quite yet, but that’s coming very soon. Agent Management BindPlane OP provides you with a single control plane for managing your agents. Want to deploy a configuration to thousands of agents? BindPlane OP won’t blink. Do you have an agent that just disconnected? You’ll instantly see that in BindPlane. Do you need help building a new configuration? No problem, BindPlane will walk you through it. Data on Demand Switch an integration between high and low flow modes to collect only the data you need when you need it. Increase the fidelity and flow of your data to quickly solve problems in your environment or reduce fidelity to save on cost. This helps increase the signal in your logging data and reduce your log analytics bills. Instrument Once. Send Anywhere. No need for proprietary agents. Deploy the OTel collector to all your hosts and gather metrics, logs, and traces in minutes. Changed your mind about that analytics platform? No problem. OpenTelemetry plus BindPlane means no vendor lock-in. Change the destination in minutes with no need to redeploy. Enterprise Ready You asked for a version of BindPlane you could manage internally. We’re delivering that with BindPlane OP, a single Go binary that can be deployed behind your firewall and requires no connection to observIQ. Try it Now Today, we’re releasing BindPlane OP in beta, and we’d love to have you try it out as we work toward a general release later this year. Get started now: Download the beta and start kicking the tires. Join our Slack community and let us know what you think. We’re deeply committed to OpenTelemetry and building the solutions that allow the industry to operate on a completely open-source telemetry stack. We’re looking forward to working with you as we realize that vision.

How to monitor Elasticsearch with OpenTelemetry

Deepa Ramachandra — Wed, 15 Jun 2022 14:36:15 GMT

Some popular monitoring tools in the market can complicate and create blind spots in your Elasticsearch monitoring. That’s why we made monitoring Elasticsearch simple, straightforward, and actionable. Read along as we dive into the steps to monitor Elasticsearch using observIQ’s distribution of the OpenTelemetry collector. To monitor Elasticsearch, we will configure two OpenTelemetry receivers, the Elasticsearch receiver and the JVM receiver. Broken image It is always good to stick to industry standards, and when it comes to monitoring, OpenTelemetry is the standard. We are simplifying the use of OpenTelemetry for all users. If you are as excited as we are, look at the details of this support in our repo. You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. What signals matter? Elasticsearch has clusters, nodes, and masters, which are concepts specific to Elasticsearch. When monitoring a cluster, you collect metrics from a single Elasticsearch node or multiple nodes in the cluster. Some of the most critical elastic search metrics to monitor: Cluster health based on node availability and shards: Elasticsearch’s most favorable feature is its scalability, which depends on optimized cluster performance. Metrics deliver valuable data such as cluster status, node status, and shard numbers split categorically as active shards, initializing shards, relocating shards, and unassigned shards. In addition to this, the elasticsearch.node.shards.size metrics give the size of shards assigned to a specific node. Node health based on disk space availability, CPU, and memory usage percentages: Elasticsearch’s performance depends on how efficiently its memory is used, specifically the memory health of each node. Constant node reboots could lead to an increased read from disk activity, reducing performance. CPU usage is another critical component of Elasticsearch monitoring. Heavy search or indexing workloads can increase CPU usage, resulting in degraded performance. Metrics such as elasticsearch.node.fs.disk.available, elasticsearch.node.cluster.io helps chart these values and derive valuable inferences. Related Content: How to Install and Configure an OpenTelemetry Collector JVM metrics for JVM heap, garbage collection, and thread pool: Elasticsearch is Java-based and runs within a JVM(Java Virtual Machine). Cluster performance depends on the efficiency of the JVM heap usage. All Java objects exist within a JVM heap, created when the JVM application starts, and the objects are retained in the heap until it is complete. JVM heap is tracked using the metrics jvm.memory.heap.max,jvm.memory.heap.used and jvm.memory.heap.committed. Once the JVM heap is full, garbage collection is initiated. JVM’s garbage collection is an ongoing process; it is critical to ensure that it does not retard the application’s performance in any way. JVM’s garbage collection capabilities are tracked using the metrics jvm.gc.collections.count and jvm.gc.collections.elapsed. Each node maintains thread pools of all types; the thread pools,, in turn,, have worker threads that reduce the overhead on the overall performance. Threadpools queue the requests and serve them when the node has available bandwidth to accommodate the request. All metrics related to the categories above can be gathered with the Elasticsearch receiver – so let’s get started! Configuring the Elasticsearch receiver You can use the following configuration to gather metrics using the Elasticsearch receiver and forward the metrics to the destination of your choice. OpenTelemetry supports over a dozen destinations to which you can forward the collected metrics. More information is available about exporters in OpenTelemetry’s repo. In this sample, the configuration for the elastic receiver is covered. For details on the JVM receiver, check OpenTelemetry’s repo. Receiver configuration: Use the nodes attribute to specify the node that is being monitored. Set up the endpoint attribute as the system running the elasticsearch instance. Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration. Processor configuration: The resourcedetection processor creates a unique identity for each metric host so that you can filter between the various hosts to view the metrics specific to that host. The resource processor is used to set and identify these parameters. The resourceattributetransposer processor enriches the metrics data with the cluster information. This makes it easier to drill down to the metrics for each cluster. The batch processor is used to batch all the metrics together during collection. Related Content: What is the OpenTelemetry Transform Language (OTTL)? Exporter Configuration: In this example, the metrics are exported to Google Cloud Operations. If you would like to forward your metrics to a different destination, check the destinations that OpenTelemetry supports at this time, here. Set up the pipeline. Viewing the metrics All the metrics the Elasticsearch receiver scrapes are listed below. In addition to those, the attributes and their usage are also listed. It helps to understand the attributes used if your usage requires enriching the metrics data further with these attributes. List of attributes: observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to Monitor Active Directory with OpenTelemetry

Brandon Johnson — Wed, 25 May 2022 20:54:43 GMT

We’re excited to announce that we’ve recently contributed Active Directory Domain Services (abbreviated Active Directory DS) monitoring support to the OpenTelemetry collector. You can check it out here! You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. Below are steps to get up and running quickly with observIQ’s distribution and shipping Active Directory DS metrics from Windows to a popular backend: Google Cloud Monitoring. You can find out more about it on observIQ’s GitHub page. What signals matter? Monitoring an Active Directory DS instance can be daunting, but we’ve focused the performance metrics on just a few key components: The Directory Replication Agent (DRA) The Directory Replication Agent controls the replication of domains across multiple domain controllers. This component is essential for keeping your directory data safe and available during outages. LDAP LDAP (Lightweight Directory Access Protocol) is used to access your directory. The performance of this component is critical to accessing data in your directory over the network. The Domain Controller The domain controller itself manages directory data. The performance of this component is critical to accessing the data in your directory. A table with the complete list of the Active Directory metrics automatically tracked with OpenTelemetry can be found at the end of the article, but first, let’s install the collector! Related Content: How to Install and Configure an OpenTelemetry Collector Installing to the Source You'll need to do that first if you don’t already have an OpenTelemetry collector built with the latest Active Directory receiver installed. We suggest using observIQ’s distribution of the OpenTelemetry Collector, which includes the Active Directory receiver (and many others) and is simple to install with our one-line installer. Configuring the Active Directory DS receiver After the installation, the config file for the collector can be found at C:\Program Files\observIQ OpenTelemetry Collector\config.yaml Edit the configuration file and use the following configuration. In the example above, the Active Directory DS receiver configuration is set to: Receive Active Directory metrics from the Windows performance counters. Set the time interval for fetching the metrics. The default value for this parameter is 10s. However, if metrics are exported to Google Cloud operations, this value should be set to 60s. Export metrics to Google Cloud. By default, the version of the Google Cloud exporter provided with the observIQ collector exports as the “generic_node” resource. “node_id” is the hostname of the machine the collector is running on. “location” is “global” as default. “namespace” is the default hostname of the machine. We override the default namespace and set it to “active_directory”. You can view the full range of configuration options for observIQ’s version of the Google Cloud exporter here. Related Content: OpenTelemetry in Production: A Primer Viewing the metrics You should see the following metrics exported to Metrics Explorer: To view the metrics, follow the steps outlined below: In the Google Cloud Console, head to Metrics Explorer. Select the resource as a generic node. Follow the namespace equivalent in the table above and filter the metric to view the chart. Broken image observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to monitor MongoDB with OpenTelemetry

Deepa Ramachandra — Thu, 19 May 2022 13:50:26 GMT

MongoDB is a document-oriented and cross-platform database that maintains its documents in the binary-encoded JSON format. Mongo’s replication capabilities and horizontal capability using database sharding make MongoDB highly available. An effective monitoring solution can make it easier for you to identify issues with MongoDB, such as resource availability, execution slowdowns, and scalability. observIQ recently built and contributed a MongoDB metric receiver to the OpenTelemetry contrib repo. You can check it out here! You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. Below are steps to get up and running quickly with observIQ’s distribution, shipping MongoDB metrics to any popular backend. You can find out more about it on observIQ’s GitHub page. You can find OTel config examples for MongoDB and other applications shipping to Google Cloud here. Let’s get started! What signals matter? The most critical MongoDB-related metrics to monitor are: The status of processes and memory utilization: Monitoring MongoDB’s server processes helps identify slowness in its activity or health. Unresponsive processes during command execution are an example of a scenario that needs further analysis. The mongodb.collection.count metric helps determine the stability, restart numbers, and backup performance related to the collections in that MongoDB instance. The mongodb.data.size gives the value of the storage space consumed by the data in your current MongoDB instance. Broken image Operations and connections metrics: When there are performance issues in the application, it is necessary to rule out if the problem stems from the database layer. In this case, monitoring the connections and operations patterns becomes very critical. Metrics such as mongodb.cache.operations and mongodb.connection.count gives insights into the connections’ operation and count. By monitoring the operations, you can draw a pattern and set thresholds and alerts for those thresholds. Broken image Query Optimization: For a query, the MongoDB query optimizer chooses and caches the most efficient query plan given the available indexes. The most efficient query plan is evaluated based on the number of “work units” ( works ) performed by the query execution plan when the query planner evaluates candidate plans. For instance, metrics such as mongodb.global_lock.time show the trends in lock time for query optimization. Broken image Before creating your configuration, you should have observIQ’s distribution of the OpenTelemetry Collector installed. For installation instructions and the collector's latest version, check our GitHub repo. Configuring the mongoDB receiver After the installation, the config file for the collector can be found at: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) Let’s begin with the configuration for the receiver. Here, we set up the host as the endpoint, essentially the IP address and port of the Mongo system. For all configurations using the Google Cloud Operations as an endpoint, the collection interval is set to 60s, which is the requirement. Disable TLS. This is done to remove any restriction from TLS to transmit the metrics data to the third party, in this case, Google Cloud Operations. Next up, the processors: Please note that these processors are optional. You may choose to use any of the available processors documented here. The resourcedetection processor will create a unique identifier for each MongoDB instance monitored using this configuration. Use the Normalize Sums Processor to average the initial metrics received for better visualization. Use the batch processor to collate the metrics from multiple receivers and send them to the exporter destination. We recommend using this processor with all receiver configurations when applicable. In this example, we are showing you a sample config for exporting metrics to Google Cloud. However, you may choose to export the metrics to any of the available destinations documented here. The configuration below exports the metrics to Google Cloud. Finally, set up the pipeline. Viewing the metrics collected The following metrics are fetched using the configuration above: To view the metrics, follow the steps outlined below: In the Google Cloud Console, head to Metrics Explorer. Select the resource as a generic node. Follow the namespace equivalent in the table above and filter the metric to view the chart. Broken image Broken image observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

observIQ Expands Partnership with Google Cloud to Support OpenTelemetry

observIQ Media — Wed, 18 May 2022 06:01:00 GMT

observIQ provides unified view into workloads on Google Compute Engine and on premise Grand Rapids, MI (May 18, 2022) – Open source telemetry innovator observIQ has expanded its partnership with Google Cloud and Google Cloud’s Operations Suite, to improve telemetry for managing all business-critical applications and environments. The partnership will increase visibility into workloads running on Google Compute Engine. Google Cloud customers can now use OpenTelemetry logs and metrics collection with support from Google Cloud and observIQ to unify observability data from Google Cloud, on-premise data centers, and other cloud environments. OpenTelemetry is a collection of open-source telemetry tools, APIs, and SDKs. observIQ is a major contributor, developing high-performance log and metric collection technology that can be easily configured and scaled to fit customers' monitoring needs. Many Google Cloud customers are already using OpenTelemetry to achieve significant cost savings to replace legacy telemetry and observability technologies. observIQ will integrate over 100 OpenTelemetry receivers into Google Ops Agent making it possible to collect logs and metrics for all of the major technologies customers wish to observe. Google Ops Agent is the primary agent for collecting telemetry from Compute Engine instances, while OpenTelemetry provides support for many other platforms. observIQ will also integrate other analysis layer components, such as dashboards and alerts, to provide customers with a true out-of-the-box observability experience. Google Cloud Operations (GCO) Suite is an integrated monitoring, logging, and trace-managed service in Google Cloud that businesses rely on to monitor their applications and systems. Google Cloud is expanding GCO to provide a greater level of insight into business-critical applications and environments. Currently, DevOps and ITOps teams must use legacy tools and approaches to achieve visibility into workloads. The expanded collaboration positions both companies at the forefront of a shift towards open source in the observability industry and reflects the growth of hybrid and multi-cloud environments and the resulting complexity in monitoring performance across them. Quote from Manvinder Singh, Director, IaaS/PaaS Partnerships, Google Cloud “As organizations build out their multi-cloud or hybrid cloud environments, the need for solutions that help streamline telemetry collection for any cloud environment continues to rise. With OpenTelemetry available on Google Cloud alongside observIQ capabilities, customers will be able to easily unify their observability data across cloud environments.” Quote from Mike Kelly, CEO of observIQ “The open source community and OpenTelemetry capture the very best innovation, and expanding our work with the Google Cloud team is a great way to put it into the hands of even more DevOps and ITOps teams quickly. We have a shared vision of telemetry's future and look forward to continuing this collaboration.” ### About ObservIQ At observIQ, we develop fast, powerful, and intuitive next-generation observability technologies for DevOps and ITOps – built by engineers for engineers. Learn more at www.observiq.com. Contact: Ann.oleary@observiq.com P: +1 650 996 0778

How to Monitor Microsoft IIS with OpenTelemetry

Miguel Rodriguez — Sat, 14 May 2022 18:44:18 GMT

The OpenTelemetry members at observIQ are excited to add Microsoft IIS metric monitoring support to OpenTelemetry! You can now easily monitor your IIS web servers with the oIQ OpenTelemetry Collector. You can add the IIS metric receiver to any OpenTelemetry collector. This post demonstrates just one configuration for shipping metrics with OpenTelemetry components. This configuration and many other observIQ OpenTelemetry configurations are available in the oIQ Opentelemetry Collector. Installation and configuration are simple, but you can refine your configuration once the metric receiver is up and running. The configuration is easily editable as a yaml file. You can find more documentation, example configurations for other receivers, and observability tools on GitHub and our blog. What Matters for Microsoft IIS Metrics Microsoft IIS is a general-purpose platform for web servers and applications. The possible scale and scope of IIS monitoring are vast, so the specific metrics that matter most to you might vary. Uptime, data flow, and request metrics are the most commonly monitored. Step 1: Installing the Collector The oIQ OpenTelemetry Collector can be installed on Windows, MacOS, and Linux using single-line install commands that can be copied directly from GitHub. Be sure that you have administrator privileges on the device or VM you are running the installation on. Since Microsoft IIS is a Windows-based product, only the Windows installation steps are shown below. Related Content: How to Install and Configure an OpenTelemetry Collector Step 2: Prerequisites and Authentication Credentials In the following example, we use Google Cloud Operations as the destination. However, OpenTelemtry offers exporters for many destinations. Check out the list of exporters here. Setting up Google Cloud exporter prerequisites: If running outside of Google Cloud (On-prem, AWS, etc.) or without the Cloud Monitoring scope, the Google Exporter requires a service account. Create a service account with the following roles: Metrics: roles/monitoring.metricWriter Logs: roles/logging.logWriter Create a service account JSON key and place it in the system running the collector. Windows In this example, the key is placed at C:/observiq/collector/sa.json. Set the GOOGLE_APPLICATION_CREDENTIALS with the command prompt setx command. Run the following command. Restart the service using the services application. Related Content: Rapid telemetry for Windows with OpenTelemetry and BindPlane OP Step 3: Configure the Microsoft IIS Receiver After installation, the config file for the collector can be found at: Windows: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml MacOS/Linux: /opt/observiq-otel-collector/config.yaml Edit the config file with the following configuration: The configuration is set to receive metrics from the Microsoft IIS server to Google Cloud. You can specify your destination and insert any necessary credentials near the top of the “receivers” section of the config file. The following notes apply to Google Cloud: The interval for fetching metrics is 60 seconds by default. In the Google Cloud exporter, do the following mapping: Set the target type to a generic node to simplify filtering metrics from the collector in cloud monitoring. Set node_id, location, and namespace for the metrics. The resource processor sets the location and namespace. The project ID is not set in the configuration. Google automatically detects the project ID. Add the normalizesums processor to exclude the first metric with a zero value when the configuration is done and the collector is restarted. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. Step 4: View Your Metrics Below is a table of the metrics that OpenTelemetry collects on Microsoft IIS. You can exclude metrics by adding the following line to your config file with the metric name you want to exclude: observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to Monitor Riak Metrics with OpenTelemetry

Mitchell Armstrong — Tue, 03 May 2022 23:48:21 GMT

observIQ’s OpenTelemetry members contributed Riak metric monitoring support to OpenTelemetry! You can now monitor your Riak agent performance with OpenTelemetry and deploy it simply with the oIQ OpenTelemetry Collector. You can add the Riak metric receiver to any OpenTelemetry collector. This post demonstrates a configuration for shipping metrics to Google Cloud Operations with OpenTelemetry components. This configuration and many other observIQ OpenTelemetry configurations are available in the oIQ Opentelemetry Collector. Installation and configuration are simple, but you can refine your configuration once the metric receiver is up and running. The configuration is easily editable as a yaml file. You can find more documentation, example configurations for other receivers, and observability tools on GitHub and our blog. What Matters for Riak Metrics Riak deployments can get complicated and tedious. Large environments stress throughput, and monitoring metrics offer insight into resource usage, stability, and overall health. Step 1: Installing the Collector The oIQ OpenTelemetry Collector can be installed on Windows, MacOS, and Linux using single-line install commands that can be copied directly from GitHub. Please make sure you have administrator privileges on your device or VM when running the installation. Windows: MacOS/Linux: Step 2: Prerequisites and Authentication Credentials In the following example, we are using Google Cloud Operations as the destination. However, OpenTelemtry offers exporters for many destinations. Check out the list of exporters here. Setting up Google Cloud exporter prerequisites: If running outside of Google Cloud (On-prem, AWS, etc) or without the Cloud Monitoring scope, the Google Exporter requires a service account. Create a service account with the following roles: Metrics: roles/monitoring.metricWriter Logs: roles/logging.logWriter Create a service account JSON key and place it in the system running the collector. Related Content: Rapid telemetry for Windows with OpenTelemetry and BindPlane OP MacOS/Linux In this example, the key is placed at /opt/observiq-otel-collector/sa.json, and its permissions are restricted to the user running the collector process. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable by creating a systemd override. A systemd override allows users to modify the systemd service configuration without changing the service directly. This allows package upgrades to happen seamlessly. You can learn more about systemd units and overrides here. Run the following command. If this is the first time an override is being created, paste the following contents into the file: If an override is already in place, insert the Environment parameter into the existing Service section. Restart the collector Windows In this example, the key is placed at C:/observiq/collector/sa.json. Set the GOOGLE_APPLICATION_CREDENTIALS with the command prompt setx command. Run the following command. Restart the service using the services application. Related Content: How to Monitor MySQL with OpenTelemetry Step 3: Configure the Riak Receiver After installation, the config file for the collector can be found at: Windows: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml MacOS/Linux: /opt/observiq-otel-collector/config.yaml Edit the config file with the following configuration: The configuration is set to receive metrics from the Riak system to Google Cloud. You can specify your destination and insert any necessary credentials near the top of the “receivers” section of the config file. The following notes apply to Google Cloud: The interval for fetching metrics is 60 seconds by default. In the Google Cloud exporter, do the following mapping: Set the target type to a generic node to simplify filtering metrics from the collector in cloud monitoring. Set node_id, location, and namespace for the metrics. The resource processor sets the location and namespace. The project ID is not set in the configuration. Google automatically detects the project ID. Add the normalizesums processor to exclude the first metric with a zero value when the configuration is done and the collector is restarted. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP Step 4: View Your Metrics Below is a list of metrics that are collected by the OpenTelemetry Riak receiver. The metrics are sent to Google Cloud, or the destination designated during setup, for analysis. Go to the Google Cloud Console—Head to Metrics Explorer to view the metrics. Select the resource as a generic node. You can filter by namespace to view specific metrics. Broken image observIQ’s OpenTelemetry distribution is an easy way for anyone looking to implement OpenTelemetry observability standards in their IT environments. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.

How to Monitor Redis with OpenTelemetry

Deepa Ramachandra — Fri, 29 Apr 2022 19:07:24 GMT

We’re excited to announce that we’ve recently contributed Redis monitoring support to the OpenTelemetry collector. You can check it out here! You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector. Below are steps to get up and running quickly with observIQ’s distribution and shipping Redis metrics to a popular backend: Google Cloud Ops. You can find out more about it on observIQ’s GitHub page. What signals matter? Unlike other databases, monitoring the performance of Redis is relatively simple, focusing on the following categories of KPIs: Memory Utilization Database Throughput Cache hit ratio and evicted cache data Number of connections Replication All of the above categories can be gathered with the Redis receiver – so let’s get started. Step 1: Installing the collector The simplest way to get started is with one of the single-line installation commands shown below. For more advanced options, you’ll find various installation options for Linux, Windows, and macOS on GitHub. You can use the following single-line installation script to install the observIQ distribution of the OpenTelemetry Collector. Just so you know, the collector must be installed on the Redis system. Windows: Related Content: Rapid telemetry for Windows with OpenTelemetry and BindPlane OP MacOS/Linux: Step 2: Setting up pre-requisites and authentication credentials In the following example, we are using Google Cloud Operations as the destination. However, OpenTelemtry offers exporters for many destinations. Check out the list of exporters here. Setting up Google Cloud exporter prerequisites: If running outside of Google Cloud (On-prem, AWS, etc.) or without the Cloud Monitoring scope, the Google Exporter requires a service account. Create a service account with the following roles: Metrics: roles/monitoring.metricWriter Logs: roles/logging.logWriter Create a service account JSON key and place it in the system running the collector. MacOS/Linux In this example, the key is placed at /opt/observiq-otel-collector/sa.json, and its permissions are restricted to the user running the collector process. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable by creating a systemd override. A systemd override allows users to modify the systemd service configuration without changing the service directly. This allows package upgrades to happen seamlessly. You can learn more about systemd units and overrides here. Run the following command. If this is the first time an override is being created, paste the following contents into the file: If an override is already in place, insert the Environment parameter into the existing Service section. Restart the collector. Windows In this example, the key is placed at C:/observiq/collector/sa.json. Set the GOOGLE_APPLICATION_CREDENTIALS with the command prompt setx command. Run the following command. Restart the service using the services application. Related Content: How to Install and Configure an OpenTelemetry Collector Step 3: Configuring the Redis receiver After the installation, the config file for the collector can be found at C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows) /opt/observiq-otel-collector/config.yaml(Linux) Edit the configuration file and use the following configuration. In the example above, the Redis receiver configuration is set to: Receive metrics from the Redis system at the specified endpoint. Set the time interval for fetching the metrics. The default value for this parameter is 10s. However, if exporting metrics to Google Cloud operations, this value is set to 60s by default. The resource detection processor is used to create a distinction between metrics received from multiple Redis systems. This helps with filtering metrics from specific Redis hosts in the monitoring tool, in this case, Google Cloud operations. In the Google Cloud exporter here, do the following mapping: Set the target type to a generic node to simplify filtering metrics from the collector in cloud monitoring. Set node_id, location, and namespace for the metrics. The resource processor sets the location and namespace. It is important to note that the project ID is not set in the googlecloud exporter configuration. Google automatically detects the project ID. Add the normalizesums processor to exclude the first metric with a zero value when the configuration is done and the collector is restarted. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. It is recommended that the retry_on_failure be set to false. If this is not set, the retry attempts to fall into a loop for five attempts. Step 4: Viewing the metrics collected in Google Cloud operations Following the steps detailed above, you should see the following metrics exported to Metrics Explorer. To view the metrics, follow the steps outlined below: In the Google Cloud Console, head to Metrics Explorer. Select the resource as a generic node. Follow the namespace equivalent in the table above and filter the metric to view the chart. Related Content: Exploring & Remediating Consumption Costs with Google Billing and BindPlane OP observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

How to Collect Metrics and Logs for NGINX Using the OpsAgent

Deepa Ramachandra — Tue, 22 Mar 2022 14:53:02 GMT

Why use OpsAgent? The Ops agent is Google’s recommended agent for collating your application’s telemetry data and forwarding them to GCP for visualization, alerting, and monitoring. The Ops agent collates logs and a metrics collector into one single powerhouse. Some of the key advantages of using the Ops agent are outlined below: Ability to monitor and parse logs that are written to file or console. It is a lightweight agent. The installation and configuration are straightforward and seamless. How to install it? Before you install: Identify the most recent version of the Ops agent from Google Cloud documentation. Please note that the Ops agent version installed on the VM to monitor Nginx should be ver 2.1.0 or higher. Get the deployment link for the VM’s operating system. You may install the Ops agent in the following methods: Use the gcloud/ agent policies to install the agent to a fleet of VMs simultaneously. Use automation tools such as Ansible, Terraform, etc., to install the agent of a fleet of VMs. Install the agent to a single VM from the Google Cloud Console. Related Content: Getting Started with BindPlane OP and Google Cloud Operations This post details the steps to install the Ops agent on a single VM(that has Nginx installed) from the Google Cloud Console. Copy the installation code from Google’s documentation, which is linked here. Please ensure that you have the installation code for the VM’s operating system. In this case, for a Linux Debian 10 VM, the installation code is: curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh sudo bash add-google-cloud-ops-agent-repo.sh –also-install Configuring the Ops agent to collect Nginx logs and metrics These components must be enabled in the ops agent config to collect logs and metrics. Access the Ops agent’s config file in the /etc/google-cloud-ops-agent/config.yaml directory and add the following configuration: Broken image Viewing Nginx logs and metrics You can view the logs ingested from Nginx under the Logs Explorer. View the metrics forwarded to Google Cloud Monitoring. What logs are ingested from Nginx? The Ops agents ingest and forward the following logs to the Logs viewer. There are two types of Nginx logs: access logs and error logs. Request type: The protocol used in every request sent is logged. Referrer header: Nginx blocks access to a site if the request received has invalid values in the header field. The contents of this header field are logged. Client IP address: This log contains the IP address of the client from which the request is received. HTTP Method: The HTTP method in the request header. Request URL: The URL in the request received. Response size: The response sent to the client from Nginx. HTTP status code: The HTTP status code is sent as part of the response to the client. User-agent header: The contents of the user-agent proxy sent to the client. JSON payload username: The authenticated username sending the request Timestamp: The time that the request is received Related Content: How to monitor Vault with Google Cloud Platform What metrics does the Ops agent collect from the instance? Nginx collects and forwards the following metrics from Nginx. Requests received Connections accepted based on requests Connections handled by Nginx The current active connections Our telemetry solutions' ease of use is a must for any organization looking to refine and expand their observability. Please get in touch with our customer support team with your questions and requests.

Kubernetes Logging Simplified – Pt 2: Kubernetes Events

Joe Howell — Mon, 05 Apr 2021 15:06:00 GMT

Overview In my first post in the Kubernetes Logging Simplified blog series, I touched on some of the ‘need to know’ concepts and architectures to effectively manage your application logs in Kubernetes – providing steps on how to implement a Cluster-level logging solution to debug and analyze your application workloads. In my second post, I will touch on another signal to keep an eye on Kubernetes events. Kubernetes events are essential objects that can provide visibility into your cluster resources and help correlate with your application and system logs. What is a Kubernetes Event? Kubernetes events are JSON objects made accessible via the Kubernetes API that signify a state change of a Kubernetes resource. These changes are reported to the API using their related component. For example, suppose a pod is evicted or created. In that case, a container fails to start, or a node restarts – all these state changes would generate a Kubernetes Event, made accessible via API via kubectl commands. Unlike container logs, Kubernetes events don’t ultimately get logged to a file somewhere; Kubernetes lacks a built-in mechanism to ship these events to an external backend. As a result, attempting to utilize a typical node-level log agent architecture to grab these events may not work. These events can be captured with a custom application, several OSS tools, or an observIQ Event Collector, which I’ll walkthrough below. What information does a Kubernetes Event Contain? In addition to helpful environment metadata, a Kubernetes Event contains the following key bits of information. When the event occurred Severity of the event (info, warning, error) Reason the event occurred (abbreviated description of the event) Kind of Kubernetes resource (node, pod, container) Description of the event Component that reported the event (kubelet, kube-proxy, kube-API, etc.) Why is a Kubernetes Event Useful for Capturing? Postmortems Tracking Kubernetes Events can help you understand what’s happening in your cluster over time, which can be particularly helpful when reviewing during a postmortem. Digging into the ‘when’ and ‘why’ over time can reveal useful trends and a good discussion point when an application or service fails. Custom Kubernetes Events dashboard in observIQ Real-Time Environment Awareness Suppose you’re using a complete Cluster-Level Logging solution likeobservIQ. In that case, KubernetesEvents can be used to create informational or error-level alerts that notify Slack, for example, that provide real-time notifications that can keep your entire time in the loop as the state of your cluster resources changes. Container Log Correlation Having visibility into the state of your resources can help provide valuable hints as to what’s happening with your applications. Kubernetes Events gathered by observIQ are automatically enriched with Kubernetes Metadata like namespace, deployment, and pod names – all of which allow you to correlate an application log directly to a resource event with a single filter. Correlating container logs and Kubernetes events with a resource filter in observIQ How do I get Kubernetes Events? By default, events are stored in etcd for a limited time, typically ~60 minutes, and are made accessible by kubectl commands. Though the commands are helpful to learn and employ in certain situations, utilizing a custom application or implementing a complete Cluster-level logging solution that captures, ships, and stores events for long-term analysis is highly recommended. Accessing Kubernetes Events with Kubectl Here are a few commands that will allow you to see your events: kubectl describe pods Describing a pod will provide you with related Kubernetes event information, if available: kubectl get events Provides a list of current Kubernetes events for all resources: kubectl get events -o json Same as above, but each Kubernetes Event is presented as the raw JSON object: Accessing Kubernetes Events with OSS Tools: Both kube-events and Kubernetes-event-exporter are nifty, highly customizable tools that can capture and forward Kubernetes events to a preferred output or sink (e.g., S3, Kafka, etc.) Kube Events https://github.com/kubesphere/kube-events Kubernetes Event Explorer https://github.com/opsgenie/kubernetes-event-exporter Accessing Kubernetes Events with observIQ With observIQ, you can easily enable Kubernetes Event collection by deploying the observIQ log agent as an Event Collector. Just select the option on your Kubernetes Template. See the steps below: Deploying the observIQ Agent as a Kubernetes Event Collector In my first post in the series, I explored how to create a Kubernetes Template and enable container log collection. With observIQ, you can easily enable or disable logging options from your template, even after you’ve deployed agents to your cluster. In this example, we’ll allow event collection in our existing template, re-apply the observ-agent.yaml, and add an observIQ Event Collector to our existing deployment. This will leave us with 1) an observIQ agent daemonset that will gather the application’s logs and 2) a single Event Collector deployment that will gather the Kubernetes Events, running side by side. Update your Kubernetes Template in observIQ You can just navigate to the Fleet > Templates page and choose your previously created Kubernetes template. Select the ‘Enable Cluster Events’ option, then click ‘Update’. Enabling Kubernetes Events in observIQ Kubernetes Template Next, click ‘Add Agents’. On the Install Kubernetes Agents page, download and copy the newly-generated observiq-agent.yaml to your cluster, and apply by running kubectl apply -f observiq-agent.yaml command. After 15-30 seconds, you’ll see the new Event Collector in the discovery panel below. observIQ daemonset agents + event collector View your Kubernetes Events in observIQ After a few minutes, you’ll see Kubernetes Events on the observIQ Explore page. The messages will be typed as k8s.events. Opening up one of the k8s.events, you can see parsed JSON objects and valuable labels and metadata that have been automatically added to the event to help correlate to a specific application. Kubernetes event in with Kubernetes Labels and Metadata Wrapping up Gathering your Kubernetes events is essential if you want a complete understanding of what’s going on in your cluster. Kubernetes Events are easily accessible with kubectl commands but are short-lived. Container logs and Kubernetes events can be correlated together – but it can be challenging without the right tool. For a complete log management solution that will capture your Kubernetes Events, Container logs, and more, sign up here.

Kubernetes Logging Simplified – Pt 1: Applications

Joe Howell — Thu, 04 Mar 2021 14:30:00 GMT

If you’re running a fleet of containerized applications on Kubernetes, aggregating and analyzing your logs can be a bit daunting without the proper knowledge and tools. Thankfully, there’s plenty of helpful documentation to help you get started; observIQ provides the tools to quickly gather and analyze your application logs. In the first part of this blog series, Kubernetes Logging Simplified, I’ll highlight a few ‘need to know’ concepts so you can start digging into your application logs quickly. Kubernetes Logging Architecture – A Few Things You Need to Know Standard Output and Error streams The simplest logging method for containerized applications is writing to stdout and stderr. If you’re deploying an application, it’s best practice to enable logging to stdout and stderr or build this functionality into your custom application. Doing so will streamline your overall Kubernetes logging configuration and help facilitate the implementation of a Cluster-Level Logging solution. Cluster-Level Logging Out of the box, Kubernetes and container engines do not provide a complete Cluster-Level Loggingsolution, so it’s essential to implement a logging backend like ELK, Google Cloud Logging, or observIQ to ensure you can gather, store and analyze your application logs as the state and scale of your cluster changes. Node-Level Logging For applications that log to stdout and stderr, the Kubelet will detect and hand them off to the container engine and write the streams to a path on your node. This behavior is determined by the logging driver you’ve configured. For Docker and containers, this path typically defaults to /var/log/containers. A Node Log Agent architecture is recommended to gather these logs, which I’ll touch on below. Node-Level Log Rotation* As application logs will ultimately be written to your nodes, it’s essential to administer a Node log rotation solution, as filling Node storage could impact the overall health of your cluster. Depending on how you deploy your Cluster, node log rotation may or may not be configured by default. For example, if you deploy using kube-up.sh, logrotate will be configured automatically. If you’re using Docker, you can set max-size and max-file options using log-opt. Where Can I Find More? The Kubernetes docs outline logging architecture in a pretty straightforward and concise way. This blog focuses on application logs, but If you’re just getting started with Kubernetes, I’d encourage you to check out the following links better to understand container, system, and audit logging. https://kubernetes.io/docs/concepts/cluster-administration/logging/ https://kubernetes.io/docs/concepts/cluster-administration/system-logs/ https://kubernetes.io/docs/tasks/debug-application-cluster/audit/ How Do I Get Application Logs From Kubernetes? You can gather your application in several ways, manually via the command line or implementing a Cluster-level logging architecture described below. Manual Commands Before implementing a complete Cluster-level loggingsolution, it’s helpful to familiarize yourself with some basic commands to access, stream and dump your application logs manually. Cheat Sheet For a quick list, check out the kubectl cheat sheet here: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#logs Complete list For a complete list of kubectl commands, check out the docs here: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#logs Custom utilities worth checking out Stern – https://github.com/wercker/stern Kubetail – https://github.com/johanhaleby/kubetail Kail – https://github.com/boz/kail Cluster-Level Logging Architecture When you are ready to implement Cluster-level logging, there are a few primary architectures you should consider: Node Log Agents (recommended) To best leverage Node-level logging, you can deploy a log agent like Fluentd, Logstash, or observIQ log agent to the nodes in your cluster to read your application logs and ship logs to your preferred backend. Typically, it’s recommended that the agent be run as a Dameonset, which deploys an agent for each node in the cluster. At observIQ, we recommend deploying Node Log Agents as the most straightforward and efficient method to gather your application logs. Stream Sidecar If your application can’t log to stdout and stderr, you can use a Stream Sidecar. The Stream Sidecar can grab logs from an application container’s filesystem and then stream them to its own stdout and stderr streams. Like Node log agents, this is another path to get the application logs written on the Node. Agent Sidecar: If your application can’t log to stdout and stderr, you can deploy a log agent as a sidecar, which can grab the logs from your application container’s filesystem and send them to your preferred backend. Deploying Kubernetes Cluster-level Logging with observIQ Now that we’ve completed the basic architectures let’s walk through setting up Cluster-level logging with observIQ. With observIQ, you can quickly implement Node-level logging agent architecture, deploying the observIQ log agent as a Daemonset and gathering the logs from a single, many, or all of your containerized applications in a few simple steps. Create a Kubernetes Template What is a Template? A template in observIQ is an object that allows you to manage the same logging configuration across multiple agents, all from a single place in the UI. It also allows you to define and update logging configuration before and after you deploy observIQ agents, which I’ll be exploring more in my next post in the series. Add a new Template To create a Template, navigate to the Fleet > Templates page in observIQ, select ‘Add Template,’ and then select ‘Kubernetes’ as the platform. In this example, specify a friendly name for your cluster, GKE US East 1, and choose ‘Enable Container Logs’. From here, you can specify a specific pod or container, leave the default option, and gather logs from all pods and containers. In this case, I will leave the default options and gather all the application logs from my cluster. Then click ‘Create’. Creating a Kubernetes Template Deploy observIQ to your Kubernetes Cluster Once you have your Template created, click ‘Add Agents’. On the Install Kubernetes Agents page, download and copy the observiq-agent.yaml to your cluster, and apply by running kubectl apply -f observiq-agent.yaml command. Install the Kubernetes Agents page After a few minutes, observIQ agents will be running in your cluster. If you run kubectl get pods | grep observing-agent, you’ll see an observIQ Agent for each node in your cluster. If you return to your template, you’ll also see each of these agents related to your Template. A good thing to know is that if you want to make configuration changes to your agents, you can now modify Agent configuration directly from the Templates. Kubectl get pods | grep observiq-agent Kubernetes agents associated with Template View your Application Logs in observIQ After a few minutes, your application logs appear on the observIQ Explore page. The messages will be labeled with the type k8s.container. When opening one of the application logs, you can see application messages, proper labels, and metadata automatically added to help trace the message to your specific application. Application log in observIQ with Kubernetes Labels and Metadata Wrapping up Gathering your application logs is critical to understanding and debugging application workloads. Knowing manual commands is helpful, but as your application and cluster scales, it’s essential to implement a Cluster-level logging solution that fits your environment and requirements. In my next post, I’ll dive into System and Cluster events and step through how to quickly ship and analyze your logs with observIQ.

observIQ’s Stanza Log Agent Now Part Of OpenTelemetry Project

Mike Kelly — Thu, 28 Jan 2021 01:46:34 GMT

Today, I’m happy to announce that observIQ’s Stanza Log Agent will become vital to the OpenTelemetry project. This has been in the works for many months, and the team at observIQ is thrilled to see it becoming a reality. We’re particularly pleased to see it happening just as we launch our log management platform, which will be the first platform to utilize the log agent technology that has now been fully incorporated into OpenTelemetry. Our mission since launching observIQ has been to deliver a simple-to-use, powerful, and performant log management experience. We found that the most significant source of complexity for the customers we worked with was the ingestion pipeline itself. We’ve changed that with observIQ Cloud and started with the log agent. We launched Stanza alongside observIQ. It was a critical component for the log management pipeline, and it required starting from scratch to build a performant, highly configurable agent with a focus on flexibility. Stanza is a small footprint, high capability, log shipping agent. It uses roughly 10% of the CPU and memory of other popular log agents. We launched it as an open-source project and are committed to keeping it that way. From the beginning, we’ve felt strongly that there was an opportunity to take a big step forward in log agent capabilities, and we wanted everyone to benefit from it, whether using observIQ or another platform. I’ve also strongly believed in the mission of OpenTelemetry since its start in 2019. Like much of the industry, we recognize the value of observability and monitoring in general, which will be realized by a standardized telemetry system used by all of the industry’s best platforms. An end-to-end vendor lock-in is critical to unlocking innovation. OpenTelemetry started with tracing and soon moved to add metrics. Together with the OpenTelemetry team, we saw the opportunity to accelerate the log component with the addition of the Stanza project, and we quickly moved to make it a reality. Today, Stanza is a component of OpenTelemetry. Over the next few months, we’ll be making improvements to a more tightly coupled Stanza with the OpenTelemetry collector. You can find more information on Stanza and the OpenTelemetry project here. Or sign up for a free trial of observIQ and see Stanza in action as a native component.