As containerized environments become more complex, troubleshooting networking issues within a Kubernetes cluster can be a daunting task. Intermittent failures and performance bottlenecks pose challenges, and gaining comprehensive visibility into the networking infrastructure is crucial. To address these challenges, we are excited to announce the availability of Azure Kubernetes Service (AKS)—Network Observability. This feature equips customers with powerful capabilities to gain enhanced visibility into their container network traffic, empowering administrators and developers to effectively troubleshoot networking issues and optimize the performance of their containerized applications.
What is Network Observability for AKS?
The Network Observability feature in AKS is a distributed monitoring solution that works seamlessly for both Linux and Windows hosting environments. This add-on leverages eBPF in Linux, Virtual Filtering Platform (VFP), and Host Networking Service (HNS) in Windows to gain insights into the networking infrastructure. The collected real-time data points are then provided to Prometheus and Grafana for consumption.
Visualizing Network Observability Data
Azure offers an Azure-managed Prometheus and Grafana approach, simplifying the setup and management of monitoring and visualization. Azure Monitor provides a managed instance of Prometheus that collects and stores metrics from various sources, including the network observability add-on. Grafana, a popular open-source data visualization platform, is seamlessly integrated with Azure Monitor. Users can leverage pre-configured dashboards and templates specifically designed for AKS and the network observability add-on. These dashboards offer a comprehensive view of network metrics, enabling users to monitor and analyze data in a visually appealing and intuitive manner.
Setting up network observability using Azure-managed Prometheus and Grafana approach is well-documented in the Azure documentation. Once configured, users can access the Grafana interface to explore predefined dashboards or create custom visualizations tailored to their specific requirements. The integration between Azure Monitor, Prometheus, and Grafana streamlines the process of visualizing network observability data, enabling users to gain valuable insights into their AKS cluster’s network performance.
Alternatively, users have the option to set up and manage their own Prometheus and Grafana instances, providing more flexibility and control over the configuration and customization of the monitoring and visualization stack. Prometheus and Grafana can be deployed as separate components within the infrastructure or as containerized versions running alongside the AKS cluster.
Setting up a Bring-Your-Own (BYO) Prometheus involves configuring Prometheus to scrape the metrics exposed by the network observability add-on. Users can define scrape configurations to collect relevant metrics and store them in Prometheus’s time-series database. Grafana can then be connected to Prometheus to create custom dashboards and visualizations. Users have the freedom to design their own Grafana dashboards or import community-provided templates to visualize the network observability metrics based on their monitoring needs and preferences. The Azure documentation provides guidance on enabling the network observability add-on and visualizing it using BYO Prometheus and Grafana.
Using BYO Prometheus and Grafana gives users complete control over the deployment, configuration, and customization of their monitoring and visualization stack. This approach allows for more advanced and tailored visualizations of network observability data, empowering users to design insightful dashboards aligned with their unique monitoring requirements.
Use Cases
Customer Scenario 1: Network Policy Drops
Debugging network policies in large and intricate clusters with multiple namespaces can be challenging. The network observability add-on addresses this by leveraging eBPF in Linux to collect crucial information about dropped packets. By attaching kprobes at critical locations in the Linux kernel, such as the netfilter drop function and the netfilter nat function, the add-on effectively determines if a packet is being dropped.
When a dropped packet is detected, associated eBPF programs generate an event that includes packet metadata, drop reasons, and location. A userspace program processes this event, parsing the data and converting it into Prometheus metrics. These metrics provide valuable insights into dropped packets, aiding in the identification and resolution of network policy configuration issues.
In Windows, the VFP and HNS provide counters for Access Control List (ACL) or endpoint rule drops. The network observability add-on scrapes these counters and converts the data into Prometheus metrics, ensuring consistent and comprehensive monitoring across different platforms.
Customer Scenario 2: Receive Cache Full
In Azure, accelerated networking is enabled by default for almost all Linux virtual machines (VMs). Each network interface is allocated dedicated memory space for receiving packets. The network observability add-on plays a crucial role in monitoring this memory allocation by examining the Rx Cache full statistic on each interface and converting it into Prometheus metrics. This allows users to gain valuable insights into the performance of their network interfaces.
For example, when a VM operates at its maximum capacity, receiving packets at the line rate, intermittent latency spikes or packet drops may occur. By correlating this information with the provided graph, users can quickly identify that when the “Rx buffer full” metric spikes, the network interface’s receive buffer becomes saturated, potentially leading to packet drops or increased latency for packets awaiting processing.
Benefits
- Enhanced network visibility: The network observability add-on empowers users to gain deep visibility into their network infrastructure, enabling them to identify and troubleshoot issues related to network policies, packet drops, latency spikes, and other performance-related issues.
- Improved debugging capabilities: Leveraging eBPF and other monitoring mechanisms, the add-on provides valuable insights into network policy configurations, enabling efficient debugging and troubleshooting. Users can quickly identify misconfigured network policies and promptly resolve them.
- Real-time monitoring and alerting: By converting network observability metrics into Prometheus metrics, users can monitor their network performance in real-time. They can set up alerts and notifications to proactively address any anomalies, ensuring high availability and optimal performance of their network infrastructure.
- Platform compatibility: The network observability add-on is designed to seamlessly work across different platforms, including Linux and Windows. This compatibility allows users to maintain a consistent monitoring experience across their infrastructure, regardless of the underlying operating system.
- Multi-Cluster Historical View: Enabling multiple clusters with the network observability add-on and connecting them to the same Azure-managed Prometheus and Grafana facilitates a single pane of glass view to visualize the networking performance of all clusters over time.
The Network Observability add-on in AKS provides organizations with powerful capabilities to gain enhanced visibility into their container network traffic, enabling effective troubleshooting and optimization of Kubernetes networking. With comprehensive network observability, organizations can address networking challenges and ensure the smooth operation of their containerized applications.
[…] course is specifically designed for Azure network engineers. It focuses on designing and implementing a secure network infrastructure within […]
[…] conclusion, governments will be able to improve situational threat awareness across a network of connected SOCs thanks to Chronicle CyberShield, which is supported by the speed, size, and […]
[…] is advantageous to employ observability early in the development process because it enables DevOps teams to find and address problems in new […]
[…] App Service Environment is a component of the Azure App Service that offers a dedicated and completely isolated environment for the purpose of executing App […]
[…] foundation of Cognite’s infrastructure is Microsoft Kubernetes Service (AKS). Complex calculations are orchestrated by Azure Functions using Azure Data Lake stored data. With […]
[…] Big data management with HDInsight on AKS […]
[…] and application solutions to IBM Cloud by establishing a new environment (IBM i, Windows, Citrix, Network) there in collaboration with […]
[…] remains unwavering. With the introduction of recommended interactive playbooks for Google Kubernetes Engine (GKE), we aim to empower you with efficient tools to tackle challenges, ultimately […]