Meet Kubernetes History Inspector, a log visualization tool for Kubernetes clusters
The container orchestration platform Kubernetes is a distributed, intricate system by nature. Although it offers scalability and robustness, it can also add operational complexity, especially when troubleshooting. Despite Kubernetes’ self-healing capabilities, delving deeply into the logs of numerous different components is frequently necessary to determine the underlying cause of a problem.
As it enable large-scale, complex deployments, the engineers at Google Cloud have been directly tackling this Kubernetes troubleshooting difficulty for years. By routinely examining a large volume of customer support tickets, delving into user environments, and using the combined knowledge to identify the underlying causes of issues, the Google Cloud Support team has actually gained extensive expertise in diagnosing problems within Kubernetes environments. The team created an internal tool called the Kubernetes History Inspector (KHI) to solve this widespread issue, and it made it publicly available as open source today.
The Kubernetes troubleshooting challenge
Each pod, deployment, service, node, and control-plane component in Kubernetes creates a stream of logs of its own. Effective troubleshooting requires gathering, correlating, and analysing these varied log streams. However, manually setting up logs for each component takes time and requires a comprehensive understanding of the Kubernetes environment. Finally, managed Kubernetes services like Google Kubernetes Engine simplify log collection. For instance, GKE aggregates logs from every area of the Kubernetes environment through its integrated interaction with Cloud Logging. A vital first step is this centralised repository.
But gathering the logs alone just addresses half of the issue. Effectively analysing them is the true challenge. A single clear error message won’t show many of the problems you’ll face in a Kubernetes implementation. Rather, they appear as a series of occurrences that necessitate a thorough comprehension of the causal connections between a large number of log entries from various components.
Think about the scale: a reasonably sized Kubernetes cluster can quickly produce terabytes of log data, which are made up of tens of thousands of individual entries. For human operators, manually sorting through this amount of data to determine the underlying reason of a configuration error, intermittent failure, or performance degradation is, at best, extremely time-consuming and, at worst, nearly impossible. The ratio of signal to noise is quite difficult.
Introducing the Kubernetes History Inspector
Kubernetes History Inspector is a potent tool that examines logs gathered by Cloud Logging, extracts component-specific state information, and displays it chronologically. Additionally, KHI allows you to see the evolution of each element over time by connecting this timeline to the raw log data.
The Google Cloud Support staff frequently helps users in urgent, time-sensitive circumstances. It would be unfeasible to use a technology that necessitates extensive agent installation or configuration. For this reason, it packed Kubernetes History Inspector as a container image, which can be run with just one command and doesn’t require any prior setup.
Showing is simpler than telling. Consider a situation in which a service operating on your Google Kubernetes Engine (GKE) cluster is receiving “Connection Timed Out” problems from end users. When Kubernetes History Inspector is launched, you may see something like this:

Look first at the horizontal, colourful rectangles on the left. These are taken from the timeline’s logs and show how each component’s state has changed over time. A macroscopic perspective of your Kubernetes infrastructure is given by this timeline. On the other hand, microscopic information is shown on the right side of the interface, including raw logs, manifests, and their historical changes pertaining to the timeline component that was chosen. Kubernetes History Inspector facilitates the exploration of your logs by offering both macroscopic and microscopic viewpoints.
Let’s return to the hypothetical issue now. Take note of the timeline’s “Ready” row’s alternating orange and green sections:

This shows that the readiness probe is alternating between green, which represents success, and orange, which represents failure. A smoking gun, that is! Your troubleshooting efforts can now be directed precisely where they are needed.
Visualising the connections between elements at any particular historical moment is another area in which Kubernetes History Inspector excels. A Kubernetes cluster’s intricate interdependencies are explained in an easy-to-understand manner.

Next steps for troubleshooting Kubernetes and KHI
The capabilities of KHI are far more extensive than it have yet to explore. Much more is hidden beneath the surface, like the meaning of the little diamond markings, the real operation of the timeline colours, and numerous other tools that can help you debug more quickly. It made KHI open-sourced so that everyone could access it.
Visit the Kubernetes History Inspector GitHub page for comprehensive specs, a thorough breakdown of the visual components, and guidance on setting up KHI on your own managed Kubernetes cluster. KHI now only supports Google Kubernetes Engine (GKE) and Kubernetes on Google Cloud in conjunction with Cloud Logging, but it intend to shortly expand its functionality to include the standard open-source Kubernetes setup.
Although Kubernetes History Inspector is a major advancement in Kubernetes log analysis, it is meant to supplement, not to replace, your current knowledge. A thorough grasp of Kubernetes principles and the design of your application are still necessary for efficient troubleshooting. KHI offers a robust map to see your logs and identify problems more rapidly and effectively, assisting you as the engineer in navigating the complexity.