Saturday, April 20, 2024

How BigQuery ML Sorted Palo Alto Networks’ Data

BigQuery ML and Its Applications

Google cloud goal at Palo Alto Networks is to make safe digital transformation possible for everyone. Their expansion through mergers and acquisitions has resulted in a sizable, dispersed organization with numerous engineering teams that work together to produce they well-known goods. On Google Cloud, their teams are working on over 170,000 projects, each with its own resource hierarchy and naming scheme.

The group in charge of the company’s central cloud operations is their Cloud Center of Excellence team. they assumed responsibility for this intricate brownfield landscape, which has been expanding rapidly. It is they responsibility to ensure that this growth is secure, compliant with cloud hygiene, and economical, all the while enabling the Palo Alto Networks product engineering teams to perform at the highest level.

The first step in their team’s work is determining which project belongs to which team, cost center, and environment, but this proved to be difficult. Three years ago, they started a significant automated labeling project that resulted in over 95% coverage for team, owner, cost center, and environment tagging. The final 5% proved to be more challenging, though. At that point, they came to the conclusion that machine learning could improve the efficiency of their operations and make they lives easier. This is the tale of how they used BigQuery ML a built-in machine learning feature to accomplish that.

Cutting the two-week turnaround time for ML prototyping to two hours

The sheer volume of cloud projects and their disparate naming conventions made it difficult to determine the owner, environment, and cost center for each one. Mislabeled projects that were assigned to the wrong teams or to no teams at all were frequently discovered. Because of this, it was challenging to ascertain how much money teams were spending on cloud resources.

A member of the finance team had to manually sort hundreds of projects and get in touch with potential owners in order to accurately assign team owners on dashboards and reports. This process took weeks to complete. If the results of their investigation were not clear-cut, the projects were labeled as “undecided.” As this list got longer, they only investigated expensive projects, leaving low-cost projects unlabeled as owned.

Palo Alto Networks

Google cloud team searched for terms in a project’s name or path that provided us with hints about which team was involved when concerns about project ownership arose. However, they trusted their gut feeling based on keywords, and they were aware that they could replicate this behavior with machine learning. The time had come to mechanize this tedious procedure.

It took us nearly two weeks to create a functional model that would enable us to begin training end-to-end prediction algorithms at first. they did this by writing the code from scratch using Python libraries and Scikit-learn for machine learning. Although the results were good, the small-scale prototype was unable to process the large amounts of data that they needed to process.

BigQuery is already widely used by Palo Alto Networks, so accessing their data for this project was simple. It made sense to follow the Google Cloud team’s suggestion to prototype their project using BigQuery ML instead. Prototyping the entire project took a few hours using BigQuery ML. That same afternoon, they were up and running with 99.9% accuracy. they tested it on hundreds of projects and consistently obtained accurate label predictions.

Increasing developer output and democratizing AI

Google cloud were able to use and test a number of models from BigQuery ML’s library right away after it was deployed, ultimately deciding on the boosted trees model as the best fit for they project. In the past, they had to spend up to three hours training various algorithms for testing using Python Scikit-learn each time they discovered that they weren’t accurate enough. That loop of trial and error is significantly reduced with BigQuery ML. To test a new model, they just swap out the keyword and perform an hour of training.

In a similar vein, this project’s developer time requirements have drastically decreased. they had over 300 lines of Python code in their previous iteration. Now that’s reduced to ten lines of SQL in BigQuery, it’s much simpler to read, comprehend, and maintain.

And that gets me to the democratization of AI. An extensive background in Python and machine learning was previously necessary for a project such as this one, so they initially gave this prototype to an experienced colleague. No one else on their team could have completed this manually because it would have taken a long time to read 300 lines of ML Python code and explain it.

However, they can examine the code sequence and provide an explanation in five minutes using BigQuery ML. If anyone on google cloud team knows even a little bit about the theoretical workings of each algorithm, they can understand and modify it. This work becomes much more accessible with BigQuery ML, even for those without years of experience with machine learning.

Maximizing visibility while maintaining 99.9% accuracy

Palo Alto Networks‘ label prediction project currently supports the backend infrastructure for every cloud operations team. Financial teams can see cloud cost information as it helps to sort mislabeled projects and determine which team each project belongs to. With the least amount of manual labor, their new labeling system provides us with precise, trustworthy information about they cloud projects.

As of right now, this solution can identify the cost center, environment, and team that a particular project belongs to with 99.9% accuracy. This seems like an introduction to a gateway. they have been discussing how to expand the advantages of BigQuery ML to more teams and use cases now that they have seen its value and how quickly it can produce results.

Google cloud plan to implement this model, for example, as a service for information security and finance teams that may need more information about a particular project. If a project hasn’t been mapped yet and is being used suspiciously, their model can be used to quickly determine who owns it and who has been compromised. they have mapping for 95–98% of they projects, but the last bit of unexplored ground is the riskiest. If something happens someplace and no one knows who is to blame, how can it be fixed? BigQuery ML will help us solve that in the end.

Looking forward to fascinating advancements in generative AI

They are also excited about a project that uses BigQuery and generative AI to provide non-technical users with natural language responses to business inquiries. their goal is to develop a financial operations companion that can provide all the necessary cost, asset, and optimization information from their BigQuery-stored Data Lake. It will do this by learning about each employee, their team, the projects they own, and the cloud resources they use.

Finding this kind of data used to require knowing where to put and how to write a BigQuery query. Anyone who isn’t familiar with SQL can now ask questions in plain language and get an appropriate response, from directors to interns. By utilizing a natural language prompt to write queries and combining data from several BigQuery tables to surface a contextualized response, generative AI democratizes access to information. For this project, they alpha version is now available and delivering positive outcomes. they are eager to incorporate this into each and every one of they financial operations tools.

Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes