Sunday, July 21, 2024

Use Descriptive Lineage To Boost Your Data Lineage Tracking

Automation is often in the forefront when discussing Descriptive Lineage and how to attain it. This makes sense as understanding and maintaining a reliable system of data pipelines depend on automating the process of calculating and establishing lineage. In the end, lineage tracing aims to become a hands-off process devoid of human involvement by automating everything through a variety of approaches.

Descriptive or manually generated lineage, often known as custom technical lineage or custom lineage, is a crucial tool for providing a thorough lineage framework that is not typically discussed. Sadly, detailed lineage rarely receives the credit or attention it merits. Among data specialists, “manual stitching” makes them all shudder and flee.

Dr. Irina Steenbeek presents the idea of Descriptive Lineage as “a method to record metadata-based data lineage manually in a repository” in her book, Data lineage from a business viewpoint.

Describe the historical ancestry

In the 1990s, lineage solutions were very specific. They were usually centered around a specific technology or use case. ETL tools, largely used for business intelligence and data warehousing, dominated data integration at the time.

Only that one solution’s domain was allowed for vendor solutions for impact and lineage analysis. This simplified matters. A closed sandbox was used for the lineage analysis, which resulted in a matrix of connected paths that applied a standardized method of connectivity using a limited number of operators and controls.

When everything is consistent, comes from a single provider, and has few unknown patterns, automated lineage is easier to accomplish. But that would be like being in a closet with a blindfold on.

That strategy and point of view are today impractical and, to be honest, pointless. Their lineage solutions must be significantly more adaptable and able to handle a large variety of solutions in order to meet the demands of the modern data stack. Now, in the event that no other way is available, lineage must be able to supply the tools necessary to join objects using nuts and bolts.

Use cases for Descriptive Lineage

The target user community for each use case should be taken into account while talking about Descriptive Lineage use cases. Since the lineage definitions pertain to actual physical assets, the first two use cases are largely intended for a technical audience.

The latter two use cases are higher level, more abstract, and directly target non-technical people who are interested in the big picture. Nonetheless, even low-level lineage for physical assets is valuable to all parties since information is distilled down to “big picture” insights that benefit the entire company using lineage tools.

Bridges that are both rapid and critical

There is far more need for lineage than just specialized systems like the ETL example. In that single-tool context, Descriptive Lineage is frequently encountered, but even there, you find instances that are not amenable to automation.

Rarely observed usage patterns that are only understood by highly skilled users of a certain instrument, odd new syntax that defies parsers, sporadic but unavoidable anomalies, missing sections of source code, and intricate wraps around legacy routines and processes are a few examples. This use case also includes simple sequential (flat) files that are duplicated manually or by script.

You can join items together that aren’t otherwise automatically associated by using Descriptive Lineage . This covers resources that aren’t accessible because of technical constraints, genuine missing links, or restricted access to the source code.

Descriptive Lineage fills in the blanks and crosses gaps in their existing lineage in this use case, making it more comprehensive. Hybrid lineage, as it is often called, maximizes automation while balancing it with additional assets and points of interaction.

Assistance with new tools

Emerging technology portfolios offer the next significant application for Descriptive Lineage . IBM see the growth of settings where everything interacts with their data as their industry investigates new areas and approaches to optimize the value of IBM data.

A website with just one specific toolset is uncommon. Numerous solutions, such as databases, data lake homes, on-premises and cloud transformation tools, touch and manipulate data. New reporting tools and resources from both active and retired legacy systems are also involved.

The vast array of technology available today is astounding and constantly expanding. The goal may be automated lineage throughout the spectrum, but there aren’t enough suppliers, experts, and solution providers to provide the perfect automation “easy button” for such a complicated cosmos.

Descriptive Lineage is therefore required in order to identify new systems, new data assets, and new points of connection and link them to previously processed or recorded data through automation.

Lineage at the application level

Higher-level or application-level lineage, often known as business lineage, can also be referred to as Descriptive Lineage . Because application-level lineage lacks established industry criteria, automating this process can be challenging.

Your lead data architects may have different ideas about the ideal high-level lineage than another user or set of users. You can specify the lineage you desire at any depth by using Descriptive Lineage.

This is a fully purpose-driven lineage, usually adhering to high abstraction levels and not going any further than naming an application area or a certain database cluster. Lineage may be generic for specific areas of a financial organisation, resulting in a target area known as “risk aggregation.”

Upcoming ancestry

“To-be” or future lineage is an additional use case for Descriptive Lineage. The capacity to model future application lineage (particularly when realized in hybrid form with current lineage definitions) facilitates work effort assessment, prospective impact measurement on current teams and systems, and progress tracking for the organisation.

The fact that the source code is merely written on a chalkboard, isn’t in production, hasn’t been returned or released, doesn’t prevent Descriptive Lineage for future applications. In the previously mentioned hybrid paradigm, future lineage can coexist with existing lineage or exist independently of it.

These are only a few ways that Descriptive Lineage enhances overarching goals for lineage awareness throughout the organisation. By filling in the blanks, bridging gaps, supporting future designs, and enhancing your overall lineage solutions, Descriptive Lineage gives you deeper insights into your environment, which fosters trust and improves your capacity for making sound business decisions.

Add evocative lineage to your applications to improve them. Get knowledge and improve your decision-making.

Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes