Quantcast
Channel: Dynamics Communities
Viewing all articles
Browse latest Browse all 940

Enhanced Data Quality in Microsoft Fabric with Great Expectations

$
0
0
Microsoft Fabric UG

One of my favorite topics within the world of data is data quality. Who hasn’t experienced a bad report or KPI because the underlying data is not correctly formatted or updated? Since Microsoft has integrated “Great Expectations” – an open-source Python library to enhance data quality – into Microsoft Fabric, let’s take a look at how to enhance data quality across the platform.

What Is Data Quality?

Data quality is the degree to which data meets the expectations and requirements of its intended users. Data quality is essential for any data-driven organization, as it affects the reliability, accuracy, and usability of data analysis and decision making.

One of the key features of Microsoft Fabric is semantic link, which allows users to establish a connection between semantic models and Synapse Data Science. What is so great about semantic link? It bridges the gap between Power BI and the data science experience, enabling users to access and augment data with Power BI measures and metadata. Semantic link also propagates semantic information into the data science environment, such as data categories, relationships, and hierarchies. So, in addition to the data model, it is possible to work with DAX measures and other types of metadata.

How to Use Great Expectations within Fabric with Semantic Link

Great Expectations is a Python library that helps users to define, test, and monitor data quality expectations. Great Expectations allows users to create data quality tests, or expectations, that describe how data should look, behave, or relate to other data.

Great Expectations can be used within Fabric with semantic link to validate and enrich data from any source, including Power BI semantic models. Here is an example on how to use Great Expectations:

  • Install the semantic-link and great-expectations Python libraries:

pip install semantic-link great-expectations

  • Import the libraries and initialize a Great Expectations:

import semantic_link as sl

import great_expectations as ge

from great_expectations.data_context import BaseDataContext

# Initialize a data context

context = BaseDataContext()

  • Retrieve data from Power BI semantic models using the semantic_link library:

# Retrieve data from a table called Sales

sales_df = sl.get_table(“Sales”)

  • Validate expectations on the pandas DataFrames. For example, to create an expectation that the SalesAmount column should have a minimum value of 0, and validate it against the sales_df DataFrame, execute the following code in a notebook cell:

# Create an expectation that the SalesAmount column should have a minimum value of 0

expectation = ge.dataset.util.create_expectation_configuration(

expectation_type=”expect_column_values_to_be_between”,

kwargs={

“column”: “SalesAmount”,

“min_value”: 0

}

)

# Validate the expectation against the sales_df DataFrame

validation_result = ge.dataset.util.validate_expectation(sales_df, expectation)

  • Use the Great Expectations library to generate data quality documentation and reports:

# Generate a data quality report for the sales_df DataFrame

report = ge.data_asset.util.get_dataset_report(sales_df)

By using Great Expectations with semantic link inside Microsoft Fabric, it is possible to easily validate any data from any object or source, metadata from semantic models, and generate documentation about data quality.

How to Move from Data Quality to Data Governance within Fabric

Data quality is a key component of data governance, which is the process of defining, implementing, and enforcing policies, standards, and practices to ensure the effective and efficient use of data. The Fabric Data Quality Framework consists of the following elements:

  • Data Quality Rules: These are the rules that define the data quality expectations and validations for data sources and data assets within Fabric. Data quality rules can be created and managed using the Fabric Data Quality Studio.
  • Data Quality Metrics: These are the metrics that measure the data quality performance and compliance of data sources and data assets within Fabric. Data quality metrics can be calculated and monitored using the Fabric Data Quality Engine.
  • Data Quality Dashboards: These are the dashboards that visualize and report the data quality metrics and trends for data sources and data assets within Fabric. Data quality dashboards can be created in Power BI sourcing Fabric Data Quality Studio.

The post Enhanced Data Quality in Microsoft Fabric with Great Expectations appeared first on Dynamics Communities.


Viewing all articles
Browse latest Browse all 940

Trending Articles