Enhanced Data Quality in Microsoft Fabric with Great Expectations

One of my favorite topics within the world of data is data quality. Who hasn’t experienced a bad report or KPI because the underlying data is not correctly formatted or updated? Since Microsoft has integrated “Great Expectations” – an open-source Python library to enhance data quality – into Microsoft Fabric, let’s take a look at how to enhance data quality across the platform.

What Is Data Quality?

Data quality is the degree to which data meets the expectations and requirements of its intended users. Data quality is essential for any data-driven organization, as it affects the reliability, accuracy, and usability of data analysis and decision making.

One of the key features of Microsoft Fabric is semantic link, which allows users to establish a connection between semantic models and Synapse Data Science. What is so great about semantic link? It bridges the gap between Power BI and the data science experience, enabling users to access and augment data with Power BI measures and metadata. Semantic link also propagates semantic information into the data science environment, such as data categories, relationships, and hierarchies. So, in addition to the data model, it is possible to work with DAX measures and other types of metadata.

How to Use Great Expectations within Fabric with Semantic Link

Great Expectations is a Python library that helps users to define, test, and monitor data quality expectations. Great Expectations allows users to create data quality tests, or expectations, that describe how data should look, behave, or relate to other data.

Great Expectations can be used within Fabric with semantic link to validate and enrich data from any source, including Power BI semantic models. Here is an example on how to use Great Expectations:

Install the semantic-link and great-expectations Python libraries:

pip install semantic-link great-expectations

Import the libraries and initialize a Great Expectations:

import semantic_link as sl

import great_expectations as ge

from great_expectations.data_context import BaseDataContext

# Initialize a data context

context = BaseDataContext()

Retrieve data from Power BI semantic models using the semantic_link library:

# Retrieve data from a table called Sales

sales_df = sl.get_table(“Sales”)

Validate expectations on the pandas DataFrames. For example, to create an expectation that the SalesAmount column should have a minimum value of 0, and validate it against the sales_df DataFrame, execute the following code in a notebook cell:

# Create an expectation that the SalesAmount column should have a minimum value of 0

expectation = ge.dataset.util.create_expectation_configuration(

expectation_type=”expect_column_values_to_be_between”,

kwargs={

“column”: “SalesAmount”,

“min_value”: 0

}

)

# Validate the expectation against the sales_df DataFrame

validation_result = ge.dataset.util.validate_expectation(sales_df, expectation)

Use the Great Expectations library to generate data quality documentation and reports:

# Generate a data quality report for the sales_df DataFrame

report = ge.data_asset.util.get_dataset_report(sales_df)

By using Great Expectations with semantic link inside Microsoft Fabric, it is possible to easily validate any data from any object or source, metadata from semantic models, and generate documentation about data quality.

How to Move from Data Quality to Data Governance within Fabric

Data quality is a key component of data governance, which is the process of defining, implementing, and enforcing policies, standards, and practices to ensure the effective and efficient use of data. The Fabric Data Quality Framework consists of the following elements:

Data Quality Rules: These are the rules that define the data quality expectations and validations for data sources and data assets within Fabric. Data quality rules can be created and managed using the Fabric Data Quality Studio.
Data Quality Metrics: These are the metrics that measure the data quality performance and compliance of data sources and data assets within Fabric. Data quality metrics can be calculated and monitored using the Fabric Data Quality Engine.
Data Quality Dashboards: These are the dashboards that visualize and report the data quality metrics and trends for data sources and data assets within Fabric. Data quality dashboards can be created in Power BI sourcing Fabric Data Quality Studio.

For more “for user, by user” Microsoft Fabric innovation, education, and training content, join us at Community Summit North America 2024!

The post Enhanced Data Quality in Microsoft Fabric with Great Expectations appeared first on Dynamics Communities.

Enhanced Data Quality in Microsoft Fabric with Great Expectations

What Is Data Quality?

How to Use Great Expectations within Fabric with Semantic Link

How to Move from Data Quality to Data Governance within Fabric

Trending Articles

Stalker hid in bushes leaving his ex 'terrified'

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Police charge man, 23, with assault and criminal damage following incident in...

BREAKING NEWS: Pagan’s MC Boss Keith “Conan The Barbarian” Richter Released...

Hizia picha za utupu za meneja wa benki imekaaje?

BO RUSSELL BENDER Arrested by Clackamas County Sheriff's Office on Mar 11, 2020

Charlotte de Witte – One Mind – EP [iTunes Plus M4A]

D16 Group Phoscyon v1.9.5 Incl.Keygen WiN/MAC-R2R

Ko Droka na Bogi

Azura Botanify v1.0 (For FL Studio)-FANTASTiC

SANIDAPA LIVE IN HALDADUWANA 2005-06-26

Teen Shot In Miami Drive-By Dies From Injuries

Man arrested for threatening to shoot up police station

Blackstone — Befi Mano (Throw Back Thursday)

Notts men wanted over alleged cocaine smuggling plot

Black Angus Grilled Artichokes

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Download: Mirraj Malifah – Chance Yako (Prod by_Bicko @Musiqhouse)

Raj Panchayat 3rd / Third Grade Teacher Revised Result 2012 Level 1-2...

MCQ Questions for Class 12 History: Ch 10 Colonialism and the countryside