UtilitiesDQMaRC#

overall_quality_fx(avg_prop_good)[source]#

Determines the overall quality level based on the average proportion of ‘good’ data.

Parameters:

avg_prop_good (float) – The average proportion (percentage) of ‘good’ quality data across all metrics.

Returns:

A string representing the overall quality level. Possible values are “Outstanding”, “Good”, “Requires Improvement”, or “Inadequate” with corresponding colours for background and text.

Return type:

str

class DonutChartGenerator(data)[source]#

Bases: object

A class for generating donut charts to visualise data quality metrics.

data#

The data containing quality metrics to be visualised.

Type:

pandas.DataFrame

plot_donut_charts()[source]#

Generates a subplot of donut charts for each quality metric in the data.

Returns:

A Plotly Figure object containing the subplot of donut charts.

Return type:

plotly.graph_objs._figure.Figure

class BarPlotGenerator(data, chosen_metric)[source]#

Bases: object

A class for generating bar plots to visualise data quality metrics for a chosen metric.

data#

The data containing quality metrics to be visualised.

Type:

pandas.DataFrame

chosen_metric#

The metric for which to generate the bar plot.

Type:

str

plot_bar()[source]#

Generates a bar plot for the chosen metric.

Returns:

A Plotly Figure object containing the bar plot.

Return type:

plotly.graph_objs._figure.Figure

class MetricCalculator(data)[source]#

Bases: object

A class designed to calculate and compile data quality metrics from a provided dataset.

data#

The input dataset containing various quality metrics and fields.

Type:

pandas.DataFrame

result#

A DataFrame initialised to store the calculated metrics, including counts and proportions of good, bad, and N/A data.

Type:

pandas.DataFrame

calculate_metrics()[source]#

Calculates aggregate metrics for each field and metric combination present in the input data, updating the result attribute.

col_bad(row)[source]#

Assigns a color code to a data quality metric indicating a “bad” quality status.

Parameters:

row (pandas.Series) – A row from a DataFrame, expected to contain a ‘Metric’ column specifying the data quality metric.

Returns:

A hexadecimal color code associated with the “bad” quality status of the specified metric.

Return type:

str

Notes

The function maps different data quality metrics to specific color codes, enhancing visual distinction in graphical representations.

col_good(row)[source]#

Assigns a color code to a data quality metric indicating a “good” quality status.

Parameters:

row (pandas.Series) – A row from a DataFrame, expected to contain a ‘Metric’ column specifying the data quality metric.

Returns:

A hexadecimal color code associated with the “good” quality status of the specified metric.

Return type:

str

Notes

Similar to col_bad, this function provides a way to visually differentiate between various data quality metrics in graphical representations by mapping them to specific color codes for “good” quality status.