Completeness#

class Completeness(df, test_params)[source]#

A subclass of Dimension to assess the completeness aspect of data quality within a dataset.

This class focuses on identifying and quantifying missing or incomplete data points within a given dataset. It uses predefined tests to determine the presence of null values, empty strings, and encoded missing values.

Parameters:
  • df (pandas.DataFrame) – The dataset to be evaluated, imported via pandas’ read_csv() function.

  • test_params (pandas.DataFrame) – The test parameters that are either initialised by the Data Quality (DQ) tool or uploaded via pandas’ read_csv() function.

  • tests (dict) – A dictionary mapping test names to their relevant information and methods. It includes tests for null values, empty strings, and encoded missing values.

test_null(test)[source]#

Counts the number of NULL values in specified columns of the dataset.

Parameters:

test (dict) – The test configuration.

test_empty(test)[source]#

Identifies empty strings in specified columns of the dataset.

Parameters:

test (dict) – The test configuration.

test_na_strings(test)[source]#

Detects strings that represent missing values, as defined in the test parameters, in specified columns of the dataset.

Parameters:

test (dict) – The test configuration, including the encoding used to represent missing data.