Uniqueness#

class Uniqueness(df, test_params)[source]#

A subclass of Dimension focused on evaluating the uniqueness of data within a dataset.

This class performs uniqueness tests by identifying duplicate rows based on specified columns. It can be configured to check the entire row or a subset of columns for duplicates.

df#

The dataset to be evaluated, imported via pandas’ read_csv() function.

Type:

pandas.DataFrame

test_params#

The parameters defining how tests should be conducted, including which columns to consider for checking uniqueness.

Type:

pandas.DataFrame

tests#

A dictionary mapping test names to their relevant information and methods. Currently supports a row uniqueness test.

Type:

dict

test_row_uniqueness(test)[source]#

Identifies duplicate rows in the dataset based on the specified subset of columns.

Parameters:

test (str) – The name of the test to be executed.

run_metric(test, func)[source]#

Executes the given test function and updates the results attribute with the test’s outcomes.

Parameters:
  • test (str) – The name of the test to be executed.

  • func (callable) – The test function to execute.

get_uniqueness_errors()[source]#

Returns the results of uniqueness tests performed on the dataset.

Returns:

The results of the uniqueness tests, indicating duplicated rows based on the specified columns.

Return type:

pandas.DataFrame