Uniqueness#

class Uniqueness(df, test_params)[source]#

A subclass of Dimension focused on evaluating the uniqueness of data within a dataset.

This class performs uniqueness tests by identifying duplicate rows based on specified columns. It can be configured to check the entire row or a subset of columns for duplicates.

df#

The dataset to be evaluated, imported via pandas’ read_csv() function.

Type:: pandas.DataFrame

test_params#

The parameters defining how tests should be conducted, including which columns to consider for checking uniqueness.

Type:: pandas.DataFrame

tests#

A dictionary mapping test names to their relevant information and methods. Currently supports a row uniqueness test.

Type:: dict

test_row_uniqueness(test)[source]#

Identifies duplicate rows in the dataset based on the specified subset of columns.

Parameters:: test (str) – The name of the test to be executed.

run_metric(test, func)[source]#

Executes the given test function and updates the results attribute with the test’s outcomes.

Parameters:

test (str) – The name of the test to be executed.
func (callable) – The test function to execute.

get_uniqueness_errors()[source]#

Returns the results of uniqueness tests performed on the dataset.

Returns:: The results of the uniqueness tests, indicating duplicated rows based on the specified columns.
Return type:: pandas.DataFrame

Uniqueness

Contents

Uniqueness#