Validity#

class Validity(df, test_params)[source]#

A subclass of Dimension focused on evaluating the validity of data within a dataset.

This class performs various validity checks, including verifying NHS numbers, postcodes, future dates, date ranges, and ensuring data matches against specified lookup tables or falls within given numeric ranges.

df#

The dataset to be evaluated, imported via pandas’ read_csv() function.

Type:

pandas.DataFrame

test_params#

The parameters defining how tests should be conducted, including specifics like date formats, lookup table names, and numeric ranges for validity checks.

Type:

pandas.DataFrame

date_format#

The format used for parsing dates within the dataset. This should match the actual date format for accurate comparisons.

Type:

str

tests#

A dictionary mapping test names to their relevant information and methods, supporting various types of validity checks.

Type:

dict

test_nhs_numbers(test)[source]#

Checks if NHS numbers in the dataset are valid according to the modulus 11 algorithm.

Parameters:
  • test (str) – The name of the test to be executed.

  • algorithm (The method checks if NHS numbers conform to the modulus 11)

  • accordingly. (marking them as valid or invalid)

test_postcode(test)[source]#

Validates the format of postcodes in the dataset using a regular expression.

Parameters:
  • test (str) – The name of the test to be executed, indicating the column to be checked for postcode validity.

  • invalid. (Uses a regular expression to match UK postcode formats. Postcodes that do not match the pattern are flagged as)

test_against_lookup_tables(test)[source]#

Verifies if values in the dataset match against specified lookup tables.

Parameters:
  • test (str) – The name of the test to be executed, indicating the column and the associated lookup table for validation.

  • invalid. (The lookup table is expected to be a CSV file containing valid codes or values. Values not found in the lookup table are flagged as)

test_ranges(test)[source]#

Checks if numeric values in the dataset fall within specified ranges.

Parameters:
  • test (str) – The name of the test to be executed, indicating the column and the numeric range for validation.

  • invalid. (The valid range is specified as two numbers separated by '||'. Values outside this range are flagged as)

validate_nhs_number(nhs_number)[source]#

A helper method to validate a single NHS number using the modulus 11 algorithm.

Parameters:

nhs_number (str or int) – The NHS number to validate. It can be a string or integer; any spaces or non-numeric characters will be ignored.

Returns:

Returns True if the NHS number is valid, otherwise False.

Return type:

bool

Notes

The method first checks for null or empty values, then verifies the length of the number. For a valid NHS number: - The sum of the products of the first 9 digits and their weights, subtracted from 11, should equal the 10th digit (check digit). - If the result of the subtraction is 11, it is replaced with 0 to match the check digit. - A result of 10 indicates an invalid NHS number.

test_future_dates(test)[source]#

Identifies dates in the dataset that are in the future relative to the current date.

Parameters:
  • test (str) – The name of the test to be executed, indicating the column to be checked for future dates.

  • test. (Dates that are beyond the current datetime are considered invalid for this)

min_max_dates(test)[source]#

Validates if dates in the dataset fall within a specified minimum and maximum date range.

Parameters:
  • test (str) – The name of the test to be executed, indicating the column and the date range for validation.

  • parameters. (Dates outside the specified minimum and maximum range are flagged as invalid. The range is defined by 'min_date' and 'max_date' test)

test_pattern_validity(test)[source]#

Checks if values conform to an expected user-specified pattern.

Parameters:

test (str) – The name of the test to be executed. It is expected that test_params will include the regex pattern for validation.