Validity#
- class Validity(df, test_params)[source]#
A subclass of Dimension focused on evaluating the validity of data within a dataset.
This class performs various validity checks, including verifying NHS numbers, postcodes, future dates, date ranges, and ensuring data matches against specified lookup tables or falls within given numeric ranges.
- df#
The dataset to be evaluated, imported via pandas’ read_csv() function.
- Type:
pandas.DataFrame
- test_params#
The parameters defining how tests should be conducted, including specifics like date formats, lookup table names, and numeric ranges for validity checks.
- Type:
pandas.DataFrame
- date_format#
The format used for parsing dates within the dataset. This should match the actual date format for accurate comparisons.
- Type:
str
- tests#
A dictionary mapping test names to their relevant information and methods, supporting various types of validity checks.
- Type:
dict
- test_nhs_numbers(test)[source]#
Checks if NHS numbers in the dataset are valid according to the modulus 11 algorithm.
- Parameters:
test (str) – The name of the test to be executed.
algorithm (The method checks if NHS numbers conform to the modulus 11)
accordingly. (marking them as valid or invalid)
- test_postcode(test)[source]#
Validates the format of postcodes in the dataset using a regular expression.
- Parameters:
test (str) – The name of the test to be executed, indicating the column to be checked for postcode validity.
invalid. (Uses a regular expression to match UK postcode formats. Postcodes that do not match the pattern are flagged as)
- test_against_lookup_tables(test)[source]#
Verifies if values in the dataset match against specified lookup tables.
- Parameters:
test (str) – The name of the test to be executed, indicating the column and the associated lookup table for validation.
invalid. (The lookup table is expected to be a CSV file containing valid codes or values. Values not found in the lookup table are flagged as)
- test_ranges(test)[source]#
Checks if numeric values in the dataset fall within specified ranges.
- Parameters:
test (str) – The name of the test to be executed, indicating the column and the numeric range for validation.
invalid. (The valid range is specified as two numbers separated by '||'. Values outside this range are flagged as)
- validate_nhs_number(nhs_number)[source]#
A helper method to validate a single NHS number using the modulus 11 algorithm.
- Parameters:
nhs_number (str or int) – The NHS number to validate. It can be a string or integer; any spaces or non-numeric characters will be ignored.
- Returns:
Returns True if the NHS number is valid, otherwise False.
- Return type:
bool
Notes
The method first checks for null or empty values, then verifies the length of the number. For a valid NHS number: - The sum of the products of the first 9 digits and their weights, subtracted from 11, should equal the 10th digit (check digit). - If the result of the subtraction is 11, it is replaced with 0 to match the check digit. - A result of 10 indicates an invalid NHS number.
- test_future_dates(test)[source]#
Identifies dates in the dataset that are in the future relative to the current date.
- Parameters:
test (str) – The name of the test to be executed, indicating the column to be checked for future dates.
test. (Dates that are beyond the current datetime are considered invalid for this)
- min_max_dates(test)[source]#
Validates if dates in the dataset fall within a specified minimum and maximum date range.
- Parameters:
test (str) – The name of the test to be executed, indicating the column and the date range for validation.
parameters. (Dates outside the specified minimum and maximum range are flagged as invalid. The range is defined by 'min_date' and 'max_date' test)