Data Quality Package

Inheritance

Submodules

Models

SEED Platform (TM), Copyright (c) Alliance for Sustainable Energy, LLC, and other contributors. See also https://github.com/SEED-platform/seed/blob/main/LICENSE.md

exception seed.models.data_quality.ComparisonError

Bases: Exception

class seed.models.data_quality.DataQualityCheck(*args, **kwargs)

Bases: Model

Object that stores the high level configuration per organization of the DataQualityCheck

exception DoesNotExist

Bases: ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

REQUIRED_FIELDS = {'PropertyState': ['address_line_1', 'custom_id_1', 'pm_property_id'], 'TaxLotState': ['address_line_1', 'custom_id_1', 'jurisdiction_tax_lot_id']}
add_invalid_geometry_entry_provided(row_id, rule, display_name, value)
add_result_comparison_error(row_id, rule, display_name, value, rule_check)
add_result_dimension_error(row_id, rule, display_name, value)
add_result_is_null(row_id, rule, display_name, value)
add_result_max_error(row_id, rule, display_name, value, rule_max)
add_result_min_error(row_id, rule, display_name, value, rule_min)
add_result_missing_and_none(row_id, rule, display_name, value)
add_result_missing_req(row_id, rule, display_name, value)
add_result_string_error(row_id, rule, display_name, value)
add_result_type_error(row_id, rule, display_name, value)
add_rule(rule)

Add a new rule to the Data Quality Checks

Parameters:

rule – dict to be added as a new rule

Returns:

None

add_rule_if_new(rule)

Add a new rule to the Data Quality Checks only if rule does not exist

Parameters:

rule – dict to be added as a new rule

Returns:

None

static cache_key(identifier, organization_id)

Static method to return the location of the data_quality results from redis.

Parameters:

identifier – Import file primary key

Returns:

check_data(record_type, rows)

Send in data as a queryset from the Property/Taxlot ids.

Parameters:
  • record_type – one of PropertyState | TaxLotState

  • rows – rows of data to be checked for data quality

Returns:

None

get_fieldnames(record_type)

Get fieldnames to apply to results.

id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

static initialize_cache(identifier, organization_id)

Initialize the cache for storing the results. This is called before the celery tasks are chunked up.

The cache_key is different than the identifier. The cache_key is where all the results are to be stored for the data quality checks, the identifier, is the random number (or specified value that is used to identifier both the progress and the data storage

Parameters:

identifier – Identifier for cache, if None, then creates a random one

Returns:

list, [cache_key and the identifier]

initialize_rules()

Initialize the default rules for a DataQualityCheck object

Returns:

None

name

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

objects = <django.db.models.manager.Manager object>
organization

Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Child.parent is a ForwardManyToOneDescriptor instance.

organization_id
remove_all_rules()

Removes all the rules associated with this DataQualityCheck instance.

Returns:

None

remove_status_label(label_class, rule, linked_id)

Remove label because it did not match any of the range exceptions

Parameters:
  • label_class – statuslabel object, either property label or taxlot label

  • rule – rule object

  • linked_id – id of propertystate or taxlotstate object

Returns:

boolean, if labeled was applied

reset_all_rules()

Delete all rules and reinitialize the default set of rules

Returns:

None

reset_default_rules()

Reset only the default rules

Returns:

reset_results()
classmethod retrieve(organization_id)

DataQualityCheck was previously a simple object but has been migrated to a django model. This method ensures that the data quality model will be backwards compatible.

This is the preferred method to initialize a new object.

Parameters:

organization – instance of Organization

Returns:

obj, DataQualityCheck

retrieve_result_by_address(address)

Retrieve the results of the data quality checks for a specific address.

Parameters:

address – string, address to find the result for

Returns:

dict, results of data quality check for specific building

retrieve_result_by_tax_lot_id(tax_lot_id)

Retrieve the results of the data quality checks by the jurisdiction ID.

Parameters:

tax_lot_id – string, jurisdiction tax lot id

Returns:

dict, results of data quality check for specific building

rules

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

save_to_cache(identifier, organization_id)

Save the results to the cache database. The data in the cache are stored as a list of dictionaries. The data in this class are stored as a dict of dict. This is important to remember because the data from the cache cannot be simply loaded into the above structure.

Parameters:

identifier – Import file primary key

Returns:

None

update_status_label(label_class, rule, linked_id, row_id, add_to_results=True)
Parameters:
  • label_class – statuslabel object, either propertyview label or taxlotview label

  • rule – rule object

  • linked_id – id of propertyview or taxlotview object

  • row_id

  • add_to_results – bool

Returns:

boolean, if labeled was applied

exception seed.models.data_quality.DataQualityTypeCastError

Bases: Exception

class seed.models.data_quality.Rule(*args, **kwargs)

Bases: Model

Rules for DataQualityCheck

DATA_TYPES = [(0, 'number'), (1, 'string'), (2, 'date'), (3, 'year'), (4, 'area'), (5, 'eui')]
DEFAULT_RULES = [{'condition': 'not_null', 'data_type': 1, 'field': 'address_line_1', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'not_null', 'data_type': 1, 'field': 'pm_property_id', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'not_null', 'field': 'custom_id_1', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'not_null', 'field': 'jurisdiction_tax_lot_id', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'TaxLotState'}, {'condition': 'not_null', 'data_type': 1, 'field': 'address_line_1', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'TaxLotState'}, {'condition': 'range', 'data_type': 4, 'field': 'conditioned_floor_area', 'max': 7000000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 4, 'field': 'conditioned_floor_area', 'min': 100, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 0, 'field': 'energy_score', 'max': 100, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 0, 'field': 'energy_score', 'min': 10, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 2, 'field': 'generation_date', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 0, 'field': 'gross_floor_area', 'max': 7000000, 'min': 100, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 0, 'field': 'occupied_floor_area', 'max': 7000000, 'min': 100, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 2, 'field': 'recent_sale_date', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 2, 'field': 'release_date', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 5, 'field': 'site_eui', 'max': 1000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'site_eui', 'min': 10, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'site_eui_weather_normalized', 'max': 1000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'source_eui', 'max': 1000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'source_eui', 'min': 10, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'source_eui_weather_normalized', 'max': 1000, 'min': 10, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 3, 'field': 'year_built', 'max': '2024', 'min': 1700, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 2, 'field': 'year_ending', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}]
exception DoesNotExist

Bases: ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

RULE_EXCLUDE = 'exclude'
RULE_INCLUDE = 'include'
RULE_NOT_NULL = 'not_null'
RULE_RANGE = 'range'
RULE_REQUIRED = 'required'
RULE_TYPE = [(0, 'default'), (1, 'custom')]
RULE_TYPE_CUSTOM = 1
RULE_TYPE_DEFAULT = 0
SEVERITY = [(0, 'error'), (1, 'warning'), (2, 'valid')]
SEVERITY_ERROR = 0
SEVERITY_VALID = 2
SEVERITY_WARNING = 1
TYPE_AREA = 4
TYPE_DATE = 2
TYPE_EUI = 5
TYPE_NUMBER = 0
TYPE_STRING = 1
TYPE_YEAR = 3
condition

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

data_quality_check

Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Child.parent is a ForwardManyToOneDescriptor instance.

data_quality_check_id
data_type

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

description

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

enabled

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

field

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

for_derived_column

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

format_strings(value)
get_data_type_display(*, field=<django.db.models.fields.IntegerField: data_type>)
get_rule_type_display(*, field=<django.db.models.fields.IntegerField: rule_type>)
get_severity_display(*, field=<django.db.models.fields.IntegerField: severity>)
id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

max

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

maximum_valid(value)

Validate that the value is not greater than the maximum specified by the rule.

Parameters:

value – Value to validate rule against

Returns:

bool, True is valid, False if the value is out of range

min

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

minimum_valid(value)

Validate that the value is not less than the minimum specified by the rule.

Parameters:

value – Value to validate rule against

Returns:

bool, True is valid, False if the value is out of range

name

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

not_null

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

objects = <django.db.models.manager.Manager object>
required

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

rule_type

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

severity

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

status_label

Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Child.parent is a ForwardManyToOneDescriptor instance.

status_label_id
str_to_data_type(value)

If the check is coming from a field in the database then it will be typed correctly; however, for extra_data, the values are typically strings or unicode. Therefore, the values are typed before they are checked using the rule’s data type definition.

Parameters:

value – variant, value to type

Returns:

typed value

table_name

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

text_match

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

units

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

valid_text(value)

Validate the rule matches the specified text. Text is matched by regex.

Parameters:

value – Value to validate rule against

Returns:

bool, True is valid, False if the value does not match

exception seed.models.data_quality.UnitMismatchError

Bases: Exception

seed.models.data_quality.format_pint_violation(rule, source_value)

Format a pint min, max violation for human readability.

:param rule :param source_value : Quantity - value to format into range :return (formatted_value, formatted_min, formatted_max) : (String, String, String)

Tests

Views