Data Quality Package
Inheritance
Submodules
Models
SEED Platform (TM), Copyright (c) Alliance for Sustainable Energy, LLC, and other contributors. See also https://github.com/SEED-platform/seed/blob/main/LICENSE.md
- exception seed.models.data_quality.ComparisonError
Bases:
Exception
- class seed.models.data_quality.DataQualityCheck(*args, **kwargs)
Bases:
Model
Object that stores the high level configuration per organization of the DataQualityCheck
- exception DoesNotExist
Bases:
ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
MultipleObjectsReturned
- REQUIRED_FIELDS = {'PropertyState': ['address_line_1', 'custom_id_1', 'pm_property_id'], 'TaxLotState': ['address_line_1', 'custom_id_1', 'jurisdiction_tax_lot_id']}
- add_invalid_geometry_entry_provided(row_id, rule, display_name, value)
- add_result_comparison_error(row_id, rule, display_name, value, rule_check)
- add_result_dimension_error(row_id, rule, display_name, value)
- add_result_is_null(row_id, rule, display_name, value)
- add_result_max_error(row_id, rule, display_name, value, rule_max)
- add_result_min_error(row_id, rule, display_name, value, rule_min)
- add_result_missing_and_none(row_id, rule, display_name, value)
- add_result_missing_req(row_id, rule, display_name, value)
- add_result_string_error(row_id, rule, display_name, value)
- add_result_type_error(row_id, rule, display_name, value)
- add_rule(rule)
Add a new rule to the Data Quality Checks
- Parameters:
rule – dict to be added as a new rule
- Returns:
None
- add_rule_if_new(rule)
Add a new rule to the Data Quality Checks only if rule does not exist
- Parameters:
rule – dict to be added as a new rule
- Returns:
None
- static cache_key(identifier, organization_id)
Static method to return the location of the data_quality results from redis.
- Parameters:
identifier – Import file primary key
- Returns:
- check_data(record_type, rows)
Send in data as a queryset from the Property/Taxlot ids.
- Parameters:
record_type – one of PropertyState | TaxLotState
rows – rows of data to be checked for data quality
- Returns:
None
- get_fieldnames(record_type)
Get fieldnames to apply to results.
- id
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- static initialize_cache(identifier, organization_id)
Initialize the cache for storing the results. This is called before the celery tasks are chunked up.
The cache_key is different than the identifier. The cache_key is where all the results are to be stored for the data quality checks, the identifier, is the random number (or specified value that is used to identifier both the progress and the data storage
- Parameters:
identifier – Identifier for cache, if None, then creates a random one
- Returns:
list, [cache_key and the identifier]
- initialize_rules()
Initialize the default rules for a DataQualityCheck object
- Returns:
None
- name
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>
- organization
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- organization_id
- remove_all_rules()
Removes all the rules associated with this DataQualityCheck instance.
- Returns:
None
- remove_status_label(label_class, rule, linked_id)
Remove label because it did not match any of the range exceptions
- Parameters:
label_class – statuslabel object, either property label or taxlot label
rule – rule object
linked_id – id of propertystate or taxlotstate object
- Returns:
boolean, if labeled was applied
- reset_all_rules()
Delete all rules and reinitialize the default set of rules
- Returns:
None
- reset_default_rules()
Reset only the default rules
- Returns:
- reset_results()
- classmethod retrieve(organization_id)
DataQualityCheck was previously a simple object but has been migrated to a django model. This method ensures that the data quality model will be backwards compatible.
This is the preferred method to initialize a new object.
- Parameters:
organization – instance of Organization
- Returns:
obj, DataQualityCheck
- retrieve_result_by_address(address)
Retrieve the results of the data quality checks for a specific address.
- Parameters:
address – string, address to find the result for
- Returns:
dict, results of data quality check for specific building
- retrieve_result_by_tax_lot_id(tax_lot_id)
Retrieve the results of the data quality checks by the jurisdiction ID.
- Parameters:
tax_lot_id – string, jurisdiction tax lot id
- Returns:
dict, results of data quality check for specific building
- rules
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- save_to_cache(identifier, organization_id)
Save the results to the cache database. The data in the cache are stored as a list of dictionaries. The data in this class are stored as a dict of dict. This is important to remember because the data from the cache cannot be simply loaded into the above structure.
- Parameters:
identifier – Import file primary key
- Returns:
None
- update_status_label(label_class, rule, linked_id, row_id, add_to_results=True)
- Parameters:
label_class – statuslabel object, either propertyview label or taxlotview label
rule – rule object
linked_id – id of propertyview or taxlotview object
row_id –
add_to_results – bool
- Returns:
boolean, if labeled was applied
- exception seed.models.data_quality.DataQualityTypeCastError
Bases:
Exception
- class seed.models.data_quality.Rule(*args, **kwargs)
Bases:
Model
Rules for DataQualityCheck
- DATA_TYPES = [(0, 'number'), (1, 'string'), (2, 'date'), (3, 'year'), (4, 'area'), (5, 'eui')]
- DEFAULT_RULES = [{'condition': 'not_null', 'data_type': 1, 'field': 'address_line_1', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'not_null', 'data_type': 1, 'field': 'pm_property_id', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'not_null', 'field': 'custom_id_1', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'not_null', 'field': 'jurisdiction_tax_lot_id', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'TaxLotState'}, {'condition': 'not_null', 'data_type': 1, 'field': 'address_line_1', 'not_null': True, 'rule_type': 0, 'severity': 0, 'table_name': 'TaxLotState'}, {'condition': 'range', 'data_type': 4, 'field': 'conditioned_floor_area', 'max': 7000000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 4, 'field': 'conditioned_floor_area', 'min': 100, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 0, 'field': 'energy_score', 'max': 100, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 0, 'field': 'energy_score', 'min': 10, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 2, 'field': 'generation_date', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 0, 'field': 'gross_floor_area', 'max': 7000000, 'min': 100, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 0, 'field': 'occupied_floor_area', 'max': 7000000, 'min': 100, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'ft**2'}, {'condition': 'range', 'data_type': 2, 'field': 'recent_sale_date', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 2, 'field': 'release_date', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 5, 'field': 'site_eui', 'max': 1000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'site_eui', 'min': 10, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'site_eui_weather_normalized', 'max': 1000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'source_eui', 'max': 1000, 'min': 0, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'source_eui', 'min': 10, 'rule_type': 0, 'severity': 1, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 5, 'field': 'source_eui_weather_normalized', 'max': 1000, 'min': 10, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState', 'units': 'kBtu/ft**2/year'}, {'condition': 'range', 'data_type': 3, 'field': 'year_built', 'max': '2024', 'min': 1700, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}, {'condition': 'range', 'data_type': 2, 'field': 'year_ending', 'max': '20241231', 'min': 18890101, 'rule_type': 0, 'severity': 0, 'table_name': 'PropertyState'}]
- exception DoesNotExist
Bases:
ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
MultipleObjectsReturned
- RULE_EXCLUDE = 'exclude'
- RULE_INCLUDE = 'include'
- RULE_NOT_NULL = 'not_null'
- RULE_RANGE = 'range'
- RULE_REQUIRED = 'required'
- RULE_TYPE = [(0, 'default'), (1, 'custom')]
- RULE_TYPE_CUSTOM = 1
- RULE_TYPE_DEFAULT = 0
- SEVERITY = [(0, 'error'), (1, 'warning'), (2, 'valid')]
- SEVERITY_ERROR = 0
- SEVERITY_VALID = 2
- SEVERITY_WARNING = 1
- TYPE_AREA = 4
- TYPE_DATE = 2
- TYPE_EUI = 5
- TYPE_NUMBER = 0
- TYPE_STRING = 1
- TYPE_YEAR = 3
- condition
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- data_quality_check
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- data_quality_check_id
- data_type
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- description
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- enabled
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- field
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- for_derived_column
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- format_strings(value)
- get_data_type_display(*, field=<django.db.models.fields.IntegerField: data_type>)
- get_rule_type_display(*, field=<django.db.models.fields.IntegerField: rule_type>)
- get_severity_display(*, field=<django.db.models.fields.IntegerField: severity>)
- id
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- max
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- maximum_valid(value)
Validate that the value is not greater than the maximum specified by the rule.
- Parameters:
value – Value to validate rule against
- Returns:
bool, True is valid, False if the value is out of range
- min
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- minimum_valid(value)
Validate that the value is not less than the minimum specified by the rule.
- Parameters:
value – Value to validate rule against
- Returns:
bool, True is valid, False if the value is out of range
- name
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- not_null
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- objects = <django.db.models.manager.Manager object>
- required
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- rule_type
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- severity
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- status_label
Accessor to the related object on the forward side of a many-to-one or one-to-one (via ForwardOneToOneDescriptor subclass) relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Child.parent
is aForwardManyToOneDescriptor
instance.
- status_label_id
- str_to_data_type(value)
If the check is coming from a field in the database then it will be typed correctly; however, for extra_data, the values are typically strings or unicode. Therefore, the values are typed before they are checked using the rule’s data type definition.
- Parameters:
value – variant, value to type
- Returns:
typed value
- table_name
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- text_match
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- units
A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
- valid_text(value)
Validate the rule matches the specified text. Text is matched by regex.
- Parameters:
value – Value to validate rule against
- Returns:
bool, True is valid, False if the value does not match
- exception seed.models.data_quality.UnitMismatchError
Bases:
Exception
- seed.models.data_quality.format_pint_violation(rule, source_value)
Format a pint min, max violation for human readability.
:param rule :param source_value : Quantity - value to format into range :return (formatted_value, formatted_min, formatted_max) : (String, String, String)