CalTRACK Compliance¶
Checklist for caltrack compliance:
Section 2.2: Data Constraints¶
Section 2.2.1: Missing Values/Data Sufficiency¶
- 2.2.1.1:
eemeter.get_baseline_data
must setmax_days=365
. - 2.2.1.2:
eemeter.caltrack_sufficiency_criteria
must setmin_fraction_daily_coverage=0.9
- 2.2.1.3: (Data Preparation) Missing values in input data have been represented as
float('nan')
,np.nan
, or anything recognized as null by the methodpandas.isnull
. - 2.2.1.4: (Data Preparation) Values of
0
in electricity data have been converted tonp.nan
.
Section 2.2.2: Daily Data Sufficiency¶
- 2.2.2.1: (Data Preparation) Input meter data has been appropriately downsampled to daily values.
- 2.2.2.2: (Data Preparation) Estimated reads in input data have been combined with subsequent reads.
- 2.2.2.3:
eemeter.create_caltrack_daily_design_matrix
setspercent_hourly_coverage_per_day=0.5
foreemeter.compute_temperature_features
.
Section 2.2.3: Billing Data Sufficiency¶
- 2.2.3.1: (Data Preparation) Estimated reads in input data have been combined with subsequent reads up to a 70 day limit. Estimated reads count as missing values when evaluating the sufficiency criteria defined in 2.2.1.2.
- 2.2.3.2:
eemeter.create_caltrack_billing_design_matrix
must setpercent_hourly_coverage_per_billing_period=0.9
foreemeter.compute_temperature_features
and :any:`eemeter.caltrack_sufficiency_criteria must setmin_fraction_daily_coverage=0.9
. - 2.2.3.3: (Data Preparation) Input meter data that represents billing periods less than 25 days long has been converted to
np.nan
. - 2.2.3.4: (Data Preparation) Input meter data that represents billing periods greater than 35 days long for pseudo-monthly billing period calculations and 70 days long for bi-monthly billing period calculations has been converted to
np.nan
.
Section 2.2.X: Other Data Sufficiency Requirements¶
- 2.2.4:
eemeter.caltrack_sufficiency_criteria
set requested_start_date and requested_end_date to receive critical warnings related to data outside of the requested period of analysis. - 2.2.5: (Data Preparation) Projects have been removed if the status of net metering has changed during the baseline or reporting periods.
- 2.2.6: (Data Preparation) Projects have been removed if EV charging has been installed during the baseline or reporting periods.
Section 2.3: Data Quality¶
Section 2.3.1: Impossible Dates¶
- 2.3.1.1: (Data Preparation) For billing analysis, input meter data containing invalid dates for a valid month have been converted to the first date of that month.
- 2.3.1.2: (Data Preparation) Input meter data containing invalid months/years for have been removed and a warning has been generated.
Section 2.3.2: Duplicate Records¶
- 2.3.2.1: (Data Preparation) Meter usage and temperature data has used matching time zone information to ensure that the upsampled values represent the same periods of time.
- 2.3.2.2: (Data Preparation) If duplicate rows are found for meter data, then the project must be flagged as it may have sub-metering/multiple meters.
Section 2.3.X: Other Data Quality Requirements¶
- 2.3.3: :any: eemeter.merge_temperature_data
meter_data
andtemperature_data
must be timezone-aware and have matching timezones. - 2.3.4: If NOAA weather data was used (which is roughly hourly), it has been normalized to hourly using
eeweather.ISDStation.fetch_isd_hourly_temp_data
. - 2.3.5: Warnings are generated in
eemeter.caltrack_sufficiency_criteria
if negative meter values are discovered as they indicate the possible presence of unreported net metering. - 2.3.6: (Data Preparation) Must generate warning for values that are more than three interquartile ranges larger than the median usage.
- 2.3.7: (Audit) Resulting dataset of meter runs has been compared with expected counts of sites, meters, and projects.
- 2.3.8: (Data Preparation) Meter data has been downsampled according to the desired frequency for analysis using
eemeter.as_freq
before merging of temperature data or modeling.
Section 2.4: Matching Sites to Weather Stations¶
- 2.4.1: When matching weather stations to sites,
eeweather.select_station
should use an appropriate ranking fromeeweather.rank_stations
and/oreeweather.combine_ranked_stations
. - 2.4.2: When matching a particular site to a weather station, a weather station mapping
eeweather.select_station
generates a warning if the weather station is greater than 200 km from the site.
Section 3.2: Balance Points¶
- 3.2.1: When calculating cooling and heating degree days
eemeter.compute_temperature_features
must setheating_balance_points
to be any list ranging from 30 to 90 with a maximum gap of 3 degrees Fahrenheit. For electricity meter use data,cooling_balance_points
must also be any list ranging from 30 to 90 with a maximum gap of 3 degrees Fahrenheit. For natural gas meter use data, the functioneemeter.fit_caltrack_usage_per_day_model
must set fit_cdd=False andcooling_balance_points=None
so that models using cooling degree days are not considered. Alternatively, design matrixe functions can be used directly: seeeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
. - 3.2.2.1:
eemeter.get_cdd_hdd_candidate_models
only generates cdd_hdd candidate models where the cooling balance point is greater than or equal to the heating balance point. - 3.2.2.2: For daily data,
eemeter.fit_caltrack_usage_per_day_model
must setminimum_non_zero_cdd=10, minimum_non_zero_hdd=10, minimum_total_cdd=20, minimum_total_hdd=20
. For billing data,eemeter.fit_caltrack_usage_per_day_model
must setuse_billing_presets=True
. - 3.2.3: See the description above regarding the gap between candidate balance points in 3.2.1.
Section 3.3: Design Matrix (for Daily and Billing Methods)¶
- 3.3.1:
eemeter.fit_caltrack_usage_per_day_model
is used for model candidate creation and model selection. It uses one of the following functions to construct models with the formulameter_value ~ hdd_X + cdd_Y
, where X is the heating balance point and Y is the cooling balance point. This is specifically done in one of the following functions:eemeter.get_single_cdd_only_candidate_model
,eemeter.get_single_hdd_only_candidate_model
,eemeter.get_single_cdd_hdd_candidate_model
. - 3.3.1.1: For billing methods,
eemeter.create_caltrack_billing_design_matrix
must setuse_mean_daily_values=True
. - 3.3.1.2:
eemeter.create_caltrack_billing_design_matrix
oreemeter.create_caltrack_daily_design_matrix
must setdegree_day_method='daily'
when it callseemeter.compute_temperature_features
. - 3.3.1.3: The output of
eemeter.fit_caltrack_usage_per_day_model
is aeemeter.CalTRACKUsagePerDayModelResults
. If a model has been selected, theneemeter.CalTRACKUsagePerDayModelResults
contains an attributemodel
which is aCalTRACKUsagePerDayCandidateModel
. ThisCalTRACKUsagePerDayCandidateModel
contains an attributemodel_params
which is a dictionary containing model parameters. It potentially can contain the following parameters:intercept
,beta_cdd
,cooling_balance_point
,beta_hdd
, andheating_balance_point
depending on whetherthis_model_results.model_type
isintercept_only
,cdd_only
,hdd_only
, orcdd_hdd
.
Section 3.4: Fit Candidate Models¶
- 3.4.1: For daily methods,
eemeter.fit_caltrack_usage_per_day_model
must setweight_cols=None
. - 3.4.2: For billing methods,
eemeter.fit_caltrack_usage_per_day_model
must setweight_cols='n_days_kept'
. - 3.4.3.1:
eemeter.fit_caltrack_usage_per_day_model
must setfit_cdd=True, fit_intercept_only=True, fit_cdd_only=True, fit_hdd_only=True, fit_cdd_hdd=True
for electricity data, andfit_cdd=False, fit_intercept_only=True, fit_cdd_only=False, fit_hdd_only=True, fit_cdd_hdd=False
for gas data. - 3.4.3.2:
eemeter.fit_caltrack_usage_per_day_model
calls the following functions to generate candidate models (given that the correct parameters are set to true as defined in 3.4.3.1:eemeter.get_single_cdd_only_candidate_model
,eemeter.get_single_hdd_only_candidate_model
,eemeter.get_single_cdd_hdd_candidate_model
,eemeter.get_intercept_only_candidate_models
. Within each of these functions, the status of the model is set to ‘DISQUALIFIED’ and a warning is generated if any model parameters are negative. - 3.4.3.3:
eemeter.fit_caltrack_usage_per_day_model
callseemeter.select_best_candidate
to select the best of the model candidates. This function finds the best of the model candidates based on which model has the highest adjusted r-squared value.
Section 3.5: Computing Derived Quantities¶
- 3.5.1:
eemeter.metered_savings
returns apandas.DataFrame
with a columnmetered_savings
which contains the avoided energy use values.