API Docs¶
CalTRACK¶
CalTRACK design matrix creation¶
These functions are designed as shortcuts to common CalTRACK design matrix inputs.
-
eemeter.
create_caltrack_hourly_preliminary_design_matrix
(meter_data, temperature_data)[source]¶ A helper function which calls basic feature creation methods to create an input suitable for use in the first step of creating a CalTRACK hourly model.
Parameters: - meter_data (
pandas.DataFrame
) – Hourly meter data in eemeter format. - temperature_data (
pandas.Series
) – Hourly temperature data in eemeter format.
Returns: design_matrix – A design matrix with meter_value, hour_of_week, hdd_50, and cdd_65 features.
Return type: - meter_data (
-
eemeter.
create_caltrack_hourly_segmented_design_matrices
(preliminary_design_matrix, segmentation, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]¶ A helper function which calls basic feature creation methods to create a design matrix suitable for use with segmented CalTRACK hourly models.
Parameters: - preliminary_design_matrix (
pandas.DataFrame
) – A dataframe of the form returned byeemeter.create_caltrack_hourly_preliminary_design_matrix
. - segmentation (
pandas.DataFrame
) – Weights for each segment. This is a dataframe of the form returned byeemeter.segment_time_series
on the preliminary_design_matrix. - occupancy_lookup (any:pandas.DataFrame) – Occupancy for each segment. This is a dataframe of the form returned by
eemeter.estimate_hour_of_week_occupancy
. - occupied_temperature_bins (:any:``) – Occupied temperature bin settings for each segment. This is a dataframe of the
form returned by
eemeter.fit_temperature_bins
. - unoccupied_temperature_bins (:any:``) – Ditto, for unoccupied.
Returns: design_matrix – A dict of design matrixes created using the
eemeter.caltrack_hourly_fit_feature_processor
.Return type: - preliminary_design_matrix (
-
eemeter.
create_caltrack_daily_design_matrix
(meter_data, temperature_data)[source]¶ A helper function which calls basic feature creation methods to create a design matrix suitable for use with CalTRACK daily methods.
Parameters: - meter_data (
pandas.DataFrame
) – Hourly meter data in eemeter format. - temperature_data (
pandas.Series
) – Hourly temperature data in eemeter format.
Returns: design_matrix – A design matrics with mean usage_per_day, hdd_30-hdd_90, and cdd_30-cdd_90 features.
Return type: - meter_data (
-
eemeter.
create_caltrack_billing_design_matrix
(meter_data, temperature_data)[source]¶ A helper function which calls basic feature creation methods to create a design matrix suitable for use with CalTRACK Billing methods.
Parameters: - meter_data (
pandas.DataFrame
) – Hourly meter data in eemeter format. - temperature_data (
pandas.Series
) – Hourly temperature data in eemeter format.
Returns: design_matrix – A design matrics with mean usage_per_day, hdd_30-hdd_90, and cdd_30-cdd_90 features.
Return type: - meter_data (
CalTRACK Hourly¶
These classes and functions are designed to assist with running the CalTRACK Hourly methods. See also Quickstart for CalTRACK Hourly.
-
class
eemeter.
CalTRACKHourlyModel
(segment_models, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]¶ An object which holds CalTRACK Hourly model data and metadata, and which can be used for prediction.
-
segment_models
¶ Dictionary of models for each segment, keys are segment names.
Type: dict
of eemeter.CalTRACKSegmentModel
-
occupancy_lookup
¶ A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1.
Type: pandas.DataFrame
-
occupied_temperature_bins
¶ A dataframe of bin endpoint flags for each segment. Segment names are columns.
Type: pandas.DataFrame
-
unoccupied_temperature_bins
¶ Ditto for the unoccupied mode.
Type: pandas.DataFrame
-
classmethod
from_json
(data)[source]¶ Loads a JSON-serializable representation into the model state.
The input of this function is a dict which can be the result of
json.loads
.
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
-
class
eemeter.
CalTRACKHourlyModelResults
(status, method_name, model=None, warnings=[], metadata=None, settings=None)[source]¶ Contains information about the chosen model.
-
status
¶ A string indicating the status of this result. Possible statuses:
'NO DATA'
: No baseline data was available.'NO MODEL'
: A complete model could not be constructed.'SUCCESS'
: A model was constructed.
Type: str
-
model
¶ The selected model, if any.
Type: eemeter.CalTRACKHourlyModel
orNone
-
warnings
¶ A list of any warnings reported during the model selection and fitting process.
Type: list
ofeemeter.EEMeterWarning
-
metadata
¶ An arbitrary dictionary of metadata to be associated with this result. This can be used, for example, to tag the results with attributes like an ID:
{ 'id': 'METER_12345678', }
Type: dict
-
totals_metrics
¶ A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as period totals.
Type: ModelMetrics
-
avgs_metrics
¶ A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as daily averages.
Type: ModelMetrics
-
classmethod
from_json
(data)[source]¶ Loads a JSON-serializable representation into the model state.
The input of this function is a dict which can be the result of
json.loads
.
-
json
(with_candidates=False)[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
predict
(prediction_index, temperature_data, **kwargs)[source]¶ Predict over a particular index using temperature data.
Parameters: - prediction_index (
pandas.DatetimeIndex
) – Time period over which to predict. - temperature_data (
pandas.DataFrame
) – Hourly temperature data to use for prediction. Time period should match theprediction_index
argument. - **kwargs – Extra keyword arguments to send to self.model.predict
Returns: prediction – The predicted usage values.
Return type: - prediction_index (
-
-
eemeter.
caltrack_hourly_fit_feature_processor
(segment_name, segmented_data, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]¶ A function that takes in temperature data and returns a dataframe of features suitable for use with
eemeter.fit_caltrack_hourly_model_segment
. Designed for use witheemeter.iterate_segmented_dataset
.Parameters: - segment_name (
str
) – The name of the segment. - segmented_data (
pandas.DataFrame
) – Hourly temperature data for the segment. - occupancy_lookup (
pandas.DataFrame
) – A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1. - occupied_temperature_bins (
pandas.DataFrame
) – A dataframe of bin endpoint flags for each segment. Segment names are columns. - unoccupied_temperature_bins (
pandas.DataFrame
) – Ditto for the unoccupied mode.
Returns: features – A dataframe of features with the following columns:
- ’meter_value’: the observed meter value
- ’hour_of_week’: 0-167
- ’bin_<0-6>_occupied’: temp bin feature, or 0 if unoccupied
- ’bin_<0-6>_unoccupied’: temp bin feature or 0 in occupied
- ’weight’: 0.0 or 0.5 or 1.0
Return type: - segment_name (
-
eemeter.
caltrack_hourly_prediction_feature_processor
(segment_name, segmented_data, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]¶ A function that takes in temperature data and returns a dataframe of features suitable for use inside
eemeter.CalTRACKHourlyModel
. Designed for use witheemeter.iterate_segmented_dataset
.Parameters: - segment_name (
str
) – The name of the segment. - segmented_data (
pandas.DataFrame
) – Hourly temperature data for the segment. - occupancy_lookup (
pandas.DataFrame
) – A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1. - occupied_temperature_bins (
pandas.DataFrame
) – A dataframe of bin endpoint flags for each segment. Segment names are columns. - unoccupied_temperature_bins (
pandas.DataFrame
) – Ditto for the unoccupied mode.
Returns: features – A dataframe of features with the following columns:
- ’hour_of_week’: 0-167
- ’bin_<0-6>_occupied’: temp bin feature, or 0 if unoccupied
- ’bin_<0-6>_unoccupied’: temp bin feature or 0 in occupied
- ’weight’: 1
Return type: - segment_name (
-
eemeter.
fit_caltrack_hourly_model_segment
(segment_name, segment_data)[source]¶ Fit a model for a single segment.
Parameters: - segment_name (
str
) – The name of the segment. - segment_data (
pandas.DataFrame
) – A design matrix for caltrack hourly, of the form returned byeemeter.caltrack_hourly_prediction_feature_processor
.
Returns: segment_model – A model that represents the fitted model.
Return type: - segment_name (
-
eemeter.
fit_caltrack_hourly_model
(segmented_design_matrices, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]¶ Fit a CalTRACK hourly model
Parameters: - segmented_design_matrices (
dict
ofpandas.DataFrame
) – A dictionary of dataframes of the form returned byeemeter.create_caltrack_hourly_segmented_design_matrices
- occupancy_lookup (
pandas.DataFrame
) – A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1. - occupied_temperature_bins (
pandas.DataFrame
) – A dataframe of bin endpoint flags for each segment. Segment names are columns. - unoccupied_temperature_bins (
pandas.DataFrame
) – Ditto for the unoccupied mode.
Returns: model – Has a model.predict method which take input data and makes a prediction using this model.
Return type: - segmented_design_matrices (
CalTRACK Daily and Billing (Usage per Day)¶
These classes and functions are designed to assist with running the CalTRACK Daily and Billing methods. See also Quickstart for CalTRACK Billing/Daily.
-
class
eemeter.
CalTRACKUsagePerDayCandidateModel
(model_type, formula, status, model_params=None, model=None, result=None, r_squared_adj=None, warnings=None)[source]¶ Contains information about a candidate model.
-
formula
¶ The R-style formula for the design matrix of this model, e.g.,
'meter_value ~ hdd_65'
.Type: str
-
status
¶ A string indicating the status of this model. Possible statuses:
'NOT ATTEMPTED'
: Candidate model not fitted due to an issue encountered in data before attempt.'ERROR'
: A fatal error occurred during model fit process.'DISQUALIFIED'
: The candidate model fit was disqualified from the model selection process because of a decision made after candidate model fit completed, e.g., a bad fit, or a parameter out of acceptable range.'QUALIFIED'
: The candidate model fit is acceptable and can be considered during model selection.
Type: str
-
model_params
¶ A flat dictionary of model parameters which must be serializable using the
json.dumps
method.Type: dict
, defaultNone
-
warnings
¶ A list of any warnings reported during creation of the candidate model.
Type: list
ofeemeter.EEMeterWarning
-
classmethod
from_json
(data)[source]¶ Loads a JSON-serializable representation into the model state.
The input of this function is a dict which can be the result of
json.loads
.
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
-
class
eemeter.
CalTRACKUsagePerDayModelResults
(status, method_name, interval=None, model=None, r_squared_adj=None, candidates=None, warnings=None, metadata=None, settings=None)[source]¶ Contains information about the chosen model.
-
status
¶ A string indicating the status of this result. Possible statuses:
'NO DATA'
: No baseline data was available.'NO MODEL'
: No candidate models qualified.'SUCCESS'
: A qualified candidate model was chosen.
Type: str
-
model
¶ The selected candidate model, if any.
Type: eemeter.CalTRACKUsagePerDayCandidateModel
orNone
-
candidates
¶ A list of any model candidates encountered during the model selection and fitting process.
Type: list
ofeemeter.CalTRACKUsagePerDayCandidateModel
-
warnings
¶ A list of any warnings reported during the model selection and fitting process.
Type: list
ofeemeter.EEMeterWarning
-
metadata
¶ An arbitrary dictionary of metadata to be associated with this result. This can be used, for example, to tag the results with attributes like an ID:
{ 'id': 'METER_12345678', }
Type: dict
-
totals_metrics
¶ A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as period totals.
Type: ModelMetrics
-
avgs_metrics
¶ A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as daily averages.
Type: ModelMetrics
-
classmethod
from_json
(data)[source]¶ Loads a JSON-serializable representation into the model state.
The input of this function is a dict which can be the result of
json.loads
.
-
json
(with_candidates=False)[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
plot
(ax=None, title=None, figsize=None, with_candidates=False, candidate_alpha=None, temp_range=None)[source]¶ Plot a model fit.
Parameters: - ax (
matplotlib.axes.Axes
, optional) – Existing axes to plot on. - title (
str
, optional) – Chart title. - figsize (
tuple
, optional) – (width, height) of chart. - with_candidates (
bool
) – If True, also plot candidate models. - candidate_alpha (
float
between 0 and 1) – Transparency at which to plot candidate models. 0 fully transparent, 1 fully opaque.
Returns: ax – Matplotlib axes.
Return type: - ax (
-
-
class
eemeter.
DataSufficiency
(status, criteria_name, warnings=None, data=None, settings=None)[source]¶ Contains the result of a data sufficiency check.
-
status
¶ A string indicating the status of this result. Possible statuses:
'NO DATA'
: No baseline data was available.'FAIL'
: Data did not meet criteria.'PASS'
: Data met criteria.
Type: str
-
criteria_name
¶ The name of the criteria method used to check for baseline data sufficiency.
Type: str
-
warnings
¶ A list of any warnings reported during the check for baseline data sufficiency.
Type: list
ofeemeter.EEMeterWarning
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
-
class
eemeter.
ModelPrediction
(result, design_matrix, warnings)¶ -
design_matrix
¶ Alias for field number 1
-
result
¶ Alias for field number 0
-
warnings
¶ Alias for field number 2
-
-
eemeter.
fit_caltrack_usage_per_day_model
(data, fit_cdd=True, use_billing_presets=False, minimum_non_zero_cdd=10, minimum_non_zero_hdd=10, minimum_total_cdd=20, minimum_total_hdd=20, beta_cdd_maximum_p_value=1, beta_hdd_maximum_p_value=1, weights_col=None, fit_intercept_only=True, fit_cdd_only=True, fit_hdd_only=True, fit_cdd_hdd=True)[source]¶ CalTRACK daily and billing methods using a usage-per-day modeling strategy.
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns each of the formhdd_<heating_balance_point>
andcdd_<cooling_balance_point>
. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. Should have apandas.DatetimeIndex
. - fit_cdd (
bool
, optional) – If True, fit CDD models unless overridden byfit_cdd_only
orfit_cdd_hdd
flags. Should be set toFalse
for gas meter data. - use_billing_presets (
bool
, optional) – Use presets appropriate for billing models. Otherwise defaults are appropriate for daily models. - minimum_non_zero_cdd (
int
, optional) – Minimum allowable number of non-zero cooling degree day values. - minimum_non_zero_hdd (
int
, optional) – Minimum allowable number of non-zero heating degree day values. - minimum_total_cdd (
float
, optional) – Minimum allowable total sum of cooling degree day values. - minimum_total_hdd (
float
, optional) – Minimum allowable total sum of heating degree day values. - beta_cdd_maximum_p_value (
float
, optional) – The maximum allowable p-value of the beta cdd parameter. The default value is the most permissive possible (i.e., 1). This is here for backwards compatibility with CalTRACK 1.0 methods. - beta_hdd_maximum_p_value (
float
, optional) – The maximum allowable p-value of the beta hdd parameter. The default value is the most permissive possible (i.e., 1). This is here for backwards compatibility with CalTRACK 1.0 methods. - weights_col (
str
or None, optional) – The name of the column (if any) indata
to use as weights. Weight must be the number of days of data in the period. - fit_intercept_only (
bool
, optional) – If True, fit and consider intercept_only model candidates. - fit_cdd_only (
bool
, optional) – If True, fit and consider cdd_only model candidates. Ignored iffit_cdd=False
. - fit_hdd_only (
bool
, optional) – If True, fit and consider hdd_only model candidates. - fit_cdd_hdd (
bool
, optional) – If True, fit and consider cdd_hdd model candidates. Ignored iffit_cdd=False
.
Returns: model_results – Results of running CalTRACK daily method. See
eemeter.CalTRACKUsagePerDayModelResults
for more details.Return type: - data (
-
eemeter.
caltrack_sufficiency_criteria
(data_quality, requested_start, requested_end, num_days=365, min_fraction_daily_coverage=0.9, min_fraction_hourly_temperature_coverage_per_period=0.9)[source]¶ CalTRACK daily data sufficiency criteria.
Note
For CalTRACK compliance,
min_fraction_daily_coverage
must be set at0.9
(section 2.2.1.2), and requested_start and requested_end must not be None (section 2.2.4).Parameters: - data_quality (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and the two columnstemperature_null
, containing a count of null hourly temperature values for each meter value, andtemperature_not_null
, containing a count of not-null hourly temperature values for each meter value. Should have apandas.DatetimeIndex
. - requested_start (
datetime.datetime
, timezone aware (orNone
)) – The desired start of the period, if any, especially if this is different from the start of the data. If given, warnings are reported on the basis of this start date instead of data start date. Must be explicitly set toNone
in order to use data start date. - requested_end (
datetime.datetime
, timezone aware (orNone
)) – The desired end of the period, if any, especially if this is different from the end of the data. If given, warnings are reported on the basis of this end date instead of data end date. Must be explicitly set toNone
in order to use data end date. - num_days (
int
, optional) – Exact number of days allowed in data, including extent given byrequested_start
orrequested_end
, if given. - min_fraction_daily_coverage (:any:, optional) – Minimum fraction of days of data in total data extent for which data must be available.
- min_fraction_hourly_temperature_coverage_per_period=0.9, – Minimum fraction of hours of temperature data coverage in a particular period. Anything below this causes the whole period to be considered considered missing.
Returns: data_sufficiency – The an object containing sufficiency status and warnings for this data.
Return type: - data_quality (
-
eemeter.
caltrack_usage_per_day_predict
(model_type, model_params, prediction_index, temperature_data, degree_day_method='daily', with_disaggregated=False, with_design_matrix=False)[source]¶ CalTRACK predict method.
Given a model type, parameters, hourly temperatures, a
pandas.DatetimeIndex
index over which to predict meter usage, return model predictions as totals for the period (so billing period totals, daily totals, etc.). Optionally include the computed design matrix or disaggregated usage in the output dataframe.Parameters: - model_type (
str
) – Model type (e.g.,'cdd_hdd'
). - model_params (
dict
) – Parameters as stored ineemeter.CalTRACKUsagePerDayCandidateModel.model_params
. - temperature_data (
pandas.DataFrame
) – Hourly temperature data to use for prediction. Time period should match theprediction_index
argument. - prediction_index (
pandas.DatetimeIndex
) – Time period over which to predict. - with_disaggregated (
bool
, optional) – If True, return results as apandas.DataFrame
with columns'base_load'
,'heating_load'
, and'cooling_load'
. - with_design_matrix (
bool
, optional) – If True, return results as apandas.DataFrame
with columns'n_days'
,'n_days_dropped'
,n_days_kept
, andtemperature_mean
.
Returns: - prediction (
pandas.DataFrame
) – Columns are as follows:predicted_usage
: Predicted usage values computed to matchprediction_index
.base_load
: modeled base load (only forwith_disaggregated=True
).cooling_load
: modeled cooling load (only forwith_disaggregated=True
).heating_load
: modeled heating load (only forwith_disaggregated=True
).n_days
: number of days in period (only forwith_design_matrix=True
).n_days_dropped
: number of days dropped because of insufficient data (only forwith_design_matrix=True
).n_days_kept
: number of days kept because of sufficient data (only forwith_design_matrix=True
).temperature_mean
: mean temperature during given period. (only forwith_design_matrix=True
).
- predict_warnings (:any: list of EEMeterWarning if any.)
- model_type (
-
eemeter.
plot_caltrack_candidate
(candidate, best=False, ax=None, title=None, figsize=None, temp_range=None, alpha=None, **kwargs)[source]¶ Plot a CalTRACK candidate model.
Parameters: - candidate (
eemeter.CalTRACKUsagePerDayCandidateModel
) – A candidate model with a predict function. - best (
bool
, optional) – Whether this is the best candidate or not. - ax (
matplotlib.axes.Axes
, optional) – Existing axes to plot on. - title (
str
, optional) – Chart title. - figsize (
tuple
, optional) – (width, height) of chart. - temp_range (
tuple
, optional) – (min, max) temperatures to plot model. - alpha (
float
between 0 and 1, optional) – Transparency, 0 fully transparent, 1 fully opaque. - **kwargs – Keyword arguments for
matplotlib.axes.Axes.plot
Returns: ax – Matplotlib axes.
Return type: - candidate (
-
eemeter.
get_too_few_non_zero_degree_day_warning
(model_type, balance_point, degree_day_type, degree_days, minimum_non_zero)[source]¶ Return an empty list or a single warning wrapped in a list regarding non-zero degree days for a set of degree days.
Parameters: - model_type (
str
) – Model type (e.g.,'cdd_hdd'
). - balance_point (
float
) – The balance point in question. - degree_day_type (
str
) – The type of degree days ('cdd'
or'hdd'
). - degree_days (
pandas.Series
) – A series of degree day values. - minimum_non_zero (
int
) – Minimum allowable number of non-zero degree day values.
Returns: warnings – Empty list or list of single warning.
Return type: - model_type (
-
eemeter.
get_total_degree_day_too_low_warning
(model_type, balance_point, degree_day_type, avg_degree_days, period_days, minimum_total)[source]¶ Return an empty list or a single warning wrapped in a list regarding the total summed degree day values.
Parameters: - model_type (
str
) – Model type (e.g.,'cdd_hdd'
). - balance_point (
float
) – The balance point in question. - degree_day_type (
str
) – The type of degree days ('cdd'
or'hdd'
). - avg_degree_days (
pandas.Series
) – A series of degree day values. - period_days (
pandas.Series
) – A series of containing day counts. - minimum_total (
float
) – Minimum allowable total sum of degree day values.
Returns: warnings – Empty list or list of single warning.
Return type: - model_type (
-
eemeter.
get_parameter_negative_warning
(model_type, model_params, parameter)[source]¶ Return an empty list or a single warning wrapped in a list indicating whether model parameter is negative.
Parameters: - model_type (
str
) – Model type (e.g.,'cdd_hdd'
). - model_params (
dict
) – Parameters as stored ineemeter.CalTRACKUsagePerDayCandidateModel.model_params
. - parameter (
str
) – The name of the parameter, e.g.,'intercept'
.
Returns: warnings – Empty list or list of single warning.
Return type: - model_type (
-
eemeter.
get_parameter_p_value_too_high_warning
(model_type, model_params, parameter, p_value, maximum_p_value)[source]¶ Return an empty list or a single warning wrapped in a list indicating whether model parameter p-value is too high.
Parameters: - model_type (
str
) – Model type (e.g.,'cdd_hdd'
). - model_params (
dict
) – Parameters as stored ineemeter.CalTRACKUsagePerDayCandidateModel.model_params
. - parameter (
str
) – The name of the parameter, e.g.,'intercept'
. - p_value (
float
) – The p-value of the parameter. - maximum_p_value (
float
) – The maximum allowable p-value of the parameter.
Returns: warnings – Empty list or list of single warning.
Return type: - model_type (
-
eemeter.
get_single_cdd_only_candidate_model
(data, minimum_non_zero_cdd, minimum_total_cdd, beta_cdd_maximum_p_value, weights_col, balance_point)[source]¶ Return a single candidate cdd-only model for a particular balance point.
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
andcdd_<balance_point>
DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - minimum_non_zero_cdd (
int
) – Minimum allowable number of non-zero cooling degree day values. - minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values. - beta_cdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta cdd parameter. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights. - balance_point (
float
) – The cooling balance point for this model.
Returns: candidate_model – A single cdd-only candidate model, with any associated warnings.
Return type: - data (
-
eemeter.
get_single_hdd_only_candidate_model
(data, minimum_non_zero_hdd, minimum_total_hdd, beta_hdd_maximum_p_value, weights_col, balance_point)[source]¶ Return a single candidate hdd-only model for a particular balance point.
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
andhdd_<balance_point>
DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - minimum_non_zero_hdd (
int
) – Minimum allowable number of non-zero heating degree day values. - minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values. - beta_hdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta hdd parameter. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights. - balance_point (
float
) – The heating balance point for this model.
Returns: candidate_model – A single hdd-only candidate model, with any associated warnings.
Return type: - data (
-
eemeter.
get_single_cdd_hdd_candidate_model
(data, minimum_non_zero_cdd, minimum_non_zero_hdd, minimum_total_cdd, minimum_total_hdd, beta_cdd_maximum_p_value, beta_hdd_maximum_p_value, weights_col, cooling_balance_point, heating_balance_point)[source]¶ Return and fit a single candidate cdd_hdd model for a particular selection of cooling balance point and heating balance point
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
andhdd_<heating_balance_point>
andcdd_<cooling_balance_point>
DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - minimum_non_zero_cdd (
int
) – Minimum allowable number of non-zero cooling degree day values. - minimum_non_zero_hdd (
int
) – Minimum allowable number of non-zero heating degree day values. - minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values. - minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values. - beta_cdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta cdd parameter. - beta_hdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta hdd parameter. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights. - cooling_balance_point (
float
) – The cooling balance point for this model. - heating_balance_point (
float
) – The heating balance point for this model.
Returns: candidate_model – A single cdd-hdd candidate model, with any associated warnings.
Return type: - data (
-
eemeter.
get_intercept_only_candidate_models
(data, weights_col)[source]¶ Return a list of a single candidate intercept-only model.
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – List containing a single intercept-only candidate model.
Return type: - data (
-
eemeter.
get_cdd_only_candidate_models
(data, minimum_non_zero_cdd, minimum_total_cdd, beta_cdd_maximum_p_value, weights_col)[source]¶ Return a list of all possible candidate cdd-only models.
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns with names of the formcdd_<balance_point>
. All columns with names of this form will be used to fit a candidate model. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - minimum_non_zero_cdd (
int
) – Minimum allowable number of non-zero cooling degree day values. - minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values. - beta_cdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta cdd parameter. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – A list of cdd-only candidate models, with any associated warnings.
Return type: - data (
-
eemeter.
get_hdd_only_candidate_models
(data, minimum_non_zero_hdd, minimum_total_hdd, beta_hdd_maximum_p_value, weights_col)[source]¶ Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns with names of the formhdd_<balance_point>
. All columns with names of this form will be used to fit a candidate model. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - minimum_non_zero_hdd (
int
) – Minimum allowable number of non-zero heating degree day values. - minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values. - beta_hdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta hdd parameter. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – A list of hdd-only candidate models, with any associated warnings.
Return type: - data (
-
eemeter.
get_cdd_hdd_candidate_models
(data, minimum_non_zero_cdd, minimum_non_zero_hdd, minimum_total_cdd, minimum_total_hdd, beta_cdd_maximum_p_value, beta_hdd_maximum_p_value, weights_col)[source]¶ Return a list of candidate cdd_hdd models for a particular selection of cooling balance point and heating balance point
Parameters: - data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns each of the formhdd_<heating_balance_point>
andcdd_<cooling_balance_point>
. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. - minimum_non_zero_cdd (
int
) – Minimum allowable number of non-zero cooling degree day values. - minimum_non_zero_hdd (
int
) – Minimum allowable number of non-zero heating degree day values. - minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values. - minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values. - beta_cdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta cdd parameter. - beta_hdd_maximum_p_value (
float
) – The maximum allowable p-value of the beta hdd parameter. - weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – A list of cdd_hdd candidate models, with any associated warnings.
Return type: - data (
-
eemeter.
select_best_candidate
(candidate_models)[source]¶ Select and return the best candidate model based on r-squared and qualification.
Parameters: candidate_models ( list
ofeemeter.CalTRACKUsagePerDayCandidateModel
) – Candidate models to select from.Returns: (best_candidate, warnings) – Return the candidate model with highest r-squared or None if none meet the requirements, and a list of warnings about this selection (or lack of selection). Return type: tuple
ofeemeter.CalTRACKUsagePerDayCandidateModel
orNone
andlist
of eemeter.EEMeterWarning
Savings¶
These methods are designed for computing metered and normal year savings.
-
eemeter.
metered_savings
(baseline_model, reporting_meter_data, temperature_data, with_disaggregated=False, confidence_level=0.9, predict_kwargs=None)[source]¶ Compute metered savings, i.e., savings in which the baseline model is used to calculate the modeled usage in the reporting period. This modeled usage is then compared to the actual usage from the reporting period. Also compute two measures of the uncertainty of the aggregate savings estimate, a fractional savings uncertainty (FSU) error band and an OLS error band. (To convert the FSU error band into FSU, divide by total estimated savings.)
Parameters: - baseline_model (
eemeter.CalTRACKUsagePerDayModelResults
) – Object to use for predicting pre-intervention usage. - reporting_meter_data (
pandas.DataFrame
) – The observed reporting period data (totals). Savings will be computed for the periods supplied in the reporting period data. - temperature_data (
pandas.Series
) – Hourly-frequency timeseries of temperature data during the reporting period. - with_disaggregated (
bool
, optional) – If True, calculate baseline counterfactual disaggregated usage estimates. Savings cannot be disaggregated for metered savings. For that, useeemeter.modeled_savings
. - confidence_level (
float
, optional) –The two-tailed confidence level used to calculate the t-statistic used in calculation of the error bands.
Ignored if not computing error bands.
- predict_kwargs (
dict
, optional) – Extra kwargs to pass to the baseline_model.predict method.
Returns: results (
pandas.DataFrame
) – DataFrame with metered savings, indexed withreporting_meter_data.index
. Will include the following columns:counterfactual_usage
(baseline model projected into reporting period)reporting_observed
(given by reporting_meter_data)metered_savings
If with_disaggregated is set to True, the following columns will also be in the results DataFrame:
counterfactual_base_load
counterfactual_heating_load
counterfactual_cooling_load
error_bands (
dict
, optional) – If baseline_model is an instance of CalTRACKUsagePerDayModelResults, will also return a dictionary of FSU and OLS error bands for the aggregated energy savings over the post period.
- baseline_model (
-
eemeter.
modeled_savings
(baseline_model, reporting_model, result_index, temperature_data, with_disaggregated=False, confidence_level=0.9, predict_kwargs=None)[source]¶ Compute modeled savings, i.e., savings in which baseline and reporting usage values are based on models. This is appropriate for annualizing or weather normalizing models.
Parameters: - baseline_model (
eemeter.CalTRACKUsagePerDayCandidateModel
) – Model to use for predicting pre-intervention usage. - reporting_model (
eemeter.CalTRACKUsagePerDayCandidateModel
) – Model to use for predicting post-intervention usage. - result_index (
pandas.DatetimeIndex
) – The dates for which usage should be modeled. - temperature_data (
pandas.Series
) – Hourly-frequency timeseries of temperature data during the modeled period. - with_disaggregated (
bool
, optional) – If True, calculate modeled disaggregated usage estimates and savings. - confidence_level (
float
, optional) –The two-tailed confidence level used to calculate the t-statistic used in calculation of the error bands.
Ignored if not computing error bands.
- predict_kwargs (
dict
, optional) – Extra kwargs to pass to the baseline_model.predict and reporting_model.predict methods.
Returns: results (
pandas.DataFrame
) – DataFrame with modeled savings, indexed with the result_index. Will include the following columns:modeled_baseline_usage
modeled_reporting_usage
modeled_savings
If with_disaggregated is set to True, the following columns will also be in the results DataFrame:
modeled_baseline_base_load
modeled_baseline_cooling_load
modeled_baseline_heating_load
modeled_reporting_base_load
modeled_reporting_cooling_load
modeled_reporting_heating_load
modeled_base_load_savings
modeled_cooling_load_savings
modeled_heating_load_savings
error_bands (
dict
, optional) – If baseline_model and reporting_model are instances of CalTRACKUsagePerDayModelResults, will also return a dictionary of FSU and error bands for the aggregated energy savings over the normal year period.
- baseline_model (
Exceptions¶
These exceptions are used in the package to indicate various common issues.
Features¶
These methods are used to compute features that are used in creating CalTRACK models.
-
eemeter.
compute_usage_per_day_feature
(meter_data, series_name='usage_per_day')[source]¶ Compute average usage per day for billing/daily data.
Parameters: - meter_data (
pandas.DataFrame
) – Meter data for which to compute usage per day. - series_name (
str
) – Name of the output pandas series
Returns: usage_per_day_feature – The usage per day feature.
Return type: - meter_data (
-
eemeter.
compute_occupancy_feature
(hour_of_week, occupancy)[source]¶ Given an hour of week feature, determine the occupancy for that hour of week.
Parameters: - hour_of_week (
pandas.Series
) – Hour of week feature as given byeemeter.compute_time_features
. - occupancy (
pandas.Series
) – Boolean occupancy assignents for each hour of week as determined byeemeter.estimate_hour_of_week_occupancy
Returns: occupancy_feature – Occupancy labels for the timeseries.
Return type: - hour_of_week (
-
eemeter.
compute_temperature_features
(meter_data_index, temperature_data, heating_balance_points=None, cooling_balance_points=None, data_quality=False, temperature_mean=True, degree_day_method='daily', percent_hourly_coverage_per_day=0.5, percent_hourly_coverage_per_billing_period=0.9, use_mean_daily_values=True, tolerance=None, keep_partial_nan_rows=False)[source]¶ Compute temperature features from hourly temperature data using the
pandas.DatetimeIndex
meter data..Creates a
pandas.DataFrame
with the same index as the meter data.Note
For CalTRACK compliance (2.2.2.3), must set
percent_hourly_coverage_per_day=0.5
,cooling_balance_points=range(30,90,X)
, andheating_balance_points=range(30,90,X)
, where X is either 1, 2, or 3. For natural gas meter use data, must setfit_cdd=False
.Note
For CalTRACK compliance (2.2.3.2), for billing methods, must set
percent_hourly_coverage_per_billing_period=0.9
.Note
For CalTRACK compliance (2.3.3),
meter_data_index
andtemperature_data
must both be timezone-aware and have matching timezones.Note
For CalTRACK compliance (3.3.1.1), for billing methods, must set
use_mean_daily_values=True
.Note
For CalTRACK compliance (3.3.1.2), for daily or billing methods, must set
degree_day_method=daily
.Parameters: - meter_data_index (
pandas.DataFrame
) – Apandas.DatetimeIndex
corresponding to the index over which to compute temperature features. - temperature_data (
pandas.Series
) – Series withpandas.DatetimeIndex
with hourly ('H'
) frequency and a set of temperature values. - cooling_balance_points (
list
ofint
orfloat
, optional) – List of cooling balance points for which to create cooling degree days. - heating_balance_points (
list
ofint
orfloat
, optional) – List of heating balance points for which to create heating degree days. - data_quality (
bool
, optional) – If True, compute data quality columns for temperature, i.e.,temperature_not_null
andtemperature_null
, containing for each meter value - temperature_mean (
bool
, optional) – If True, compute temperature means for each meter period. - degree_day_method (
str
,'daily'
or'hourly'
) – The method to use in calculating degree days. - percent_hourly_coverage_per_day (
str
, optional) – Percent hourly temperature coverage per day for heating and cooling degree days to not be dropped. - use_mean_daily_values (
bool
, optional) – If True, meter and degree day values should be mean daily values, not totals. If False, totals will be used instead. - tolerance (
pandas.Timedelta
, optional) – Do not merge more than this amount of temperature data beyond this limit. - keep_partial_nan_rows (
bool
, optional) – If True, keeps data in resultantpandas.DataFrame
that has missing temperature or meter data. Otherwise, these rows are overwritten entirely withnumpy.nan
values.
Returns: data – A dataset with the specified parameters.
Return type: - meter_data_index (
-
eemeter.
compute_temperature_bin_features
(temperatures, bin_endpoints)[source]¶ Compute temperature bin features.
Parameters: - temperatures (
pandas.Series
) – Hourly temperature data. - bin_endpoints (
list
ofint
orfloat
) – List of bin endpoints to use when assigning features.
Returns: temperature_bin_features – A datafame with the input index and one column per bin. The sum of each row (with all of the temperature bins) equals the input temperature. More details on this bin feature are available in the CalTRACK documentation.
Return type: - temperatures (
-
eemeter.
compute_time_features
(index, hour_of_week=True, day_of_week=True, hour_of_day=True)[source]¶ Compute hour of week, day of week, or hour of day features.
Parameters: - index (
pandas.DatetimeIndex
) – Datetime index with hourly frequency. - hour_of_week (
bool
) – Include the hour_of_week feature. - day_of_week (
bool
) – Include the day_of_week feature. - hour_of_day (
bool
) – Include the hour_of_day feature.
Returns: time_features – A dataframe with the input datetime index and up to three columns
- hour_of_week : Label for hour of week, 0-167, 0 is 12-1am Monday
- day_of_week : Label for day of week, 0-6, 0 is Monday.
- hour_of_day : Label for hour of day, 0-23, 0 is 12-1am.
Return type: - index (
-
eemeter.
estimate_hour_of_week_occupancy
(data, segmentation=None, threshold=0.65)[source]¶ Estimate occupancy features for each segment.
Parameters: - data (
pandas.DataFrame
) – Input data for the weighted least squares (“meter_value ~ cdd_65 + hdd_50”) used to estimate occupancy. Must contain meter_value, hour_of_week, cdd_65, and hdd_50 columns with an hourlypandas.DatetimeIndex
. - segmentation (
pandas.DataFrame
, default None) – A segmentation expressed as a dataframe which shares the timeseries index of the data and has named columns of weights, which are of the form returned byeemeter.segment_time_series
. - threshold (
float
, default 0.65) – To be marked as unoccupied, the ratio of points with negative residuals in the weighted least squares in a particular hour of week must exceed this threshold. Said another way, in the default case, if more than 35% of values are greater than the basic degree day model for any particular hour of the week, that hour of week is marked as being occupied.
Returns: occupancy_lookup – The occupancy lookup has a categorical index with values from 0 to 167 - one for each hour of the week, and boolean values indicating an occupied (1, True) or unoccupied (0, False) for each of the segments. Each segment has a column labeled by its segment name.
Return type: - data (
-
eemeter.
fit_temperature_bins
(data, segmentation=None, occupancy_lookup=None, default_bins=[30, 45, 55, 65, 75, 90], min_temperature_count=20)[source]¶ Determine appropriate temperature bins for a particular set of temperature data given segmentation and occupancy.
Parameters: - data (
pandas.Series
) – Input temperature data with an hourlypandas.DatetimeIndex
- segmentation (
pandas.DataFrame
, default None) – A dataframe containing segment weights with one column per segment. If left off, segmentation will not be considered. - occupancy_lookup (
pandas.DataFrame
, default None) – A dataframe of the form returned byeemeter.estimate_hour_of_week_occupancy
containing occupancy for each segment. If None, occupancy will not be considered. - default_bins (
list
offloat
orint
) – A list of candidate bin endpoints to begin the search with. - min_temperature_count (
int
) – The minimum number of temperatre values that must be included in any bin. If this threshold is not met, bins are dropped from the outside in following the algorithm described in the CalTRACK documentation.
Returns: - temperature_bins (
pandas.DataFrame
or, if occupancy_lookup is provided a) - two
tuple
ofpandas.DataFrame
– A dataframe with boolean values indicating whether or not a bin was kept, with a categorical index for each candidate bin endpoint and a column for each segment.
- data (
-
eemeter.
get_missing_hours_of_week_warning
(hours_of_week)[source]¶ Warn if any hours of week (0-167) are missing.
Parameters: hours_of_week ( pandas.Series
) – Hour of week feature as given byeemeter.compute_time_features
.Returns: warning – Warning with qualified name “eemeter.hour_of_week.missing” Return type: eemeter.EEMeterWarning
-
eemeter.
merge_features
(features, keep_partial_nan_rows=False)[source]¶ Combine dataframes of features which share a datetime index.
Parameters: - features (
list
ofpandas.DataFrame
) – List of dataframes to be concatenated to share an index. - keep_partial_nan_rows (
bool
, default False) – If True, don’t overwrite partial rows with NaN, otherwise any row with a NaN value gets changed to all NaN values.
Returns: merged_features – A single dataframe with the index of the input data and all of the columns in the input feature dataframes.
Return type: - features (
Input and Output Utilities¶
These functions are used for reading and writing meter and temperature data.
-
eemeter.
meter_data_from_csv
(filepath_or_buffer, tz=None, start_col='start', value_col='value', gzipped=False, freq=None, **kwargs)[source]¶ Load meter data from a CSV file.
Default format:
start,value 2017-01-01T00:00:00+00:00,0.31 2017-01-02T00:00:00+00:00,0.4 2017-01-03T00:00:00+00:00,0.58
Parameters: - filepath_or_buffer (
str
or file-handle) – File path or object. - tz (
str
, optional) – E.g.,'UTC'
or'US/Pacific'
- start_col (
str
, optional, default'start'
) – Date period start column. - value_col (
str
, optional, default'value'
) – Value column, can be in any unit. - gzipped (
bool
, optional) – Whether file is gzipped. - freq (
str
, optional) – If given, apply frequency to data usingpandas.DataFrame.resample
. - **kwargs – Extra keyword arguments to pass to
pandas.read_csv
, such assep='|'
.
- filepath_or_buffer (
-
eemeter.
meter_data_from_json
(data, orient='list')[source]¶ Load meter data from json.
Default format:
[ ['2017-01-01T00:00:00+00:00', 3.5], ['2017-02-01T00:00:00+00:00', 0.4], ['2017-03-01T00:00:00+00:00', 0.46], ]
records format:
[ {'start': '2017-01-01T00:00:00+00:00', 'value': 3.5}, {'start': '2017-02-01T00:00:00+00:00', 'value': 0.4}, {'start': '2017-03-01T00:00:00+00:00', 'value': 0.46}, ]
Parameters: - data (
list
) – A list of meter data, with each row representing a single record. - orient –
- Format of data parameter:
- list (a list of lists, with the first element as start date)
- records (a list of dicts)
Returns: df – DataFrame with a single column (
'value'
) and apandas.DatetimeIndex
. A second column ('estimated'
) may also be included if the input data contained an estimated boolean flag.Return type: - data (
-
eemeter.
meter_data_to_csv
(meter_data, path_or_buf)[source]¶ Write meter data to CSV. See also
pandas.DataFrame.to_csv
.Parameters: - meter_data (
pandas.DataFrame
) – Meter data DataFrame with'value'
column andpandas.DatetimeIndex
. - path_or_buf (
str
or file handle, default None) – File path or object, if None is provided the result is returned as a string.
- meter_data (
-
eemeter.
temperature_data_from_csv
(filepath_or_buffer, tz=None, date_col='dt', temp_col='tempF', gzipped=False, freq=None, **kwargs)[source]¶ Load temperature data from a CSV file.
Default format:
dt,tempF 2017-01-01T00:00:00+00:00,21 2017-01-01T01:00:00+00:00,22.5 2017-01-01T02:00:00+00:00,23.5
Parameters: - filepath_or_buffer (
str
or file-handle) – File path or object. - tz (
str
, optional) – E.g.,'UTC'
or'US/Pacific'
- date_col (
str
, optional, default'dt'
) – Date period start column. - temp_col (
str
, optional, default'tempF'
) – Temperature column. - gzipped (
bool
, optional) – Whether file is gzipped. - freq (
str
, optional) – If given, apply frequency to data usingpandas.Series.resample
. - **kwargs – Extra keyword arguments to pass to
pandas.read_csv
, such assep='|'
.
- filepath_or_buffer (
-
eemeter.
temperature_data_from_json
(data, orient='list')[source]¶ Load temperature data from json. (Must be given in degrees Fahrenheit).
Default format:
[ ['2017-01-01T00:00:00+00:00', 3.5], ['2017-01-01T01:00:00+00:00', 5.4], ['2017-01-01T02:00:00+00:00', 7.4], ]
Parameters: data ( list
) – List elements are each a rows of data.Returns: series – DataFrame with a single column ( 'tempF'
) and apandas.DatetimeIndex
.Return type: pandas.Series
-
eemeter.
temperature_data_to_csv
(temperature_data, path_or_buf)[source]¶ Write temperature data to CSV. See also
pandas.DataFrame.to_csv
.Parameters: - temperature_data (
pandas.Series
) – Temperature data series withpandas.DatetimeIndex
. - path_or_buf (
str
or file handle, default None) – File path or object, if None is provided the result is returned as a string.
- temperature_data (
Metrics¶
This class is used for computing model metrics.
-
class
eemeter.
ModelMetrics
(observed_input, predicted_input, num_parameters=1, autocorr_lags=1, confidence_level=0.9)[source]¶ Contains measures of model fit and summary statistics on the input series.
Parameters: - observed_input (
pandas.Series
) – Series withpandas.DatetimeIndex
with a set of electricity or gas meter values. - predicted_input (
pandas.Series
) – Series withpandas.DatetimeIndex
with a set of electricity or gas meter values. - num_parameters (
int
, optional) – The number of parameters (excluding the intercept) used in the regression from which the predictions were derived. - autocorr_lags (
int
, optional) – The number of lags to use when calculating the autocorrelation of the residuals. - confidence_level (
int
, optional) – Confidence level used in fractional savings uncertainty computations.
-
merged_length
¶ The length of the dataframe resulting from the inner join of the observed_input series and the predicted_input series.
Type: int
-
r_squared
¶ The r-squared of the model from which the predicted_input series was produced.
Type: float
-
r_squared_adj
¶ The r-squared of the predicted_input series relative to the observed_input series, adjusted by the number of parameters in the model.
Type: float
-
cvrmse
¶ The coefficient of variation (root-mean-squared error) of the predicted_input series relative to the observed_input series.
Type: float
-
cvrmse_adj
¶ The coefficient of variation (root-mean-squared error) of the predicted_input series relative to the observed_input series, adjusted by the number of parameters in the model.
Type: float
-
mape
¶ The mean absolute percent error of the predicted_input series relative to the observed_input series.
Type: float
-
mape_no_zeros
¶ The mean absolute percent error of the predicted_input series relative to the observed_input series, with all time periods dropped where the observed_input series was not greater than zero.
Type: float
-
num_meter_zeros
¶ The number of time periods for which the observed_input series was not greater than zero.
Type: int
-
nmae
¶ The normalized mean absolute error of the predicted_input series relative to the observed_input series.
Type: float
-
nmbe
¶ The normalized mean bias error of the predicted_input series relative to the observed_input series.
Type: float
-
autocorr_resid
¶ The autocorrelation of the residuals (where the residuals equal the predicted_input series minus the observed_input series), measured using a number of lags equal to autocorr_lags.
Type: float
-
n_prime
¶ The number of baseline inputs corrected for autocorrelation – used in fractional savings uncertainty computation.
Type: float
-
single_tailed_confidence_level
¶ The adjusted confidence level for use in single-sided tests.
Type: float
-
degrees_of_freedom
¶ Maxmimum number of independent variables which have the freedom to vary
Type: :any:`float
-
cvrmse_auto_corr_correction
¶ Correctoin factor the apply to cvrmse to account for autocorrelation of inputs.
Type: :any:`float
-
approx_factor_auto_corr_correction
¶ Approximation factor used in ashrae 14 guideline for uncertainty computation.
Type: :any:`float
-
classmethod
from_json
(data)[source]¶ Loads a JSON-serializable representation into the model state.
The input of this function is a dict which can be the result of
json.loads
.
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
- observed_input (
Sample Data¶
These sample data are provided to make things easier for new users.
-
eemeter.
samples
()[source]¶ Load a list of sample data identifiers.
Returns: samples – List of sample identifiers for use with eemeter.load_sample
.Return type: list
ofstr
-
eemeter.
load_sample
(sample)[source]¶ Load meter data, temperature data, and metadata for associated with a particular sample identifier. Note: samples are simulated, not real, data.
Parameters: sample ( str
) – Identifier of sample. Complete list can be obtained witheemeter.samples
.Returns: meter_data, temperature_data, metadata – Meter data, temperature data, and metadata for this sample identifier. Return type: tuple
ofpandas.DataFrame
,pandas.Series
, anddict
Segmentation¶
These methods are used within CalTRACK hourly to support building multiple partial models and combining them into one full model.
-
eemeter.
iterate_segmented_dataset
(data, segmentation=None, feature_processor=None, feature_processor_kwargs=None, feature_processor_segment_name_mapping=None)[source]¶ A utility for iterating over segments which allows providing a function for processing outputs into features.
Parameters: - data (
pandas.DataFrame
, required) – Data to segment, - segmentation (
pandas.DataFrame
, default None) – A segmentation of the input dataframe expressed as a dataframe which shares the timeseries index of the data and has named columns of weights, which are iterated over to create the outputs (or inputs to the feature processor, which then creates the actual outputs). - feature_processor (
function
, default None) – A function that transforms raw inputs (temperatures) into features for each segment. - feature_processor_kwargs (
dict
, default None) – A dict of keyword arguments to be passed as **kwargs to the feature_processor function. - feature_processor_segment_name_mapping (
dict
, default None) – A mapping from the default segmentation segment names to alternate names. This is useful when prediction uses a different segment type than fitting.
- data (
-
eemeter.
segment_time_series
(index, segment_type='single', drop_zero_weight_segments=False)[source]¶ Split a time series index into segments by applying weights.
Parameters: - index (
pandas.DatetimeIndex
) – A time series index which gets split into segments. - segment_type (
str
) –The method to use when creating segments.
- ”single”: creates one big segment with the name “all”.
- ”one_month”: creates up to twelve segments, each of which contains a single month. Segment names are “jan”, “feb”, … “dec”.
- ”three_month”: creates up to twelve overlapping segments, each of which contains three calendar months of data. Segment names are “dec-jan-feb”, “jan-feb-mar”, … “nov-dec-jan”
- ”three_month_weighted”: creates up to twelve overlapping segments, each of contains three calendar months of data with first and third month in each segment having weights of one half. Segment names are “dec-jan-feb-weighted”, “jan-feb-mar-weighted”, … “nov-dec-jan-weighted”.
Returns: segmentation – A segmentation of the input index expressed as a dataframe which shares the input index and has named columns of weights.
Return type: pandas.DataFrame
- index (
-
class
eemeter.
CalTRACKSegmentModel
(segment_name, model, formula, model_params, warnings=None)[source]¶ An object that captures the model fit for one segment.
-
classmethod
from_json
(data)[source]¶ Loads a JSON-serializable representation into the model state.
The input of this function is a dict which can be the result of
json.loads
.
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
classmethod
-
class
eemeter.
SegmentedModel
(segment_models, prediction_segment_type, prediction_segment_name_mapping=None, prediction_feature_processor=None, prediction_feature_processor_kwargs=None)[source]¶ Represent a model which has been broken into multiple model segments (for CalTRACK Hourly, these are month-by-month segments, each of which is associated with a different model.
Parameters: - segment_models (
dict
ofeemeter.CalTRACKSegmentModel
) – Dictionary of segment models, keyed by segment name. - prediction_segment_type (
str
) – Any segment_type that can be passed toeemeter.segment_time_series
, currently “single”, “one_month”, “three_month”, or “three_month_weighted”. - prediction_segment_name_mapping (
dict
ofstr
) – A dictionary mapping the segment names for the segment type used for predicting to the segment names for the segment type used for fitting, e.g., {“<predict_segment_name>”: “<fit_segment_name>”}. - prediction_feature_processor (
function
) – A function that transforms raw inputs (temperatures) into features for each segment. - prediction_feature_processor_kwargs (
dict
) – A dict of keyword arguments to be passed as **kwargs to the prediction_feature_processor function.
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-
predict
(prediction_index, temperature, **kwargs)[source]¶ Predict over a prediction index by combining results from all models.
Parameters: - prediction_index (
pandas.DatetimeIndex
) – The index over which to predict. - temperature (
pandas.Series
) – Hourly temperatures. - **kwargs – Extra argmuents will be ignored
- prediction_index (
- segment_models (
Transformation utilities¶
These functions are used to various common data transformations based on pandas inputs.
-
eemeter.
as_freq
(data_series, freq, atomic_freq='1 Min', series_type='cumulative', include_coverage=False)[source]¶ Resample data to a different frequency.
This method can be used to upsample or downsample meter data. The assumption it makes to do so is that meter data is constant and averaged over the given periods. For instance, to convert billing-period data to daily data, this method first upsamples to the atomic frequency (1 minute freqency, by default), “spreading” usage evenly across all minutes in each period. Then it downsamples to hourly frequency and returns that result. With instantaneous series, the data is copied to all contiguous time intervals and the mean over freq is returned.
Caveats:
- This method gives a fair amount of flexibility in resampling as long as you are OK with the assumption that usage is constant over the period (this assumption is generally broken in observed data at large enough frequencies, so this caveat should not be taken lightly).
Parameters: - data_series (
pandas.Series
) – Data to resample. Should have apandas.DatetimeIndex
. - freq (
str
) – The frequency to resample to. This should be given in a form recognized by thepandas.Series.resample
method. - atomic_freq (
str
, optional) – The “atomic” frequency of the intermediate data form. This can be adjusted to a higher atomic frequency to increase speed or memory performance. - series_type (
str
, {‘cumulative’, ‘instantaneous’},) – default ‘cumulative’ Type of data sampling. ‘cumulative’ data can be spread over smaller time intervals and is aggregated using addition (e.g. meter data). ‘instantaneous’ data is copied (not spread) over smaller time intervals and is aggregated by averaging (e.g. weather data). - include_coverage (
bool
,) – default False Option of whether to return a series with just the resampled values or a dataframe with a column that includes percent coverage of source data used for each sample.
Returns: resampled_data – Data resampled to the given frequency (optionally as a dataframe with a coverage column if include_coverage is used.
Return type:
-
eemeter.
day_counts
(index)[source]¶ Days between DatetimeIndex values as a
pandas.Series
.Parameters: index ( pandas.DatetimeIndex
) – The index for which to get day counts.Returns: day_counts – A pandas.Series
with counts of days between periods. Counts are given on start dates of periods.Return type: pandas.Series
-
eemeter.
get_baseline_data
(data, start=None, end=None, max_days=365, allow_billing_period_overshoot=False, n_days_billing_period_overshoot=None, ignore_billing_period_gap_for_day_count=False)[source]¶ Filter down to baseline period data.
Note
For compliance with CalTRACK, set
max_days=365
(section 2.2.1.1).Parameters: - data (
pandas.DataFrame
orpandas.Series
) – The data to filter to baseline data. This data will be filtered down to an acceptable baseline period according to the dates passed as start and end, or the maximum period specified with max_days. - start (
datetime.datetime
) – A timezone-aware datetime that represents the earliest allowable start date for the baseline data. The stricter of this or max_days is used to determine the earliest allowable baseline period date. - end (
datetime.datetime
) – A timezone-aware datetime that represents the latest allowable end date for the baseline data, i.e., the latest date for which data is available before the intervention begins. - max_days (
int
, default 365) – The maximum length of the period. Ignored if end is not set. The stricter of this or start is used to determine the earliest allowable baseline period date. - allow_billing_period_overshoot (
bool
, default False) – If True, count max_days from the end of the last billing data period that ends before the end date, rather than from the exact end date. Otherwise use the exact end date as the cutoff. - n_days_billing_period_overshoot (
int
, default None) – If allow_billing_period_overshoot is set to True, this determines the number of days of overshoot that will be tolerated. A value of None implies that any number of days is allowed. - ignore_billing_period_gap_for_day_count (
bool
, default False) –If True, instead of going back max_days from either the end date or end of the last billing period before that date (depending on the value of the allow_billing_period_overshoot setting) and excluding the last period that began before that date, first check to see if excluding or including that period gets closer to a total of max_days of data.
For example, with max_days=365, if an exact 365 period would targeted Feb 15, but the billing period went from Jan 20 to Feb 20, exclude that period for a total of ~360 days of data, because that’s closer to 365 than ~390 days, which would be the total if that period was included. If, on the other hand, if that period started Feb 10 and went to Mar 10, include the period, because ~370 days of data is closer to than ~340.
Returns: baseline_data, warnings – Data for only the specified baseline period and any associated warnings.
Return type: tuple
of (pandas.DataFrame
orpandas.Series
,list
ofeemeter.EEMeterWarning
)- data (
-
eemeter.
get_reporting_data
(data, start=None, end=None, max_days=365, allow_billing_period_overshoot=False, ignore_billing_period_gap_for_day_count=False)[source]¶ Filter down to reporting period data.
Parameters: - data (
pandas.DataFrame
orpandas.Series
) – The data to filter to reporting data. This data will be filtered down to an acceptable reporting period according to the dates passed as start and end, or the maximum period specified with max_days. - start (
datetime.datetime
) – A timezone-aware datetime that represents the earliest allowable start date for the reporting data, i.e., the earliest date for which data is available after the intervention begins. - end (
datetime.datetime
) – A timezone-aware datetime that represents the latest allowable end date for the reporting data. The stricter of this or max_days is used to determine the latest allowable reporting period date. - max_days (
int
, default 365) – The maximum length of the period. Ignored if start is not set. The stricter of this or end is used to determine the latest allowable reporting period date. - allow_billing_period_overshoot (
bool
, default False) – If True, count max_days from the start of the first billing data period that starts after the start date, rather than from the exact start date. Otherwise use the exact start date as the cutoff. - ignore_billing_period_gap_for_day_count (
bool
, default False) –If True, instead of going forward max_days from either the start date or the start of the first billing period after that date (depending on the value of the allow_billing_period_overshoot setting) and excluding the first period that ended after that date, first check to see if excluding or including that period gets closer to a total of max_days of data.
For example, with max_days=365, if an exact 365 period would targeted Feb 15, but the billing period went from Jan 20 to Feb 20, include that period for a total of ~370 days of data, because that’s closer to 365 than ~340 days, which would be the total if that period was excluded. If, on the other hand, if that period started Feb 10 and went to Mar 10, exclude the period, because ~360 days of data is closer to than ~390.
Returns: - reporting_data, warnings (
tuple
of (pandas.DataFrame
or) pandas.Series
,list
ofeemeter.EEMeterWarning
) – Data for only the specified reporting period and any associated warnings.
- data (
-
class
eemeter.
Term
(index, label, target_start_date, target_end_date, target_term_length_days, actual_start_date, actual_end_date, actual_term_length_days, complete)[source]¶ The term object represents a subset of an index.
-
index
¶ The index of the term. Includes a period at the end meant to be NaN-value.
Type: pandas.DatetimeIndex
-
target_start_date
¶ The start date inferred for this term from the start date and target term lenths.
Type: pandas.Timestamp
ordatetime.datetime
-
target_end_date
¶ The end date inferred for this term from the start date and target term lenths.
Type: pandas.Timestamp
ordatetime.datetime
-
actual_start_date
¶ The first date in the index.
Type: pandas.Timestamp
-
actual_end_date
¶ The last date in the index.
Type: pandas.Timestamp
-
-
eemeter.
get_terms
(index, term_lengths, term_labels=None, start=None, method='strict')[source]¶ Breaks a
pandas.DatetimeIndex
into consecutive terms of specified lengths.Parameters: - index (
pandas.DatetimeIndex
) – The index to split into terms, generally meter_data.index or temperature_data.index. - term_lengths (
list
ofint
) – The lengths (in days) of the terms into which to split the data. - term_labels (
list
ofstr
, default None) – Labels to use for each term. List must be the same length as the term_lengths list. - start (
datetime.datetime
, default None) – A timezone-aware datetime that represents the earliest allowable start date for the terms. If None, use the first element of the index. - method (one of ['strict', 'nearest'], default 'strict') –
The method to use to get terms.
- ”strict”: Ensures that the term end will come on or before the length of
Returns: terms – A dataframe of term labels with the same
pandas.DatetimeIndex
given as index. This can be used to filter the original data into terms of approximately the desired length.Return type: list
ofeemeter.Term
- index (
-
eemeter.
remove_duplicates
(df_or_series)[source]¶ Remove duplicate rows or values by keeping the first of each duplicate.
Parameters: df_or_series ( pandas.DataFrame
orpandas.Series
) – Pandas object from which to drop duplicate index values.Returns: deduplicated – The deduplicated pandas object. Return type: pandas.DataFrame
orpandas.Series
Visualization¶
These functions are used to visualization of models and meter and temperature data inputs.
-
eemeter.
plot_time_series
(meter_data, temperature_data, **kwargs)[source]¶ Plot meter and temperature data in dual-axes time series.
Parameters: - meter_data (
pandas.DataFrame
) – Apandas.DatetimeIndex
-indexed DataFrame of meter data with the columnvalue
. - temperature_data (
pandas.Series
) – Apandas.DatetimeIndex
-indexed Series of temperature data. - **kwargs – Arbitrary keyword arguments to pass to
plt.subplots
Returns: axes – Tuple of
(ax_meter_data, ax_temperature_data)
.Return type: - meter_data (
-
eemeter.
plot_energy_signature
(meter_data, temperature_data, temp_col=None, ax=None, title=None, figsize=None, **kwargs)[source]¶ Plot meter and temperature data in energy signature.
Parameters: - meter_data (
pandas.DataFrame
) – Apandas.DatetimeIndex
-indexed DataFrame of meter data with the columnvalue
. - temperature_data (
pandas.Series
) – Apandas.DatetimeIndex
-indexed Series of temperature data. - temp_col (
str
, default'temperature_mean'
) – The name of the temperature column. - ax (
matplotlib.axes.Axes
) – The axis on which to plot. - title (
str
, optional) – Chart title. - figsize (
tuple
, optional) – (width, height) of chart. - **kwargs – Arbitrary keyword arguments to pass to
matplotlib.axes.Axes.scatter
.
Returns: ax – Matplotlib axes.
Return type: - meter_data (
Warnings¶
-
class
eemeter.
EEMeterWarning
(qualified_name, description, data)[source]¶ An object representing a warning and data associated with it.
-
data
¶ Data that reproducibly shows why the warning was issued. Data should be JSON serializable.
Type: dict
-
json
()[source]¶ Return a JSON-serializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
-