API Docs¶
CalTRACK¶
CalTRACK design matrix creation¶

eemeter.
create_caltrack_hourly_preliminary_design_matrix
(meter_data, temperature_data)¶

eemeter.
create_caltrack_hourly_segmented_design_matrices
(preliminary_design_matrix, segmentation, occupancy_lookup, temperature_bins)¶

eemeter.
create_caltrack_daily_design_matrix
(meter_data, temperature_data)¶

eemeter.
create_caltrack_billing_design_matrix
(meter_data, temperature_data)¶
CalTRACK Hourly¶

eemeter.
caltrack_hourly_fit_feature_processor
(segment_name, segmented_data, occupancy_lookup, temperature_bins)¶

eemeter.
caltrack_hourly_prediction_feature_processor
(segment_name, segmented_data, occupancy_lookup, temperature_bins)¶

eemeter.
fit_caltrack_hourly_model_segment
(segment_name, segment_data)¶

eemeter.
fit_caltrack_hourly_model
(segmented_design_matrices, occupancy_lookup, temperature_bins)¶
CalTRACK Daily and Billing (Usage per Day)¶

class
eemeter.
CalTRACKUsagePerDayCandidateModel
(model_type, formula, status, model_params=None, model=None, result=None, r_squared_adj=None, warnings=None)¶ Contains information about a candidate model.

formula
¶ str
– The Rstyle formula for the design matrix of this model, e.g.,'meter_value ~ hdd_65'
.

status
¶ str
– A string indicating the status of this model. Possible statuses:'NOT ATTEMPTED'
: Candidate model not fitted due to an issue encountered in data before attempt.'ERROR'
: A fatal error occurred during model fit process.'DISQUALIFIED'
: The candidate model fit was disqualified from the model selection process because of a decision made after candidate model fit completed, e.g., a bad fit, or a parameter out of acceptable range.'QUALIFIED'
: The candidate model fit is acceptable and can be considered during model selection.

model_params
¶ Dictionary displays
, defaultNone
– A flat dictionary of model parameters which must be serializable using thejson.dumps
method.

model
¶ Object Protocol
– The raw model (if any) used in fitting. Not serialized.

result
¶ Object Protocol
– The raw modeling result (if any) returned by the model. Not serialized.

warnings
¶ list
ofeemeter.EEMeterWarning
– A list of any warnings reported during creation of the candidate model.

json
()¶ Return a JSONserializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.

plot
(best=False, ax=None, title=None, figsize=None, temp_range=None, alpha=None, **kwargs)¶ Plot

predict
(prediction_index, temperature_data, with_disaggregated=False, with_design_matrix=False, **kwargs)¶ Predict


class
eemeter.
CalTRACKUsagePerDayModelResults
(status, method_name, interval=None, model=None, r_squared_adj=None, candidates=None, warnings=None, metadata=None, settings=None)¶ Contains information about the chosen model.

status
¶ str
– A string indicating the status of this result. Possible statuses:'NO DATA'
: No baseline data was available.'NO MODEL'
: No candidate models qualified.'SUCCESS'
: A qualified candidate model was chosen.

model
¶ eemeter.CalTRACKUsagePerDayCandidateModel
orNone
– The selected candidate model, if any.

candidates
¶ list
ofeemeter.CalTRACKUsagePerDayCandidateModel
– A list of any model candidates encountered during the model selection and fitting process.

warnings
¶ list
ofeemeter.EEMeterWarning
– A list of any warnings reported during the model selection and fitting process.

metadata
¶ Dictionary displays
– An arbitrary dictionary of metadata to be associated with this result. This can be used, for example, to tag the results with attributes like an ID:{ 'id': 'METER_12345678', }

settings
¶ Dictionary displays
– A dictionary of settings used by the method.

totals_metrics
¶ ModelMetrics
– A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as period totals.

avgs_metrics
¶ ModelMetrics
– A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as daily averages.

json
(with_candidates=False)¶ Return a JSONserializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.

plot
(ax=None, title=None, figsize=None, with_candidates=False, candidate_alpha=None, temp_range=None)¶ Plot a model fit.
Parameters:  ax (
matplotlib.axes.Axes
, optional) – Existing axes to plot on.  title (
str
, optional) – Chart title.  figsize (
tuple
, optional) – (width, height) of chart.  with_candidates (
bool
) – If True, also plot candidate models.  candidate_alpha (
float
between 0 and 1) – Transparency at which to plot candidate models. 0 fully transparent, 1 fully opaque.
Returns: ax – Matplotlib axes.
Return type:  ax (


class
eemeter.
DataSufficiency
(status, criteria_name, warnings=None, settings=None)¶ Contains the result of a data sufficiency check.

status
¶ str
– A string indicating the status of this result. Possible statuses:'NO DATA'
: No baseline data was available.'FAIL'
: Data did not meet criteria.'PASS'
: Data met criteria.

warnings
¶ list
ofeemeter.EEMeterWarning
– A list of any warnings reported during the check for baseline data sufficiency.

settings
¶ Dictionary displays
– A dictionary of settings (keyword arguments) used.

json
()¶ Return a JSONserializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.


class
eemeter.
ModelPrediction
(result, design_matrix, warnings)¶ 
design_matrix
¶ Alias for field number 1

result
¶ Alias for field number 0

warnings
¶ Alias for field number 2


eemeter.
fit_caltrack_usage_per_day_model
(data, fit_cdd=True, use_billing_presets=False, minimum_non_zero_cdd=10, minimum_non_zero_hdd=10, minimum_total_cdd=20, minimum_total_hdd=20, beta_cdd_maximum_p_value=1, beta_hdd_maximum_p_value=1, weights_col=None, fit_intercept_only=True, fit_cdd_only=True, fit_hdd_only=True, fit_cdd_hdd=True)¶ CalTRACK daily and billing methods using a usageperday modeling strategy.
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns each of the formhdd_<heating_balance_point>
andcdd_<cooling_balance_point>
. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods. Should have apandas.DatetimeIndex
.  fit_cdd (
bool
, optional) – If True, fit CDD models unless overridden byfit_cdd_only
orfit_cdd_hdd
flags. Should be set toFalse
for gas meter data.  use_billing_presets (
bool
, optional) – Use presets appropriate for billing models. Otherwise defaults are appropriate for daily models.  minimum_non_zero_cdd (
int
, optional) – Minimum allowable number of nonzero cooling degree day values.  minimum_non_zero_hdd (
int
, optional) – Minimum allowable number of nonzero heating degree day values.  minimum_total_cdd (
float
, optional) – Minimum allowable total sum of cooling degree day values.  minimum_total_hdd (
float
, optional) – Minimum allowable total sum of heating degree day values.  beta_cdd_maximum_p_value (
float
, optional) – The maximum allowable pvalue of the beta cdd parameter. The default value is the most permissive possible (i.e., 1). This is here for backwards compatibility with CalTRACK 1.0 methods.  beta_hdd_maximum_p_value (
float
, optional) – The maximum allowable pvalue of the beta hdd parameter. The default value is the most permissive possible (i.e., 1). This is here for backwards compatibility with CalTRACK 1.0 methods.  weights_col (
str
or None, optional) – The name of the column (if any) indata
to use as weights. Weight must be the number of days of data in the period.  fit_intercept_only (
bool
, optional) – If True, fit and consider intercept_only model candidates.  fit_cdd_only (
bool
, optional) – If True, fit and consider cdd_only model candidates. Ignored iffit_cdd=False
.  fit_hdd_only (
bool
, optional) – If True, fit and consider hdd_only model candidates.  fit_cdd_hdd (
bool
, optional) – If True, fit and consider cdd_hdd model candidates. Ignored iffit_cdd=False
.
Returns: model_results – Results of running CalTRACK daily method. See
eemeter.CalTRACKUsagePerDayModelResults
for more details.Return type:  data (

eemeter.
caltrack_sufficiency_criteria
(data_quality, requested_start, requested_end, num_days=365, min_fraction_daily_coverage=0.9, min_fraction_hourly_temperature_coverage_per_period=0.9)¶ CalTRACK daily data sufficiency criteria.
Note
For CalTRACK compliance,
min_fraction_daily_coverage
must be set at0.9
(section 2.2.1.2), and requested_start and requested_end must not be None (section 2.2.4).TODO: add warning for outliers (CalTrack 2.3.6)
Parameters:  data_quality (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and the two columnstemperature_null
, containing a count of null hourly temperature values for each meter value, andtemperature_not_null
, containing a count of notnull hourly temperature values for each meter value. Should have apandas.DatetimeIndex
.  requested_start (
datetime.datetime
, timezone aware (orNone
)) – The desired start of the period, if any, especially if this is different from the start of the data. If given, warnings are reported on the basis of this start date instead of data start date. Must be explicitly set toNone
in order to use data start date.  requested_end (
datetime.datetime
, timezone aware (orNone
)) – The desired end of the period, if any, especially if this is different from the end of the data. If given, warnings are reported on the basis of this end date instead of data end date. Must be explicitly set toNone
in order to use data end date.  num_days (
int
, optional) – Exact number of days allowed in data, including extent given byrequested_start
orrequested_end
, if given.  min_fraction_daily_coverage (:any:, optional) – Minimum fraction of days of data in total data extent for which data must be available.
 min_fraction_hourly_temperature_coverage_per_period=0.9, – Minimum fraction of hours of temperature data coverage in a particular period. Anything below this causes the whole period to be considered considered missing.
Returns: data_sufficiency – The an object containing sufficiency status and warnings for this data.
Return type:  data_quality (

eemeter.
caltrack_usage_per_day_predict
(model_type, model_params, prediction_index, temperature_data, degree_day_method='daily', with_disaggregated=False, with_design_matrix=False)¶ CalTRACK predict method.
Given a model type, parameters, hourly temperatures, a
pandas.DatetimeIndex
index over which to predict meter usage, return model predictions as totals for the period (so billing period totals, daily totals, etc.). Optionally include the computed design matrix or disaggregated usage in the output dataframe.Parameters:  model_type (
str
) – Model type (e.g.,'cdd_hdd'
).  model_params (
Dictionary displays
) – Parameters as stored ineemeter.CalTRACKUsagePerDayCandidateModel.model_params
.  temperature_data (
pandas.DataFrame
) – Hourly temperature data to use for prediction. Time period should match theprediction_index
argument.  prediction_index (
pandas.DatetimeIndex
) – Time period over which to predict.  with_disaggregated (
bool
, optional) – If True, return results as apandas.DataFrame
with columns'base_load'
,'heating_load'
, and'cooling_load'
.  with_design_matrix (
bool
, optional) – If True, return results as apandas.DataFrame
with columns'n_days'
,'n_days_dropped'
,n_days_kept
, andtemperature_mean
.
Returns:  prediction (
pandas.DataFrame
) – Columns are as follows:predicted_usage
: Predicted usage values computed to matchprediction_index
.base_load
: modeled base load (only forwith_disaggregated=True
).cooling_load
: modeled cooling load (only forwith_disaggregated=True
).heating_load
: modeled heating load (only forwith_disaggregated=True
).n_days
: number of days in period (only forwith_design_matrix=True
).n_days_dropped
: number of days dropped because of insufficient data (only forwith_design_matrix=True
).n_days_kept
: number of days kept because of sufficient data (only forwith_design_matrix=True
).temperature_mean
: mean temperature during given period. (only forwith_design_matrix=True
).
 predict_warnings (:any: list of EEMeterWarning if any.)
 model_type (

eemeter.
plot_caltrack_candidate
(candidate, best=False, ax=None, title=None, figsize=None, temp_range=None, alpha=None, **kwargs)¶ Plot a CalTRACK candidate model.
Parameters:  candidate (
eemeter.CalTRACKUsagePerDayCandidateModel
) – A candidate model with a predict function.  best (
bool
, optional) – Whether this is the best candidate or not.  ax (
matplotlib.axes.Axes
, optional) – Existing axes to plot on.  title (
str
, optional) – Chart title.  figsize (
tuple
, optional) – (width, height) of chart.  temp_range (
tuple
, optional) – (min, max) temperatures to plot model.  alpha (
float
between 0 and 1, optional) – Transparency, 0 fully transparent, 1 fully opaque.  **kwargs – Keyword arguments for
matplotlib.axes.Axes.plot
Returns: ax – Matplotlib axes.
Return type:  candidate (

eemeter.
get_too_few_non_zero_degree_day_warning
(model_type, balance_point, degree_day_type, degree_days, minimum_non_zero)¶ Return an empty list or a single warning wrapped in a list regarding nonzero degree days for a set of degree days.
Parameters:  model_type (
str
) – Model type (e.g.,'cdd_hdd'
).  balance_point (
float
) – The balance point in question.  degree_day_type (
str
) – The type of degree days ('cdd'
or'hdd'
).  degree_days (
pandas.Series
) – A series of degree day values.  minimum_non_zero (
int
) – Minimum allowable number of nonzero degree day values.
Returns: warnings – Empty list or list of single warning.
Return type:  model_type (

eemeter.
get_total_degree_day_too_low_warning
(model_type, balance_point, degree_day_type, avg_degree_days, period_days, minimum_total)¶ Return an empty list or a single warning wrapped in a list regarding the total summed degree day values.
Parameters:  model_type (
str
) – Model type (e.g.,'cdd_hdd'
).  balance_point (
float
) – The balance point in question.  degree_day_type (
str
) – The type of degree days ('cdd'
or'hdd'
).  avg_degree_days (
pandas.Series
) – A series of degree day values.  period_days (
pandas.Series
) – A series of containing day counts.  minimum_total (
float
) – Minimum allowable total sum of degree day values.
Returns: warnings – Empty list or list of single warning.
Return type:  model_type (

eemeter.
get_parameter_negative_warning
(model_type, model_params, parameter)¶ Return an empty list or a single warning wrapped in a list indicating whether model parameter is negative.
Parameters:  model_type (
str
) – Model type (e.g.,'cdd_hdd'
).  model_params (
Dictionary displays
) – Parameters as stored ineemeter.CalTRACKUsagePerDayCandidateModel.model_params
.  parameter (
str
) – The name of the parameter, e.g.,'intercept'
.
Returns: warnings – Empty list or list of single warning.
Return type:  model_type (

eemeter.
get_parameter_p_value_too_high_warning
(model_type, model_params, parameter, p_value, maximum_p_value)¶ Return an empty list or a single warning wrapped in a list indicating whether model parameter pvalue is too high.
Parameters:  model_type (
str
) – Model type (e.g.,'cdd_hdd'
).  model_params (
Dictionary displays
) – Parameters as stored ineemeter.CalTRACKUsagePerDayCandidateModel.model_params
.  parameter (
str
) – The name of the parameter, e.g.,'intercept'
.  p_value (
float
) – The pvalue of the parameter.  maximum_p_value (
float
) – The maximum allowable pvalue of the parameter.
Returns: warnings – Empty list or list of single warning.
Return type:  model_type (

eemeter.
get_single_cdd_only_candidate_model
(data, minimum_non_zero_cdd, minimum_total_cdd, beta_cdd_maximum_p_value, weights_col, balance_point)¶ Return a single candidate cddonly model for a particular balance point.
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
andcdd_<balance_point>
DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  minimum_non_zero_cdd (
int
) – Minimum allowable number of nonzero cooling degree day values.  minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values.  beta_cdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta cdd parameter.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.  balance_point (
float
) – The cooling balance point for this model.
Returns: candidate_model – A single cddonly candidate model, with any associated warnings.
Return type:  data (

eemeter.
get_single_hdd_only_candidate_model
(data, minimum_non_zero_hdd, minimum_total_hdd, beta_hdd_maximum_p_value, weights_col, balance_point)¶ Return a single candidate hddonly model for a particular balance point.
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
andhdd_<balance_point>
DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  minimum_non_zero_hdd (
int
) – Minimum allowable number of nonzero heating degree day values.  minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values.  beta_hdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta hdd parameter.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.  balance_point (
float
) – The heating balance point for this model.
Returns: candidate_model – A single hddonly candidate model, with any associated warnings.
Return type:  data (

eemeter.
get_single_cdd_hdd_candidate_model
(data, minimum_non_zero_cdd, minimum_non_zero_hdd, minimum_total_cdd, minimum_total_hdd, beta_cdd_maximum_p_value, beta_hdd_maximum_p_value, weights_col, cooling_balance_point, heating_balance_point)¶ Return and fit a single candidate cdd_hdd model for a particular selection of cooling balance point and heating balance point
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
andhdd_<heating_balance_point>
andcdd_<cooling_balance_point>
DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  minimum_non_zero_cdd (
int
) – Minimum allowable number of nonzero cooling degree day values.  minimum_non_zero_hdd (
int
) – Minimum allowable number of nonzero heating degree day values.  minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values.  minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values.  beta_cdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta cdd parameter.  beta_hdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta hdd parameter.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.  cooling_balance_point (
float
) – The cooling balance point for this model.  heating_balance_point (
float
) – The heating balance point for this model.
Returns: candidate_model – A single cddhdd candidate model, with any associated warnings.
Return type:  data (

eemeter.
get_intercept_only_candidate_models
(data, weights_col)¶ Return a list of a single candidate interceptonly model.
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – List containing a single interceptonly candidate model.
Return type:  data (

eemeter.
get_cdd_only_candidate_models
(data, minimum_non_zero_cdd, minimum_total_cdd, beta_cdd_maximum_p_value, weights_col)¶ Return a list of all possible candidate cddonly models.
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns with names of the formcdd_<balance_point>
. All columns with names of this form will be used to fit a candidate model. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  minimum_non_zero_cdd (
int
) – Minimum allowable number of nonzero cooling degree day values.  minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values.  beta_cdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta cdd parameter.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – A list of cddonly candidate models, with any associated warnings.
Return type:  data (

eemeter.
get_hdd_only_candidate_models
(data, minimum_non_zero_hdd, minimum_total_hdd, beta_hdd_maximum_p_value, weights_col)¶ Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns with names of the formhdd_<balance_point>
. All columns with names of this form will be used to fit a candidate model. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  minimum_non_zero_hdd (
int
) – Minimum allowable number of nonzero heating degree day values.  minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values.  beta_hdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta hdd parameter.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – A list of hddonly candidate models, with any associated warnings.
Return type:  data (

eemeter.
get_cdd_hdd_candidate_models
(data, minimum_non_zero_cdd, minimum_non_zero_hdd, minimum_total_cdd, minimum_total_hdd, beta_cdd_maximum_p_value, beta_hdd_maximum_p_value, weights_col)¶ Return a list of candidate cdd_hdd models for a particular selection of cooling balance point and heating balance point
Parameters:  data (
pandas.DataFrame
) – A DataFrame containing at least the columnmeter_value
and 1 to n columns each of the formhdd_<heating_balance_point>
andcdd_<cooling_balance_point>
. DataFrames of this form can be made using theeemeter.create_caltrack_daily_design_matrix
oreemeter.create_caltrack_billing_design_matrix
methods.  minimum_non_zero_cdd (
int
) – Minimum allowable number of nonzero cooling degree day values.  minimum_non_zero_hdd (
int
) – Minimum allowable number of nonzero heating degree day values.  minimum_total_cdd (
float
) – Minimum allowable total sum of cooling degree day values.  minimum_total_hdd (
float
) – Minimum allowable total sum of heating degree day values.  beta_cdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta cdd parameter.  beta_hdd_maximum_p_value (
float
) – The maximum allowable pvalue of the beta hdd parameter.  weights_col (
str
or None) – The name of the column (if any) indata
to use as weights.
Returns: candidate_models – A list of cdd_hdd candidate models, with any associated warnings.
Return type:  data (

eemeter.
select_best_candidate
(candidate_models)¶ Select and return the best candidate model based on rsquared and qualification.
Parameters: candidate_models ( list
ofeemeter.CalTRACKUsagePerDayCandidateModel
) – Candidate models to select from.Returns: (best_candidate, warnings) – Return the candidate model with highest rsquared or None if none meet the requirements, and a list of warnings about this selection (or lack of selection). Return type: tuple
ofeemeter.CalTRACKUsagePerDayCandidateModel
orNone
andlist
of eemeter.EEMeterWarning
Derivatives¶

eemeter.
metered_savings
(baseline_model, reporting_meter_data, temperature_data, with_disaggregated=False, confidence_level=0.9, predict_kwargs=None)¶ Compute metered savings, i.e., savings in which the baseline model is used to calculate the modeled usage in the reporting period. This modeled usage is then compared to the actual usage from the reporting period. Also compute two measures of the uncertainty of the aggregate savings estimate, a fractional savings uncertainty (FSU) error band and an OLS error band. (To convert the FSU error band into FSU, divide by total estimated savings.)
Parameters:  baseline_model (
eemeter.CalTRACKUsagePerDayModelResults
) – Object to use for predicting preintervention usage.  reporting_meter_data (
pandas.DataFrame
) – The observed reporting period data (totals). Savings will be computed for the periods supplied in the reporting period data.  temperature_data (
pandas.Series
) – Hourlyfrequency timeseries of temperature data during the reporting period.  with_disaggregated (
bool
, optional) – If True, calculate baseline counterfactual disaggregated usage estimates. Savings cannot be disaggregated for metered savings. For that, useeemeter.modeled_savings
.  confidence_level (
float
, optional) –The twotailed confidence level used to calculate the tstatistic used in calculation of the error bands.
Ignored if not computing error bands.
 predict_kwargs (
Dictionary displays
, optional) – Extra kwargs to pass to the baseline_model.predict method.
Returns: results (
pandas.DataFrame
) – DataFrame with metered savings, indexed withreporting_meter_data.index
. Will include the following columns:counterfactual_usage
(baseline model projected into reporting period)reporting_observed
(given by reporting_meter_data)metered_savings
If with_disaggregated is set to True, the following columns will also be in the results DataFrame:
counterfactual_base_load
counterfactual_heating_load
counterfactual_cooling_load
error_bands (
Dictionary displays
, optional) – If baseline_model is an instance of CalTRACKUsagePerDayModelResults, will also return a dictionary of FSU and OLS error bands for the aggregated energy savings over the post period.
 baseline_model (

eemeter.
modeled_savings
(baseline_model, reporting_model, result_index, temperature_data, with_disaggregated=False, predict_kwargs=None)¶ Compute modeled savings, i.e., savings in which baseline and reporting usage values are based on models. This is appropriate for annualizing or weather normalizing models.
Parameters:  baseline_model (
eemeter.CalTRACKUsagePerDayCandidateModel
) – Model to use for predicting preintervention usage.  reporting_model (
eemeter.CalTRACKUsagePerDayCandidateModel
) – Model to use for predicting postintervention usage.  result_index (
pandas.DatetimeIndex
) – The dates for which usage should be modeled.  temperature_data (
pandas.Series
) – Hourlyfrequency timeseries of temperature data during the modeled period.  with_disaggregated (
bool
, optional) – If True, calculate modeled disaggregated usage estimates and savings.  predict_kwargs (
Dictionary displays
, optional) – Extra kwargs to pass to the baseline_model.predict and reporting_model.predict methods.
Returns: results – DataFrame with modeled savings, indexed with the result_index. Will include the following columns:
modeled_baseline_usage
modeled_reporting_usage
modeled_savings
If with_disaggregated is set to True, the following columns will also be in the results DataFrame:
modeled_baseline_base_load
modeled_baseline_cooling_load
modeled_baseline_heating_load
modeled_reporting_base_load
modeled_reporting_cooling_load
modeled_reporting_heating_load
modeled_base_load_savings
modeled_cooling_load_savings
modeled_heating_load_savings
Return type:  baseline_model (
Exceptions¶

exception
eemeter.
EEMeterError
¶ Base class for EEmeter library errors.

exception
eemeter.
NoBaselineDataError
¶ Error indicating lack of baseline data.

exception
eemeter.
NoReportingDataError
¶ Error indicating lack of reporting data.

exception
eemeter.
MissingModelParameterError
¶ Error indicating missing model parameter.

exception
eemeter.
UnrecognizedModelTypeError
¶ Error indicating unrecognized model type.
Features¶

eemeter.
compute_usage_per_day_feature
(meter_data, series_name='usage_per_day')¶

eemeter.
compute_occupancy_feature
(hour_of_week, occupancy)¶

eemeter.
compute_temperature_features
(meter_data_index, temperature_data, heating_balance_points=None, cooling_balance_points=None, data_quality=False, temperature_mean=True, degree_day_method='daily', percent_hourly_coverage_per_day=0.5, percent_hourly_coverage_per_billing_period=0.9, use_mean_daily_values=True, tolerance=None, keep_partial_nan_rows=False)¶ Compute temperature features from hourly temperature data using the
pandas.DatetimeIndex
meter data..Creates a
pandas.DataFrame
with the same index as the meter data.Note
For CalTRACK compliance (2.2.2.3), must set
percent_hourly_coverage_per_day=0.5
,cooling_balance_points=range(30,90,X)
, andheating_balance_points=range(30,90,X)
, where X is either 1, 2, or 3. For natural gas meter use data, must setfit_cdd=False
.Note
For CalTRACK compliance (2.2.3.2), for billing methods, must set
percent_hourly_coverage_per_billing_period=0.9
.Note
For CalTRACK compliance (2.3.3),
meter_data_index
andtemperature_data
must both be timezoneaware and have matching timezones.Note
For CalTRACK compliance (3.3.1.1), for billing methods, must set
use_mean_daily_values=True
.Note
For CalTRACK compliance (3.3.1.2), for daily or billing methods, must set
degree_day_method=daily
.Parameters:  meter_data_index (
pandas.DataFrame
) – Apandas.DatetimeIndex
corresponding to the index over which to compute temperature features.  temperature_data (
pandas.Series
) – Series withpandas.DatetimeIndex
with hourly ('H'
) frequency and a set of temperature values.  cooling_balance_points (
list
ofint
orfloat
, optional) – List of cooling balance points for which to create cooling degree days.  heating_balance_points (
list
ofint
orfloat
, optional) – List of heating balance points for which to create heating degree days.  data_quality (
bool
, optional) – If True, compute data quality columns for temperature, i.e.,temperature_not_null
andtemperature_null
, containing for each meter value  temperature_mean (
bool
, optional) – If True, compute temperature means for each meter period.  degree_day_method (
str
,'daily'
or'hourly'
) – The method to use in calculating degree days.  percent_hourly_coverage_per_day (
str
, optional) – Percent hourly temperature coverage per day for heating and cooling degree days to not be dropped.  use_mean_daily_values (
bool
, optional) – If True, meter and degree day values should be mean daily values, not totals. If False, totals will be used instead.  tolerance (
pandas.Timedelta
, optional) – Do not merge more than this amount of temperature data beyond this limit.  keep_partial_nan_rows (
bool
, optional) – If True, keeps data in resultantpandas.DataFrame
that has missing temperature or meter data. Otherwise, these rows are overwritten entirely withnumpy.nan
values.
Returns: data – A dataset with the specified parameters.
Return type:  meter_data_index (

eemeter.
compute_temperature_bin_features
(temperatures, bin_endpoints)¶

eemeter.
compute_time_features
(index, hour_of_week=True, day_of_week=True, hour_of_day=True)¶

eemeter.
estimate_hour_of_week_occupancy
(data, segmentation=None, threshold=0.65)¶

eemeter.
fit_temperature_bins
(data, segmentation=None, default_bins=[30, 45, 55, 65, 75, 90], min_temperature_count=20)¶

eemeter.
get_missing_hours_of_week_warning
(hours_of_week)¶

eemeter.
merge_features
(features, keep_partial_nan_rows=False)¶
Input and Output Utilities¶

eemeter.
meter_data_from_csv
(filepath_or_buffer, tz=None, start_col='start', value_col='value', gzipped=False, freq=None, **kwargs)¶ Load meter data from a CSV file.
Default format:
start,value 20170101T00:00:00+00:00,0.31 20170102T00:00:00+00:00,0.4 20170103T00:00:00+00:00,0.58
Parameters:  filepath_or_buffer (
str
or filehandle) – File path or object.  tz (
str
, optional) – E.g.,'UTC'
or'US/Pacific'
 start_col (
str
, optional, default'start'
) – Date period start column.  value_col (
str
, optional, default'value'
) – Value column, can be in any unit.  gzipped (
bool
, optional) – Whether file is gzipped.  freq (
str
, optional) – If given, apply frequency to data usingpandas.DataFrame.resample
.  **kwargs – Extra keyword arguments to pass to
pandas.read_csv
, such assep=''
.
 filepath_or_buffer (

eemeter.
meter_data_from_json
(data, orient='list')¶ Load meter data from json.
Default format:
[ ['20170101T00:00:00+00:00', 3.5], ['20170201T00:00:00+00:00', 0.4], ['20170301T00:00:00+00:00', 0.46], ]
Parameters: data ( list
) – List elements are each a rows of data.Returns: df – DataFrame with a single column ( 'value'
) and apandas.DatetimeIndex
.Return type: pandas.DataFrame

eemeter.
meter_data_to_csv
(meter_data, path_or_buf)¶ Write meter data to CSV. See also
pandas.DataFrame.to_csv
.Parameters:  meter_data (
pandas.DataFrame
) – Meter data DataFrame with'value'
column andpandas.DatetimeIndex
.  path_or_buf (
str
or file handle, default None) – File path or object, if None is provided the result is returned as a string.
 meter_data (

eemeter.
temperature_data_from_csv
(filepath_or_buffer, tz=None, date_col='dt', temp_col='tempF', gzipped=False, freq=None, **kwargs)¶ Load temperature data from a CSV file.
Default format:
dt,tempF 20170101T00:00:00+00:00,21 20170101T01:00:00+00:00,22.5 20170101T02:00:00+00:00,23.5
Parameters:  filepath_or_buffer (
str
or filehandle) – File path or object.  tz (
str
, optional) – E.g.,'UTC'
or'US/Pacific'
 date_col (
str
, optional, default'dt'
) – Date period start column.  temp_col (
str
, optional, default'tempF'
) – Temperature column.  gzipped (
bool
, optional) – Whether file is gzipped.  freq (
str
, optional) – If given, apply frequency to data usingpandas.Series.resample
.  **kwargs – Extra keyword arguments to pass to
pandas.read_csv
, such assep=''
.
 filepath_or_buffer (

eemeter.
temperature_data_from_json
(data, orient='list')¶ Load temperature data from json. (Must be given in degrees Fahrenheit).
Default format:
[ ['20170101T00:00:00+00:00', 3.5], ['20170101T01:00:00+00:00', 5.4], ['20170101T02:00:00+00:00', 7.4], ]
Parameters: data ( list
) – List elements are each a rows of data.Returns: series – DataFrame with a single column ( 'tempF'
) and apandas.DatetimeIndex
.Return type: pandas.Series

eemeter.
temperature_data_to_csv
(temperature_data, path_or_buf)¶ Write temperature data to CSV. See also
pandas.DataFrame.to_csv
.Parameters:  temperature_data (
pandas.Series
) – Temperature data series withpandas.DatetimeIndex
.  path_or_buf (
str
or file handle, default None) – File path or object, if None is provided the result is returned as a string.
 temperature_data (
Metrics¶

class
eemeter.
ModelMetrics
(observed_input, predicted_input, num_parameters=1, autocorr_lags=1)¶ Contains measures of model fit and summary statistics on the input series.
Parameters:  observed_input (
pandas.Series
) – Series withpandas.DatetimeIndex
with a set of electricity or gas meter values.  predicted_input (
pandas.Series
) – Series withpandas.DatetimeIndex
with a set of electricity or gas meter values.  num_parameters (
int
, optional) – The number of parameters (excluding the intercept) used in the regression from which the predictions were derived.  autocorr_lags (
int
, optional) – The number of lags to use when calculating the autocorrelation of the residuals

merged_length
¶ int
– The length of the dataframe resulting from the inner join of the observed_input series and the predicted_input series.

r_squared_adj
¶ float
– The rsquared of the predicted_input series relative to the observed_input series, adjusted by the number of parameters in the model.

cvrmse
¶ float
– The coefficient of variation (rootmeansquared error) of the predicted_input series relative to the observed_input series.

cvrmse_adj
¶ float
– The coefficient of variation (rootmeansquared error) of the predicted_input series relative to the observed_input series, adjusted by the number of parameters in the model.

mape
¶ float
– The mean absolute percent error of the predicted_input series relative to the observed_input series.

mape_no_zeros
¶ float
– The mean absolute percent error of the predicted_input series relative to the observed_input series, with all time periods dropped where the observed_input series was not greater than zero.

num_meter_zeros
¶ int
– The number of time periods for which the observed_input series was not greater than zero.

nmae
¶ float
– The normalized mean absolute error of the predicted_input series relative to the observed_input series.

nmbe
¶ float
– The normalized mean bias error of the predicted_input series relative to the observed_input series.

autocorr_resid
¶ float
– The autocorrelation of the residuals (where the residuals equal the predicted_input series minus the observed_input series), measured using a number of lags equal to autocorr_lags.

json
()¶ Return a JSONserializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
 observed_input (
Sample Data¶

eemeter.
samples
()¶ Load a list of sample data identifiers.
Returns: samples – List of sample identifiers for use with eemeter.load_sample
.Return type: list
ofstr

eemeter.
load_sample
(sample)¶ Load meter data, temperature data, and metadata for associated with a particular sample identifier. Note: samples are simulated, not real, data.
Parameters: sample ( str
) – Identifier of sample. Complete list can be obtained witheemeter.samples
.Returns: meter_data, temperature_data, metadata – Meter data, temperature data, and metadata for this sample identifier. Return type: tuple
ofpandas.DataFrame
,pandas.Series
, andDictionary displays
Segmentation¶

eemeter.
iterate_segmented_dataset
(data, segmentation=None, feature_processor=None, feature_processor_kwargs=None, feature_processor_segment_name_mapping=None)¶

eemeter.
segment_time_series
(index, segment_type='single', drop_zero_weight_segments=False)¶

class
eemeter.
SegmentModel
(segment_name, model, formula, model_params, warnings=None)¶

class
eemeter.
SegmentedModel
(segment_models, prediction_segment_type, prediction_segment_name_mapping=None, prediction_feature_processor=None, prediction_feature_processor_kwargs=None)¶
Transformation utilities¶

eemeter.
as_freq
(meter_data_series, freq, atomic_freq='1 Min')¶ Resample meter data to a different frequency.
This method can be used to upsample or downsample meter data. The assumption it makes to do so is that meter data is constant and averaged over the given periods. For instance, to convert billingperiod data to daily data, this method first upsamples to the atomic frequency (1 minute freqency, by default), “spreading” usage evenly across all minutes in each period. Then it downsamples to hourly frequency and returns that result.
Caveats:
 This method gives a fair amount of flexibility in resampling as long as you are OK with the assumption that usage is constant over the period (this assumption is generally broken in observed data at large enough frequencies, so this caveat should not be taken lightly).
 This method should not be used for sampled (e.g., temperature data) rather than recorded data (e.g., meter data), as sampled data cannot be “spread” in the same way.
Parameters:  meter_data_series (
pandas.Series
) – Meter data to resample. Should have apandas.DatetimeIndex
.  freq (
str
) – The frequency to resample to. This should be given in a form recognized by thepandas.Series.resample
method.  atomic_freq (
str
, optional) – The “atomic” frequency of the intermediate data form. This can be adjusted to a higher atomic frequency to increase speed or memory performance.
Returns: resampled_meter_data – Meter data resampled to the given frequency.
Return type:

eemeter.
day_counts
(index)¶ Days between DatetimeIndex values as a
pandas.Series
.Parameters: index ( pandas.DatetimeIndex
) – The index for which to get day counts.Returns: day_counts – A pandas.Series
with counts of days between periods. Counts are given on start dates of periods.Return type: pandas.Series

eemeter.
get_baseline_data
(data, start=None, end=None, max_days=365)¶ Filter down to baseline period data.
Note
For compliance with CalTRACK, set
max_days=365
(section 2.2.1.1).Parameters:  data (
pandas.DataFrame
orpandas.Series
) – The data to filter to baseline data. This data will be filtered down to an acceptable baseline period according to the dates passed as start and end, or the maximum period specified with max_days.  start (
datetime.datetime
) – A timezoneaware datetime that represents the earliest allowable start date for the baseline data. The stricter of this or max_days is used to determine the earliest allowable baseline period date.  end (
datetime.datetime
) – A timezoneaware datetime that represents the latest allowable end date for the baseline data, i.e., the latest date for which data is available before the intervention begins.  max_days (
int
) – The maximum length of the period. Ignored if end is not set. The stricter of this or start is used to determine the earliest allowable baseline period date.
Returns: baseline_data, warnings – Data for only the specified baseline period and any associated warnings.
Return type: tuple
of (pandas.DataFrame
orpandas.Series
,list
ofeemeter.EEMeterWarning
) data (

eemeter.
get_reporting_data
(data, start=None, end=None, max_days=365)¶ Filter down to reporting period data.
Parameters:  data (
pandas.DataFrame
orpandas.Series
) – The data to filter to reporting data. This data will be filtered down to an acceptable reporting period according to the dates passed as start and end, or the maximum period specified with max_days.  start (datetime.datetime) – A timezoneaware datetime that represents the earliest allowable start date for the reporting data, i.e., the earliest date for which data is available after the intervention begins.
 end (datetime.datetime) – A timezoneaware datetime that represents the latest allowable end date for the reporting data. The stricter of this or max_days is used to determine the latest allowable reporting period date.
 max_days (int) – The maximum length of the period. Ignored if start is not set. The stricter of this or end is used to determine the latest allowable reporting period date.
Returns: reporting_data, warnings – Data for only the specified reporting period and any associated warnings.
Return type: tuple
of (pandas.DataFrame
orpandas.Series
,list
ofeemeter.EEMeterWarning
) data (

eemeter.
remove_duplicates
(df_or_series)¶ Remove duplicate rows or values by keeping the first of each duplicate.
Parameters: df_or_series ( pandas.DataFrame
orpandas.Series
) – Pandas object from which to drop duplicate index values.Returns: deduplicated – The deduplicated pandas object. Return type: pandas.DataFrame
orpandas.Series

eemeter.
overwrite_partial_rows_with_nan
(df)¶
Visualization¶

eemeter.
plot_time_series
(meter_data, temperature_data, **kwargs)¶ Plot meter and temperature data in dualaxes time series.
Parameters:  meter_data (
pandas.DataFrame
) – Apandas.DatetimeIndex
indexed DataFrame of meter data with the columnvalue
.  temperature_data (
pandas.Series
) – Apandas.DatetimeIndex
indexed Series of temperature data.  **kwargs – Arbitrary keyword arguments to pass to
plt.subplots
Returns: axes – Tuple of
(ax_meter_data, ax_temperature_data)
.Return type:  meter_data (

eemeter.
plot_energy_signature
(meter_data, temperature_data, temp_col=None, ax=None, title=None, figsize=None, **kwargs)¶ Plot meter and temperature data in energy signature.
Parameters:  meter_data (
pandas.DataFrame
) – Apandas.DatetimeIndex
indexed DataFrame of meter data with the columnvalue
.  temperature_data (
pandas.Series
) – Apandas.DatetimeIndex
indexed Series of temperature data.  temp_col (
str
, default'temperature_mean'
) – The name of the temperature column.  ax (
matplotlib.axes.Axes
) – The axis on which to plot.  title (
str
, optional) – Chart title.  figsize (
tuple
, optional) – (width, height) of chart.  **kwargs – Arbitrary keyword arguments to pass to
matplotlib.axes.Axes.scatter
.
Returns: ax – Matplotlib axes.
Return type:  meter_data (
Warnings¶

class
eemeter.
EEMeterWarning
(qualified_name, description, data)¶ An object representing a warning and data associated with it.

data
¶ Dictionary displays
– Data that reproducibly shows why the warning was issued. Data should be JSON serializable.

json
()¶ Return a JSONserializable representation of this result.
The output of this function can be converted to a serialized string with
json.dumps
.
