Project Overview
Objective
Your assignment is to clean data obtained from an electrical meter and calculate the total of a utility bill.
Tariff Description
There are three components for this specified tariff: demand charges, energy charges, and other charges. Demand charges are dependent on the highest energy usage in any 15 min interval (12:00 - 12:15, 12:15 - 12:30, 12:30 - 12:45, etc.). This value is in kW (power). Energy charges are based on energy usage during different time of use periods defined below, using the unit kWh. Other charges include a customer charge, which is charged for every day in the billing period, and a tax on total electricity consumed.
Data
Stored in data.csv is data received from an electrical meter. Columns: epochMilliseconds = UTC time in milliseconds. cumulativeEnergy = cumulative energy consumption in kWh Notes: Convert times into Pacific time. cumulativeEnergy may need to undergo cleaning processes.
###Tariff Details
Demand Charges
$14.31 /kW
Energy Charges
Peak : $0.19472 /kWh
Hours : 16,17,18,19,20
Months : 1,2,3,4,5,6,7,8,9,10,11,12
Off Peak : $0.15924 /kWh Hours : 9,10,11,12,13 Months : 3,4,5
Super Off Peak : $0.1229 /kWh
Hours : all other energy usage not included in peak and off peak usage
Other Charges
Customer Charge : $5.47664 /day Tax : $0.003 /kWh
Project Report
Data Analysis Answer
Jiashu Miao 8/22/2021
DATA PREPROCESSING AND EXPLORATION
Import Packages and Change Setting
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import numpy as np
import pandas as pd
import seaborn as sns
import datetime
Load and Inspect the Data
energy = pd.read_csv('data.csv')
energy['date'] = energy['epochMilliseconds']
energy.ndim
energy.shape
energy.size
energy.head(10)
energy.describe()
2
(73756, 3)
221268
epochMilliseconds | cumulativeEnergy | date | |
---|---|---|---|
0 | 1612166412948 | 162.05 | 1612166412948 |
1 | 1612166444265 | 162.05 | 1612166444265 |
2 | 1612166472972 | 162.05 | 1612166472972 |
3 | 1612166502960 | 162.05 | 1612166502960 |
4 | 1612166534365 | 162.05 | 1612166534365 |
5 | 1612166564315 | 162.05 | 1612166564315 |
6 | 1612166592978 | 162.05 | 1612166592978 |
7 | 1612166624265 | 162.05 | 1612166624265 |
8 | 1612166654255 | 162.05 | 1612166654255 |
9 | 1612166682983 | 162.05 | 1612166682983 |
epochMilliseconds | cumulativeEnergy | date | |
---|---|---|---|
count | 7.375600e+04 | 73756.000000 | 7.375600e+04 |
mean | 1.614570e+12 | 132.536504 | 1.614570e+12 |
std | 1.355367e+09 | 48.314213 | 1.355367e+09 |
min | 1.612166e+12 | 0.150000 | 1.612166e+12 |
25% | 1.613307e+12 | 120.650000 | 1.613307e+12 |
50% | 1.614599e+12 | 158.850000 | 1.614599e+12 |
75% | 1.615731e+12 | 161.850000 | 1.615731e+12 |
max | 1.617260e+12 | 190.450000 | 1.617260e+12 |
Check Special Values, Missings, Nulls
# check null
energy.isnull().sum()
# check na
energy.isna().sum()
# check 0 and None
energy.isin([0,None]).sum()
epochMilliseconds 0
cumulativeEnergy 0
date 0
dtype: int64
epochMilliseconds 0
cumulativeEnergy 0
date 0
dtype: int64
epochMilliseconds 0
cumulativeEnergy 0
date 0
dtype: int64
Data Exploration on distribution, frequency, etc.
sns.distplot(energy.cumulativeEnergy, hist=True, kde=False,
color = 'blue',
hist_kws={'edgecolor':'black'})
<AxesSubplot:xlabel='cumulativeEnergy'>
energy[['cumulativeEnergy']].value_counts().iloc[0:10]
cumulativeEnergy
158.85 12874
161.85 9206
120.65 6385
164.85 6332
159.05 6295
151.65 1211
10.85 810
156.65 793
177.65 730
155.05 694
dtype: int64
Data Operation and Manipulation
Convert epoch time to UTC first of all
energy['newdate'] = energy['epochMilliseconds']/1000
for i in range(0,len(energy)):
energy['newdate'][i] = datetime.datetime.utcfromtimestamp(energy['newdate'][i]).strftime('%Y-%m-%d %H:%M:%S')
/Users/MichaelMiao/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This is separate from the ipykernel package so we can avoid doing imports until
/Users/MichaelMiao/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:670: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value)
energy.head(3)
energy.tail(3)
epochMilliseconds | cumulativeEnergy | date | newdate | |
---|---|---|---|---|
0 | 1612166412948 | 162.05 | 1612166412948 | 2021-02-01 08:00:12 |
1 | 1612166444265 | 162.05 | 1612166444265 | 2021-02-01 08:00:44 |
2 | 1612166472972 | 162.05 | 1612166472972 | 2021-02-01 08:01:12 |
epochMilliseconds | cumulativeEnergy | date | newdate | |
---|---|---|---|---|
73753 | 1617260336317 | 70.45 | 1617260336317 | 2021-04-01 06:58:56 |
73754 | 1617260366397 | 70.45 | 1617260366397 | 2021-04-01 06:59:26 |
73755 | 1617260396158 | 70.45 | 1617260396158 | 2021-04-01 06:59:56 |
Convert Time Zone to Pacific
#Modules needed
import datetime
import pytz
import pytz
energy.index = pd.to_datetime(energy.newdate)
pacific = pytz.timezone('US/Pacific')
energy.index = energy.index.tz_localize(pytz.utc).tz_convert(pacific)
Find Total Energy Consumption
- find max cumulative energy if there is energy use in that day
energy_max_day=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('8:00','23:59').resample("1440T").max().fillna(0)
energy_min_day=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].resample("1440T").min().fillna(0)
energy_new_day = (energy_max_day - energy_min_day)
energy_new_day = energy_new_day.where(energy_new_day['cumulativeEnergy']==0,energy_max_day)
energy_new_day = energy_new_day.where(energy_max_day > energy_min_day,0)
pd.set_option('display.max_rows', None)
energy_new_day.sum()
cumulativeEnergy 2827.95
dtype: float64
ee = pd.DataFrame()
ee[['max']]= energy_max_day
ee[['min']]= energy_min_day
ee[['day']]= energy_new_day
ee
pd.set_option('display.max_rows', 10)
max | min | day | |
---|---|---|---|
2021-02-15 00:00:00-08:00 | 161.85 | 161.85 | 0.00 |
2021-02-16 00:00:00-08:00 | 171.25 | 0.65 | 171.25 |
2021-02-17 00:00:00-08:00 | 150.65 | 0.15 | 150.65 |
2021-02-18 00:00:00-08:00 | 145.25 | 0.45 | 145.25 |
2021-02-19 00:00:00-08:00 | 0.00 | 145.25 | 0.00 |
2021-02-20 00:00:00-08:00 | 0.00 | 0.00 | 0.00 |
2021-02-21 00:00:00-08:00 | 0.00 | 0.00 | 0.00 |
2021-02-22 00:00:00-08:00 | 183.25 | 0.35 | 183.25 |
2021-02-23 00:00:00-08:00 | 125.85 | 0.45 | 125.85 |
2021-02-24 00:00:00-08:00 | 140.45 | 0.65 | 140.45 |
2021-02-25 00:00:00-08:00 | 150.85 | 0.45 | 150.85 |
2021-02-26 00:00:00-08:00 | 164.85 | 0.65 | 164.85 |
2021-02-27 00:00:00-08:00 | 164.85 | 164.85 | 0.00 |
2021-02-28 00:00:00-08:00 | 164.85 | 164.85 | 0.00 |
2021-03-01 00:00:00-08:00 | 135.25 | 0.25 | 135.25 |
2021-03-02 00:00:00-08:00 | 136.45 | 0.35 | 136.45 |
2021-03-03 00:00:00-08:00 | 144.45 | 0.45 | 144.45 |
2021-03-04 00:00:00-08:00 | 155.25 | 0.45 | 155.25 |
2021-03-05 00:00:00-08:00 | 120.65 | 0.65 | 120.65 |
2021-03-06 00:00:00-08:00 | 120.65 | 120.65 | 0.00 |
2021-03-07 00:00:00-08:00 | 120.65 | 120.65 | 0.00 |
2021-03-08 00:00:00-08:00 | 166.45 | 0.65 | 166.45 |
2021-03-09 00:00:00-08:00 | 150.05 | 0.15 | 150.05 |
2021-03-10 00:00:00-08:00 | 153.25 | 0.45 | 153.25 |
2021-03-11 00:00:00-08:00 | 123.25 | 0.15 | 123.25 |
2021-03-12 00:00:00-08:00 | 158.85 | 0.45 | 158.85 |
2021-03-13 00:00:00-08:00 | 158.85 | 158.85 | 0.00 |
2021-03-14 00:00:00-08:00 | 158.85 | 158.85 | 0.00 |
2021-03-15 01:00:00-07:00 | 151.65 | 0.25 | 151.65 |
Find total peak hour usage
energy_max_peak=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('16:00','20:59').resample("1440T").max().fillna(0)
energy_min_peak=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('16:00','20:59').resample("1440T").min().fillna(0)
energy_peak = (energy_max_peak - energy_min_peak)
pd.set_option('display.max_rows', 10)
energy_peak.sum()
cumulativeEnergy 19.5
dtype: float64
Find total offpeak usage
energy_max_offpeak=energy[['cumulativeEnergy']].loc['2021-03-01':'2021-03-15'].between_time('09:00','13:59').resample("1440T").max().fillna(0)
energy_min_offpeak=energy[['cumulativeEnergy']].loc['2021-03-01':'2021-03-15'].between_time('09:00','13:59').resample("1440T").min().fillna(0)
energy_offpeak = (energy_max_offpeak - energy_min_offpeak)
pd.set_option('display.max_rows', 10)
energy_offpeak.sum()
cumulativeEnergy 0.0
dtype: float64
Find the other super offpeak usage
energy_max_spoffpeak=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-02-28'].between_time('09:00','13:59').resample("1440T").max().fillna(0)
energy_min_spoffpeak=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-02-28'].between_time('09:00','13:59').resample("1440T").min().fillna(0)
energy_spoffpeak = (energy_max_spoffpeak - energy_min_spoffpeak)
pd.set_option('display.max_rows', 10)
energy_spoffpeak.sum()
cumulativeEnergy 0.0
dtype: float64
energy_max_spoffpeak2=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('00:00','08:59').resample("1440T").max().fillna(0)
energy_min_spoffpeak2=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('00:00','08:59').resample("1440T").min().fillna(0)
energy_spoffpeak2 = (energy_max_spoffpeak2 - energy_min_spoffpeak2)
pd.set_option('display.max_rows', 10)
energy_spoffpeak2.sum()
cumulativeEnergy 1.2
dtype: float64
energy_max_spoffpeak3=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('21:00','23:59').resample("1440T").max().fillna(0)
energy_min_spoffpeak3=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('21:00','23:59').resample("1440T").min().fillna(0)
energy_spoffpeak3 = (energy_max_spoffpeak3 - energy_min_spoffpeak3)
pd.set_option('display.max_rows', 10)
energy_spoffpeak3.sum()
cumulativeEnergy 2805.8
dtype: float64
energy_max_spoffpeak4=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('14:00','15:59').resample("1440T").max().fillna(0)
energy_min_spoffpeak4=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].between_time('14:00','15:59').resample("1440T").min().fillna(0)
energy_spoffpeak4 = (energy_max_spoffpeak4 - energy_min_spoffpeak4)
pd.set_option('display.max_rows', 10)
energy_spoffpeak4.sum()
cumulativeEnergy 0.0
dtype: float64
Find the number of days
energy_count_day=energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].resample("1440T").max().fillna(0).count()
energy_count_day
cumulativeEnergy 29
dtype: int64
Find the highest usage for each 15-min
- highest usage in each 15 min each day as the demand
- count the total demand
demand = energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].resample("15T").max().fillna(0) - energy[['cumulativeEnergy']].loc['2021-02-15':'2021-03-15'].resample("15T").min().fillna(0)
total_demand= demand.groupby([demand.index.date]).max().sum()
total_demand
cumulativeEnergy 444.3
dtype: float64
Calculate the Results
Find total cost
total_cost =total_demand * 14.31 + energy_peak.sum() * 0.19472 + (energy_spoffpeak2.sum() + energy_spoffpeak3.sum()) * 0.1229 + energy_count_day * 5.47664 + energy_new_day.sum() * 0.003
print("The total cost for the energy would be: ", "$",round(total_cost[0],2),"","\n")
The total cost for the energy would be: $ 6874.02
edited by Jiashu miao edited by Jiashu Miao :) 08/22/2021