Profit Time Series
Code
[Notice] download here
Learning Goals
시계열 예측을 위한 페이스북 Propjet 이해 Understanding Facebook Propjet for Time Series Prediction
PART 1: Chicago Crime Rate
Description
절도범이 어느 시간대에 가장 잘 잡히는지, 범죄율이 올라가는 가장 높은 시간대는 언제인지 등을 관찰해보고 Prophet 활용하여 미래 ‘Crime’ 결과도 예측해본다. Observing when thieves are best caught and when the crime rate rises the most, and predict future ‘Crime’ results using Prophet.
Observing the dataset
Dataset contains the following columns:
- ID: Unique identifier for the record.
- Case Number: The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.
- Date: Date when the incident occurred.
- Block: address where the incident occurred
- IUCR: The Illinois Unifrom Crime Reporting code.
- Primary Type: The primary description of the IUCR code.
- Description: The secondary description of the IUCR code, a subcategory of the primary description.
- Location Description: Description of the location where the incident occurred.
- Arrest: Indicates whether an arrest was made.
- Domestic: Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
- Beat: Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car.
- District: Indicates the police district where the incident occurred.
- Ward: The ward (City Council district) where the incident occurred.
- Community Area: Indicates the community area where the incident occurred. Chicago has 77 community areas.
- FBI Code: Indicates the crime classification as outlined in the FBI’s National Incident-Based Reporting System (NIBRS).
- X Coordinate: The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection.
- Y Coordinate: The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection.
- Year: Year the incident occurred.
- Updated On: Date and time the record was last updated.
- Latitude: The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
- Longitude: The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
- Location: The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.
Datasource: https://www.kaggle.com/currie32/crimes-in-chicago
Loading the dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import seaborn as sns
from fbprophet import Prophet
# training and testing datasets
chicago_df_1 = pd.read_csv('Chicago_Crimes_2005_to_2007.csv', error_bad_lines=False) # error_bad_lines: 손상된 줄이나 누락된 행을 무시한다 Ignoring corrupted or missing lines
chicago_df_2 = pd.read_csv('Chicago_Crimes_2008_to_2011.csv', error_bad_lines=False)
chicago_df_3 = pd.read_csv('Chicago_Crimes_2012_to_2017.csv', error_bad_lines=False)
chicago_df = pd.concat([chicago_df_1, chicago_df_2, chicago_df_3], ignore_index=False, axis=0) # concatnate dataframes
Organizing the dataset
# 불필요한 열 제거하기
chicago_df.drop(['Unnamed: 0', 'Case Number', 'Case Number', 'IUCR', 'X Coordinate', 'Y Coordinate','Updated On','Year', 'FBI Code', 'Beat','Ward','Community Area', 'Location', 'District', 'Latitude' , 'Longitude'], inplace=True, axis=1)
inplace: 메모리에서 실제열(= 불필요한열)을 삭제한다 Delete real rows (= unnecessary rows) from memory
axis=1: 전체 열을 탈락시킨다 dissipating all heat
하기 코드는 이 프로젝트의 시계열 처리에 수반되는 전처리 과정이다. The following code is the preprocessing process involved in the time series processing of this project.
# Date 형식 수정 Modifying Date Format
chicago_df.Date = pd.to_datetime(chicago_df.Date, format='%m/%d/%Y %I:%M:%S %p')
# Date을 인덱스로 활용한다 Usnig Date as an index
chicago_df.index = pd.DatetimeIndex(chicago_df.Date)
DatetimeIndex: 특정한 순간에 기록된 타임스탬프(timestamp) 형식의 시계열 자료를 다루기 위한 인덱스 Index for handling time series data in timestamp format recorded at a specific moment
Data Visualization
plt.figure(figsize=(10,10))
sns.heatmap(chicago_df.isnull(), cbar = False, cmap = 'YlGnBu')
# 어떤 종류의 폭력이 가장 많이 발생했나 What kind of violence occurred the most
plt.figure(figsize = (15, 10))
sns.countplot(y= 'Primary Type', data = chicago_df, order = chicago_df['Primary Type'].value_counts().iloc[:15].index)
‘MOTOR VEHICLE THEFT’ 대략 20만 여개의 차량이 도난됐다. ‘MOTOR VEHICLE THEFT’ About 200,000 vehicles were stolen.
# 어느 지역에서 가장 폭력이 많이 발생했는가 Which region has the most violence?
plt.figure(figsize = (15, 10))
sns.countplot(y= 'Location Description', data = chicago_df, order = chicago_df['Location Description'].value_counts().iloc[:15].index)
‘거리’에서 발생한 폭력이 가장 많은 것을 확인해볼 수 있다. It can be seen that the most violence occurred on the ‘street’.
# 특정 연도에 범죄가 얼마나 발생했나 How many crimes occurred in a particular year
plt.plot(chicago_df.resample('Y').size()) # 연도(Y) 기준으로 resample하여 특정 연도에 발생한 샘플 개수(사건 수)를 도출 Resampling based on year (Y) to derive # samples (# events) that occurred in a specific year
plt.title('Crimes Count Per Year')
plt.xlabel('Years')
plt.ylabel('Number of Crimes')
Date
2005-12-31 455811
2006-12-31 794684
2007-12-31 621848
2008-12-31 852053
2009-12-31 783900
2010-12-31 700691
2011-12-31 352066
2012-12-31 335670
2013-12-31 306703
2014-12-31 274527
2015-12-31 262995
2016-12-31 265462
2017-12-31 11357
# 특정 달에 범죄가 얼마나 발생했나 How many crimes occurred in a particular month
plt.plot(chicago_df.resample('M').size())
plt.title('Crimes Count Per Month')
plt.xlabel('Months')
plt.ylabel('Number of Crimes')
# 특정 분기에 범죄가 얼마나 발생했나 How many crimes occurred in a particular quarter
plt.plot(chicago_df.resample('Q').size())
plt.title('Crimes Count Per Quarter')
plt.xlabel('Quarters')
plt.ylabel('Number of Crimes')
Data Preprocessing
chicago_prophet = chicago_df.resample('M').size().reset_index() # 인덱스화 되어있는 테이블을 초기화시킨다 Initializing an indexed table
chicago_prophet.columns = ['Date', 'Crime Count']
chicago_prophet_df = pd.DataFrame(chicago_prophet)
chicago_prophet_df_final = chicago_prophet_df.rename(columns={'Date':'ds', 'Crime Count':'y'})
Prediction
m = Prophet() # 'Crime의' 미래를 예측하는 역할 Predicting the future of 'Crime'
m.fit(chicago_prophet_df_final)
# Forcasting into the future
future = m.make_future_dataframe(periods=365) # 앞으로 1년 동안의 'Crime'을 Prophet 활용하여 예측 Prediction using Prophet 'Crime' for the next year
forecast = m.predict(future)
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2005-01-31 | 60454.550849 | 38579.873129 | 72458.344880 | 60454.550849 | 60454.550849 | -4762.896867 | -4762.896867 | -4762.896867 | -4762.896867 | -4762.896867 | -4762.896867 | 0.0 | 0.0 | 0.0 | 55691.653982 |
1 | 2005-02-28 | 60322.147047 | 34221.041714 | 67421.454566 | 60322.147047 | 60322.147047 | -9500.949898 | -9500.949898 | -9500.949898 | -9500.949898 | -9500.949898 | -9500.949898 | 0.0 | 0.0 | 0.0 | 50821.197149 |
2 | 2005-03-31 | 60175.557124 | 42823.370306 | 75698.921616 | 60175.557124 | 60175.557124 | -1224.296867 | -1224.296867 | -1224.296867 | -1224.296867 | -1224.296867 | -1224.296867 | 0.0 | 0.0 | 0.0 | 58951.260257 |
3 | 2005-04-30 | 60033.695908 | 44555.300811 | 79384.772219 | 60033.695908 | 60033.695908 | 1182.976012 | 1182.976012 | 1182.976012 | 1182.976012 | 1182.976012 | 1182.976012 | 0.0 | 0.0 | 0.0 | 61216.671919 |
4 | 2005-05-31 | 59887.105985 | 49410.229024 | 81613.777868 | 59887.105985 | 59887.105985 | 5498.632207 | 5498.632207 | 5498.632207 | 5498.632207 | 5498.632207 | 5498.632207 | 0.0 | 0.0 | 0.0 | 65385.738191 |
5 | 2005-06-30 | 59745.244769 | 47098.245873 | 78678.030782 | 59745.244769 | 59745.244769 | 3577.501610 | 3577.501610 | 3577.501610 | 3577.501610 | 3577.501610 | 3577.501610 | 0.0 | 0.0 | 0.0 | 63322.746379 |
6 | 2005-07-31 | 59598.654838 | 47806.382068 | 80665.905762 | 59598.654838 | 59598.654838 | 4583.361194 | 4583.361194 | 4583.361194 | 4583.361194 | 4583.361194 | 4583.361194 | 0.0 | 0.0 | 0.0 | 64182.016032 |
7 | 2005-08-31 | 59452.064908 | 47368.296071 | 80756.033012 | 59452.064908 | 59452.064908 | 4499.375562 | 4499.375562 | 4499.375562 | 4499.375562 | 4499.375562 | 4499.375562 | 0.0 | 0.0 | 0.0 | 63951.440470 |
8 | 2005-09-30 | 59310.203685 | 44535.626188 | 77006.014932 | 59310.203685 | 59310.203685 | 1749.549105 | 1749.549105 | 1749.549105 | 1749.549105 | 1749.549105 | 1749.549105 | 0.0 | 0.0 | 0.0 | 61059.752790 |
9 | 2005-10-31 | 59163.613755 | 45431.878281 | 78473.609767 | 59163.613755 | 59163.613755 | 2397.346677 | 2397.346677 | 2397.346677 | 2397.346677 | 2397.346677 | 2397.346677 | 0.0 | 0.0 | 0.0 | 61560.960432 |
10 | 2005-11-30 | 59021.752529 | 39855.843322 | 73045.342043 | 59021.752529 | 59021.752529 | -2065.033670 | -2065.033670 | -2065.033670 | -2065.033670 | -2065.033670 | -2065.033670 | 0.0 | 0.0 | 0.0 | 56956.718858 |
11 | 2005-12-31 | 58875.162595 | 35993.544575 | 71131.998977 | 58875.162595 | 58875.162595 | -5992.119657 | -5992.119657 | -5992.119657 | -5992.119657 | -5992.119657 | -5992.119657 | 0.0 | 0.0 | 0.0 | 52883.042938 |
12 | 2006-01-31 | 58728.572661 | 36668.886036 | 70839.471318 | 58728.572661 | 58728.572661 | -4772.659269 | -4772.659269 | -4772.659269 | -4772.659269 | -4772.659269 | -4772.659269 | 0.0 | 0.0 | 0.0 | 53955.913392 |
13 | 2006-02-28 | 58596.168850 | 32430.561382 | 65806.466898 | 58596.168850 | 58596.168850 | -9503.051717 | -9503.051717 | -9503.051717 | -9503.051717 | -9503.051717 | -9503.051717 | 0.0 | 0.0 | 0.0 | 49093.117133 |
14 | 2006-03-31 | 58449.578916 | 39193.541836 | 71919.946197 | 58449.578916 | 58449.578916 | -1224.434198 | -1224.434198 | -1224.434198 | -1224.434198 | -1224.434198 | -1224.434198 | 0.0 | 0.0 | 0.0 | 57225.144718 |
15 | 2006-04-30 | 58307.717686 | 42625.987661 | 77057.400277 | 58307.717686 | 58307.717686 | 1187.100547 | 1187.100547 | 1187.100547 | 1187.100547 | 1187.100547 | 1187.100547 | 0.0 | 0.0 | 0.0 | 59494.818233 |
16 | 2006-05-31 | 58161.127748 | 45990.584105 | 80277.571713 | 58161.127748 | 58161.127748 | 5451.418874 | 5451.418874 | 5451.418874 | 5451.418874 | 5451.418874 | 5451.418874 | 0.0 | 0.0 | 0.0 | 63612.546621 |
17 | 2006-06-30 | 58019.266517 | 45778.759963 | 79069.169730 | 58019.266517 | 58019.266517 | 3564.138248 | 3564.138248 | 3564.138248 | 3564.138248 | 3564.138248 | 3564.138248 | 0.0 | 0.0 | 0.0 | 61583.404765 |
18 | 2006-07-31 | 57872.676579 | 47002.201998 | 79782.570772 | 57872.676579 | 57872.676579 | 4563.254349 | 4563.254349 | 4563.254349 | 4563.254349 | 4563.254349 | 4563.254349 | 0.0 | 0.0 | 0.0 | 62435.930927 |
19 | 2006-08-31 | 57726.086594 | 45447.102442 | 79160.703585 | 57726.086594 | 57726.086594 | 4479.990711 | 4479.990711 | 4479.990711 | 4479.990711 | 4479.990711 | 4479.990711 | 0.0 | 0.0 | 0.0 | 62206.077306 |
20 | 2006-09-30 | 57584.225319 | 41994.861861 | 76377.523816 | 57584.225319 | 57584.225319 | 1829.842795 | 1829.842795 | 1829.842795 | 1829.842795 | 1829.842795 | 1829.842795 | 0.0 | 0.0 | 0.0 | 59414.068114 |
21 | 2006-10-31 | 57437.635335 | 43300.623636 | 77873.098798 | 57437.635335 | 57437.635335 | 2439.830765 | 2439.830765 | 2439.830765 | 2439.830765 | 2439.830765 | 2439.830765 | 0.0 | 0.0 | 0.0 | 59877.466100 |
22 | 2006-11-30 | 57295.774060 | 38210.477095 | 73388.386714 | 57295.774060 | 57295.774060 | -2045.360906 | -2045.360906 | -2045.360906 | -2045.360906 | -2045.360906 | -2045.360906 | 0.0 | 0.0 | 0.0 | 55250.413154 |
23 | 2006-12-31 | 57149.184075 | 34675.761615 | 67310.501555 | 57149.184075 | 57149.184075 | -6013.413267 | -6013.413267 | -6013.413267 | -6013.413267 | -6013.413267 | -6013.413267 | 0.0 | 0.0 | 0.0 | 51135.770808 |
24 | 2007-01-31 | 56994.480254 | 35873.268789 | 69294.200682 | 56994.480254 | 56994.480254 | -4783.036614 | -4783.036614 | -4783.036614 | -4783.036614 | -4783.036614 | -4783.036614 | 0.0 | 0.0 | 0.0 | 52211.443640 |
25 | 2007-02-28 | 56854.747771 | 29801.353956 | 62906.688914 | 56854.747771 | 56854.747771 | -9501.921423 | -9501.921423 | -9501.921423 | -9501.921423 | -9501.921423 | -9501.921423 | 0.0 | 0.0 | 0.0 | 47352.826348 |
26 | 2007-03-31 | 56700.043950 | 38154.033064 | 72707.777440 | 56700.043950 | 56700.043950 | -1225.266871 | -1225.266871 | -1225.266871 | -1225.266871 | -1225.266871 | -1225.266871 | 0.0 | 0.0 | 0.0 | 55474.777078 |
27 | 2007-04-30 | 56550.330574 | 40491.474353 | 74706.900200 | 56550.330574 | 56550.330574 | 1190.223261 | 1190.223261 | 1190.223261 | 1190.223261 | 1190.223261 | 1190.223261 | 0.0 | 0.0 | 0.0 | 57740.553835 |
28 | 2007-05-31 | 56395.626753 | 45020.807451 | 77993.605388 | 56395.626753 | 56395.626753 | 5402.206493 | 5402.206493 | 5402.206493 | 5402.206493 | 5402.206493 | 5402.206493 | 0.0 | 0.0 | 0.0 | 61797.833246 |
29 | 2007-06-30 | 56230.582698 | 43009.450294 | 77307.962960 | 56230.582698 | 56230.582698 | 3551.457755 | 3551.457755 | 3551.457755 | 3551.457755 | 3551.457755 | 3551.457755 | 0.0 | 0.0 | 0.0 | 59782.040452 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
480 | 2018-01-02 | 10430.753220 | -10645.820107 | 20038.001128 | 10287.038125 | 10556.745840 | -5814.639079 | -5814.639079 | -5814.639079 | -5814.639079 | -5814.639079 | -5814.639079 | 0.0 | 0.0 | 0.0 | 4616.114140 |
481 | 2018-01-03 | 10417.686680 | -12131.618633 | 21312.023151 | 10273.332005 | 10544.184093 | -5725.854616 | -5725.854616 | -5725.854616 | -5725.854616 | -5725.854616 | -5725.854616 | 0.0 | 0.0 | 0.0 | 4691.832063 |
482 | 2018-01-04 | 10404.620140 | -11264.733448 | 21481.058839 | 10259.615204 | 10531.622345 | -5640.534387 | -5640.534387 | -5640.534387 | -5640.534387 | -5640.534387 | -5640.534387 | 0.0 | 0.0 | 0.0 | 4764.085753 |
483 | 2018-01-05 | 10391.553601 | -11293.427706 | 21583.702828 | 10245.910236 | 10519.060598 | -5560.875533 | -5560.875533 | -5560.875533 | -5560.875533 | -5560.875533 | -5560.875533 | 0.0 | 0.0 | 0.0 | 4830.678068 |
484 | 2018-01-06 | 10378.487061 | -10571.821161 | 21827.517378 | 10232.514272 | 10506.498851 | -5488.665623 | -5488.665623 | -5488.665623 | -5488.665623 | -5488.665623 | -5488.665623 | 0.0 | 0.0 | 0.0 | 4889.821439 |
485 | 2018-01-07 | 10365.420522 | -11213.789750 | 21737.667858 | 10218.899641 | 10493.937103 | -5425.240870 | -5425.240870 | -5425.240870 | -5425.240870 | -5425.240870 | -5425.240870 | 0.0 | 0.0 | 0.0 | 4940.179652 |
486 | 2018-01-08 | 10352.353982 | -11068.421737 | 22756.628202 | 10205.056042 | 10481.449293 | -5371.460788 | -5371.460788 | -5371.460788 | -5371.460788 | -5371.460788 | -5371.460788 | 0.0 | 0.0 | 0.0 | 4980.893194 |
487 | 2018-01-09 | 10339.287442 | -11751.596928 | 22353.659111 | 10191.219798 | 10469.065920 | -5327.699997 | -5327.699997 | -5327.699997 | -5327.699997 | -5327.699997 | -5327.699997 | 0.0 | 0.0 | 0.0 | 5011.587446 |
488 | 2018-01-10 | 10326.220903 | -12214.265224 | 21555.904799 | 10177.503749 | 10456.812418 | -5293.857323 | -5293.857323 | -5293.857323 | -5293.857323 | -5293.857323 | -5293.857323 | 0.0 | 0.0 | 0.0 | 5032.363580 |
489 | 2018-01-11 | 10313.154363 | -13379.641545 | 21442.105207 | 10163.873381 | 10444.834064 | -5269.381789 | -5269.381789 | -5269.381789 | -5269.381789 | -5269.381789 | -5269.381789 | 0.0 | 0.0 | 0.0 | 5043.772575 |
490 | 2018-01-12 | 10300.087824 | -11407.652737 | 22014.507343 | 10150.269457 | 10432.577518 | -5253.314515 | -5253.314515 | -5253.314515 | -5253.314515 | -5253.314515 | -5253.314515 | 0.0 | 0.0 | 0.0 | 5046.773309 |
491 | 2018-01-13 | 10287.021284 | -11767.060141 | 23105.051714 | 10136.676775 | 10420.474065 | -5244.345070 | -5244.345070 | -5244.345070 | -5244.345070 | -5244.345070 | -5244.345070 | 0.0 | 0.0 | 0.0 | 5042.676214 |
492 | 2018-01-14 | 10273.954744 | -12453.383825 | 22490.477042 | 10122.988290 | 10408.031781 | -5240.880304 | -5240.880304 | -5240.880304 | -5240.880304 | -5240.880304 | -5240.880304 | 0.0 | 0.0 | 0.0 | 5033.074441 |
493 | 2018-01-15 | 10260.888205 | -11792.410668 | 21383.620828 | 10109.359927 | 10395.589498 | -5241.123306 | -5241.123306 | -5241.123306 | -5241.123306 | -5241.123306 | -5241.123306 | 0.0 | 0.0 | 0.0 | 5019.764899 |
494 | 2018-01-16 | 10247.821665 | -10884.794995 | 21807.204144 | 10095.446589 | 10383.147214 | -5243.159787 | -5243.159787 | -5243.159787 | -5243.159787 | -5243.159787 | -5243.159787 | 0.0 | 0.0 | 0.0 | 5004.661878 |
495 | 2018-01-17 | 10234.755126 | -11117.490200 | 21908.112254 | 10081.328118 | 10370.704930 | -5245.048924 | -5245.048924 | -5245.048924 | -5245.048924 | -5245.048924 | -5245.048924 | 0.0 | 0.0 | 0.0 | 4989.706201 |
496 | 2018-01-18 | 10221.688586 | -11471.233652 | 22612.102358 | 10067.579986 | 10358.232019 | -5244.915571 | -5244.915571 | -5244.915571 | -5244.915571 | -5244.915571 | -5244.915571 | 0.0 | 0.0 | 0.0 | 4976.773015 |
497 | 2018-01-19 | 10208.622046 | -12616.078056 | 22604.293475 | 10053.863185 | 10345.699704 | -5241.040633 | -5241.040633 | -5241.040633 | -5241.040633 | -5241.040633 | -5241.040633 | 0.0 | 0.0 | 0.0 | 4967.581413 |
498 | 2018-01-20 | 10195.555507 | -11085.109200 | 21862.639163 | 10040.146384 | 10333.148577 | -5231.946498 | -5231.946498 | -5231.946498 | -5231.946498 | -5231.946498 | -5231.946498 | 0.0 | 0.0 | 0.0 | 4963.609009 |
499 | 2018-01-21 | 10182.488967 | -12116.756382 | 22124.561839 | 10026.429583 | 10320.597449 | -5216.474494 | -5216.474494 | -5216.474494 | -5216.474494 | -5216.474494 | -5216.474494 | 0.0 | 0.0 | 0.0 | 4966.014473 |
500 | 2018-01-22 | 10169.422428 | -11799.236660 | 22128.247318 | 10012.712782 | 10308.046322 | -5193.851637 | -5193.851637 | -5193.851637 | -5193.851637 | -5193.851637 | -5193.851637 | 0.0 | 0.0 | 0.0 | 4975.570791 |
501 | 2018-01-23 | 10156.355888 | -11397.718675 | 21488.804694 | 9998.995980 | 10295.688169 | -5163.744193 | -5163.744193 | -5163.744193 | -5163.744193 | -5163.744193 | -5163.744193 | 0.0 | 0.0 | 0.0 | 4992.611695 |
502 | 2018-01-24 | 10143.289348 | -12170.143585 | 21137.408443 | 9985.289993 | 10283.354063 | -5126.296039 | -5126.296039 | -5126.296039 | -5126.296039 | -5126.296039 | -5126.296039 | 0.0 | 0.0 | 0.0 | 5016.993309 |
503 | 2018-01-25 | 10130.222809 | -11855.103144 | 21615.603916 | 9971.447744 | 10271.019957 | -5082.150230 | -5082.150230 | -5082.150230 | -5082.150230 | -5082.150230 | -5082.150230 | 0.0 | 0.0 | 0.0 | 5048.072579 |
504 | 2018-01-26 | 10117.156269 | -12025.376518 | 22220.684386 | 9957.493001 | 10258.685851 | -5032.452727 | -5032.452727 | -5032.452727 | -5032.452727 | -5032.452727 | -5032.452727 | 0.0 | 0.0 | 0.0 | 5084.703542 |
505 | 2018-01-27 | 10104.089730 | -12326.781502 | 21628.736366 | 9943.939751 | 10246.351744 | -4978.837801 | -4978.837801 | -4978.837801 | -4978.837801 | -4978.837801 | -4978.837801 | 0.0 | 0.0 | 0.0 | 5125.251929 |
506 | 2018-01-28 | 10091.023190 | -12000.100061 | 22722.886587 | 9930.400801 | 10234.017638 | -4923.395183 | -4923.395183 | -4923.395183 | -4923.395183 | -4923.395183 | -4923.395183 | 0.0 | 0.0 | 0.0 | 5167.628007 |
507 | 2018-01-29 | 10077.956651 | -11021.460045 | 21882.372752 | 9916.765027 | 10221.727326 | -4868.619657 | -4868.619657 | -4868.619657 | -4868.619657 | -4868.619657 | -4868.619657 | 0.0 | 0.0 | 0.0 | 5209.336993 |
508 | 2018-01-30 | 10064.890111 | -11496.619265 | 20772.074491 | 9903.067605 | 10209.526120 | -4817.344316 | -4817.344316 | -4817.344316 | -4817.344316 | -4817.344316 | -4817.344316 | 0.0 | 0.0 | 0.0 | 5247.545795 |
509 | 2018-01-31 | 10051.823571 | -13360.948215 | 21954.650132 | 9889.370183 | 10197.222692 | -4772.659269 | -4772.659269 | -4772.659269 | -4772.659269 | -4772.659269 | -4772.659269 | 0.0 | 0.0 | 0.0 | 5279.164302 |
figure = m.plot(forecast, xlabel='Date', ylabel='Crime Rate')
시각화 자료에서 볼 수 있듯이 데이터에 포함된 2017년 이후의 연도 또한 Prophet을 통하여 표현이 가능하다. As can be seen from the visualization data, the years after 2017 included in the data can also be expressed through Prophet.
# 예측된 추세가 어떤 모양일지 도출 Determining what the predicted trend will look like
figure3 = m.plot_components(forecast)
시카고와 같은 경우 그래프에서 7월달(여름)까지 범죄율이 상승하다가, 그 이후로 겨울을 맞아 날씨가 추워지면서 범죄율이 하락하는 현상을 관찰해볼 수 있다. In the case of Chicago, the graph shows that the crime rate rises until July (summer), and then the crime rate decreases as the weather gets colder in winter.
PART 2: Avocado Market
Description
페이스북 Prophet을 사용해 미래 물가를 예측한다.
Observing the dataset
Some relevant columns in the dataset:
- Date - The date of the observation
- AveragePrice - the average price of a single avocado
- type - conventional or organic
- year - the year
- Region - the city or region of the observation
- Total Volume - Total number of avocados sold
- 4046 - Total number of avocados with PLU 4046 sold
- 4225 - Total number of avocados with PLU 4225 sold
- 4770 - Total number of avocados with PLU 4770 sold
Loading the dataset
# import libraries
import pandas as pd # Import Pandas for data manipulation using dataframes
import numpy as np # Import Numpy for data statistical analysis
import matplotlib.pyplot as plt # Import matplotlib for data visualisation
import random
import seaborn as sns
from fbprophet import Prophet
avocado_df = pd.read_csv('avocado.csv')
# 날짜별 아보카도 가격분포 Avocado price distribution by date
avocado_df = avocado_df.sort_values("Date") # 시간 순으로 정렬 order by time
plt.figure(figsize=(10,10))
plt.plot(avocado_df['Date'], avocado_df['AveragePrice'])
# 지역별 아보카도 가격분포 Avocado Price Distribution by Region
plt.figure(figsize=[25,12])
sns.countplot(x = 'region', data = avocado_df)
plt.xticks(rotation = 45)
# 연도별 아보카도 가격분포 Avocado Price Distribution by Year
plt.figure(figsize=[25,12])
sns.countplot(x = 'year', data = avocado_df)
plt.xticks(rotation = 45)
Prediction
avocado_prophet_df = avocado_df[['Date', 'AveragePrice']] # Prophet에 필요한 열만 추출 Extracting only the columns needed by Prophet
avocado_prophet_df = avocado_prophet_df.rename(columns={'Date':'ds', 'AveragePrice':'y'}) # Prophet 열이름 사전설정 Prophet column name presets
avocado_prophet_df
# Applying the Prophet
m = Prophet()
m.fit(avocado_prophet_df)
# Forcasting into the future
future = m.make_future_dataframe(periods=365) # 미래 1년 동안의 아보카도 가격 예측 Avocado Price Prediction for the Future Year
forecast = m.predict(future)
figure = m.plot(forecast, xlabel='Date', ylabel='Price')
figure3 = m.plot_components(forecast)
댓글남기기