Forecasting IRAN’s max daily demand for electricity in different weather types using deep learning approach

Electricity network management is a complicated issue that needs a forecast of consumption. This predication is generally hard and a challenging topic due to nonlinearity and permanent fluctuation of load in the network. Although linearities of time-series load may give an estimation, linearization of nonlinear data to foresee removes the nature of information which shows anomalies in electricity demand. New approaches of (ANN) help to make a forecast model with keeping information properties. In this paper, we categorized max daily load data in IRAN based on climate zone to figure the following 2 months. Long Short-Term Memory (LSTM) from deep learning approach picks and yields contrasted with different steps of epochs. The outcomes appear epochs do not necessarily improve the accuracy of prediction and best MADP for cold, tropical and temperament zone, are 1.26%, 2.5%, and 10.49%, respectively.

examination with RNN) will clarify. From that point onward, our experimental work and spotlight on our outcomes will be displayed, and the last part goes to the conclusion.

Related Work
Here is some related research to LSTM for power demand estimating. (J.-Y. Kim and Cho, 2019) proposed a technique dependent on deep learning that comprises of a projector for characterizing a proper state for a given circumstance and an indicator that conjectures vitality request from the characterized state. Results gave better execution (MSE = 0.384) than the ordinary models. (Zheng et al., 2019) declared provided that the prediction of household-level electricity demand, first was built at the appliance-level and afterward aggregated to get the residential-level, forecast performance could be significantly improved by LSTM. (Kandananond, 2019) found ARX (Autoregressive with Exogenous Output) outperformed ARIMA in the accuracy of electricity demand prediction. (Weinberg, 2019) utilized Multi Seasonal ARIMA models for long and short-range estimates of load requests and arrived at 15 Results shew that the SVM-model had marginally higher precision and a lower standard error of the mean. The conclusion made was that in this specific case, SVM outperformed RNN in prediction accuracy, however, there is an opportunity to get better of the two executions of these strategies. Regarding specificity and sensitivity, the choice of an SVM or RNN would be highly dependent on the implementation of real-world applications. (Muzaffar and Afshari, 2019) compared LSTM prediction accuracy over the horizons of 1, 2, 7 and 30 days with traditional methods by picking up an electrical load data with exogenous variables including temperature, humidity, and wind speed. Results shew, the trained LSTM network is better than other methods and has the potential to further improve the accuracies of forecasts. 3. Methodology

Long Short-Term Memory
It is a kind of deep learning approach with memory and equipped for adapting long haul conditions and consecutive information. A Cell as memory part and "controllers" (made from input, output and Foregate gate) are the regular engineering of this system. LSTM is intended for applications where the info is an arranged succession. It is especially similar to an RNN however with increasingly entangled innards. The incredible thing about RNN is that it remembers everything. It has long-ago conditions that can catch the whole grouping, but at the same time, that is a drawback! In LSTM we let RNN specifically pick whatever individuals and what it slips its mind. It successfully chooses how a lot of the system is going to convey information from the past time-step and what amount should be utilized in the present time-step. Like every single neural network, the node plays out an estimation utilizing the sources of info and returns yield esteem. LSTM reuses the yield from a past advance as a contribution for the subsequent stage, while in RNN, the yield is utilized alongside the following component as the contributions for the following stage. An ordinary internal neuron of RNN and LSTM are as underneath: Diagram 1: An inner neuron of an RNN (left) and LSTM (right) (Olah, 2015) Diagram.1 shows the RNN takes and joins the past shrouded state as input.1 with the present time-step vector ( ) as input.2. At that point goes the outcome through some non-linearity ( ℎ) for squishing values between [-1 1] to direct the system. At that point, it delivers another shrouded state (ℎ ) and goes to time-step ( ). By rehashing the procedure, it might make a forecast model toward the finish of layers. (relies upon what we need). LSTM has more segments within. This is based on gates. Gates choose if the information is significant to keep or ought to be overlooked during preparing. Every one of the entryways has sigmoid ( ) enactment to comprise smooth bends in the range 0 to 1 and the model stays differentiable. Information more like 0, ought to be overlooked and close to 1, keeps refreshing cell state. Diagram.1 shows three images of sigma ( ) which are answerable for overlooking, information and yield door, individually. Sigmoid either let data through or prevents the data from experiencing and this choice relies upon the info that it's getting. From the start entryway, it chooses whether to recollect the past state dependent on an element of the past concealed state (ℎ ) and the present vector input ( ). The condition is as beneath. LSTM compact conditions with compact first exhibited in 1997. (Hochreiter and Schmidhuber, 1997)

= [ℎ ; ] +
The other choice LSTM needs to make is the amount it will add to contribute to its representation from the current time step. The conditions for that are as beneath: This is again a component of both the shrouded state (ℎ ) and current input ( ), in addition to bias ( ) and afterward, it goes through a non-linearity. There are two segments to this progression. The first one ( ) says what amount is it going to contribute and the subsequent one ( ) which is about what you will contribute. Function (tanh) disseminates slopes. Thus, averts vanishing or exploding. At that point, the subsequent stage goes to an introduction (like RNN) between the present state and the past state. = * + * is about how much past state will contribute and is about how much the current contribution will add to the new shrouded state description. The following condition is about yield ( ) in each time step and making a hidden state ℎ which is a function of cell , not ℎ .

= ( [ℎ ; ] + ) ℎ = ℎ( ) *
A significant reality about LSTM is that there is no need to concern about reaching a locally optimal solution. The vanishing or fading gradient is a typical issue happens during ANN utilizing gradient-based learning techniques. This is accomplished by using a fractional derivative of the error function to each parameter in each iteration of the training process, however, gradient values (moving to the highest point of the matrix) continuously become little enough that the weight changes are insignificant. Therefore, the learning procedure gets slower, and in progressively extreme cases, it stops the learning procedure. In feed-forward neural systems (MPL, CNN, … ) it happens on account of numerous layers and in RNNs, it is a direct result of the steps. LSTM can learn to set explicit transient reliance. It implies units have memory for a short or a long-lasting period and neurons (inside their pipeline) kept a set of memory to take into consideration handling successive and worldly information. It is finished by the plausibility of passing data from the past cell with no change, rather than iteratively adjustment at each time-step or layer. Loads also are relied upon to meet their ideal qualities. In this manner, by including three entryways (input, yield, overlook) and , this system takes care of the issue of the vanishing of inclination and helps with controlling mistakes during back-spread.

Experimental Work
The models in LSTM are controlled by Python and a piece of similar equipment, to have an equivalent circumstance to pass judgment.

Cross-validation and running model
Albeit cross-validation is a way to deal with maintaining a strategic distance from over/under-fitting, information is enough to abstain from overfitting. (Rashid et al., 2018) Now, we isolate the information to evaluate the intensity of expectation in the following part 1.

Data Processing
For any system, we should initially import the libraries and information handling identified with our work. Here we called NumPy, matplotlib. pylot and panda are the libraries from the start. At that point, we should downsize the information in a similar range. This is called highlighting scaling, and minmax scaler which is the device to be called. The subsequent stage goes to making an information structure. This is presumably the most significant of making a model, in which we locate the best time steps. It is gained by tedious trials and, 60 was the best outcome for this examination. It implies we need our system to have a memory of 60 characters and one yield. We have to set our information as reliance and independence. By picking 60 as time steps, information Num.1 to information Num.60 makes the principal yield. At that point, information Num.2 to information Num.61, sends out the second yield, etc. From that point forward, we should reshape information dependent on RNN.

Building Network
Presently we have to call Sequential, LSTM, Dense and Dropout from Keras (backend by TensorFlow or PlaidML). In any case, we utilized a consecutive layer and afterward characterized LSTM with 50 neurons in each layer. Here we characterize the shrouded layers of the system which is 2. Our dropout for each layer is 20% and the thick layer is only for our yield in the last layer. Note that, since we only one yield, the unit characterizes one. For aggregating the system, which is the subsequent stage, we utilized "Adam" as a streamlining agent that chips away at non-stationary destinations and issues (Kingma and Ba, 2014) and "Mean_Squred_Error" for our loss. The subsequent stage is fitting the model for the preparation set. Here we run the ages (number of preparing times) and cluster size. The group size clarifies all outnumber of preparing that information utilized. we won't give all information to NN without a moment's delay and rather than that, NN gets the information in a few phases and littler groups. The quantity of clumps rises to the quantity of emphasis to finish preparing. Since 50 in the cluster size of our model, the quantity of emphasis for the whole informational collection is 44. Age speaks to the all-out number of times; a learning calculation sees the total dataset. "Since deep learning utilizes gradient descent to upgrade model, it makes sense to pass the whole dataset through a solitary system multiple times to update the weights and thus obtaining a better and more accurate prediction model". (Siami-Namini and Namin, 2018) Our calculation is executed in python seeking after Keras (TensorFlow Back-End) and sorted in 3 sections with the subtleties as underneath:

Making the expectations and picturing the outcomes
Here was the last some portion of the model which goes to real and predicted load and envisioning the outcomes. Figure.1 presents the data (train + test) categorizes in three climate zone and an hour of max load in the second row. As it shows the trend of consumption and max load is different in cases.  Figure.2 exhibits the pattern of prediction in networks after increasing the number of epochs. Epochs may not be equivalent to the number of cycles and the quantity of best ages to prepare a model isn't clear, however, the trend of estimation get better by increasing epochs in most steps. The table beneath shows the correlation between LSTM to figure exactness.  Where and are actual and forecast, respectively. As the table shows for Ardabil (cold zone), the accuracy of prediction in epochs 400 and 500 is the same while in Khuzestan as tropical zone, the best answer is in epoch 300. Tehran, on the other hand, does not represent a direct relation with increasing epochs. The trend has fluctuated and the best result is in epoch 500 while in the best case, the accuracy of prediction is not like other zones, 6. Improvements In this paper, "daily demand" information was utilized for forecasts, while utilizing different files in a board of information may improve the presentation of the models. In conclusion, the hybrid models or other complex advancement methods, for example, extrapolation or other metaheuristic calculations could report fascinating outcomes.

Conclusion
Developing countries need to have a prediction of demand and supply of electricity due to increasing consumption, especially in a max load of the day. In this paper, we used a daily load of power in IRAN and with filtering the max load in the day, tried to forecast the load with LSTM in deep learning. Since climate change in IRAN has the greatest effect of consumption, prediction categorized in 3 different climate zone and models ran by 6 types of epochs which proved for data in this research, number of epochs did not always get the best answer, and MADP is 1.26%, 2.5% and 10.49% in cold, tropical and temperament zone, which is completely acceptable. References: