Comparative Evaluation of Prediction Models for Forecasting Spectral Opportunities

— Predicting the behavior of the primary user in cognitive radio networks enables significant reduction of the interference level caused by the secondary user during his change of channel. Therefore, the purpose of this article is to present a comparative evaluation of the models for time series: AR, MA and ARMA that can predict the behavior of the primary user as well as the spectral opportunities for cognitive radio networks in the GSM frequency band. The performance of the three models for time series will be contrasted with a purely reactive model (non-predictive) under two scenarios, two traffic levels and six evaluation metrics. The results obtained show that the moving average model has the best performance in general. However, it is not the best in all four testing scenarios.

The authors in [8] propose a model with diverse variables to perform the prediction of the cannel including a prediction model for the behavior of the PU to avoid interference and a multi-user model to control the collision between SU. Collision control between SU is one of the most complicated areas when performing assertive models due to their random behavior. To solve this [8], a Common Hopping coordination scheme is proposed for the SH protocol's design; in this case, all SU are synchronized to the Hop through channels with the same Sequence Hopping. To perform the detection of the spectrum, it is assumed that the cognitive device has two antennas, one for transmission and control and another one exclusively dedicated to the detection of the spectrum. The results show that the proactive strategy is efficient when the load of the PU is low, reducing the number of handoffs and collisions; however, if the demand is high on behalf of the PU, the collision control is maintained but the number of handoffs increases.

II. MODELS BASED ON TIME SERIES
These methods model time series by studying the structure of correlation that the time, index or distance induce in the random variables originating the series. The strategy in these models consists on: 1) Stabilizing the variance and eliminating the tendency and stationality of the series through transformations and/or differences which leads to a stationary series. 2) For the resulting series, a model is estimated with the purpose of explaining the correlation structure of the time series. 3) Inverse transformations are applied to the model obtained in step 2 so the variance, tendency and stationality of the original series can be established [9]- [11].
The three fundamental models based on time series that are autoregressive Integrated Moving Average (ARIMA) are: the Auto-Regressive (AR), the Moving Average (MA) and the Auto-Regressive of Moving Average (ARMA).

A. AR Model
This model considers that the value of the stationary series in present time t depends on all past values that the series has taken, pondered by a weight factor φj. The latter measures the present influence of the past value; and of a present random perturbation [12].
The AR model is described in Equation (1) where correspond to the parameters of the model and is an error term (or white Gaussian noise process term), i.e., random variables with a null average, constant variance, uncorrelated between them and the series' past values.
The AR process is a regression model where the explicative variables are the same delayed dependent variable. A condition for the AR model being stationary is that [13]. Only when the last past values p of the series affect significantly the present value, the model is called AR of order p, AR (p) and in this case, the upper limit of the sum in equation (1) is p. To determine the value of p, the Partial Auto-Correlation Function (PACF) is used.

B. MA Model
This model considers that the value of the stationary series oscillates or moves around the average called . Additionally, it assumes that the displacement of  in present time t is caused by infinite perturbations occurred in the past pondered by a factor θj that measures the influence of such perturbation in the present of the series [12].
The MA model is described in equation (2) where correspond to the parameters of the model and is an error term (or white Gaussian noise process term), i.e., random variables with a null average, constant variance, uncorrelated between them and the series' past values.
The MA model assumes that all observations of the time series are equally important for estimating the predicted parameter. Only when the last past perturbations affect significantly the present value of the series is the model called MA of order q noted MA (q) and in this case the sum in equation (2) has q as upper limit. The average of the most recent data values q of the time series are used to forecast during the next period. To determine the value of q, the Auto-Correlation Function (ACF) is used.

C. ARMA Model
This model corresponds to the combination of the AR (p) and MA (q) models to produce the ARMA (p, q) model. The ARMA model is described by equation (3).
In general, time series are not stationary but can be transformed into stationary with the use of transformations of variance and differences. The ARIMA (p, d, q) models are the result of integrating into the ARMA (p, q) the differences and transformations that were necessary to convert the initial series into a stationary one. The number of differences and transformations of the series define the parameter d of the model [12].

III. EVALUATION METHODOLOGY
To evaluate the performance of the proactive predictive algorithms: PPTS, also proposed in this investigation. The six evaluation metrics (EM) are described in Table 1. It is the total number of predictive handoffs carried out before the arrival of the PU, during the 10 minutes of transmission of the SU.

Number of average accumulated handoffs with interference
It is the total number of reactive handoffs carried out once the PU arrives, during the 10 minutes of transmission of the SU.

Number of average accumulated perfect handoffs
It is the number of AAPH carried out very closely to the PU's arrival but without interfering on him during the 10 minutes of transmission of the SU.

Number of average accumulated anticipated handoffs
It is the number of AAPH carried out way before the PU's arrival during the 10 minutes of transmission of the SU. Cost To perform a fair comparative evaluation, each absolute value of the AAPH, AAIH, AAEH and AAUH metrics was divided by the absolute value of the AAH corresponding to each evaluation scenario, i.e., the respective values were taken with respect to AAH.
AAPH represents the SH carried out before the PU's arrival; while the AAIH represents the SH carried out after his arrival; therefore AAH = AAPH + AAIH. The EM APPH is dual: it is of the benefit type since it is desirable that the SH is performed before the PU's arrival to avoid the interference between the PU and the SU and it is of the cost type when the prediction is imprecise and is performed in too much anticipation before the PU's arrival, causing an increase in the AAH.
Due to the previous statements, it was decided to create the AAEH and AAUH evaluation metrics which are subsets of AAPH. AAEH represents the SH that are performed very closely to the PU's arrival; however, before this metric, the perfect SH are considered since they optimize the SO available time that they are using. AAUH represents the SH that are performed way before the PU's arrival which produces an increase in the AAH.
To determine whether a prediction is classified as AAEH it is verified that the SH was performed afterwards at 80% of the SO availability time in that moment; in the AAUH case, it is verified that the SH is performed 20% before the SO availability time in that moment. This does not only imply that the AAPH is not equal to the sum of AAEH and AAUH but also that there is a number of intermediate SH which are performed between 20% and 80% of the availability time of the SO; this can be calculated as AAPH -AAEH -AAUH.
With the purpose of facilitating the comparative analysis for each algorithm, the relative values (in percentage) were calculated for each EM. For the benefit-type metrics, the relative value (Rel) of the algorithm i was calculated from the absolute value (Abs) and the maximum value (Max) of the EM as described in equation (4). For the cost-type metrics, the relative value (Rel) of the algorithm i was calculated from the absolute value (Abs) and the minimum value (Min) of the EM as described in equation (5).
To calculate the global scores, the ponderation on equation (6) was used:

P G A A F H A A P H A A H A A IH A A E H A A U H
IV. SIMULATION In order to assess the performance of each developed VHDA, a simulation environment progressively reconstructs the behavior of the spectrum occupancy with the use of the captured data traces in the frequency GSM band. These allows to accurately evaluate the behavior of the PUs and also, to assess and validate the performance of each VHDA. The spectral occupancy data corresponds to a week-long observation captured at Bogota City in Colombia [14]. The energy detection technique was used to determine the occupation or availability of each of the 124 channels of the analyzed GSM band, with a decision threshold for the power of 5 dBm above the noise power. To determine whether a frequency channel is busy or not, the proposed decision threshold is based on the average noise floor for the frequency band used. We consider the specifications of the GSM band, the standard configuration of the spectrum analyzer and the measurements to establish the noise floor and the guard level. The average noise floor is obtained via spectrum analyzer measurements. The guard level was fixed at +5 dBm above the noise floor, in order to minimize false alarms. Thus, the average noise floor is -113 dBm and the decision threshold is set to -113 + 5 = -108 dBm.

V. RESULTS
The aforementioned EM were calculated for the HT and LT trace as well as the RT and BE approaches, which led to four evaluation scenarios for each metric: GSM-RT-LT, GSM-RT-HT, GSM-BE-LT, GSM-BE-HT (see Fig. 1 to 6).  Analyzing the performance of the SH predictive algorithm based on time series such as AR, MA and ARMA along with the reactive version, the following was observed: with respect to AAH, it is noted that the reactive model has the best performance followed by MA. With respect to AAFH, the reactive model has the best performance followed by MA. With respect to AAPH, the ARMA model has the best performance followed by AR. With respect to AAIH, the ARMA model has the best performance. With respect to AAEH, the ARMA model has the best performance. With respect to AAUH, the reactive model has the best performance followed by MA.
When comparing globally each SH algorithm in the four scenarios defined in the methodology for the GSM network, the general global score shows that the MA model has the best performance with a 0.73% margin compared to the second. Therefore, it is interesting to analyze which algorithms are the best in each scenario: in the RT case in HT and BE in HT, the AR model has the best performance for RT in LT, the best model is MA and finally for BE in LT the ARMA model is the best one. If the results are averaged, it can be concluded that the AR model is the best one for HT with a 2.03% margin compared to the second and the MA model is the best in LT with a 5.6% margin compared to the second one.

VI. CONCLUSION
The most significant advantage of the prediction models is their capacity to reduce the level of interference; in the GSM network, the ARMA model has the best performance in this aspect with a margin of only 1,97% with respect to the AR model.
The spectrum assignment algorithms are the tools that give solution to the problem of using efficiently the radio-electric spectrum and contribute in different matters such as: channel characterization, local policies, user requirements, etc. The advantages and disadvantages in the adoption of one algorithm or the other for spectrum assignment are in function of the specific needs of its purpose; hence, its implementation depends on the needs in terms of signal processing, time responses, data availability, storage capacity, learning capacity, robustness, among other factors.