Benchmarking of Algorithms to Forecast Spectrum Occupancy by Primary Users in Wireless Networks

- One of the most relevant aspects in the performance of wireless cognitive communications is the interference between users, especially the one that the secondary user can cause to the primary user. A proactive handoff strategy considerably reduces said interference. However, highly accurate prediction models are required. The following article seeks to compare the performance of four algorithms in the spectral occupancy of the primary user during a secondary user’s communication. The performance of the algorithms is assessed by using five metrics: handoffs, failed handoffs, bandwidth, delay and throughput. The simulation scenario involves the communication of a secondary user during 10 minutes in a Wi-Fi network.


I. INTRODUCTION
The trending growth of applications, the scarcity of the radio electric spectrum and its underuse are current problems of wireless networks, which have fostered the use of strategies for dynamic and optimal access of the spectrum [1]- [4]. Cognitive radio (CR) is a promising approach that includes efficient and adaptive methodologies for the dynamic spectrum allocation (DSA) of existing radio [2], [5], [6]. In contrast with traditional networks, there are two types of users in cognitive radio: the primary user (PU) that pays for the frequency band and the secondary user (SU) that uses spectral spaces in an opportunistic manner when they are not being used [7]- [9]. Joseph Mitola III coined the concept of CR in 1999 as "the point in which wireless Personal Digital Assistants (PDA) and the related networks are smart enough in computational terms in comparison to radio resources and the corresponding communications from one computer to another. They must be able to detect the user needs as a function of the context of use and offer radio resources and wireless services which are more suitable in that particular instant." [10].
The process in which the SU changes from one frequency channel to another is called spectral handoff where it is inevitable that the communication between users is temporarily cut off. There are various techniques to minimize the effect of this characteristic. This work describes four methodologies based on probabilistic models, regression models and evolutive artificial intelligence [11].
Due to their mathematical formality, performance in prediction scenarios and the complexity imposed by traditional techniques, Markov chains and Naïve Bayes are chosen as probabilistic models [12], [13]. The Logistic Regression model is also selected since it involves the simultaneous use of several explicative variables and shows good performance in the prediction of regression models [14], [15]. Seeking to optimize the spectrum allocation for better performance, the genetic algorithm is incorporated as an evolutive artificial intelligence technique. Previous investigations have revealed a superior performance of this method in comparison to others [16]. The comparison strategy establishes five metrics: handoffs, failed handoffs, bandwidth, delay and throughput. Performance is assessed based on the simulation of a 10-minute communication for a Wi-Fi network.
The article is organized in five sections beginning with an introduction. The second section describes the generalized mathematical model for each strategy. The third section describes the chosen methodology. The fourth section presents the results obtained. Finally, the fifth section draws a set of conclusions.

II. SPECTRAL HANDOFF
The mobility of the spectrum or handoff can be defined as the process in which a cognitive radio user (secondary user) changes its operation frequency, when the conditions of a channel are worsened or when a primary user (PU) appears since the former is using a licensed channel [7], [8].

A. Causes and requirements of spectral handoff
The need to carry out spectral handoff in cognitive radio networks can be explained by one of the following causes [17]:  A primary user (PU) is occupying the target channel. In proactive strategies, the backup channel is chosen beforehand and its occupation status is not verified when changing channels. This means that a secondary user (SU) can find the channel to be occupied by another secondary user or by a primary user.  Arrival of a PU to a channel occupied by the SU: During the data transmission of a SU in a licensed channel, a PU can arrive and demand the immediate availability of said channel.  The channel occupied by the SU is downgraded: Even without the existence of a PU, it is possible that the SU must change channel due to the downgrade of quality in the channel that is currently under use.  The SU interferes with the PU: Spectral handoff is necessary when the opportunistic use of a licensed channel by a SU interferes with the activity of the PU.  Traffic variation: If the amount of traffic in the frequency band significantly increases, it is possible that the SU requires changing channels while seeking the balance of the load and guaranteeing better performance levels.  Movement of the SU: If the SU moves geographically outside the coverage area of the node in a centralized system, spectral handoff is necessary.

B. Requirements of spectral handoff
Spectral handoff can affect the performance and quality of the service in cognitive radio networks. Therefore, there are some requirements involving spectral handoff [18]:  Speed: The delay in spectral handoff must be sufficiently small to avoid quality downgrade or interruption of communication.  Handoff rate: A high number of unnecessary changes of channel directly affect the performance of data transmission so the handoff rate needs to be minimized.  Reliability: Minimizing the effect of handoff in service quality. For instance, in mobile networks, the probability of blocking new calls and the probability of dropping current calls must be minimized as well as balancing traffic in adjacent cells.  Signalization: It is important to minimize signalization since a high volume of signalization can affect the performance in communication.  Success: Channels and resources must be available to guarantee successful handoff [19].
 Multiple criteria of handoff: The new access network must be selected in an intelligent manner based on multiple criteria since choosing the best spectral opportunity can avoid multiple handoffs [19].

C. Phases and procedure of spectral handoff
The fundamental purpose of any spectral handoff model is the transmission from one frequency to another one with the minimum degradation of quality [20]. Spectral handoff is developed based on three phases [21]: measuring, decision-making and execution.  Measuring: This phase includes the discovery of wireless networks and the detection of spectral opportunities in said networks. This can be achieved through a centralized or distributed approach.  Decision-making: In this phase, the decision of when and where to perform spectral handoff is made based on multiple criteria and chosen metrics.  Execution: In this phase, the transfer from the current connection to the new one is carried out keeping in mind the previously mentioned requirements of spectral handoff. The procedure of spectral handoff assumes that the secondary users SU1 and SU2 communicate in channel Ch1. SU1 detects it and prepares to perform spectral handoff. SU1 pauses its current communication within a predefined duration. The SU2 is also notified of the interruption before another fixed time interval. Afterwards, SU1 and SU2 resume communication in the selected channel Ch2 or in the same channel Ch1. Finally, since a stack of data can be interrupted many times during transmission, spectral handoff can be executed on repeated occasions.

D. Impact of spectral mobility in Cognitive Radio
Spectral mobility has a significant impact in the performance of cognitive radio. According to the spectral planned handoff strategy, the performance of cognitive radio networks can be affected by any of these factors: latency, throughput, reliability, signalization, PU interference, energetic efficiency, bandwidth, SINR, quality of service and error rate.
 Latency: The magnitude or latency increases with reactive handoff strategies due to the detection time of spectral opportunities.  Throughput: The value of the effective data rate is reduced due to the capacity of the channel of the frequency band selected in the spectral handoff strategy.  Reliability: An inadequate decision-making process in the spectral handoff strategy can contribute to a higher imbalance of the data traffic load in the CRN. This can affect the parameters of quality of service such as the probability of blocking new calls and the probability of dropping current calls.  Signalization: According to the spectral handoff strategy, the amount of information related to signalization can increase considerably, especially in Common Control Channel (CCC) strategies. The amount of additional information reduces the effective data transfer rate.  PU interference: Reactive strategies of spectral handoff always cause temporary interference to the PU, which is proportional to the detection time of spectral opportunities of the SU. The decision to increase the transmission power in order to increase throughput also increases the interference caused to the PU or SU in adjacent frequency channels.  Energetic efficiency: The execution of complex algorithms, the unnecessary increase of transmission power and the prolonged detection time, among other factors, contribute to the reduction of energetic efficiency of the SU.  Bandwidth: Using multiple frequency channels for the transmission of a single PU can be beneficial for bandwidth but can also reduce the potential bandwidth of other SU if there is no metric that serves right the network traffic.  SINR: An improper decision-making process in the spectral handoff strategy can affect the SINR of both the SU and the PU. The previous situation can be explained by a poor choice of channel, an increase in the transmission power, the chosen transmission mode and poor balancing of loads.  Quality of service: A poor choice of frequency channel in the spectral handoff strategy can lead to delaysensitive applications with low QoS and QoE parameters.  Error rate: In data communication, the error rate is a function of the following parameters; operation frequency, modulation, transmission power and communication technology, etc. The spectral handoff strategy must redefine certain parameters when changing channels.

III. MODELS A. MARKOV Chain
In order to define a Markov chain five elements are required to define, transition diagram, states and state spaces, transition, probability of transition, and representation.
Markov chains are a spherical technique that is based on the analysis of the internal dynamics of the system, simulating the prediction of the real state at a given time from the previous states. It is a random process with the property that gave the true value of the process Xt, the future values Xs for s>t are independent of the past values Xu for u<t.
The states are the characterization of a system at a given instant; formally it is a variable whose values can only belong to the set of states of the system. The state space is a sequence of random variables X = {Xn: n ≥ 0}, which take values in a finite or countable set ε, for all n and any states i0, i1,. . . , in, j in ε that satisfies the Markov condition (equation (1) and (2)).
The probability that Xn+1 is in state j since Xn is in state i is the transition probability (equation (3)) in one step from i to j and is denoted as Pin jn+1.
The transition probabilities depend on the states and the instant at which the transition is made. When probabilities are independent of time (they are not a function of n) the chain has stationary transition probabilities and is known as a homogeneous chain in the time (equation (4) and (5)).
The Pij values are referred to as the transition probability and satisfy a probability distribution (equation (6)). 1 1, 0, 0 All values are combined and form the transition matrix T of size m x m (equation (7)). 11

B. Logistic Regression Algorithm
The main advantage of logic regression is that it can use diverse explicative variables at the same time. Although it may seem trivial, this characteristic is important due to the great interest on knowing the impact of these variables over the response variable. If the explicative variables were examined independently, ignoring the covariance between variables could lead to confusion.
One logistic regression will model the probability of the result based on individual characteristics and is given by the equation (8).
π is the event probability, βi are regression coefficients associated with the reference group and the xi are the explicative variables. The concept of the reference group β 0 constitutes the individuals with a reference level for each variable x1, x2, ..., m.
For the specific case of the present research, the following explicative variables were defined: the signal-tointerference-plus-noise ratio (SINR), the availability (PD) and the average availability time (TED) since they are all related and their simultaneous use is required in the prediction of the channel availability. Therefore, equation (8) would turn into equation (9).
C. Genetic Algorithms Genetic algorithms are optimization models that emulate genetic and evolutive processes. The basic model consists of an initial population and a set of operations defining the interaction of said population as well as the descending generations. According to the optimization model shown in equation (10), simple genetic algorithms are used to solve optimization problems that include continuous parameters [22]- [25].
The population consists of individuals represented by a binary number. This representation is known as a chromosome where each bit in the chromosome is called a gen. Genetic algorithms are often characterized by these five concepts: allele, gen, chromosome, position and index. Figure 1a shows the graphical representation of a specific population.
From a systematic standpoint, generations can be considered as iterations that lead to the evolution of initial populations into new populations born with better genetic material. New generations are the result of three operators acting on the current population: selection, crossover and mutation [22].
The selection is the most impactful operation and it is in charge of transmitting the genetic code from the most fitted individuals to future generations. A new population called intermediate is generated from the current population and it must have the same initial size. However, genetic structure must be diverse so it does not only take the best codes but it also transmits a number of codes with lower performance. Algorithmically, there are diverse selection methods [23]- [26]: A crossover or crossing operator involves choosing two 'parent' chromosomes and assigning a crossing point to them. Then, the crossing is carried out between the chosen individuals in order to create new combinations labelled as children. The sexed operator allows the exchange of genetic material for the production of new descendants. The plan is to match parents that have different genetic codes. Figure 1b evidences the matching perform with the crossing operator for two individuals chosen at random [27], [28].
The mutation operator is irrelevant in simple genetic algorithms with low mutation rates. Mutation is carried out through the random-based modification of the genetic pool of a population to a certain degree. Low mutation rates assure that new populations vary slightly compared to the genetic code of previous populations [28], [29].

D. Naïve Bayes algorithm
To choose the prediction model it is paramount to keep in mind that it has multiple characteristics and criteria for further improvement. The training process of prediction models may factor in information such as the availability probability (AP) and the average availability time (AAT), as well as other metrics.
Thus, by considering the Naïve Bayes theorem, the independent variables (also known as predictors) would be the availability probability and the average availability time whereas the dependent variable would be the channel availability. Hence, the Naïve Bayes prediction model works adequately in terms of predicting various classes assuming that there is independence between them.
A Naïve Bayes classifier essentially assumes that the presence of one specific characteristic does not imply the presence of other characteristics. Even when one these characteristics mutually depends on each other or one depends on the existence of another one, their properties contribute independently. This facilitates the operation on large datasets and can even surpass the capacity of highly sophisticated classification methods.
The Bayes theorem defines the calculation of the posterior probability P (b | z), P(b), P(z) and P (z | b) as shown in equation (11).
Where, P (b|z) is the posterior probability of class c (c, target) given the predictor (x, attributes) P (b) is the previous probability of the class P (z|b) is the predictor's probability given the class P (z) is the predictor's probability Based on equation (1), considering the independent variables AP and AAT and the dependent variable of channel availability (denoted as either occupied or available) leads to equations (12) and (13 IV. METHODOLOGY To assess the performance of the discussed algorithms: Markov model, Logistic Regression, Genetic algorihms and Naive Bayes, five assessment metrics are described in Table I. To analyze the performance for each handoff-related component, a simulated environment progressively reconstructs the behavior of spectrum occupancy based on data traces captured within the Wi-Fi frequency band. These can assess the behavior of PUs and validate the performance of each handoff variable. The spectral occupancy data is a week-long observation registered in the city of Bogota, Colombia [30].

V. RESULTS
The Fig. 2 to Fig. 6 show the performance of the metrics for the Wi-Fi network: handoffs, failed handoffs, bandwidth, delay and Throughput.    Table II summarizes the performance of each algorithm in the cost assessment metrics (handoffs, failed handoffs and delay) with the maximum values obtained in figures 1, 2 and 4. Based on the obtained results, it is determined that the model with the best performance is the Markov chain model, while the second place goes to the Logistic Regression method and the third place goes to the Naïve Bayes method. Finally, the worst performance corresponds to the Genetic Algorithm.
By comparing the best results with the lowest ones, there is an increase of 2.8 times in the metric of failed handoffs, 2.5 times for the total number of handoffs and twice for the delay.  For the bandwidth metric (Figure 3), the Logistic Regression, Naïve Bayes and Markov chains show variations between 1078 kHz and 1253 kHz, detecting the lowest range in the Markov chains. During the 10 minutes of transmission, Genetic Algorithms have the lowest limits of average bandwidth since the range varies between 793.8 and 954.5 Hz.
For the throughput metric, Genetic algorithms have the lowest limits, varying between 5632 and 5105 kps. The strategies of Logistic Regression, Naïve Bayes and Markov chains show similar behaviors for times exceeding 5 minutes. Their most representative variation occurs in the first minute: 7117 kbps for Markov, 8047 kbps for Logistic Regression and 8239 for Naïve Bayes.

VI. CONCLUSION
The metrics obtained for each strategy reveal that although Genetic Algorithms have the worst performance in all metrics, no algorithm has the best performance for all metrics. However, by pondering the results, it is determined that the Markov algorithm has the best relative performance.
The trending growth of wireless applications sets new challenges for future wireless communication systems and spectral handoff strategies, especially the predictive strategies, which are tools that establish methodologies to improve spectral efficiency, by maximizing relevant parameters of the communication system such as quality of service, delay, throughput, reliability, energy efficiency, bandwidth, SNR and, last but not least, interference.