Optimization of IDEA Key-Schedule Algorithm for Safe Use in Cloud

— Nowadays, thanks to its significant advantages, cloud computing is one of the most attractive modern technologies which is mostly used for data storage. Cloud storage servers are targeted by various attacks which are neutralized with various methods including encryption which is the most used method for protecting information. IDEA encryption algorithm introduced in 1990 is a public key algorithm which only uses cyclic changing of a secret key in its own key schedule to generate sub-keysand therefore is very weak. In this paper, we tried to improve the security of IDEA key-schedule by utilizing the strengths of Twofish and Blowfish key-schedule algorithmsand then evaluated the new algorithm, named TB-IDEA, using CT1software.


B. Twofish Algorithm
Twofish algorithm introduced in 1998 by Bruce Schneier is an 128-bit blockencryption method that works with different key lengthsup to 256-bit. This algorithm includes 16 Feistel network rounds with a bidirectional F function and a key schedule designed with a high accuracywith only a 1-bit rotation as its non-Feistelelement. Design of rounds and key schedule brings a balance between speed, size of software, key set time and memory [8]. This algorithm is very suitable for big microprocessors and smart cards and enjoys a high level of security and any attack in its best scenario can only break 5 rounds of it. C. Blowfish Algorithm BF, a symmetric block algorithm designed in 1993 by Bruce Schneier, has a fixed block size of 64-bit and a variable key length between 32 to 448bits. This algorithm uses Feistel network and the P-Boxes, S-Boxes and xor structures [9].
III. PROBLEM STATEMENT We have a cloud storage server and a database containing user data on it. As already mentioned, this remote server is exposed to various attacks that threatenthe confidentiality, integrity, availability, etc.of the user data. An attacker can access and misuse the data in different ways. Also, a malicious cloud service provider can easily access the data. Thus, the best policy is encrypting the data before sending them to cloud storage servers.
As already mentioned in Section 2.1, IDEA encryption algorithm is a public key algorithm in which the rotation of key and its division into 16-bit keys are continued until all sub-keys are generated for each of the eight rounds. The idea of using only cyclic change of secret key for generating sub-keys is the reason that IDEA key-schedule algorithm is very weak and therefore the keys of this algorithm are categorized in the class of weak keys.
In the research conducted by Shazia Afzal etal [10], various tests have been administered for evaluating the independence of sub-keys of various algorithms at sub-key, byte and bit levels. The result of frequency test of Twofish scheduler is higher than 90% which shows the balance of '0' and '1' in the generated sequence. The results show that the generated sub-keys at sub-key level are independent and separate. However, the result of frequency test for IDEA scheduler algorithm shows the imbalance of '0' and '1' in the sequence; so, IDEA scheduler doesn't meet the independence criteria at sub-keys level. Also, results of the test of independence of sub-keys at byte level show that the bytes of sub-keys generated by Twofishare statistically independent and separate while the bytes of IDEA sub-keys are strongly related. Twofish uses strong components such as MDS and S-BOX in its key-schedule algorithm and has a strong key schedule. The weak results of IDEA in all the tests show that this algorithm has a high level of sub-key dependenceat bit level.
Due to the simplicity of IDEA key-schedule algorithm, a chosen-key differential attack can be done on 3 rounds of it. There is also a Chosen-Key Ciphertext-only Timing attack on 8 rounds of IDEA that needs 5*2 related-key queries and each of them needs to encode 2 random unknown plain text blocks [11]. Therefore, in order to secure IDEA algorithm, we need a solution to improve this key-schedule algorithm.
IV. PROPOSED ALGORITHM Encryption can be traced back in the early works of Shannon in the late 40s and early 50s and since then it has undergone great developments. When using encryption in different systems we should be aware of the features, strengthsand weaknesses of algorithms. Although encryption has progressed significantly, the encryption algorithms are constantly attacked and these attacks have undermined their securityand make us think of enhancement of their security or even designing new algorithms.
We are going to use the strengths of Twofish and Blowfish key-schedule algorithmsto produce IDEA subkeys in our proposed TB-IDEA algorithm with the aim to boost the security of IDEA. P-array of blowfish keyschedule algorithmand MDS matrices and Pseudo-Hadamard Transform (PHT) from Twofishkey-schedule algorithm which are considered the merits of these algorithms have been used. As shown in Table I, Blowfish Parray has 18,32-bit boxes. First 128-bit user input key is divided into 32-bit blocks. Then, as shown in Fig. 1, the first, second, third and fourth 32-bit blocks xor with P1, P2, P3 and P4 respectively. Again, from the beginning, the first, second, etc. 32-bit blocks xor with P5, P6, etc. respectively until all P-arrays xor with key bits. Then, the resulted 32-bit blocks xor with P-arrays and key blocks xor together and a 32-bit output is obtained.
The main key is stored reversely and xor with the 32-bit output of previous step. Then, the resulted 128-bit output is divided into four 32-bit blocks. As shown in Fig. 1and Eq. (1), each of the 32-bit blocks is multiplied with MDS matrices separately [8].
The output of the first 32-bit is rotated 8 bits to the left and is entered into PHT with the output of the second 32-bit block and are combined together. Again, the output of the second 32-bit block of PHT is rotated 9 bits to the left. This procedure is repeated for the third and fourth 32-bit blocks too.
By virtue of Eq. (2) and Eq. (3), PHT is a simple operation for combining the two outputs of the previous stage [8]. The inputs are 'a' and 'b' (in order to optimize computing, we have made some changes in the formulas): 2 * 1023 ( 2 ) 2 2 * 1023 ( 3 ) If the lengths of 'a' and 'b' are less than 32 bits, padding operation is done: a '1' is added to the end of bits and then '0' bits are added until the desired length is achieved. Finally, four 32-bit blocks are formed. Each of these 32-bit outputs are divided into two 16-bit sections that are the sub-keys. In the first round, eight 16-bit subkeys are generated. Then, the user's input key is rotated 25 bits to the left and is used as the primary key to generate the next 8 sub-keys. All the above steps are repeated on the rotated key and the next 8 sub-keys are produced. All in all, the said steps are repeated 7 times to produce 52 sub-keys required for IDEA. Except for the first round in which the user's key is directly entered into the algorithm, the remaining rounds use the key of previous stages and are rotated 25 bits to the left and undergo the process. Fig. 2 is an overview of TB-IDEA Scheduler.

V. EVALUATION SOFTWARE
Cryptanalysis is done with two purposes: 1) reducing or removing the security of an encryption system and accessing information, and 2) evaluating an encryption algorithm and examining its strengths and weaknesses.
In this paper CrypTool1 software has been used for evaluation. CT1 is an open source Windows application for encryption and decryption which was originallydesigned as an internal business program for information security training and then has been developed to become an important open source project in encryption and IT security awareness. Some of the methods used in this software for cryptanalysis include:

E. Entropy
The ability to guess the value of a random variable is an important criterion for variable quality [13]. Confidentiality of key is defined as lack of information on the part of the adversary. The desirable type of entropy for encryption is min-entropywhich measures the degree of difficulty of guessing the information. In fact, researchers believe that the traditional measurement with Shannon entropy is not a suitable model to be used in encryption and min-entropy is a good alternative. Min-entropy is used in this article too. Entropy is measured based on bit/character: the higher entropy, the more will be the chance of even occurrence of all the 26 characters.

F. Autocorrelation
In correlation test a text is compared with other altered versions of the same text. Characters which matchin both texts are determined. In fact, it is a mathematical tool for finding repeating patterns.

G. N-Gram
In computational linguistics, an interrelated sequence of n elements of a given sequence of text is called an ngram. According to the software, elements can be phonemes, syllables, letters or words. An n-gram with size of 1 is called as unigram, with size of 2 is called bigram and with size of 3 is called trigram. Larger sizes are pointed out as four-gram, five-gram and so on. In this test the frequency of the letters is also displayed. In a simple text which is long enough, depending on the language, each of the characters occurs with a certain frequency. Each of the characters of the plain text is assigned to one or more encrypted characters. To break a cryptograph, the frequency of characters in an encoded text is compared to the frequency of that character in the plain text.

H. Poker
This test examines uniform distribution of P-bit pattern in the entire sequence; that is, the number of times that the P-bit blocks are appeared in the entire sequence should be the same [10]. Accordingly, the sequence is divided into blocks with a length of m. Then, the probability of each of 2^m possible subsequences with the length of m available in the sequence is determined. Poker test determines whether all subsequences with the length of m occur evenly in the sequence or not.
In this test we have a factor called alpha importance degree with values of 0.01, 0.05 and 0.10. If its value is too high, the test may deny the sequence that actually has been generated by a random bit generator, and if its value is too low, there is the risk that the test accepts the sequence that has not been produced by a random bit generator. The maximum amount of the test is a statistical value that is dependent on the alpha importance degree and the test result is a statistical value generated by the test that is compared with the maximum amount of the test. K-tuple value is a group of K elements. K-tuple is used as a parameter for poker test. For example, tuple 3 provides 2^3=8 different vectors with the length of 3 each that are analyzed by the test individually.
For evaluation, we encrypted "New IDEA SubKeys" expression that is a 128-bit phrase as follows:  First, we considered a 128-bit key as a primary key and generated two sets of sub-keys using IDEA key schedule and TB-IDEA key schedule.  Next, we put the 128-bit plain text into binary form and divided it into two 64-bit parts.
 Then, we encrypted each of two 64-bit parts once using IDEA sub-keys and the next time using TB-IDEA sub-keys.  Finally, we gave these two encrypted texts to the software in base64 and evaluated them.

VI. EVALUATION OF IDEA CIPHER TEXT
Encrypted form of the phrase by IDEA scheduler in base64 is as follows: 77+9Z3rvv73vv73Yk2Pvv73vv73Llu+/vRDvv73vv70=

I. Entropy Test
The input text includes 9 characters and its entropy is 2.25 (Fig. 3). The maximum entropy for a text could be 4.70, so, the resulted entropy is 47% which shows low to middle difficulty of guessing the information.

J. Autocorrelation Test
The input block has been divided into 10 offsets and compared (Fig. 4). For example, in the first part, 10 characters of the two texts are similar which shows maximum correlation and we have the lowest correlation in the fourth offset. The less the correlation, the more will be the independence of '0' and '1' bits in the input text and the security. Table II shows unigram, bigram, trigram and four-gram results on IDEA cryptograph which represent the number of repeats of a letter or group of letters in terms of repetition and percentage. For example, the highest repetition in unigram is for the letter 'v' with a percentage of 56.5217, that is, 13 repetitions.

L. Poker Test
For IDEA we have chosen importance degree of 0.05 and tuple 3, and as Fig. 5 shows, the test failed. The change to tuple 1 resulted in test success (Fig. 6). The results indicate that the values have been selected correctly and subsequences occur evenly in the whole sequence with a probability of 75%. The input text contains 19 characters and has more characters than the IDEA cryptograph (Fig. 7) and its entropy is 4.05, that is, 86%, indicating that unlike IDEA, this algorithm enjoys a high degree of difficulty in guessing information.

N. Autocorrelation Test
The input block is divided into 18 offset and compared (Fig. 8). For example, in the 12th part, 8 characters of the two texts match which is the maximum correlation. Offsets 2, 3, 7, 10, 13, 14, 17 and 18 have minimum correlations.
Comparing the result of two correlation test on encrypted texts, we realize that correlation between bits of the text encrypted with IDEA sub-keys is higher than that of the text encrypted with TB-IDEA sub-keys. In other words, the bits of IDEA cryptograph have a low level of independence while the bits of TB-IDEA cryptograph have a high level of independence.  Table III shows unigram, bigram, trigram and four-gram results on TB-IDEA algorithm. For example, the highest unigram is related to the letter 'w' with a percentage of 13.1579, that is 5 repetitions. As can be seen, the TB-IDEA level of has more characters than the IDEA has. Also the percentage and frequency of the cryptograph of this algorithm in comparison with the IDEA cryptograph are low which shows improvement in encryption.

P. Poker Test
For TB-IDEA we have chosen importance degree of 0.05 and tuple 3 and the test has been successful ( Fig. 9) which indicates that the values have been selected correctly and subsequences by tuple 3 occur evenly with a probability of 63% in the whole sequence. Since the poker test on the IDEA cryptograph failed by tuple 3, we compare the two test results with tuple 1 (Fig. 10). TB-IDEA cryptograph has a test result of 20% which shows that subsequences occur evenly with a relatively low probability and shows a significant improvement in comparison with the 75% of the probability of likelihood of IDEA cryptograph.

VIII. CONCLUSION
As already mentioned, the world of encryption needs new algorithms and improvement of the existing ones to stay secure. The security of an encryption algorithm depends on the encryption operations and its key-schedule algorithm. In this paper, we attempted to make some improvements to IDEA key-schedule algorithm using Twofish and Blowfish schedulers which are secure enough and we proposed TB-IDEA algorithm. Then, we encrypted "New IDEA SubKeys" phrase once using the sub-keys generated by IDEA scheduler and the next time using the sub-keys generated by TB-IDEA scheduler and evaluated them using CT1 software that provides entropy, autocorrelation, N-Gram and poker tests.
According to Table II that summarizes the results of these tests on the two encrypted text, we concluded that we have not only optimized IDEA, but also improved its security significantly. TB-IDEA can be used along with other encryption algorithms to enhance the security of the data stored on cloud servers.