Offering a Combined Approach to The analysis of News database, using the software Rapid Mine Case study: News agency in Persian

- One of the most widely used approaches in the management of Web-based systems, use the online agency and Data related to them . Online web agency, like the heart of a community and is one of the most critical information resources in a society. The importance of this topic in the news military organizations will be doubled. So having a right way and a unique conceptual approach to analyze and categorize the resources, Can provide numerous benefits and assistance to decision-makers within the organization, especially military organizations in carrying out decisions. In this study, firstly we contribute to the study and understanding of the concepts of data mining methods. Secondly, analysis of different methods of text processing must be done from different perspectives So as to identify the positive strategies and strengths of a variety of methods and The Finally, the analysis of news database, so as to uncover information hidden in the data and how to use it.


INTRODUCTION
The news in the world today is often used as a database for archiving news, while among them there are very useful knowledge and hidden relationships. The discovery of this knowledge can be very practical and interesting for decision-makers and analysts in various fields and in particular the analysis of news.Given the huge volume of these data, scientific data must be analyzed, the hidden relationships and knowledge extracted between them and accessible to managers and decision-makers.In this way, the efficiency of the process of analyzing and identifying news is increased intentionally, because the analysis of the news is out of user analysis and can be accomplished through much higher yielding data mining methods.Using these methods can give all the organizations in the country the ability to analyze the news coming to the organization, according to the model of the organization, and to provide it to the decision makers of that organization. Therefore, in this paper, by explaining the exact application of data mining and technology methods in the analysis of published news, how to extract relationships and work outcomes in the news system is explained.Eventually, with hidden knowledge available in the database, which will be analyzed and interpreted, a framework for analyzing the news and how the hidden relationships between the news will be presented based on the predictive methods of data mining.In general, the goals that follow from this method can be summarized as follows:  Determine the keywords and nature of the news based on the data mining model, and provide analytical reports.  Discovery of hidden knowledge of news produced based on the time series available among them.
 Detecting and analyzing hidden relationships between keywords and the nature of news in the past.
 The discovery of important words in published news, which has a great impact on the minds of people and readers of news and determines the orientations of the nature of news based on data mining techniques.  Correct categorization and clustering, information and data.

DEFINITION OF CONCEPTS 2.1. Data mining definition
It can be said that text mining uses information retrieval techniques, information extraction, and the processing of natural language. And link them to KDD algorithms and methods, data mining, machine learning, and statistical data [1].Due to different research areas, different definitions of the text can be considered for each of them.
 Text mining = Extraction of information: In this definition, the corresponding text mining is considered to be extraction of information (extracting facts from the text).  Text mining = Discovery of text data: Text mining can be considered as methods and algorithms of machine learning and statistical fields for texts with the aim of finding useful patterns. For this purpose, preprocessing texts is essential. In many ways, methods for extracting information, processing natural language, or some simple preprocessor for extracting data from texts are used. Then data mining algorithms can be applied to extracted data [2][1].  Text mining = KDD process.

Discovering Knowledge and Its Relationship with Text-mining
Knowledge Discovery in the Database (KDD), the term refers to all the ways in which it seeks out the relevance and order of the visible information. The KDD word is used to describe all stages of extracting information from the database, as well as the purpose of the primary tasks of the decision rules.In general, KDD is the process of finding information and useful patterns of data, and data mining is the use of algorithms to find useful information in the KDD process [4] [3].
Among the features that can be used to measure the quality of the patterns found in the data are: human comprehensibility, validation with statistical criteria, novelty and usefulness. The discovery of knowledge in the database can be considered as a process defined by several steps of processing.These steps should be applied to the dataset in order to extract useful patterns. These steps are performed repeatedly, and some steps require feedback from the user.A KDD user In order to select the correct subset of the data, there is a need for a high understanding of the scope of the data, a proper category of patterns, and a good yardstick for interesting patterns. Therefore, the KDD system should have interactive tools, not automated analysis systems.According to [3] [4], the steps can be expressed as follows: 1) understanding the business 2) understanding the data 3) preparing data 4) modeling 5) evaluation 6) deployment. The preprocessing stage is often one of the steps that takes a lot of time, and is still very important in achieving the desired result. Especially in text mining, which requires special preprocessing methods for converting text data into a format suitable for data mining algorithms.

Related search areas
There are three basic ways to deal with this vast amount of unstructured information: information retrieval, information extraction, and natural language processing.
Information retrieval: Essentially related to the retrieval of documents. The usual work involved in data retrieval is the consideration of the user's need for the most relevant texts and documentation. This is not the search for knowledge, but only the set of words that he considers more relevant to the searcher's information needs. This method does not really bring us any knowledge or even any information [5].
Natural Language Processing: The overall purpose of the NLP is to achieve a better understanding of the natural language by computers. Robust and simple techniques for fast text processing are used. Also, linguistic analysis techniques are used to process the text.
Extraction of information: The purpose of information extraction methods is to extract specific information from text documents. Extracting information can be used as a preprocessor in the text.Extracting information

Manufacturer of documentary matrixes
Collect text documents

Dimension summary Text clustering Optimization
Clusters includes mapping natural language texts (e.g., reports, articles, journals, newspapers, emails, web pages and any text database) into a predefined and predefined display or templates that are filled out, Selects the key information from the original text. Once the information is extracted, then the information can be stored in the database for future use [6].
3. RELATED WORKS There are many methods in the knowledge extraction phase. However, all of these methods may be divided into two main categories. The main two are methods based on efficiency and knowledge-based methods.In the first approach, designers are worried about the system's performance, so they will design the system that has the best performance and speed. The most common methods in this approach are statistical methods and neural networks. Statistical methods are based on any kind of statistical information that can be extracted from texts. Things like repeating words alone, repeating words together, and similar things.On the other hand, there are knowledge-based approaches that look at this from another angle. They try first to extract as much as possible existing concepts from the body of texts and, secondly, to establish relationships between these concepts.The use of this method is highly dependent on the NLP. In fact, this is the goal that the NLP also pursues and that is the understanding of the text. Devices that use these methods are currently not numerous, but DR-LINK from Syracus University is one of them [7].

Discotex Method
This method was provided by Kanya in 2007 [8]. This method provides a new framework for information mining based on the integration of the Information Extraction System (IE) and the Standard Inference (KDD) module.IE converts text documents into more structured data. In fact, it searches for specific pieces of data in natural language documents and converts a set of semi-structured text documents to a more structured database.In this method, RAPIER and BWI are used to build IE. Then the database built by the IE module is used by the KDD module to explore more knowledge. In an improved version of this method, the rules derived from the KDD module are used to predict the missing information and improve the accuracy of the IE module. Apriori and Ripper have been used to build the KDD module.

Textminer method
In this method, firstly, the semi-structured data is changed, for example, documents are converted to structured data stored in a database. The second component applies data mining techniques to the output of the first component. Most methods for text mining apply exploration algorithms to labels attributed to each document.These tags may be keywords extracted from the document or just a list of words in the document. In textminer, exploration algorithms on terms(meaningful sequence of words such as department of computation) combined with events (meaningful set of terms, for example, in a financial domain, purchase between company A and B) extracted from Documents are applied.The authors believe that the most important feature factors that describe a document are the terms and events expressed in the document. This information is stored in a table called EvantType.Extracting information is an important technology that has one step ahead of it. In this way, once the information is extracted, then the information can be stored in the database and searched for the query and summarized in the natural language.The first necessary step is the linguistic advance (linguistic). This step involves a number of linguistic techniques such as tokenization, a part of speech labeling, and so on. The general objectives of this paper can be divided into: 1) Managing the information stored in the text database (document collections) 2) Extracting useful knowledge.This method consists of two components of text analysis and data mining. The first component converts semi-structured data into more structured data stored in the database, and the second component applies data mining techniques to the output of the first component. The purpose of this method is to manage information (categorizing documents in appropriate categories) and exploring data to explore knowledge.Therefore, in this method, semantics and events are first extracted and then stored in the database. Then, the proper clustering algorithm (using the Rock algorithm and the lnc concept) is applied to the generated database and the documents are grouped, so that the same documents fall into one group. Then, an appropriate classification algorithm (decision tree) is used to validate the results of clustering and better exploitation of the discovered knowledge. For further details, this approach can be found in [9].

RESEARCH METHODOLOGY AND PROPOSED FRAMEWORK
In today's research, a variety of methods are used. These methods, or quantitatively, use the statistics and numbers to achieve the result, or using qualitative methods. Research methods If we consider the two-sided vector, we can say that one side of the vector is quantitative methods and the other is qualitative methods. Quantitative methods are carried out using statistics and figures, and qualitative methods, with the help of different types of observation and interviewing fans, collect information. The common feature of these two methods is to equip the researchers at all stages of the research with a variety of information gathering and analysis engineers. In fact, quantitative methods deal with counting and measuring aspects of social life, while qualitative methods deal with the production of reasoning descriptions and the discovery of the meanings and changes of social activists [10].
However, all research methods do not summarize the use of quantity or quality, but one can adopt an approach that uses both methods depending on the type of research. These types of methods are called hybrid [11][12] [13]. In fact, the emergence and use of hybrid methods to strengthen research is carried out.The combined method has been used in this study. The reason for this is the use of news text databases. These databases are quantitatively and qualitatively gathered from a variety of news stories. We also use the CRISP standard and methodology to illustrate the research framework by using data mining techniques. Figure 2 shows the proposed research framework [14]. In order to analyze the research in this context, the literature review of topics such as data mining, news analysis, k-means algorithm and text mining and its various methods are discussed in detail. This survey was conducted to give a more complete coverage of the topics and gain ideas. The next phase of the research involves examining the existing system for the implementation of the analysis. For this purpose, various text and news analyzes were considered as a case study. The reason for this is the existence of high-volume data in this area and close connection with analytical topics.
During this research, in order to understand the space of the case, various types of web pages have been used in the field of news and literature reviews of this field. The next step is to identify the data and prepare it for use in the implementation phase. At this stage, it is examined which data is needed and what data in this template is appropriate for the implementation phase.   The table consists of linking two news and text tables, including information, news ID, text ID, type of text, text, group ID, source ID, status, and type identifier. The number of data columns dropped from 20 columns to 8 columns. The main reason for this diminution is the lack of sufficient information from the column or data redundancy.
6. CONCLUSION In this paper, a hybrid method for analyzing the news data of Persian news agency has been presented. The proposed method follows the CRISP standard and during its implementation uses qualitative and quantitative data mining algorithms and the Kmeans algorithm.
During the process of this research, the concepts that are needed in the project are first described in detail. The next step is to examine the existing system for implementing the analysis.The next step is to recognize the data and prepare it for use in the implementation phase. Finally, at the implementation stage, we tried to fully explain the simple implementation of the research using the RapidMine software with simple and understandable code on the database and the proposed method.Using this hybrid analysis method, in addition to providing an adequate analysis of the news and finding hidden relationships between them, can have several sub-advantages, such as establishing a native method for analyzing news, a method compatible with the Persian language, establishing more semantic analyzes On common vocabulary in Persian news and eventually helping decision makers of civilian and military organizations to make the right decisions.