Storage and Retrieval of Software Component using Hadoop and MapReduce

-The meta-data information acquired during the development of software component is very precious which has to be stored and reused for the forthcoming software projects. Effective & efficient storage and retrieval system of software components is much essential in components reuse process in the forthcoming software projects. This paper presents a meta-data model and Hadoop environment for software components reuse that considers semantic information based on ontologies and taxonomies’. The proposed model makes possible the recommendation of interrelated components, as ontology and taxonomies characteristics were incorporated in Hadoop environment. A proposed method helps the software developer for easy storage and retrieval of the needed software artifacts. The storage of software component is managed by the Hadoop Distributed File System (HDFS) and retrieval of the relevant software component is managed by the Map / Reduce concept. The ontology and meta-data helps to extract the knowledge from the legacy system and it helps in understanding the semantic meaning of the user requirements and reusable software component. The results show that the new software component reuse method can evidently improve the software component retrieval precision and take care of the full-scale of the searching results.


INTRODUCTION
Software component reuse is the use of existing software component, meta-data and knowledge or artifacts to build new software. Software component reuse is one of the major goals of software engineering research in recent years and component reusability has become an important factor in the process of software development. It reduces software development effort, time and cost and increase reliability, efficiency and flexibility. For software component reuse successfully, it is fundamental to choose appropriated reusable components from a collection of available components. In fact, the availability of software reusable assets in software development phases provides valuable support to design and implementation with software architectures by improving productivity, quality, reliability, speed of delivery, and in long-term decreases in costs for software development and maintenance of software. Thus, it is desirable to have a Hadoop Distributed File System for repository that supports the storage, query and retrieval of components and makes reuse possible. A technique to reusable software component repositories is needed that provides the retrieval of semantically interrelated software components. This paper presents ontology and Hadoop Distributed File System based meta-data repository and component repository for storage and retrieval of software components. The metadata repository integrates expert knowledge of correlative domains, reusable software component and generalizes crucial concepts and relations among concepts in these domains and software component [3] [10]. These query terms which are formed in virtue of metadata knowledge of software component can improve the software component retrieval precision. The function of a software component reuse (SCR) system is that construct the model of software component retrieval, in the model, functions, applied domains, work environments, working , static and dynamic behaviors of a software component can be accurately expressed, the software component can be store, searched and reused [3] [19].

II.
SOFTWARE COMPONENT REUSE (SCR) SYSTEM The proposed SCR system uses a multi-agent and meta-data approach to implement several reasoning mechanisms, searching mechanism like: knowledge access, process workflow automation or automated code generation and generation of software component knowledge. They use three different ontologies: Meta-data Ontology, Component Ontology and Describing Ontology. The SCR system is three tier architecture with data layer (Model), Application layer (Controller), View layer (View) where Model-View-Controller system is well established for software component code reuse. This system works as a platform that provides a way to store the Software development and component reuse knowledge in the Ontology based Meta-data Repository in the form of Name nodes. Reusable software component being stored in the HDFS file system and accessible through an interface framework, the different reusable elements should be described using a software artifact Ontology in describing & Meta-Data repository and classified through Meta-data & Describing Ontology. Meta-Data & Describing knowledge representation structures will be used to empower the search mechanisms and make easier the reuse of stored component. In figure 1, we present the software component reuse system's architecture. The system can be structured in three different logical layers: Data (Model), View and Application (Controller). The Data layer corresponds to the knowledge base and Meta-Data; it stores all the reusable software components and knowledge needed for the reuse system reasoning, including the knowledge representation structures and the components. The Controller layer implements the searching mechanisms, including the search, suggestion and browsing facilities. The View layer provides interface for software component retrieval and comprises the semantic web service and the applications that use the system's functionalities through this semantic web service. The proposed system uses a multi-agent and meta-Data approach with Hadoop architecture to implement several reasoning mechanisms for reusable software component storing and retrieval, like: knowledge access, process workflow automation or automated code generation and effectively reusing software component. Hadoop model does not include the applications and end-user presentation of reusable software components. The representation of component is broken down into the Hadoop use cases from above Compute, Storage, retrieval and Database workloads Each workload has specific characteristics for operations, deployment, architecture and management. Proposed Hadoop architecture for reusable component retrieval has a variety of node types within each Hadoop cluster; these include DataNodes, NameNodes, and EdgeNodes. Hadoop"s architecture for software component storage and retrieval is modular, allowing individual components to be scaled up and down as the needs of the environment change. The NameNode in Hadoop"s architecture is the central location for information about the file system and contacted by clients of the Hadoop HDFS to locate information within the file system and provide updates for data they have added, moved, manipulated, or deleted. The DataNode serves two functions: It ontains portion of the reusable component data in the HDFS and it acts as a compute platform for storing and retrieval of component, some of which will utilize the local data within the HDFS. The EdgeNode provide interface and access point for the external applications, tools, and users that need to utilize the Hadoop environment for storing and retrieval of component. Controller work as a search engine and controller acts as a middleware between the model and view where the searching techniques for components is implemented for reuse software component.

III. SOFTWARE COMPONENT SEARCHING & RETRIEVAL PROCESS
Component retrieval is implemented based on the architecture of the SCR system that is shown in Figure 1. The SCR system searching is based on ontology-based metadata. Ontology characteristics are suitable for software component retrieval, as they allow capturing domain semantics and recommending interrelated software components. Meta data elements permit retrieving and recommending components based on the analysis of semantic information. Ontology with metadata based search uses the domain ontology for identification of the domain knowledge and enrich the query to get more relevant component from the repository. The ontology repository can be used to extract the code concept in the domain and the sub domain of a particular context and Ontology Repository contains the concept Metadata. After receiving the patterns in the Ontology repository we apply concept with semantic similarity based search mechanisms to extract the correct component related to user requirement. All the retrieved software components from the repository are listed and the most relevant is chosen. The relevant component can be used fully, partially or adapted according to the current need. Adapting is the concept of enhancing or expanding the component according to the current need of the project. First layer of searching system define as a user interface layer were the user interacts with the system by uploading the Software requirement Document for searching the relevant code and design from the repository. Text mining with meta-data is the concept of mining the text document and identifying the most needed information according to the context & similarity. In this model text mining with meta-data is implemented in the SRS document where the where the whole text are mined. The noun part specifies the need class or entity in the project domain and the verb part shows the needed service or method in the system. The list of classes and methods are identified from the SRS document using the text mining concept. This given as search query to retrieve the relevant component from the cloud based repository. The class and method in the SRS helps to identify the need functionality of the current project.

IV. RESULTS ANALYSIS
Precision: Precision is defined as the number of relevant components retrieved divided by the total number of components retrieved. Recall: Recall is defined as the number of relevant components retrieved divided by the total number of relevant components in the index. V. CONCLUSION The benefits of Hadoop environment for reusable software component are best realized when it is teamed with a good archival solution that is cost effective, scalable and secure storing & searching of reusable software components, allowing for speedy data retrieval and analysis. By using Hadoop as an archival platform, organizations can benefit from multiple source connectivity, scalability, cost benefits, and high accessibility of archived data for reuse & analytics.
In this research work we described the CSR system, which intends to explore new ways to store and reuse software component & knowledge. In this system cloud based storage and Semantic based retrieval is used. The CSR platform provides several ways to reuse software component and share knowledge, including a hands-on way of suggesting relevant component & knowledge to the software developer using the user context. Semantic web technologies enable he association of semantics to software artifacts, which can be used by inference engines to provide new functionalities such as semantic retrieval, or suggestion of relevant knowledge. SCR provides way to integration with client applications through the semantic web service. Thus, a tool for Software Engineering and MVC (Model View Controller Architecture) that use metadata can be integrated in SCR has knowledge providers. Combining software reuse Cloud and metadata is a new emerging trend in software development process. Combining these technologies helps the software development process by locating preexisting components at the design & implementation time due to which the total effort of software development is decreased.