e-ISSN : 0975-3397
Print ISSN : 2229-5631
Home | About Us | Contact Us

ARTICLES IN PRESS

Articles in Press

ISSUES

Current Issue
Archives

CALL FOR PAPERS

CFP 2021

TOPICS

IJCSE Topics

EDITORIAL BOARD

Editors

Indexed in

oa
 

ABSTRACT

Title : Efficient Processing of XML Documents in Hadoop Map Reduce
Authors : Dmitry Vasilenko, Mahesh Kurapati
Keywords : XML, Apache Hadoop, Apache Hive, Map-Reduce, VTD-XML, XPath.
Issue Date : September 2014.
Abstract :
XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.
Page(s) : 329-333
ISSN : 0975–3397
Source : Vol. 6, Issue.09

All Rights Reserved © 2009-2024 Engg Journals Publications
Page copy protected against web site content infringement by CopyscapeCreative Commons License