A Grid-based architecture for nearest neighbor based condensation of huge datasets

Fabrizio Angiulli, Gianluigi Folino

January, 2008

Abstract

Grid computing provides services to the users to discover, transfer, and manipulate large datasets distributed in different locations. Classifying large datasets without using a centralized approach is a key problem in this kind of architectures and, for instance, it is essential for scientists to face with the ever growing bioinformatic datasets. To this aim, Grid-FCNN, a grid-enabled architecture for classifying huge data set using the nearest neighbor rule, is introduced. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, two different strategies are presented, namely Grid-FCNN 1 and Grid-FCNN2, and their performances in grid environments are analyzed. An analysis of the experimental results, performed on both synthetic and real very large datasets, revealed that these techniques can be used in a Grid. Furthermore, it is illustrated how the Grid-based algorithm can be applied to a real bioinformatics scenario. Copyright 2008 ACM.

Type

Conference paper

Publication

High Performance Distributed Computing - Proceedings of the 3rd International Workshop on Use of P2P, Grid and Agents for the Development of Content Networks 2008, UPGRADE'08

Add the full text or supplementary notes for the publication here using Markdown formatting.