Grid computing provides services to the users to discover, transfer, and manipulate large datasets distributed in different locations. Classifying large datasets without using a centralized approach is a key problem in this kind of architectures and, for instance, it is essential for scientists to face with the ever growing bioinformatic datasets. To this aim, Grid-FCNN, a grid-enabled architecture for classifying huge data set using the nearest neighbor rule, is introduced. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, two different strategies are presented, namely Grid-FCNN 1 and Grid-FCNN2, and their performances in grid environments are analyzed. An analysis of the experimental results, performed on both synthetic and real very large datasets, revealed that these techniques can be used in a Grid. Furthermore, it is illustrated how the Grid-based algorithm can be applied to a real bioinformatics scenario. Copyright 2008 ACM.
Add the full text or supplementary notes for the publication here using Markdown formatting.