Partitioned-based Fuzzy Clustering to Learn Documents' Triadic Similarity

Sonia Alouane Ksouri, Minyar Sassi Hidri, Kamel Barkaoui

Abstract


With the development of the Web and the high availability of storage spaces, more and more documents become accessible. For that reason, similarity learning suffers from a scalability problem in both memory use and computational time when a data set is large. This paper provides a fuzzy triadic similarity measure to calculate memberships in a context of document co-clustering. It allows computing simultaneously fuzzy co-similarity matrices between documents/sentences and sentences/words. Each one is built on the basis of the others. The proposed model is extended to tackle the problem of large data sets by a splitting architecture which deals with a new fuzzy triadic similarity to parallelize both memory use and computation on distributed computers. This architecture is based on fuzzy clustering for partitioning data sets into similar groups (or clusters) in order to create more coherent sub-sets.

Full Text:

PDF



International Journal of Recent Contributions from Engineering, Science & IT (iJES) – eISSN: 2197-8581
Creative Commons License
Indexing:
DOAJ logo DBLP logo MAS logo