Applying Optimized Algorithms and Technology for Interconnecting Big Data Resources in Government Institutions
DOI:
https://doi.org/10.3991/ijoe.v19i08.39661Keywords:
data quality assessment, Levenshtein distance (LV) algorithm, data quality improvementAbstract
The quality of the data in core electronic registers has constantly decreased as a result of numerous errors that were made and inconsistencies in the data in these databases due to the growing number of databases created with the intention of providing electronic services for public administration and the lack of the data harmonization or interoperability between these databases.
Evaluating and improving the quality of data by matching and linking records from multiple data sources becomes exceedingly difficult due to the incredibly large volume of data in these numerous data sources with different data architectures and no unique field to create interconnection among them.
Different algorithms are developed to treat these issues and our focus will be on algorithms that handle large amounts of data, such as Levenshtein distance (LV) algorithm and Damerau-Levenshtein distance (DL) algorithm.
In order to analyze and evaluate the effectiveness and quality of data using the mentioned algorithms and making improvements to these algorithms, through this paper we will conduct experiments on large data sets with more than 1 million records.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Genc Hamzaj, Artan Mazrekaj, Isak Shabani
This work is licensed under a Creative Commons Attribution 4.0 International License.