Phylogenetic Tree Construction Using K-Mer Forest- Based Distance Calculation

Authors

  • Gihan Gamage University of Moratuwa, Srilanka
  • Nadeeshan Gimhana University of Moratuwa, Srilanka
  • Indika Perera University of Moratuwa, Srilanka.
  • Shanaka Bandara University of Moratuwa, Srilanka
  • Thilina Pathirana University of Moratuwa, Srilanka
  • Anuradha Wickramarachchi Australian National University, Australia.
  • Vijini Mallawaarachchi Australian National University, Australia.

DOI:

https://doi.org/10.3991/ijoe.v16i07.13807

Keywords:

Phylogenetics, Genetic Relatedness, Genetic Distance, k-mer forest, k-medoid clustering

Abstract


Phylogenetics is one of the dominant data engineering research disciplines based on biological information. More particularly here, we consider raw DNA sequences and do comparative analysis in order to come up with important conclusions. When representing evolutionary relationships among different organisms in a concise manner, the phylogenetic tree helps significantly. When constructing phylogenetic trees, the elementary step is to calculate the genetic distance among species. Alignment-based sequencing and alignment-free sequencing are the two main distance computation methods that are used to find genetic relatedness of different species. In this paper we propose a novel alignment-free, pairwise, distance calculation method based on k-mers and a state of art machine learning-based phylogenetic tree construction mechanism. With the proposed approach we can convert longer DNA sequences into compendious k-mer forests which gear up the efficiency of comparison. Later we construct the phylogenetic tree based on calculated distances with the help of an algorithm build upon k-medoid clustering, which guaranteed significant efficiency and accuracy compared to traditional phylogenetic tree construction methods.

Author Biographies

Gihan Gamage, University of Moratuwa, Srilanka

Junior Consultant, Department of Computer Science and Engineering, University of Moratuwa, Srilanka.

Nadeeshan Gimhana, University of Moratuwa, Srilanka

Undergraduate, Department of Computer Science and Engineering, University of Moratuwa, Srilanka.

Indika Perera, University of Moratuwa, Srilanka.

Senior Lecturer,  Department of Computer Science and Engineering, University of Moratuwa, Srilanka.

Shanaka Bandara, University of Moratuwa, Srilanka

Undergraduate, Department of Computer Science and Engineering, University of Moratuwa, Srilanka.

Thilina Pathirana, University of Moratuwa, Srilanka

Undergraduate, Department of Computer Science and Engineering, University of Moratuwa, Srilanka.

Anuradha Wickramarachchi, Australian National University, Australia.

PhD Candidate, Australian National University, Australia.

Vijini Mallawaarachchi, Australian National University, Australia.

PhD Candidate, Australian National University, Australia.

Downloads

Published

2020-06-19

How to Cite

Gamage, G., Gimhana, N., Perera, I., Bandara, S., Pathirana, T., Wickramarachchi, A., & Mallawaarachchi, V. (2020). Phylogenetic Tree Construction Using K-Mer Forest- Based Distance Calculation. International Journal of Online and Biomedical Engineering (iJOE), 16(07), pp. 4–20. https://doi.org/10.3991/ijoe.v16i07.13807

Issue

Section

Papers