A Preliminary Study for Building an Arabic Corpus of Pair Questions-texts from the Web: AQA-WebCorp

Wided Bakari; Patrice Bellot; Mahmoud Neji

doi:10.3991/ijes.v4i2.5345

A Preliminary Study for Building an Arabic Corpus of Pair Questions-texts from the Web: AQA-WebCorp

Authors

Wided Bakari
Patrice Bellot
Mahmoud Neji

DOI:

https://doi.org/10.3991/ijes.v4i2.5345

Abstract

With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a data base of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.

Downloads

Published

2016-07-05

How to Cite

Bakari, W., Bellot, P., & Neji, M. (2016). A Preliminary Study for Building an Arabic Corpus of Pair Questions-texts from the Web: AQA-WebCorp. International Journal of Recent Contributions from Engineering, Science & IT (iJES), 4(2), pp. 38–45. https://doi.org/10.3991/ijes.v4i2.5345

Download Citation

Issue

Vol. 4 No. 2 (2016)

Section

Papers

License

The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY What does this mean?). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.
By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.
This journal has been awarded the SPARC Europe Seal for Open Access Journals (What's this?)

A Preliminary Study for Building an Arabic Corpus of Pair Questions-texts from the Web: AQA-WebCorp

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Other journals