Text Mining: Design of Interactive Search Engine Based Regular Expressions of Online Automobile Advertisements

Ahmed Adeeb Jalal

Abstract


Technology world has greatly evolved over the past decades, which led to inflated data volume. This progress of technology in the digital form generated scattered texts across millions of web pages. Unstructured texts contain a vast amount of textual data. Discover of useful and interesting relations from unstructured texts requires more processing by computers. Therefore, text mining and information extraction have become an exciting research field to get structured and valuable information. This paper focuses on text pre-processing of automotive advertisements domains to configure a structured database. The structured database was created by extract the information over unstructured automotive advertisements, which is an area of natural language processing. Information extraction deals with finding factual information in text using learning regular expressions. We manually craft rule-based specific approaches to extract structured information from unstructured web pages. Structured information will be provided by user-friendly search engine designed for topic-specific knowledge. Consequently, this information that extracted from these advertisements uses to perform a structured search over certain interesting attributes. Thus, the tuples are assigned a probability and indexed to support the efficiency of extraction and exploration via user queries.

Keywords


Information Extraction; Information Retrieval; Natural Language Processing; Text Mining; Web Crawler

Full Text:

PDF


 
International Journal of Engineering Pedagogy (iJEP) – eISSN: 2192-4880
Creative Commons License
Indexing:
Scopus logo ESCI logo DBLP logo EBSCO logo DOAJ logo