Combining Different Approaches to Improve Arabic Text Documents Classification

Abuhaiba, Ibrahim and Dawoud, Hassan (2017) Combining Different Approaches to Improve Arabic Text Documents Classification. International Journal of Intelligent Systems and Applications(IJISA), 9 (4). pp. 39-52. ISSN ISSN: 2074-904X (Print), ISSN: 2074-9058 (Online)

Full text not available from this repository. (Request a copy)
Official URL: http://www.mecs-press.org/ijisa/v9n4.html

Abstract

The objective of this research is to improve Arabic text documents classification by combining different classification algorithms. To achieve this objective we build four models using different combination methods. The first combined model is built using fixed combination rules, where five rules are used; and for each rule we used different number of classifiers. The best classification accuracy, 95.3%, is achieved using majority voting rule with seven classifiers, and the time required to build the model is 836 seconds. The second combination approach is stacking, which consists of two stages of classification. The first stage is performed by base classifiers, and the second by a meta classifier. In our experiments, we used different numbers of base classifiers and two different meta classifiers: Naïve Bayes and linear regression. Stacking achieved a very high classification accuracy, 99.2% and 99.4%, using Naïve Bayes and linear regression as meta classifiers, respectively. Stacking needed a long time to build the models, which is 1963 seconds using naïve Bayes and 3718 seconds using linear regression, since it consists of two stages of learning. The third model uses AdaBoost to boost a C4.5 classifier with different number of iterations. Boosting improves the classification accuracy of the C4.5 classifier; 95.3%, using 5 iterations, and needs 1175 seconds to build the model, while the accuracy is 99.5% using 10 iterations and requires 1966 seconds to build the model. The fourth model uses bagging with decision tree. The accuracy is 93.7% achieved in 296 seconds when using 5 iterations, and 99.4% when using 10 iteration requiring 471 seconds. We used three datasets to test the combined models: BBC Arabic, CNN Arabic, and OSAC datasets. The experiments are performed using Weka and RapidMiner data mining tools. We used a platform of Intel Core i3 of 2.2 GHz CPU with 4GB RAM. The results of all models showed that combining classifiers can effectively improve the accuracy of Arabic text documents classification.

Item Type: Article
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: م. حسن محمد حسن داود
Date Deposited: 11 Mar 2018 09:06
Last Modified: 11 Mar 2018 09:06
URI: http://scholar.alaqsa.edu.ps/id/eprint/360

Actions (login required)

View Item View Item