| Peer-Reviewed

Qualifying Articles of Persian Wikipedia Encyclopedia Through J48 Algorithm, ANFIS and Subtractive Clustering

Received: 20 December 2015     Accepted: 4 January 2016     Published: 15 January 2016
Views:       Downloads:
Abstract

Since Wikipedia encyclopedia is one of the most popular web sites on the internet, providing accurate information is of abundant importance. In this research, the effective variables on quality of Persian articles are identified and a system is, then, designed for judging articles in three quality levels: high quality, cleanup needed, and deletion. First, the variables relating to the articles included in the list of featured articles, good articles, cleanup needed, and deletion articles are collected. Then, two methods are used for the analysis of data: First, a decision tree explains the relationships among the collected variables as rules that are implemented by adaptive neuro fuzzy interference system. Second, the data are implemented by subtractive clustering algorithm and the error of both methods is, finally, measured and compared. The results indicate that the average daily hits, total views, page length, total number of edits, total number of authors, and number of templates used are directly related to quality of Persian articles while the number of recent number of authors is inversely related to quality of articles.

Published in Automation, Control and Intelligent Systems (Volume 3, Issue 6)
DOI 10.11648/j.acis.20150306.18
Page(s) 141-153
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Wikipedia Encyclopedia, Quality of Articles, J48 Decision Tree, ANFIS, Subtractive Clustering Algorithm

References
[1] Wikipedia. Persian Wikipedia, The Free Encyclopedia. Persian Wikipdia2013. https://en.wikipedia.org/wiki/Persian_Wikipedia (Persian Version).
[2] wikipedia. Statistics daily of persian wikipedia. Persian Wikipedia 2013. https://en.wikipedia.org/wiki/Wikipedia:Statistics (Persian Version).
[3] Liu J and Ram S. Who does what: Collaboration patterns in the wikipedia and their impact on article quality. ACM Trans Manage Inf Syst. 2011; 2: 1-23.
[4] Kittur A and Kraut RE. Harnessing the wisdom of crowds in wikipedia: quality through coordination. Proceedings of the 2008 ACM conference on Computer supported cooperative work. San Diego, CA, USA: ACM, 2008, p. 37-46.
[5] Priedhorsky R, Chen J, Lam SK, Panciera K, Terveen L and Riedl J. Creating, destroying, and restoring value in wikipedia. Proceedings of the 2007 international ACM conference on Supporting group work. Sanibel Island, Florida, USA: ACM, 2007, p. 259-68.
[6] Wilkinson DM and Huberman BA. Cooperation and quality in wikipedia. Proceedings of the 2007 international symposium on Wikis. Montreal, Quebec, Canada: ACM, 2007, p. 157-64.
[7] Ullah N. ANFIS BASED MODELS FOR ACCESSING QUALITY OF WIKIPEDIA ARTICLES. Computer Engineering. Dalarna University, 2010.
[8] Wikipedia. Wikipedia: Protection policy. 2013. https://en.wikipedia.org/wiki/Wikipedia:Protection_policy (Persian Version).
[9] Wikipedia. Policies and guidelines. 2013. https://en.wikipedia.org/wiki/Wikipedia:Policies_and_guidelines (Persian Version).
[10] Wikipedia. Manual of Style. 2013. https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style (Persian Version).
[11] Wikipedia. Wikipedia: WikiProject Albums. 2013. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Albums.
[12] Walraven A, Brand-Gruwel S and Boshuizen HPA. How students evaluate information and sources when searching the World Wide Web for information. Comput Educ. 2009; 52: 234-46.
[13] Lim S. How and why do college students use Wikipedia? J Am Soc Inf Sci Technol. 2009; 60: 2189-202.
[14] Lucassen T and Schraagen JM. Propensity to trust and the influence of source and medium cues in credibility evaluation. J Inf Sci. 2012; 38: 566-77.
[15] Stvilia B, Twidale MB, Smith LC and Gasser L. Information quality work organization in wikipedia. J Am Soc Inf Sci Technol. 2008; 59: 983-1001.
[16] Geiger RS and Halfaker A. When the levee breaks: without bots, what happens to Wikipedia's quality control processes? Proceedings of the 9th International Symposium on Open Collaboration. Hong Kong, China: ACM, 2013, p. 1-6.
[17] Rowley J and Johnson F. Understanding trust formation in digital information sources: The case of Wikipedia. J Inf Sci. 2013; 39: 494-508.
[18] Noč M and Zumer M. The completeness of articles and citation in the Slovene Wikipedia. Program. 2014; 48: 53-75.
[19] Lih A. Wikipedia as Participatory journalism: reliable sources? metrics for evaluating collaborative media as a news resource. Proceedings of the 5th International Symposium on Online Journalism. 2004, p. 16-7.
[20] Stein K and Hess C. Does it matter who contributes: a study on featured articles in the german wikipedia. Proceedings of the eighteenth conference on Hypertext and hypermedia. Manchester, UK: ACM, 2007, p. 171-4.
[21] Wӧhner T and Peters R. Assessing the quality of Wikipedia articles with lifecycle based metrics. Proceedings of the 5th International Symposium on Wikis and Open Collaboration. Orlando, Florida: ACM, 2009, p. 1-10.
[22] Saengthongpattana K and Soonthornphisaj N. Thai Wikipedia Quality Measurement using Fuzzy Logic. The 26th Annual Conference of the Japanese Society for Artificial Intelligence. Japan2012, p. ROMBUNNO. 4M1-IOS-3C-1.
[23] Anderka M, Stein B and Busse M. On the evolution of quality flaws and the effectiveness of cleanup tags in the English Wikipedia. Wikipedia Academy. 2012; 2012.
[24] Xiao K, Li B, He P and Yang X-h. Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree. In: Wang M, (ed.). Knowledge Science, Engineering and Management. Springer Berlin Heidelberg, 2013, p. 444-52.
[25] Chai K, Hayati P, Potdar V, Chen W and Talevski A. Assessing post usage for measuring the quality of forum posts. Digital Ecosystems and Technologies (DEST), 2010 4th IEEE International Conference on. 2010, p. 233-8.
[26] Saravanan N, Cholairajan S and Ramachandran KI. Vibration-based fault diagnosis of spur bevel gear box using fuzzy technique. Expert Syst Appl. 2009; 36: 3119-35.
[27] Omid M. Design of an expert system for sorting pistachio nuts through decision tree and fuzzy logic classifier. Expert Syst Appl. 2011; 38: 4339-47.
[28] Jalali A and Mahmoudi A. Pistachio nut varieties sorting by data mining and fuzzy logic classifier. International Journal of Agriculture and Crop Sciences (IJACS). 2013; 5: 101-8.
[29] Chiu SL. Fuzzy Model Identification Based on Cluster Estimation. Journal of Intelligent and Fuzzy Systems. 1994; 2: 267-78.
[30] Yager RR and Filev DP. Approximate clustering via the mountain method. Systems, Man and Cybernetics, IEEE Transactions on. 1994; 24: 1279-84.
[31] Mathworks. Fuzzy Logic Toolbox: User’s Guide (R2014a). 2014, http://www.mathworks.com/help/pdf_doc/fuzzy, pp. 2_109, 2_150, 2_156-158, 2_160-161.
[32] Gaur V, Soni A, Bedi P and Muttoo SK. Comparative Analysis Of ANFIS And ANN For Evaluating Inter-Agent Dependency Requirements. International Journal of Computer Information Systems and Industrial Management Applications. 2014; 6: 23-34.
[33] Yuan X, Khoshgoftaar TM, Allen EB and Ganesan K. An application of fuzzy clustering to software quality prediction. Application-Specific Systems and Software Engineering Technology, 2000 Proceedings 3rd IEEE Symposium on. 2000, p. 85-90.
[34] Wei L-Y, Chen T-L and Ho T-H. A hybrid model based on adaptive-network-based fuzzy inference system to forecast Taiwan stock market. Expert Systems with Applications. 2011; 38: 13625-31.
[35] Malhotra R and Sharma A. A neuro-fuzzy classifier for website quality prediction. Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on. 2013, p. 1274-9.
[36] Afshoon R, Harounabadi A and Mir Abedini J. Assessment and Validating the Quality of Educational Web Sites using Subtractive Clustering. International Journal of Computer Applications. 2014; 98: 42-7.
[37] Wikipedia. Cleanup. 2013. https://en.wikipedia.org/wiki/Wikipedia:Cleanup (Persian Version).
[38] Saunders MN, Saunders M, Lewis P and Thornhill A. Research methods for business students, 5/e. Pearson Education India, 2011, pp. 237-240.
[39] Wikipedia. Deletion policy. 2013. https://en.wikipedia.org/wiki/Wikipedia:Deletion_policy (Persian Version).
[40] Wikipedia. Good articles. 2013. https://en.wikipedia.org/wiki/Wikipedia:Good_articles (Persian Version).
[41] Wikipedia. Good article nominations. 2013. https://en.wikipedia.org/wiki/Wikipedia:Good_article_nominations (Persian Version).
[42] Wikipedia. Wikipedia: Featured articles. Persian Wikipedia2013. https://en.wikipedia.org/wiki/Wikipedia:Featured_articles (Persian Version).
[43] Wikipedia. Wikipedia: Featured article criteria. Persian Wikipedia2013. https://en.wikipedia.org/wiki/Wikipedia:Featured_article_criteria (Persian Version).
[44] Wikipedia. Wikipedia article traffic statistics. 2013. http://stats.grok.se.
[45] Wikipedia. Wiki ViewStats. 2013. http://tools.wmflabs.org/wikiviewstats2
[46] Wikipedia. Glossary. 2013. http://en.wikipedia.org/wiki/Wikipedia:Glossary.
[47] Jang J-SR and Sun C-T. Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice-Hall, Inc., 1997, pp. 73-74.
[48] Sivanandam SN, Sumathi S and Deepa SN. Introduction to Fuzzy Logic using MATLAB. Springer-Verlag New York, Inc., 2006, pp. 123-124.
[49] Witten, Frank and Hall. Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition. 2011, pp. 410.
[50] Bouckaert RR, Frank E, Hall M, et al. WEKA Manual for Version 3-6-2. Hamilton, New Zealand: University of Waikato, 2011, pp. 21-22.
[51] Stvilia B, Al-Faraj A and Yi YJ. Issues of cross-contextual information quality evaluation—The case of Arabic, English, and Korean Wikipedias. Library & Information Science Research. 2009; 31: 232-9.
Cite This Article
  • APA Style

    Seyedtaha Seyedsadr, Mohammadali Afsharkazemi, Hashem Nikoomaram. (2016). Qualifying Articles of Persian Wikipedia Encyclopedia Through J48 Algorithm, ANFIS and Subtractive Clustering. Automation, Control and Intelligent Systems, 3(6), 141-153. https://doi.org/10.11648/j.acis.20150306.18

    Copy | Download

    ACS Style

    Seyedtaha Seyedsadr; Mohammadali Afsharkazemi; Hashem Nikoomaram. Qualifying Articles of Persian Wikipedia Encyclopedia Through J48 Algorithm, ANFIS and Subtractive Clustering. Autom. Control Intell. Syst. 2016, 3(6), 141-153. doi: 10.11648/j.acis.20150306.18

    Copy | Download

    AMA Style

    Seyedtaha Seyedsadr, Mohammadali Afsharkazemi, Hashem Nikoomaram. Qualifying Articles of Persian Wikipedia Encyclopedia Through J48 Algorithm, ANFIS and Subtractive Clustering. Autom Control Intell Syst. 2016;3(6):141-153. doi: 10.11648/j.acis.20150306.18

    Copy | Download

  • @article{10.11648/j.acis.20150306.18,
      author = {Seyedtaha Seyedsadr and Mohammadali Afsharkazemi and Hashem Nikoomaram},
      title = {Qualifying Articles of Persian Wikipedia Encyclopedia Through J48 Algorithm, ANFIS and Subtractive Clustering},
      journal = {Automation, Control and Intelligent Systems},
      volume = {3},
      number = {6},
      pages = {141-153},
      doi = {10.11648/j.acis.20150306.18},
      url = {https://doi.org/10.11648/j.acis.20150306.18},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acis.20150306.18},
      abstract = {Since Wikipedia encyclopedia is one of the most popular web sites on the internet, providing accurate information is of abundant importance. In this research, the effective variables on quality of Persian articles are identified and a system is, then, designed for judging articles in three quality levels: high quality, cleanup needed, and deletion. First, the variables relating to the articles included in the list of featured articles, good articles, cleanup needed, and deletion articles are collected. Then, two methods are used for the analysis of data: First, a decision tree explains the relationships among the collected variables as rules that are implemented by adaptive neuro fuzzy interference system. Second, the data are implemented by subtractive clustering algorithm and the error of both methods is, finally, measured and compared. The results indicate that the average daily hits, total views, page length, total number of edits, total number of authors, and number of templates used are directly related to quality of Persian articles while the number of recent number of authors is inversely related to quality of articles.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Qualifying Articles of Persian Wikipedia Encyclopedia Through J48 Algorithm, ANFIS and Subtractive Clustering
    AU  - Seyedtaha Seyedsadr
    AU  - Mohammadali Afsharkazemi
    AU  - Hashem Nikoomaram
    Y1  - 2016/01/15
    PY  - 2016
    N1  - https://doi.org/10.11648/j.acis.20150306.18
    DO  - 10.11648/j.acis.20150306.18
    T2  - Automation, Control and Intelligent Systems
    JF  - Automation, Control and Intelligent Systems
    JO  - Automation, Control and Intelligent Systems
    SP  - 141
    EP  - 153
    PB  - Science Publishing Group
    SN  - 2328-5591
    UR  - https://doi.org/10.11648/j.acis.20150306.18
    AB  - Since Wikipedia encyclopedia is one of the most popular web sites on the internet, providing accurate information is of abundant importance. In this research, the effective variables on quality of Persian articles are identified and a system is, then, designed for judging articles in three quality levels: high quality, cleanup needed, and deletion. First, the variables relating to the articles included in the list of featured articles, good articles, cleanup needed, and deletion articles are collected. Then, two methods are used for the analysis of data: First, a decision tree explains the relationships among the collected variables as rules that are implemented by adaptive neuro fuzzy interference system. Second, the data are implemented by subtractive clustering algorithm and the error of both methods is, finally, measured and compared. The results indicate that the average daily hits, total views, page length, total number of edits, total number of authors, and number of templates used are directly related to quality of Persian articles while the number of recent number of authors is inversely related to quality of articles.
    VL  - 3
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Department of Management, Electronic Branch, Islamic Azad University, Tehran, Iran

  • Department of Management, Tehran Central Branch, Islamic Azad University, Tehran, Iran

  • Department of Management and Economics, Sciences and Research Branch, Islamic Azad University, Tehran, Iran

  • Sections