Search this journal:     Advanced search
Original Research

Automatic genre classification for Afrikaans

Dirk Snyman, Gerhard van Huyssteen, Walter Daelemans

Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie; Vol 33, No 1 (2014), 12 bladsye. doi: 10.4102/satnt.v33i1.759

Submitted: 08 August 2013
Published:  24 November 2014

Abstract

When working in the terrain of text processing, metadata about a particular text plays an important role. Metadata is often generated, using automatic text classification systems which classify a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified, is its genre. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aimed to investigate the techniques and approaches that are generally used for automatic genre classification systems, and identify the best approach for Afrikaans (a resource scarce language). With the development of an automatic genre classification system, there is a set of variables that must be considered as they influence the performance of machine learning approaches (i.e. the algorithm used, the amount of training data, and data representation as features). If these variables are handled correctly, an optimal combination of them can be identified to successfully develop a genre classification system. In this article a genre classification system is being developed by using the following approach: The implementation of a MNB algorithm with a bag of words approach feature set. This system provides a resultant f-score (performance measure) of 0.929.

Full Text:  |  HTML (Afrikaans)  |  EPUB (Afrikaans)  |  XML (Afrikaans)  |  PDF (Afrikaans) (805KB)

Author affiliations

Dirk Snyman, Centre for Text Technology, North-West University, South Africa
Gerhard van Huyssteen, Centre for Text Technology, North-West University, South Africa
Walter Daelemans, Computational Linguistics and Psycholinguistics Research Group, University of Antwerpen, Belgium

Keywords

Genreklassifikasie, Hulpbronskaars Tale, Masjienleer, Mensetaaltegnologie, Natuurliketaalprosessering

Metrics

Total abstract views: 1465
Total article views: 2145

Cited-By

No related citations found

Comments on this article

Before posting your comment, please read our policy.
Post a Comment (Login required)


ISSN: 0254-3486 (print) | ISSN: 2222-4173 (online)

Connect on: Facebook, Twitter, Google+, LinkedIn and YouTube

Subscribe to our newsletter

All articles published in this journal are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, unless otherwise stated.

Website design & content: ©2017 AOSIS (Pty) Ltd. All Rights Reserved. No Unauthorised Duplication Allowed.

AOSIS Publishing | Empowering Africa through access to knowledge
Postnet Suite #110, Private Bag X19, Durbanville, South Africa, 7551
Tel: 086 1000 381
Tel: +27 21 975 2602
Fax: 086 5004 974

publishing(AT)aosis.co.za replace (AT) with @

Please read the privacy statement.