A Machine Learning Approach to Persian Text Readability Assessment Using a Crowdsourced Dataset
An automated approach to text readability assessment is essential to a language and can be a powerful tool for improving the understandability of texts written and published in that language. However, the Persian language, which is spoken by over 110 million speakers 28, lacks such a system. Unlike other languages such as English, French, and Chinese, minimal research studies have been conducted to develop an accurate and reliable text readability assessment system for the Persian language.
In the present research, the first Persian dataset for text readability assessment was gathered, and the first model for Persian text readability assessment using machine learning was introduced. The experiments revealed that this model was accurate and could assess the readability of Persian texts with a high degree of confidence. The results of this study can be used in several applications such as medical and educational text readability evaluation and have the potential to be the cornerstone of future studies in Persian text readability assessment.
Keywords:text readability machine learning Persian language readability dataset
With the advent of the World Wide Web, the volume of digital contents such as text is growing fast every day. One of the main properties of a text is its readability. Readability or difficulty of a text signifies how understandable it is for a human reader. The massive amount of accessible texts makes it hard to find those texts with a certain readability level or accurately assess the readability of available texts by a human. With that in mind, the Internet is not the only place where we need to assess the readability of a text. Readability measurement is indispensable in an education system where it can help teachers/instructors find suitable content for students to read based on their reading skills or help textbook authors to evaluate their books in terms of suitability for the intended students. Second language learners can also benefit from an automated text readability assessment system to find suitable texts for their educational purposes Xia et al. (2019). Another application of readability measurement lies in medical texts. Studies have shown that readability of a text can significantly enhance its understandability by readers Leroy et al. (2013). By measuring the readability of educational, medical contents written for patients, it is possible to ensure that it is understandable for the public. Text readability measurement has many other applications in areas such as advertising Pancer et al. (2019), publishing, and other related practices. Computers can help us facilitate the process of text readability assessment; however, we need an accurate and reliable measure to assess text readability.
Early automated text readability assessments were undertaken using readability formulae. These formulae would measure the readability of a text-based on some simple characteristics. One of the most popular and widely used readability formulae is Flesch-Kincaid readability formula Kincaid et al. (1975). This formula measures the readability of a given text regarding the total number of words and syllables in sentences. Nonetheless, readability formulae have been considered not accurate enough through several research studies Heydari and Riazi (2012); Begeny and Greene (2013); Crossley et al. (2017). A more effective approach to the assessment of the text readability is to use machine learning techniques for this task. Therefore, in recent years, several research studies have been conducted on designing and testing a text readability assessment system using machine learning techniques. One of the first studies of this kind was the work of Schwarm and Ostendorf (2005), which introduced a model for English text readability assessment using machine learning approaches.
The only known method for Persian text readability assessment is the Flesch-Dayani Dayani (2000) formula. Flesch-Dayani formula is a recalculated version of Flesch-Kincaid formula which makes it optimal for the Persian language.
As mentioned earlier, the traditional approaches to text readability assessment have not been accurate enough. However, no previous machine learning model is available for Persian language text readability assessment. The main reasons for the absence of such a model are the lack of any text readability datasets for the Persian language. In this research paper, a text readability dataset for the Persian language is gathered using a novel method. Further, the first machine learning model for Persian text readability assessment is designed and tested.
The structure of this paper is designed as follows: (i) We discuss the previous research studies performed in this field (section 2); (ii) Our approach to gathering a Persian text readability dataset and its characteristics are proposed (section 3); (iii) the details of a machine learning model for Persian text readability assessment is explained (section 4); (iv) Test results are presented (section 5); and (v) The conclusions and future directions are discussed (section 6).
2 State of the Art
The previously published research studies in the field of text readability assessment for the Persian language are limited. As mentioned before, the only well-known text readability measure the Persian language is Flesch-Dayani Dayani (2000) measure. This formula is shown in Eq. 1.
Other research studies of Persian text readability assessment are limited to the assessment of Persian text resources such as medical resources Jabbari and Saghari (2011); Ahmadzadeh and Ahmadzadeh (2015) or readability evaluation of some Persian texts translated from English Kolahi and Shirvani (2012). On the contrary, there are many research studies conducted in the field of text readability assessment for other languages such as English, French, and Chinese.
Approaches to text readability assessment can be classified into two major classes: (i) traditional approaches to text readability assessment and (ii) machine learning approaches. In this respect, traditional approaches are those composed of simple variables and metrics and are easy to compute. In other words, these approaches mostly use surface features of a text to assess its readability. In contrast, machine learning approaches employ complex and deep features alongside traditional surface features to extract more information from the text.
One of the earliest research studies of traditional text readability assessment was the work of Flesch (1943). Flesch published a readability measurement formula which was developed under contract with the U.S. Navy Kincaid et al. (1975). Later it was recalculated under the name of Flesch-Kincaid formula and became one of the most popular text readability measures. Currently, many popular text processing programs are using this formula as a built-in readability assessment criterion 12. This formula is presented in Eq. 2.
As shown in Eq. 2, the Flesch-Kincaid formula only considers the number of syllables, words, and sentences for readability assessment. Another text readability measure is Gunning Fog index Gunning (1952). This measure calculates the readability of a text by counting the number of complex words in the text. These words are provided by a list of complex words. This formula is displayed in Eq. 3.
The Chall and Dale readability formula calculates the text readability regarding the frequency of familiar words, which are provided by a list of such words. Lexile Stenner (1996) is another text readability measure. It was the first readability measure which applied word frequency as a feature to measure the readability of a text.
Another research study which employed statistical properties of a text to predict the readability level was the work of Collins-Thompson and Callan (2005). Collins used a unigram language model to measure the text readability.
Despite the simplicity of traditional approaches, they lack the accuracy required to assess the readability of a text reliably. Indeed, the main problem of traditional approaches is that they only take into account a minimal number of features of a text to assess the readability Petersen and Ostendorf (2009); Hartley (2016). The use of a limited set of features makes these algorithms less accurate. The other weakness of traditional approaches is that they need a long text to reach an accurate conclusion about the readability of a text Kidwell et al. (2011). This problem, for example, can make such approaches unable to reliably assess the readability level of texts in snippets, short chats, or other applications of short texts. Low degree of accuracy of these approaches in assessing the readability of web pages is yet another drawback of these approaches Collins-Thompson and Callan (2005); Petersen and Ostendorf (2006); Feng et al. (2009). Generally, there are a significant number of research studies which have compared the human judgment with readability formulae. They have concluded that traditional assessment of text readability can have a vast difference from human's evaluation Heydari and Riazi (2012); Begeny and Greene (2013); Crossley et al. (2017).
The next class of approaches towards text readability assessment includes machine learning-based approaches. They outperform traditional approaches thanks to their intensive use of natural language processing (NLP) features and machine learning techniques François and Miltsakaki (2012). Text readability assessment can be either regression or a classification problem; however, research studies have suggested that classification approaches can result in a better assessment of text readability Feng et al. (2010).
One of the most common classifiers used in text readability assessment studies is Support Vector Machine (hereafter SVM). The reason behind this choice is that in order to accurately assess the readability of a text, many features should be extracted from the text, which will increase the dimensionality of the classifier's input. On top of that, SVM naturally performs better on data with a high level of dimensionality compared to other classifiers such as neural networks. With that in mind, SVM seems to be an appropriate choice for text classification. The most noticeable difference between the research studies on text readability assessment using machine learning techniques is the set of features each study has utilized. Selecting features depends on the number of criteria such as the text's application, language, and many other parameters.
One of the first attempts to apply machine learning for text readability assessment was the works by Schwarm and Ostendorf (2005) and Petersen and Ostendorf (2006). They used statistical language models, average sentence length, the average number of syllables per word, parse features, and some other features to train and test their classification model. They also used some traditional readability scores such as Flesch-Kincaid readability score as an input for their model. Other studies, such as the work of Kate et al. (2010), used syntactical features of text to assess its readability, which augmented the accuracy of the assessment. Cohesive features were used in other research studies undertaken by Sung et al. (2014) and Vajjala and Meurers (2015) More recently, Putra and Tokunaga (2017) have introduced LSA feature that captures the contextual usage of words, which can be useful to assess text readability. On the other hand, a recent study has demonstrated that a more straightforward feature like word frequency can be more significant than cohesive features on improving the text readability assessment results Todirascu et al. (2016).
Unlike traditional approaches, machine learning approaches can be applied for short texts. Models designed by Vajjala and Meurers (2015) and Stajner et al. (2017) are capable of assessing the readability of texts as short as a single sentence.
Another machine learning approach to text readability assessment is the ranking-based approach. In this approach, instead of classifying the text into some readability classes, a classifier is trained to compare two texts and then decide which text is more readable than the other. Having used this classifier as a comparison function, the rank-based approach sorts out all texts in a text collection according to their readability, which can be more useful for some applications. Some examples of recent research studies introducing ranking-based machine learning models for text readability assessment are the works of Tanaka-Ishii et al. (2010), Ma et al. (2012), and Vajjala and Meurers (2014). This research paper intends to introduce a machine learning approach toward assessing the text readability for Persian text.
As mentioned in section 2, no research studies were available for Persian text readability assessment. Therefore, a Persian dataset for text readability assessment was collected in order to use machine learning to automate the Persian text readability assessment. The dataset was collected as a multi-class dataset to be applicable for classification models given the better performance of classification models compared to regression models Feng et al. (2010).
There are two significant approaches to text readability dataset collection. The first approach is to use texts labeled by text readability experts; the second approach is to crowdsource the information required for a text readability dataset. This research study selects the crowdsource approach since it is more accessible and can reflect the real readability of Persian texts as the labels have been determined by a vast number of Persian speakers. These labels are the Persian voters' opinion about the readability of the questioned Persian text.
The texts for the dataset were gathered from various sources and belonged to different topics. Some of the texts were gathered from Persian websites such as fa.wikipedia, beytoote.com, koodakan.org, tebyan.net, akhlagh.porsemani.ir, dastanak.com, shahrekhabar.ir, and zoomit.ir. Some texts were gathered from several Telegram111Telegram.org messenger channels such as Sedanet, Vivaphilosophy, and Filmosophy. Finally, this research study selected some texts from several Persian books (e.g., Akhlagh Naseri by Nasir al-Din al-Tusi, Tarikh-e Jahangosha-ye Joveini by Ata Malik Joveini, Kelileh o Demneh by Ibn al-Muqaffa, and Gulistan by Saadi Shirazi). There are two selection criteria for these sources: Firstly, most of the selected sources are common sources of online information reached by Persian users. Secondly, the sources cover a wide range of general text readability levels; from children stories to some difficult to comprehend novels. These texts cover various genres such as news, children stories, novels, sports, history, science, philosophy, and so forth. The number of texts in each genre is presented in Table 1.
|Topic||Number of texts|
In order to gather readability information from Persian speakers, Telegram messenger platform was selected. Telegram is an open-source, cross-platform messenger which is popular among Iranians. In addition to a vast number of Telegram users in Iran, Telegram messenger is capable of hosting third-party chatbots. In order to gather information for text readability dataset, a Telegram chatbot was designed. This chatbot asked Telegram users about Persian texts' readability and requested users to express their opinions about the readability of those texts. The user could submit his/her opinion by choosing among three options, including easy, medium, and hard readability. Three levels of readability were chosen since a higher number of classes might have confused the user to select the appropriate readability level. The reason is that it is not possible to set a clear and understandable definition for each readability level. Fewer readability levels are not suitable because the readability information collected from users would not be adequate to develop a machine learning model for text readability assessment, which can have useful applications in the real world.
In order to make sure each chatbot user has a clear understanding of each readability level, the number of unfamiliar words, grammatical complexity, text and sentence length, and overall understandability were introduced to them, and each readability level was roughly described using these criteria. Chatbot users were asked to evaluate each text based on these criteria. After designing and implementing such a chatbot, the bot was published in some popular Telegram channels so that Persian users could interact with the chatbot. Chatbot users were composed of two groups. The first group was initial collaborators, who were undergraduate college students. The second group was public Telegram users of different ages, genders, and levels of education.
The main shortcoming of collecting labels using crowdsourcing methods is the likelihood of human errors, malicious users, and most importantly, the disagreement between voters. In order to avoid such errors, three solutions were implemented. Firstly, the chatbot was designed to collect at least three labels per each text from distinct users. As shown in Table 2, an average of 3.5 labels were collected per each text in the dataset. Still, three labels can not avoid all the errors that could arise. Secondly, around one hundred gold standard texts were chosen. These gold standard texts were evaluated by all the volunteers. Utilizing these texts, it was possible to evaluate the reading ability of each volunteer, in addition to the detection of possible malicious users. The information gathered using these gold standard texts were noisy due to the limited number of gold standard texts that were possible to ask each voter as the engagement time of the user with the chatbot was brief. Each user was evaluated using nine gold standard texts, with three texts from each readability level. Thirdly, the user reading level metric was included. User reading level is defined as the percentage of easy, medium, and hard labels provided by a voter. Practicing user reading level, it is possible to find malicious users by finding outliers in user reading levels, in addition to having a firm understanding of each voter’s reading skill. This metric is further discussed in section 4. A vital step to ensure an accurate and quality dataset was to solely select the texts with more than 80% agreement on their labels among voters. Since the agreement percentage was rounded down, and the average number of labels per text is 3.5, most selected texts have 100% agreement on their labels. The selected portion of dataset is used to test the Persian text readability assessment model represented in section 4. The labels were gathered in approximately three months. Some information regarding the collected dataset is presented in Table 2.
|Total number of texts in the dataset||12780|
|Total number of collaborators||400|
|Average number of texts labeled by each collaborator||127|
|Average number of labels per text||3.5|
|Average text length||37 words, 173 characters|
|Total number of labels gathered||45368|
|Portion of easy labels||54 percent|
|Portion of medium labels||32 percent|
|Portion of hard labels||14 percent|
4 Proposed Approach
One of the crucial steps of designing and implementing a machine learning model is to find and select suitable features. The present research aims to design a machine learning model to automate the text readability assessment for the Persian language. The Persian language belongs to the Indo-European language family. Note that Persian is also called Farsi. One of the distinctive properties of Farsi is the extensive use of prefixes and postfixes. This property means that different meanings can be derived by adding a prefix or postfix to a word. The Persian language also has very loose grammatical rules. It suggests expressing the same meaning by multiple and different order of words. However, all these features do not make the Persian language much different from other languages in the Indo-European language family. In order to create a useful machine learning model for Persian text readability assessment, the features should be carefully selected. Due to the similarities of the Persian language and other Indo-European languages, it is possible to use the most beneficial features from other research studies aimed at text readability assessment for other similar languages in order to get desired results in this research. Therefore, a list of features was assembled from other related studies. Though there are more complex features proposed for similar models, studies have shown that more complex features have little contribution to the accuracy of text readability model Todirascu et al. (2016); François and Fairon (2012). Consequently, the more proven features such as frequency and POS language models Todirascu et al. (2016), word and sentence length, which are a part of Flesch-Kincaid formula, and other similar features were selected to guarantee desired results and other more experimental features were scheduled for further studies. The list of selected features is reported in Table 3.
|Average length of sentences in the text|
|Variance of sentences length in the text|
|Average length of sentences in the text|
|Variance of words length in the text|
|Average word n-gram model frequency (n = 1 to 5)|
|Average character n-gram model frequency (n = 1 to 5)|
|Variance of word n-gram model frequency (n = 1 to 5)|
|Variance of character n-gram model frequency (n = 1 to 5)|
|Number of sentences in the text|
|Number of words in the text|
|Number of characters in the text|
|Number of unique words in the text|
|Entropy (number of unique words divided by total number of words)|
|Average of n-max unigram model frequency words (n = 1 to 5)|
|Average of n-min unigram model frequency words (n = 1 to 5)|
|Percentage of each part of speech tagged words to the total number of words|
|Average n-gram part of speech model frequency (n = 1 to 5)|
|Variance of n-gram part of speech model frequency (n = 1 to 5)|
|User reading ability|
In Table 3, word N-gram is a sequence of words with the length of N, and character N-gram is a sequence of characters with the length of N.
Another statistical model used here was N-gram, which was a part-of-speech model. In order to design such a model, firstly, a version of Hamshahri Persian corpus AleAhmad et al. (2009) was created. In this version, every word was replaced with its part-of-speech tag. Then, a word N-gram model was created from the modified Hamshahri corpus. To calculate the average N-gram part of speech frequency of a text, each word was replaced with its part-of-speech tag. Then, it uses the previously created N-gram part-of-speech model to calculate the frequency of each N-gram.
All statistical language models were developed using Hamshahri Persian corpus (Table 3). Also, part-of-speech tagging was executed by Hazm python library 33. To achieve a higher degree of accuracy, the texts underwent a set of processes such as normalization and stopword removal in the dataset.
Furthermore, the users' reading ability was taken into account in this research study in order to increase the accuracy of predicted text readability. User reading ability is defined by the portion of easy, medium, and hard texts tagged by the user from a preselected uniform set of easy, medium, and hard texts. User reading ability was extracted from the information gathered from each chatbot user. Thanks to this feature, it was possible to assess the text readability for a particular reader by identifying the reading ability of that reader. To define a user reading ability for each data point tagged by multiple chatbot users, the average reading ability of users labeling the text was used. To ensure that these reading ability levels are uniform, there were some pre-labeled texts in the chatbot which were asked from every chatbot user in order to determine his/her reading ability.
Because of the differences in features’' scales, it was essential to perform feature scaling on data points in order to enhance the accuracy of the model. This task was performed by the tools available in the Scikit-learn machine learning python library Pedregosa et al. (2011). The processed features and the difficulty levels, derived from the chatbot, were then fed to classifiers such as support vector machine, linear support vector machine, random forest, decision tree, and Gaussian naive Bayes (hereafter GNB), which are available in Scikit-learn machine learning library. The test results are discussed in section 5.
In order to test the created model, a ten-fold cross-validation technique was used. As ten-fold cross-validation indicates, in each experiment, 90 percent of labeled texts from the dataset is used for training, and the other 10 percent is used for testing. The final results were demonstrated based on precision, recall, and f1-score measures. The equations for these measures are presented in Eq. 5, Eq. 6, and Eq. 7, respectively.
Here, if it is intended to calculate the precision, recall, and f1-score of the classifier with regard to class A, a true-positive is when the classifier predicts a text belonging to class A correctly as A. False-positive refers to when the classifier incorrectly predicts a text which belongs to other classes as a class A text. Finally, a false-negative is when the classifier incorrectly predicts a text which belongs to class A as a member of other classes.
In the conducted experiments, support vector machine, linear support vector machine, decision tree, and Gaussian naive Bayes classifiers were used with default settings. Random forest classifier was used with 50 estimators, with the test results shown in Table 4. The reported precision, recall, and f1-score are weighted measures, indicating that the total precision, recall, and f1-scores are a weighted average of each class's precision, recall, and f1-score. The weights are the number of data points in each class. Because of the unbalanced number of data points in each class in the gathered dataset (Table 2), the effect of the precision, recall, and f1-score of each class on final results is different.
|Classifier||Precision (train/test)||Recall (train/test)||F1-score (train/test)||ROC_AUC (train/test)|
As shown in Table 4, most classifiers had high precision, recall, and f1-score. Linear support vector machine outperformed other classifiers which yielded f1-score of 0.9. These results suggest that this model could accurately label Persian texts by their readability level. The precision, recall, and f1-score of random forest and decision tree models in training were 1, which indicates overfitting in these models. In order to have a more in-depth insight into the performance of the classifiers, a class level classification report of SVM classifier results has been displayed in Table 5.
|Class||Precision (train/test)||Recall (train/test)||F1-score (train/test)|
As reported in Table 5, the support vector machine model could effectively classify texts in easy and hard classes. However, the result of the medium class was different. Model's precision in medium class was high, but its recall was lower than the recall of other classes.
To further analyze the problem of medium class recall, the features extracted from the dataset were visualized. A visualization tool from Tensorflow library Abadi et al. (2015), called Embedding Projector, was used to visualize the text readability dataset. The Embedding Projector employed the t-SNE Maaten and Hinton (2008) technique to reduce the dimensionality of the dataset. The t-SNE was applied to visualize a dataset with a high number of dimensions in a 2- or 3-dimensional space. The visualization results are demonstrated in Figures 4 to 4.
Figure 4, blue reveals easy class data points, green represents medium class data points, and red reflects hard class data points. Further, in Figures 4 to 4, 0 denotes easy class data points, 1 shows the medium class data points, and 2 reveals hard class data points. As depicted in Figures 4 to 4, some data points from texts in the medium class were mixed with the data points of other classes. The problem of low recall in the medium class was not resolved here. Nonetheless, other studies have shown that the medium class has been highly opinion-based and been heavily dependent on a reader's definition of a text with medium readability. In order to improve medium class recall, a more concrete definition or new features for the medium readability class are required to capture each reader's opinion on the definition of medium class.
6 Conclusions and Future works
In this paper, two important goals were fulfilled: (i) the first Persian text readability dataset was gathered using a novel solution; and (ii) the first machine learning model for Persian text readability assessment was introduced. The machine learning model introduced in this research had high accuracy and could be employed in many applications such as text simplification, automated medical and educational text assessment, finding suitable content for second language learners, and so forth. Future research will focus on investigating and introducing new features such as LSA and TFIDF, in order to improve the accuracy of the proposed model. On top of that, the large number of texts in the gathered dataset makes it suitable for the implementation of deep learning models on text readability assessment, which could be another interesting future study.
- TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from tensorflow.org External Links: Cited by: §5.
- Assessing the readability level of patient educational resources distributed in shiraz health centers by flesch dayani formula. Journal of Modern Medical Information Sciences 1 (2), pp. 62–69. Cited by: §2.
- Hamshahri: a standard persian text collection. Knowledge-Based Systems 22 (5), pp. 382–387. External Links: Cited by: §4.
- CAN READABILITY FORMULAS BE USED TO SUCCESSFULLY GAUGE DIFFICULTY OF READING MATERIALS?. Psychology in the Schools 51 (2), pp. 198–215. External Links: Cited by: §1, §2.
- Readability revisited: the new dale-chall readability formula. Brookline Books. Cited by: §2.
- Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology 56 (13), pp. 1448–1462. External Links: Cited by: §2.
- Predicting text comprehension, processing, and familiarity in adult readers: new approaches to readability formulas. Discourse Processes 54 (5-6), pp. 340–359. External Links: Cited by: §1, §2.
- A criteria for assessing the persian texts’ readability. Journal of Social Science and Humanities 10, pp. 35–48. Cited by: §1, §2.
- Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL '09, External Links: Cited by: §2.
- A comparison of features for automatic readability assessment. In Coling 2010: Posters, pp. 276–284. External Links: Cited by: §2, §3.
- Marks of readable style; a study in adult education.. Teachers College Contributions to Education. Cited by: §2.
-  (2018-09) Flesch–kincaid readability tests. Wikimedia Foundation. External Links: Cited by: §2.
- An ai readability formula for french as a foreign language. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 466–477. Cited by: §4.
- Do nlp and machine learning improve traditional readability formulas?. In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pp. 49–57. External Links: Cited by: §2.
- The technique of clear writing. McGraw-Hill, New York. Cited by: §2.
- Is time up for the flesch measure of reading ease?. Scientometrics 107 (3), pp. 1523–1526. External Links: Cited by: §2.
- Readability of texts: human evaluation versus computer index. Mediterranean Journal of Social Sciences 3 (1), pp. 177–190. Cited by: §1, §2.
- A comparison between the difficulty level (readability) of english medical texts and their persian translations. International Journal of English Linguistics 1 (1). External Links: Cited by: §2.
- Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 546–554. External Links: Cited by: §2.
- Statistical estimation of word acquisition with application to readability prediction. Journal of the American Statistical Association 106 (493), pp. 21–30. External Links: Cited by: §2.
- Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report Defense Technical Information Center. External Links: Cited by: §1, §2.
- A comparative study of the readability of english textbooks of translation and their persian translations. International Journal of Linguistics 4 (4). External Links: Cited by: §2.
- A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty. International Journal of Medical Informatics 82 (8), pp. 717–730. External Links: Cited by: §1.
- Ranking-based readability assessment for early primary children’s literature. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 548–552. External Links: Cited by: §2.
- Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §5.
- How readability shapes social media engagement. Journal of Consumer Psychology 29 (2), pp. 262–270. Cited by: §1.
- Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §4.
-  (2018-09) Persian language. Wikimedia Foundation. External Links: Cited by: A Machine Learning Approach to Persian Text Readability Assessment Using a Crowdsourced Dataset.
- Assessing the reading level of web pages. In Ninth International Conference on Spoken Language Processing, Cited by: §2.
- A machine learning approach to reading level assessment. Computer Speech & Language 23 (1), pp. 89–106. External Links: Cited by: §2.
- Evaluating text coherence based on semantic similarity graph. In Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing, pp. 76–85. Cited by: §2.
- Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL '05, External Links: Cited by: §1, §2.
-  (2017-08) Sobhe/hazm: python library for digesting persian text. Note: Retrieved September 1, 2017 External Links: Cited by: §4.
- Automatic assessment of absolute sentence complexity. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, External Links: Cited by: §2.
- Measuring reading comprehension with the lexile framework.. ERIC. Cited by: §2.
- Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning. Behavior Research Methods 47 (2), pp. 340–354. External Links: Cited by: §2.
- Sorting texts by readability. Computational Linguistics 36 (2), pp. 203–227. External Links: Cited by: §2.
- Are cohesive features relevant for text readability evaluation?. In 26th International Conference on Computational Linguistics (COLING 2016), pp. 987–997. Cited by: §2, §4.
- Assessing the relative reading level of sentence pairs for text simplification. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, External Links: Cited by: §2.
- Readability assessment for text simplification: from analysing documents to identifying sentential simplifications. Recent Advances in Automatic Readability Assessment and Text Simplification 165 (2), pp. 194–222. External Links: Cited by: §2.
- Text readability assessment for second language learners. arXiv preprint arXiv:1906.07580. Cited by: §1.