A Review on Non-Supervised Approaches for Cyberbullying Detection

—The current period is dominated by social networks in daily life. Despite the several efforts and research practices done to overcome the issue of cyberbullying, it still presents a problem. Despite the fact that social networks are useful for social gathering and communication, they also present new opportunities for harmful criminal acts. Cyber-harassment is an example that is enabled through the mistreatment and abuse of the internet as a means of harassing or bullying others virtually. To minimize these occurrences, research into computer-based methods has been performed to detect cyber-harassment. This literature survey shows that supervised learning methods were mostly used for cyber-bul-lying detection. Moreover, some non-supervised methods and other techniques have also shown to be effective in terms of accuracy towards cyber-bullying detection. This paper, therefore, surveys existing recent research on non-supervised techniques as well as it summarizes accuracy results obtained from several papers to discuss the significance of non-supervised learning approaches in comparison with traditional learning methods.


Introduction
In 2011, Jamey Rodemayer [1], an American adolescent aged fourteen, killed himself because he has faced cyberbullying for years due to his sexuality. In 2012, the young Canadian adolescent Amanda Todd [2], murdered herself because of constant bullying online. In 2017, the Swedish adolescent model/artist Arvida Bystrm [3] was subjected to cyber-bullying after showing in a media advertisement with hairy legs then subsequently, she started receiving rape threats. In 2018, the Australian Adolescent Dolly Everett [4] murdered herself after being a victim of cyber-harassment. The cases aforementioned are far from what happens every day due to social media. Since the internet was introduced, new sociological phenomena have risen. Cyberbullying is one disturbing development that has followed. More frequently, a suicide case is reported on the media due to cyber-harassment. Anti-harassment groups are trying to raise awareness on the problem but the issue of cyberbullying, unfortunately, persists and causes intense harm to a person's social, mental and physical well-being. Not only does it have a negative impact on the victim, but it leaves a scar on these individuals leading them to a phase of no return, such as suicide or depression.
A recent survey is be found in [5]. Unlike supervised machine learning techniques used to identify textual cyberbullying, unsupervised techniques are being more common. Importantly, predator-victim conversations tend to express themselves in slang language which makes it hard to grasp for a computer, then leads to the detection process being more complicated. Nevertheless, for all the measures to control or restrict cyber-bullying prevalence, internet abusers may find a subtle way to remove that regulation.
Another issue that faces the scientific society is that high quality features of data sets are lacking. People may lie on the internet, for example, about their age, therefore the dataset will more likely to contain false content. Also, data annotation is a problem that requires a human being's manual labeling to mark whether or not the text indicates cyberbullying. Annotators, in particular, may have different views on which sample is passed as cyberbullying. Importantly, these data-sets are suffering from class-imbalance when the # of pure texts is a lot higher than the aggressive cyberbullying texts. It serves to make clear that all the steps to remove or reduce it issue still require more effort, as the internet leaves plenty of space for cyberbullying to take place.
These issues have inspired scientists and researchers to explore tools and techniques that were not supervised, including semi-supervised, deep learning & unsupervised learning techniques. The paper reviews the methods examined by the authors, including support vector machines SVMs and neural networks like the recurrent RNNs, convolution CNNs, "long short-term memory" LSTMs, and other models of deep neural networks. Within this paper, non-supervised learning applies to approaches which do not need labelled training data exclusively. Common techniques such as the Gaussian naïve bayes, Linear SVMs and Linear Regression are also excluded. This paper is divided into sections: the coming section explores the methods used in non-supervised approaches, including auto-encoders, un-supervised learning, semi-supervised, deep learning, and modeling & clustering of time series. Section four summarizes the survey and offers a # of recommendations in this field for future research. Section five gives a summary of the results and the conclusion is given in the final section.

Non Supervised Approaches Survey
Six methods reveal an exploration of scientific papers which have used different methods other than supervised techniques for textual content in cyberbullying detection. These are unsupervised approaches to learning, deep learning, autoencoders,clustering of time series and semi-supervised modeling. The following subsections describe each of these approaches.

Unsupervised approaches
Authors in [13] built on Facebook, YouTube and Formspring an unsupervised design model influenced by Rising Hierarchical SOMS. to detect traces of bullying. In order to find examples of abusive content in a text, they conducted semantic and syntactic research on the signs of bullying. Their solution resisted manual marking because they kept away from the model of the bag of words BOWs because it does not include the word place. Alternatively, they considered the characteristics of emotional sentiment traces and sentimental examination to construct a model based on semantic, sentimental social and syntactic characteristics. In addition, they made use of syntactic features such as density and vulgarity, upper case letters and exclamation marks. For semantic features, bigrams and trigrams of words were used. They considered the emotional polarity of a sample & emojis when it came to sentimental characteristics.
Lastly, it was found that author profiling and direct user tagging for social features. In addition, the authors found limited oversight in [6] structured in key phrases given by experts that were characterized of bullying. They used twitter, ask.fm and Instagram to detect bullying by containing a two-learner-ensemble. One learner will analyze the nature of the information in the message and the other will find the social structure. The authors introduced distributed-word and graph-node representations through training of nonlinear deep models. They identified two types of harassment detection classifiers: classifiers of messages and classifiers of user relationships. User classifiers display whether another user is being threatened by someone. The classifiers take the texts as their feedback hence identify it as an indication of harassment. Regarding message classifiers, four learning features were used: a bag of n-grams model, Bag-of-words, RNN and embedding.

Auto-encoders
The semantic marginalized stacked denoising autoencoder (SMSDA) was used on MySpace & Twitter by the authors in [7], [8], [9] and [10]. Nevertheless, to accelerate research, these authors in [10] followed a linear rather than a nonlinear approach. The writers used demographics, text and social features in [7] to identify cyber-bullying messages. For parameter optimization, they made use of "Fuzzy rules" to label cyberbullying messages. They also used a genetic algorithm to perform the detection. They extracted characteristics such as adjectives, nouns & pronouns from the output and formulated their results on frequent word occurrences. In [8], writers represented "bag-ofwords" as inputs in order to learn robust features. They made use of word embedding and expert knowledge for the first stage; as the list of abusive words was then compared with the features in the corpus. For semantic similarity testing between words, they made use of the cosine similarity lying between word embeddings. They used "Fisher ratings" to pick bullying features for subsequent layers. However, the writers in [9] have explored that semantic info can boost their cyberbullying detection performance as their path towards SMSDA gained a strong improvement in performance.

Deep learning
The writers in [11] applied a neural network with 4 hidden layers for cyberbullying detection. The writers in [12] have been influenced by deep learning approaches using a hybrid CNN-LSTM , a simple CNN, and a mixture of DNN, LSTM and RNN. They trained three Twitter, Formspring and Google-News text representations using the "word2vec" model. They used word embedding measurements, but their input data did not contain any feature engineering, which was the strong point in their research. The writers featured in [13] tested an "ensemble" technique consisting of 2 deep learning neural models: first, a CNN capturing low level syntactic info from their character sequence. Second, (LRCNN) capturing high level semantic information with the use of word embedding generated from sequences of words. After extracting word vectors, the temporal and spatial features of cyber-bullying messages were based on convolution pooling operations. They considered each word sequence, along with letter sequences. They summarised commentary on cyberbully into syntactic features as indicative informal words. A CNN approach was followed by authors in [28] to detect cyberbullying in commentary. While they considered features such as text and images, other novel features such as subjects decided from image captions were leveraged. They gathered image captions created by users, detailed info about the person that posted the material, such as the number of people they follow, the total number of posts, their followers and usernames. They made use of word bags containing pronouns indicating grammatical and bullying dependencies. They structured the data with the objective of clustering similar images that might match similar bullying signatures.

Semi-supervised learning
The writers in [29] discovered that supervised techniques is not feasible in real-world applications. They adopted a new semi-supervised Perverted Justice data method to alleviate this issue by implementing a "one-class SVM" anomaly detection technique that doesn't need non-predatory samples for training. A semi-supervised approach was also adopted by authors in [21], but on a selection of social networks. Using "kernel fuzzy c-mean clustering" in order to apply an SVM fuzzy algorithm, they augmented training data samples to account for unclear or irrelevant features. For each comment, they made use of linguistic features as tonality and swear words. They used capital letters to detect anger-like emotions. They also looked at the existence of pronouns.

TSM time series modelling
The authors in [13] and [14] adopted a "Time Series Modeling" TSM approach to data extracted from PJ, in which predator posts (i.e. questions) were linked to numerical labels. Authors in [13] used SVD Singular Value Decomposition for reducing dimensionality. They analyzed questions from each bully in each conversation, then modeling it as "multivariate time series." Then, they changed their scheme of representation from time series data to symbolic string representation in their later work [14]. They were influenced by the "Multiple Sequence Alignment" (MSA) method which is widely used in the identification of DNA Series. They interpret the signal collection by means of a symbolic aggregate approximation (SAX), which converts signals into sequences of strings. The sequence result is fed in as the input to the MSA Algorithm, which tries to reveal areas of similarities between each of the the examined sequences and thus to extract hidden patterns from within the original time series. Researchers defined each predator's behavioral patterns and used them to correctly cluster the data set as input attributes. Authors formulated cyber-bullying identification as a "sequential hypothesis testing" problem in [15]. Their algorithm tests each function sequentially, starting with the highly informative and then determining the time to stop. As soon as it stops, a media session can be classified.

Clustering
The writers in [16] and by using K-mean clustering, [17] used deep learning on Twitter to find traces of bullying. Nevertheless, eight clusters of tweets found in [16] representing the function or category of words related to bullying, grouping clusters with similar characteristics. The writers in [17] categorized messages into clean and abusive. However, the writers in [18] used another technique for clustering, which is the Apriori Algorithm to analyze abusive tweets but in Malay language.

A Common Pattern
A common pattern was observed between writers. It is specific techniques are required for each platform. For instance, a network such as Instagram that features a lot of complex images as well as Japanese or Arabic data sets which makes them more complicated for the machine to grasp; and therefore, it is better to use deep learning algorithms to combat the issue of cyber-harassment. As for Ask.fm and youtube, both sites featuring a number of informal words, Unsupervised methods are likely to be used. A lot of authors have decided to use Semi-Supervised Approaches for multiple social networks. This makes it harder for the identification of different strategies to get into the system by combining data sets from different social networks. Most writers have decided to use auto-encoders for a site such as MySpace. Because of MySpace's unique nature, machine bugs could be identified, driving criminals to other sites on social accounts. Ultimately, for twitter, a many writers have agreed on the clustering Twitter platform because the twitter nature varies from other social networks in that text is limited in duration & time. In addition, each social network needs attention from the others to effectively classify bullies as clustering work together does not distinguish network exploitation from the other, making bullying victims potentially simpler for the hacker or abuser.

Analysis of Results
It can be said from this survey that unsupervised methods of machine learning could be useful in improving performance. From the table, we can see that the highest precision was 0.98 when applying unsupervised learning using lexical syntactic analysis as matching rules features. If we compare semi-supervised methods, 0.73 was achieved with the highest precision, which is not bad. The other techniques showed various re-sults ranging from 0.65 to 0.96. The table also reveals, however, that unsupervised techniques are performing much better than previous results. See table 1 for summary of results.

Conclusion
Detection of digital cyberbullying is a difficult activity to manage and resolve. It is because the Internet criminals have too many ways to manipulate the system to deliver malicious messages successfully without being caught. Cyber-bullying is a life-threatening crime for individuals. This can cause the victim to commit suicide, as indicated in this research. It is an excruciating experience for any person exposed to it, and therefore, its detection is necessary. Non-supervised methods for cyber-bullying prevention were discussed in this paper as there was not much emphasis on such strategies. Clearly, papers have emphasized on supervised learning techniques, and very few have tested non supervised approaches which leaves more room for research, and more room for harassers to manipulate victims. While supervised approaches have previously been dominant in the detection of cyberbullying, unattended approaches are increasingly gaining attention. It can also be said that supervised techniques do not deal with class inequality while unsupervised approaches can overcome this. The paper investigated recent research on non-supervised approaches towards cyberbullying detection on text and suggests some areas for future considerations. It also highlights the gravity and severity of cyber harassment which highlights a need for more research on non-supervised techniques. Gerard McKee is a graduate of the University of Manchester, UK, receiving both his BSc in Electronics and PhD from the same. He is a member of the IEEE, IET, ACM and AAAI and a Fellow of the British Computer Society. His research interest is in the general areas robotics and artificial intelligence, with particular focus on networked robotics, space robotics and human-robot interaction, and more recently on swarm robotics systems and high performance computing. Email: Gerard.mckee@bue.edu.eg Article submitted 2020-03-13. Resubmitted 2020-04-27. Final acceptance 2020-04-28. Final version published as submitted by the authors.