Detection of Depression Using Machine Learning Algorithms

csanthosh@kluniversity.in Abstract— Online media outlets such as Facebook, Twitter, and Instagram have forever altered our reality. People are now more connected than ever be-fore, and they have developed such a sophisticated identity. According to ongo-ing research, there is a link between excessive usage of social media and depression. A mood illness is known as depression. It's defined as sadness, loss, or anger that interferes with a person's day-to-day activity. For different people, depression expresses itself in a number of ways. It might cause disturbances in your daily routine, resulting in missed time and lower productivity. It can also affect relationships as well as some chronic conditions. It has evolved into a serious disease in our generation, with the number of those affected increasing by the day. Some people, on the other hand, can confess that they are depressed, while others are utterly ignorant. On the other hand, the great majority Social media has evolved into a "diary," allowing them to share their mental


Introduction
The expansion of internet and communication technologies, particularly online social networks, has revitalized people's electronic interactions and communication. Facebook, Twitter, Instagram, and other social media platforms not only hold textual and multimedia information, but also allow users to express their feelings, emotions, and sentiments about a topic, subject, or issue online. On the one hand, this is excellent for users of social networking sites to openly and freely share and comment to any issue online; on the other hand, it allows health professionals to gain insight into what might be going on in the mind of someone who replied to a topic in a particular way. Machine learning techniques could potentially offer some unique features that can assist in examining the unique patterns hidden in online communication and processing them to reveal the mental state (such as "happiness," "sadness," "anger," "anxiety," and "depression") among social network users to provide such insight. Furthermore, there is a growing corpus of studies addressing the significance of social networks in the form of social interactions such as breakup relationships, mental illness ('depression,"anxiety,'bipolar,'etc.), smoking and drinking relapse, sexual harassment, and suicidal thoughts [1,2].

Literature survey
This would include a review of depression, existing depression detection systems that employ a variety of ML algorithms and known research gaps that will be addressed in the proposed project. There are several various aspects in this area. Depression is a mental health disorder that has grown in popularity as a topic of conversation in the context of everyday health concerns [3]. A large number of people suffer from the negative impacts of depression, yet only a small percentage receives adequate therapy each year. They also investigated the idea of using social media to detect and assess any signs of serious depression in people. They measured behavioral credits associated with social engagement, feeling, dialect and semantic styles, sense of the self-system, and mentions of antidepressant drugs through their web-based social networking postings. Ignoring depression symptoms or neglecting to treat depression can have serious implications that put one's life in jeopardy [4]. Depression is produced by a complex mix of social, biological, and psychological factors in its early stages. Depression can be caused by a variety of serious and complex conditions. Clinical depression and bipolar disease are the two most frequent types of depression, with clinical depression and bipolar illness being the most common [5]. On the basis of dataset exploration machine learning aids in discovering fascinating patterns and information. Previous depression research has relied on publicly accessible social media data. The researchers used emotional and linguistic forms of word usage to conduct their research. The classification was also carried out using the SVM technique with various kernels [6].

2
Proposed methodology 2.1 Work-flow ─ Step 1: In this project the dataset is feed to the model using pandas library. ─ Step 2: The Data visualization is done by using the library called seaborn. ─ Step 3: we have used re-A regular expression (or RE) describes a collection of strings that match it; the methods in this module let you to verify whether a given text fits a regular expression (or if a given regular expression matches a particular string, which comes down to the same thing). ─ Step 4: Trained the algorithm using NLTK library,The model will understand the input data given and respond accordingly. ─ Step 5: By using wordcloud library, the model is trained with depression words and not depressed words. ─ Step 6: Applying Logistic regression and Naives bayes multinomial model and checking the accuracy. ─ Step 7: Checking the output by giving the input data to the model.

Project execution
This project was completed in JSON (JavaScript Object Notation). It's an open standard file format and data exchange format that stores and transmits data objects made up of attribute-value pairs and array data types using human-readable language.

Procedure
 Open command promt (windows+r and type cmd and enter).  Type jupyter notebook and enter.  Then jupyter notebook will be opened in the respective browser.  Select the file of data and upload it, then click on new, open python3.  Load the libraries and data and execute the code as per the commands.

Libraries
NLTK. Natural Language Toolkit is a collection of natural language tools. It's a collection of statistical language processing libraries and applications. It's one of the most powerful NLP libraries, featuring packages for teaching robots to comprehend human language and respond appropriately [7].
Word cloud. You've probably seen a cloud packed with several words of varying sizes that signify the frequency or significance of each word. The Tag Cloud or Word Cloud is what this is termed. The magnitude of each word represents its frequency or relevance in a word cloud, which is a data visualization tool for visualizing text data. A word cloud can be used to emphasize important textual data points. Data from social networking websites is frequently analyzed using word clouds [8].
Pandas. Pandas is a tool for manipulating large amounts of data at a high level. It's based on the NumPy package. The Data Frame is its primary data structure. Data Frames are a type of tabular data that may be stored and manipulated in rows of observations and columns of variables. It includes data structures and methods for manipulating numerical tables and time series, in particular. Pandas is a widely used open source Python library for data science, data analysis, and machine learning activities. It is developed on top of Numpy, a library that supports multi-dimensional arrays [9].
Seaborn. Seaborn is a Python data visualization tool based on the matplotlib library. It includes a high-level interface for building aesthetically appealing and educational data visualizations. Data frames and the Pandas library are simple to use with Seaborn. The graphs that have been made can be readily altered [10].
Regular expression. The methods in this module allow you to check whether a supplied text matches a regular expression(re) (or if a given regular expression matches a particular string, which comes down to the same thing). A regular expression is a particular sequence of characters that uses a specialized syntax to help you match or locate other strings or collections of strings. In the UNIX realm, regular expressions are commonly employed [11].

Logistic Regression Model (LRM)
 Type of analysis can help you predict the likelihood of an event happening or a choice being made.  Logistic model is used to model the probability of a certain class event existing such as win/lose or healthy/sick.  Supervised learning classification algorithm used to predict the probability of a target variable. Figure 1 shows the hysteresis curve of LRM.

Naive bayes model
 Naive Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast ML models that can make quick predictions.  Naive Bayes classification is a form of supervised learning.  It was initially introduced for text categorization tasks.  wide variety of classification tasks like sentiment prediction.  It is easy and fast to predict the class of the test data set. It also performs well in multi-class prediction. Naives bayes multinomial model. Multinomial Naive Bayes algorithm is a probabilistic learning method that is mostly used in Natural Language Processing (NLP). Naive Bayes classifier is a collection of many algorithms where all the algorithms share one common principle, and that is each feature being classified is not related to any other feature.
Where P(c|x) = the posterior probability of the class, C is the target and predictor, and x is attributes. P(c)= the class's prior probability P(x|c) = The probability of predictor per class, the class known as the likelihood P(x) = Predictor's prior Probability.

Flow chart
In this study, for the detection and processing of depression data received as Twitter posts, we first concentrated on four types of factors: emotional process, temporal process, language style, and all (emotional, temporal, linguistic style) aspects together. We then use supervised machine learning techniques to investigate each factor type separately. 'Decision tree,' 'k-Nearest Neighbor,' 'Support Vector Machine,' and 'ensemble' are considered appropriate classification approaches for each category. The flowchart of work is shown in the Figure 2. Output (1): Output (2): As we can observe in the above images, The model analyzed the given input data and predicted the person is depressed, The model is able to predict, whether the person is depressed or not depressed.
Case 2: Using the non-depressed data as input to the model. Input (1): Output (1): Output (2): As we observe in the above images, the model gave the output as not depressed. The model will predict whether a person is depressed or not depressed based on the input data given to the model.
The output findings for the 1000 tweets dataset that was categorized using the Nave Bayes algorithm [12][13][14][15]. The data is accurately classified in 92.34 percent of the cases. The output findings for the 1000 tweets dataset using Logistic Regression classification. The data is accurately classified in 92.34 percent of the cases [16].
The output findings for the dataset that was categorized using the Nave Bayes algorithm. The results demonstrate that 97.31% of the data has been accurately categorized [17]. The output findings for the 3000 tweets dataset using Logistic Regression classification. The results demonstrate that 97.31% of the data has been accurately categorized.

Conclusions
In this paper, we demonstrated the ability to use tweeter as a tool for assessing and detecting serious depression among its users in this research. Several research problems were outlined at the beginning of this paper to provide a clear picture of this work. The research issues are revealed by the analytics done on the chosen dataset. The following is a synopsis of our findings: While we all experience mood swings, sadness, or depression from time to time, few people experience these feelings on a regular basis, for long periods of time (weeks, months, or even years), and for no obvious reason. Despondency is more than just a bad mood-a it's real illness that affects a person's bodily and mental well-being. Depression can strike anyone at any time. Some periods or situations, on the other hand, render us more sensitive to depression. Growing older, losing a loved one, starting a family, and retiring can all cause physical and mental changes that might contribute to depression in a small number of people.