Paper— Autism Spectrum Disorder Classification Using Deep Learning Autism Spectrum Disorder Classification Using Deep Learning

— The goal of this paper is to evaluate the deep learning algorithm for people placed in the Autism Spectrum Disorder (ASD) classification. ASD is a developmental disability that causes the affected people to have significant communication, social, and behavioural challenges. People with autism are sad-dled with communication problems, difficulties in social interaction and displaying repetitive behaviours. Several methods have been used to classify the ASD from non-ASD people. However, there is a need to explore more algorithms that can yield better classification performance. Recently, deep learning methods have significantly sharpened the cutting edge of learning algorithms in a wide range of artificial intelligence tasks. These artificial intelligence tasks refer to object detection, speech recognition, and machine translation. In this research, the convolutional neural network (CNN) is employed. This algorithm is used to find processes that can classify ASD with a higher level of accuracy. The image data is pre-processed; the CNN algorithm is then applied to classify the ASD and non-ASD, and the steps of implementing the CNN algorithm are clearly stated. Finally, the effectiveness of the algorithm is evaluated based on the accuracy performance. The support vector machine (SVM) is utilised for the purpose of comparison. The CNN algorithm produces better results with an accuracy of 97.07%, compared with the SVM algorithm. In the future, different types of deep learning algorithms need to be applied, and different datasets can be tested with different hyper-parameters to produce more accurate ASD classifications.


Introduction
Human cerebrum is a very complicated biological organ because the neural congregations inside the cerebrum synchronise and shape practical affiliations that can be designed into a system. The formed systems share highlights with different systems from organic and physical frameworks, and thus inalienably can be delegated complex systems (Jamal et al., 2014). ASD is known as a brain disorder, and here are the symptomatic characteristics: difficulties in social communication; exhibiting restricted, stereotypic patterns, and repetitive behaviours; a lack of interest in social activities (Zaky, 2017). Autism is no respecter of people; it occurs in all socioeconomic, ethnic, and racial groups. The disorder is about 4 or 5 times less common among girls than boys, who seem to possess a strong genetic component of this disorder. The symptoms of autism can be summarised as follows: repetitive behaviours, excessive interest in certain actions such as moving object and spinning object; poor eye contact, less attention to listen or look at other people, difficulties in conversation, no response to their names, repeating phrases that they hear, and having problems in understanding other people's views (M., Prasath., & K., 2015).
Deep learning is considered a specific field of machine learning, and it is also called artificial neural networks. It is a large neural network which utilises a data model with a complex structure that combines distinct non-linear transformations. According to LeCun (2015), the deep learning algorithm improves the performance of several latest artificial intelligence tasks, such as machine translation, speech recognition and object detection. Deep learning allows computational models, which consist of various preparing layers, to learn portrayals of information with numerous dimensions of reflection. Besides that, deep learning also solves many of the complicated artificial intelligence problems through the deep architecture nature (Bengio, 2009). Deep learning is good at determining complex structures with high-dimensional data. Consequently, deep learning is applicable to solving problems and resolving issues in many domains of government, business and science.
From the previous research done by several researchers, the ASD and non-ASD have been classified using the 128-channel EEG signals extracted. The research discovered that the most extreme happening synchro state holds the best perceiving data, and its measured quality list can be considered as an exceptional biomarker for the discovery of chemical imbalance (Jamal et al., 2014). Figure 1 shows the classification of ASD with the fMRI data done by previous researchers.
According to Nielsen et al. (2013), research on multisite functional connectivity MRI classification of autism has been done and here is the conclusion: classification over different destinations needs to suit extra wellsprings of change in subjects, examining methods and equipment, in contrast to the single-site datasets. The research used the NITRC dataset to classify the autistic and non-autistic according to the brain connectivity measurements. Based on the research done by Heinsfeld (2018), such variety adds commotion to the cerebrum imaging information that challenges the capacity to draw marks from the mind actuation, which is able to order infection states. However, the accomplishment of a dependable arrangement precision in spite of such clamour created from various pieces of hardware and socioeconomics indicates a guarantee for the applications of machine learning to clinical datasets of mental issue identification. According to Koyamada et al. (2015), Deep Neural Networks (DNN) can be utilised to examine brain states through measurable brain activities. Compared with supervised learning methods like Support Vector Machine and Linear Regression with an average accuracy of 47.97%, Deep models achieved better results with an average accuracy of 50.74%. These figures show that deep learning algorithms perform better than other methods like Support Vector Machine and Linear Regression.
Nowadays, the problem of ASD is getting worse, and ASD becoming more prevalent. According to Catherine Lord et al. (2018), ASD is more related to genetic component than other causes, and it is used to illustrate the difficulties in social communication and repetitive sensorimotor behaviours in early appearing. According to O'Reilly B, Wicks K (2016), the lifestyle of people who have autism will be affected such as the occupational, social, or other vital areas of functioning. Besides that, autism affects the whole lives of individuals and frequently causes emergence of other health issues, like sleep disturbances, epilepsy, and gastrointestinal problems (National Institutes of Health, 2016). The heterogeneity among ASD people and symptoms of ASD can change periodically (Lord et al., 2016). Based on a study of Plitt et al. (2015), the assessment using rs-fMRI data to classify ASD still fails to attain the biomarker standards, and the issue was not resolved in previous studies.
In this research, the classification of ASD has been done using the deep learning algorithm based on the neuroimaging data of the patients. The neuroimaging data are retrieved from the Neuroimaging Tools and Resources Collaboratory (NITRC) website. The aim of this research is to pre-process the data of ASD, and illustrate how deep learning can be applied to ASD classification, and to assess the effectiveness of the algorithm based on the accuracy criteria. In fact, the identification of ASD using deep learning has seldom been investigated and studied by previous researchers. Therefore, this study contributes to filling the research gap for further understanding of ASD and deep learning algorithms. Besides that, it also illustrates the applicability of deep learning methods in neuroscience research projects. The rest of this paper discusses the methods used in this study, findings and analysis, and conclusions.

Methods
This section discusses the general framework of using the deep learning method to identify ASD. A general framework is an investigative apparatus with a few settings and varieties. It tends to be connected in different work classes where a general depiction is vital. In the general framework, there are some phases. The first phase is data pre-processing, which is to prepare data for implementation. The second phase is features extraction and training, followed by the third, which is testing. Then, the output is generated and shown in a statistical graph. Lastly the performance analysis of the algorithm is carried out. Figure 2 shows the general framework of the deep learning method to identify ASD. In the general framework, data pre-processing is needed for implementation. Data pre-processing is a procedure which transforms the original data into a format that is understandable. In this research, the image data are retrieved from the Neuroimaging Tools & Resources Collaboratory (NITRC) website. NITRC is an award-winning free web-based resource, which provides comprehensive information on an ever-expanding scope of neuro-informatics software and data. NITRC also gives unrestricted access to the stored data, and empowers pay-per-use cloud-based access to boundless processing power. Figure 3 shows the anatomical range of the rs-fMRI data. After collecting the data, the image data are separated into training and testing datasets. The proportion is about 80% data for training, and 20% data for testing. The ratio for the training-testing data split is 8:2. The next task is the normalisation of the data through the Jupyter Notebook. To normalise the image data, an image data generator is imported from the Keras library. Then, it is time to rescale the data. The rescale size is 1. /255 and every pixel value is transformed according to this range, [0,255] > [0,1].
After data pre-processing, the CNN (Convolutional Neural Network) algorithm is applied to identify the ASD. The Convolutional Neural Networks is like the traditional Artificial Neural Network (ANN). Both CNN and ANN include the neurons that selfoptimise through learning (Shea, 2015). CNN contains 3 layers, which are convolutional, pooling, and fully connected layers. To detect the ASD from resting-state fMRI brain imaging images, the Keras library is needed. Keras is a library for deep learning, which has high-level neural networks API, and it can run on Tensorflow, CNTK or Theano. To get the best accuracy performance, several experiments of changing the hyper-parameters are carried out, to choose the best variables for determining the network structure. This study conducts the experiments of changing the number of epoch and the batch size of the algorithm. The findings are shown in the table and statistical graph for comparison, and the best variables are chosen. To assess the effectiveness of the CNN algorithm in detecting ASD, a comparative study is conducted to compare the accuracy performance of the CNN algorithm with other standard algorithms. The results of both algorithms are shown clearly for evaluation.
Hyper-parameters are the variables that determine the network structure. Bengio (2012) explained that the choice of hyper-parameter values is essentially similar to the problem of model selection, i.e. given a family or group of learning algorithms, how do we choose the most appropriate one within the group? For a learning algorithm Z, we state a hyper-parameter as a variable to be set before the actual application of Z to the data, one that is not explicitly selected by the learning algorithm.
In addition, there is no one-stop solution to figuring out how hyper-parameters can be changed to achieve a better loss; it is normally done through trial and error. While defining a machine learning model's architecture, encountering an optimal one is usually not obvious Bergstra & Bengio (2012).
Therefore, in our study, these hyper-parameters are set based on the best results obtained after each experiment.
The comparison of the number of epochs and batch size is shown in Table 1 and Table 2.  15. In this study, 15 is the best number for the epoch and it will be used for CNN algorithm to get the best performance on detecting the ASD. According to Table 2, the best batch size is 32, with the highest accuracy of 96.97%. Thus, the batch size 32 is used for the CNN algorithm to obtain the best accuracy in detecting the ASD.
This research use 'ReLU' and 'Softplus' functions as the activation function. 'ReLU', or Rectified Linear Unit, is the most commonly used activation function in neural networks, especially in CNN. The function of 'ReLU' is f(x)=max (0, x). Usually, 'ReLU' is utilised elementwise to the output of different functions, like a matrixvector product. In a fully connection layer, the activation function is 'Softplus' because the 'Softplus' function is used in the output layer while making prediction. In this study, the ASD and non-ASD will be classified as the output after the fully connected layer. Moreover, 'Adam' is used as the optimiser. 'Adam' is an adaptive learning rate optimisation algorithm that has been created precisely for deep neural network training. It calculates the different parameters of individual learning rates, and it utilises approximations of the first and second moments of the gradient to adapt to the learning rate for each weight of the neural network. In addition, the epoch number is the number of the whole training data shown to the network while training is going on. In this study, the number of epochs is set at 15, and the number of steps per epoch is 1000. The last thing to discuss is the batch size. Batch size states the number of training examples used in one iteration. The batch size set in this research is 32 because a good default batch size is 32. The figure of the schematic representation of the proposed CNN algorithm is demonstrated in Fig. 4, which is used for explaining the implementation of the CNN algorithm.
According to Figure 4, the imported model represents the beginning of the CNN algorithm. The Keras library with the packages is imported. The next step is creating convolution layers. The layers help to take in the input and process it to generate the output. Some of the hyper-parameters are set to follow the procedure by creating the convolutional layers. After that, loading and reading the data as well as pre-processing the data are carried out by setting the shear range, zoom range and so on. The 'ReLU' activation is used to train and test the algorithm, and the accuracy performance as the output is generated. Accuracy means the general extent of right classifications. The factual level of this general exactness can be ascertained by utilising parametric tests, for example, permutation testing, which estimates how likely the watched precision would be gotten by chance (Vieira et al., 2017).  Table 3 shows the result for CNN algorithm below: Based on the results, the highest accuracy is 97.07% at epoch 13. The accuracy increases from epoch 1 (86.31%) to epoch 15 (96.82%). On the other hand, the loss decreases from the first epoch to the last epoch, which is from 0.0996 to 0.0194. The loss function is a significant part of the artificial neural networks, which is utilised for measuring the inconsistency between the actual label and predicted value. It is a non-negative value; the decrease in value of the loss function goes along with the increase in robustness of the model.
In this study, the same dataset is also applied in the SVM algorithm. The accuracy of classification of the ASD by using the SVM algorithm is presented through the classification report  Table 4: Based on the classification report, class 1 has a precision value of 0.63 and a recall value of 0.61. For class 2, the precision value is about 0.69 and the recall is 0.71. According to the confusion matrix, the accuracy value is 0.6636, which is 66.36%. This shows that SVM does not perform very well in the classification of ASD and non-ASD using the brain imaging dataset. To evaluate the effectiveness of both algorithms, a statistical graph is created to compare the accuracy of performance for both algorithms. The statistical graph is shown in Figure 5.

Fig. 5. Accuracy performance for both algorithm CNN and SVM
Based on the accuracy of performance, the highest accuracy achieved by the CNN algorithm is 97.07%, and it is about 30.71% higher than that of the SVM algorithm (66.36%). This proves that the CNN algorithm, a deep learning method, can do a better job of classifying the ASD than the SVM algorithm in machine learning. For more comparisons, reference is made to a few more research papers reported in the literature with different approaches to classifying ASD using the ABIDE dataset. More findings have been added to Table 5. It is clear that our proposed algorithm performs the job with the highest accuracy among the reported algorithms.  (Liu et al., 2017). In the field of ASD classification, Bi et al. (2018) used the multiple SVM to classify ASD patients with typical controls (TC).
One of the limitations of machine learning like SVM is the complexity of the models (Hyde, 2019) which interpret the data with a low rate of accuracy. On the other hand, using the deep learning method or CNN for image classification does not mean it would perform better than other machine learning methods, like in the cases of (Wang et al.

Conclusion
The main goal of this article is to classify ASD using the deep learning algorithm. The results clearly show that deep learning performs well in classifying ASD. The image data are pre-processed in this sequence: collecting the data, separating the data, and normalising the dataset. The CNN algorithm is then applied to the pre-processed dataset in order to classify the ASD and non-ASD using the Jupyter Notebook; and the steps of implementing the CNN algorithm are clearly stated. Lastly, the effectiveness of the algorithm based on the accuracy of performance is evaluated using a statistical graph and classification report. This study achieved an accuracy of 97.07% as the highest rate of performance; this validated accuracy is much better than those rates of previous researches, including the study of Bi et al. (2018), which has a performance success of 96.15%. Utilising more validation techniques such as cross-validation are recommended to test the performance of the algorithms. In addition, more comparisons with several other machine learning algorithms and various performance metrics will be done for more validation results. Finally, it is a good practice to include statistical hypothesis testing along with expert analysis. This research is linked to deep learning as a preferred method of classifying the ASD and non-ASD.