Deep Learning in Retinal Image Segmentation and Feature Extraction: A Review

Hqenam.unimas@gmail.com Abstract— Image recognition and understanding is considered as a remarkable subfield of Artificial Intelligence (AI). In practice, retinal image data have high dimensionality leading to enormous size data. As the morphological retinal image datasets can be analyzed in an expansive and non-invasive way, AI more precisely Deep Learning (DL) methods are facilitating in developing intelli gent retinal image analysis tools. The most recently developed DL technique, Convolutional Neural Network (CNN) showed remarkable efficiency in identify ing, localizing, and quantifying the complex and hierarchical image features that are responsible for severe cardiovascular diseases. Different deep layered CNN architectures such as LeeNet, AlexNet, and ResNet have been developed exploiting CNN morphology. This wide variety of CNN structures can iteratively learn complex data structures of different datasets through supervised or unsupervised learning and perform exquisite analysis for feature recognition independently to diagnose threatening cardiovascular diseases. In modern ophthalmic practice, DL based automated methods are being used in retinopathy screening, grading, identifying, and quantifying the pathological features to employ further therapeu tic approaches and offering a wide potentiality to get rid of ophthalmic system complexity. In this review, the recent advances of DL technologies in retinal image segmentation and feature extraction are extensively discussed. To accom plish this study the pertinent materials were extracted from different publicly available databases and online sources deploying the relevant keywords that includes retinal imaging, artificial intelligence, deep learning, and retinal data base. For the associated publications the reference lists of selected articles were further


Introduction
In recent years retinal imaging has drawn up tremendous attention of ophthalmologists and scientists who are dedicated to developing novel diagnostic tools, as retinal imaging is important for predicting cardiovascular diseases. The excessive acquisition of retinal images has created the heap of data through the challenge to the clinicians to analyze and manage retinal image data [1]. To move with this big data challenge, developing intelligent tools have appeared as crucially important for the efficient and adequate management of this enormous size data [2], [3]. Moreover, most of the existing methods for retinal image analysis are manual, time-consuming, and need the interference of the bulk of individuals. The development of the automated retinal image analysis method is remarkably significant in the ophthalmic diagnostic system to detect severe cardiovascular diseases such as Diabetic Retinopathy (DR) and Hypertensive Retinopathy (HR). AI technology especially DL is being employed widely to develop smart tools for diagnosing the severe disease through retinal image analysis. In this regard, ophthalmologists are being facilitated excitingly with the most promising AI tools in terms of high-quality analysis and effective management of retinal image data in clinical practices.
One of the most important sub-fields of biomedical engineering is the analysis of fundus retinal images that have become the key point of diagnosing life-threatening cardiovascular diseases such as DR, HR, and stroke because of the simple and non-invasive visualization of retinal microvascular structure [4]- [8]. These risky cardiovascular diseases are related to the changes in the microvasculature of human retina [9]. According to the different researches, there is a close relationship between the ocular funduscopic abnormalities and acute stroke even-though the blood pressure and other vascular risk factors are stable [10], [11]. Any damage in retinal arterioles and venules cause the HR that can lead to blindness [12]. HR and the risk of stroke are closely associated [13]. Some of the remarkable features of retinal microvascular structure such as Cotton Wool Spot (CWS), microaneurysm, hard exudates, focal retinal arteriolar narrowing and the changes in the vessel diameter and bifurcation angle, arteriovenous nicking are found to be associated with diabetes, hypertension, acute stroke and stroke mortality even the people are free from other stroke risk factors [14]- [16].
In this paper, a brief overview of the latest DL based approaches for retinal image analysis, segmentation, and feature extractions is demonstrated. Excessive acquisition of retinal image data is continuously offering the big data challenge to the ophthalmic practitioners. Implementation of DL techniques in retinal imaging is still in infantry level which needs an extensive and empirical exploration to create novel automated methods for retinal image analysis and dealing with a large amount of retinal image data. This paper is the preliminary footstep towards our future work for developing a new DL algorithm optimizing accurate feature detection of retinal microvasculature of retinal images. The consecutive sections of this paper manifest 1) the background of DL, 2) latest advancements of DL, CNN technologies in biomedical imaging, ophthalmology and 3) the contributions, performances, limitations, and challenges of recently introduced DL algorithms for retinal image segmentation and feature detection. A concise summary is included following the empirical discussion on the existing DL methods for retinal imaging and the potential scopes of this research area.

Deep learning (DL)
In the recent revolution of computer science especially in AI research, DL has come up with interesting advancements that are excitingly impacting a wide range of scientific areas such as signal and information processing and developing AI machines for healthcare. DL that is dedicated to depicting a new algorithm exploiting multiple layers to process non-linear information for analyzing and classifying large data patterns, extracting, and transforming the supervised and unsupervised data features. The DL models consist of deep architecture more generally DNN employing Artificial Neural Network (ANN) technique. This can analyze the hierarchy of features where higher and lower level concepts can be defined from each level concepts and vice-versa [17]- [19]. Figure 1 demonstrates the fundamental working procedure of the traditional ML algorithm and the DL algorithm for retinal image processing. Convolutional Neural Network (CNN) is a specific class of Neural Network (NN) that imitates the processing of the visual cortex. In general, CNN is the form of Feed-forward ANN that is capable of learning complex hierarchies of features and patterns automatically and adaptively employing backpropagation technique. CNN exploits a fewer number of parameters compared to ANN as it does not require to use parameters in pooling and non-linearity layers. The hierarchical feature extraction abilities of CNN allow it to extract different level features such as higher, mid and low-level features [20]. Generally, stochastic gradient descent method or backpropagation algorithm is used to train the CNN through supervised learning. To optimize the performance of CNN models some regulatory units such as batch normalization and dropout are also integrated into different learning stages [20]. Batch normalization is employed to generate zero mean and unit variance for the distributed feature map values to unite them. This is also known as internal covariance shift. The internal covariance unit also plays a role in regulating factors and ease the flow of gradient.
The fundamental differences between traditional MLP and CNN are the integration of weight sharing and limited connectivity properties in CNN models. CNN models are classified as structural reformulation, parameter optimization, and regularization. "LeNet" and "AlexNet" are said to be the most popular CNN configuration [21], [22]. Following the illustration of ResNet [23] to train deep CNN, some other robust CNN models had been introduced such as ResNet [24], WideResNet [25], Inception-ResNet [26] and Pyramidal-ResNet [27]. This advancement also bred the idea of integrating attention-based information processing, channel bosting, and spatial and channel-wise exploitation in CNN models [20]. As a powerful data processing tool, CNN is being employed massively in medical imaging specially in eye image processing task.

DL in ophthalmology
The ophthalmic diagnostic system is mostly dependent on eye image analysis. Human retinal images can be analyzed in a fast and non-invasive manner employing DL to extract, localize and quantify the pathological features responsible for different retinal diseases [28], [29]. Most of the recently developed DL-based retinal image analysis algorithm had been evaluated on different public retinal datasets. Some of the public datasets are Digital Retinal Images for Vessel Extraction (DRIVE) [30] Structured Analysis of the Retina (STARE) [31], Child Heart and Health Study in England (CHASE DB1) [32], Kaggle and Messidor [33]. These datasets contain images from healthy individuals and pathological images, images from retinopathy patients. Generally, Sensitivity (Se), Specificity (Sp), and Accuracy (Acc) are used as the performance measurement metrics to evaluate the performance of the retinal vessel segmentation and feature detection method. For the visual understanding of the quantitative measurement a graphical representation, Receiver Operating Characteristics (ROC) is used. Generally, ROC is plotted Se versus False Positive (Fp) fractions for different threshold values. The Area Under the ROC curve also utilized to evaluate the method's performance where the value 1 is considered for the standard predictor [32], [34]- [37].
The recently introduced DL algorithms to facilitate ophthalmic disease detection can be categorized into lesion-based and image-based, black-box, detection system. To train the lesion-based system the previously known features such as haemorrhages, microaneurysm and exudates are given to the DL model as input. The black-box models are trained with manually graded fundus photographs [38]. Grassmann et al., (2018) developed a DL model integrating six different CNN architectures, AlexNet, Visual Geometry Group (VGG), GoogLeNet, Inception-V3, Inception ResNet V-2, and Res-Net to classify the Age-related Macular Degeneration (AMD) [39]. Their model had been evaluated on Cooperative Health Research on the Region of Augsburg (KORA) data set and showed promising accuracy in AMD classification that obtained 84.20% Se and 94.30% Acc. The CNN model of [40] [41]. Though the performance of this DL system has been thought to be accepted by the clinicians, there was confusion to integrate this system into the ophthalmic diagnostic tool. This is because the used dataset for training was not completely graded by the experienced ophthalmologists and macular edema had not been identified appropriately. The retinal lesions such as microaneurysm, haemorrhages were not considered to analyze though the system employed multiple levels of representations for learning. This black-box issue can create confusion among clinicians to integrate this system in clinical practice [41].
A DL algorithm had been reported by [42] that was developed implementing the CNN architecture, Inception-V3 to detect the DR analyzing retinal fundus image. [42] considered the feature macular edema as the representation of DR without training the network with the DR features such as microaneurysm and haemorrhages. The distributed stochastic gradient descent algorithm had been used to train the network and batch normalization function was combined to speed up the network training utilizing the weights obtained from the ImageNet dataset. This newly developed model had AUC 0.991 and 0.990 for both EyePACS-1 and Messidor-2. This model obtained 90.3% Se and 98.10% Sp for EyePACS-1, and 87% Se and 98.50% Sp for Messidor-2 using first operating point while 97.50% Se and 93.40% Sp for EyePACS-1, and 96.10% Se and 93.90% Sp for Messidor-2 were obtained using second operating point [42].
A DL model was introduced by [43] that employing the CNN architecture, AlexNet, and VGG to detect DR such as haemorrhages, exudates and neovascularization, and anatomy of retinal microvasculature from retinal images. The network was trained with samples contained lesions to be detected that were extracted from DR patients and then annotated manually by several experts. The AUC of this DL model was 0.980% and obtained 96.8% and 87% Se and Sp respectively [43]. Another DL algorithm had been proposed by [44] that employed a customized CNN to detect the DR and achieved 0.97 AUC with 94% Se and 98% Sp on 5-fold cross-validation using a private dataset. Takahashi et al., modified the GoogLeNet DL network for grading DR from the fundus image [45]. Burlina et al., introduced a DCNN algorithm to detect AMD from the fundus image and applied on a 2-class classification problem and obtained accuracy ranged from 88.4% to 91.6% and AUC ranged from 0.94 to 0.96 [46]. Zhao et al., proposed an automatic patch and image-based CNN model that can detect the branch retinal vein occlusion from the fundus image and showed a 97% accurate result [47].
Schlegl et al., developed an automated method to detect the Intraretinal Cystoid Fluid (ICR) and Subretinal Fluid, and Grassmann et al., developed an algorithm to predict the severity of age-related macular degeneration based on DL [48], [49]. A method to diagnosis the DR was developed by [50] following the CNN architecture that can classify the micro-aneurysms, exudate, and haemorrhages from retinal images. For retinal vessel segmentation and feature detection [51] and cardiovascular risk factor prediction [52] from the retinal image proposed different methods based on DL. Jiewei Jiang et al., developed an ophthalmic disease diagnosis method employing a deep residual CNN classifier [53]. An automated microaneurysm detection method was proposed by [54]. Niemeijer et al., developed an automated system based on ML to detect the CWS and differentiate this from drusen, exist in colour images that were collected from diabetic patients [40].

DL in retinal image segmentation and feature detection
Segmentation of retinal image is a crucial step in ophthalmic image analysis as the output of this step is used for further analysis to extract qualitative and quantitative features. The recent advancement of DL offers the platform to develop DL based automated retinal image segmentation algorithms incorporate with conventional image processing. Table 1 illustrates an overview of retinal image segmentation methods that had been developed recently based on the DL technique. In Table 1, columns 2 and 3 describe the author profiles of the developed methods and the types of applied methods respectively. Columns 4 and 5 show the used data sets for performance evaluation and results of the developed DL methods respectively. Table 1 shows that most of the newly developed DL based retinal image segmentation methods followed the supervised learning to build their algorithms while [55] and [56] followed the unsupervised learning approach. [57], [58] proposed retinal image segmentation methods based on deep max-pooling CNN utilizing GPU and ELM respectively. [59] named their model for retinal vessel segmentation as Deep-Vessel that had been developed combining multi-scale and multi-layered CNN with side output layers that are responsible to learn rich hierarchical representations. [59] also integrated the CRF that is dedicated to maintaining long-range interactions between pixels. Dense U-net was introduced as the model for semantic segmentation and [60] developed and trained a Dense U-net model following image patch-based technique for retinal image segmentation. To train the network the patches were obtained by random extraction strategy and the test images were also divided into patches to test the model. At the output end of the network, the predicted test patches by training model were reconstructed employing a sequential reconstruction strategy to generate the segmented output image overlapping the patches [60]. A DNN incorporating multilevel Deep Supervision (DS) layers was introduced by [61] and [62] developed a model for AV classification, retinal microvasculature segmentation based on optimized deep CNN. Two different DNN models based on supervised learning were proposed by [63] and [34] while [56] proposed an unsupervised learning based DNN for the detection of the retinal blood vessel. Liskowski and Krawiec [63] utilized global contrast normalization and zero-phase whitening for data pre-processing and geometric transformation and gamma corrections for data augmentation while [56] combined denoising auto-encoders (DAE) and RF for their development. Hajabdollahi et al., [64] proposed a CNN approach for retinal image segmentation pruning convolutional layers and quantized the fully connected layers to simplify the network. Alom et al., [65] developed a semantic segmentation method, Recurrent CNN (RCNN) and Recurrent Residual CNN(RRCNN) based on U-Net and [66] utilized rotation operation for data augmentation and prediction to develop their Fully CNN approach. A multilevel and multiscale deeply supervised CNN was developed by [37] where they used the short connection to transfer low-level semantic information to high level back and forth. Lahiri et al., [55] proposed an unsupervised Deep Neural Ensemble Network and [36] developed a three-stage DL model to segment thick and thin vessels to avoid the issue related to the imbalance pixels ratio of the pixels in the input image space. S. Wang et al., [67] combined CNN and RF to design their retinal vessel segmentation model where CNN contributed as the trainable hierarchical feature extractor and RF as the trainable classifier. A supervised learning-based model employing both classification and regression tree (CART) and AdaBoost had been proposed by [35]. A deeply supervised CNN developed by [32] showed robustness in accurate segmentation and faster processing speed. To validate their findings a cross-training experiment was carried out and showed that their proposed model obtained better performance.         From the demonstration of Table 1 it is seen that the recently developed retinal image segmentation methods are in the form of DNN that had been modified employing CNN technique. U-net had been developed based on semantic segmentation method for medical image segmentation that had been used in the model of [60] and [65]. Most of the retinal image segmentation algorithms that are mentioned in Table 1 had been evaluated on the DRIVE, STARE, and CHASE_DB while another two different algorithms used AVRDB and AV-Classification dataset for their performance evaluation. Amongst the mentioned algorithms [59], [65], [66], [36], [32], and [34] evaluated their performance on DRIVE, STARE and CHASE_DB1 together while [60], [61], [63], and [67] evaluated on both DRIVE and STARE. The DRIVE database alone had been used for the performance evaluation of [57], [58], [37], [55], and [56], and [64] was evaluated on STARE database. [62] and [35] used DRIVE, AVRDB, AV-Classification, and DRIVE, RIS database respectively for their performance evaluation. According to Table 1, it is seen that most of the methods obtained better accuracy on the STARE database and the performance on the DRIVE has partially deviated from the other databases. It can be said that the deviation of the performances of the existing methods on the DRIVE database is due to the quality of DRIVE images that contain DR signs. The cross-training findings of [32] demonstrate that the proposed model showed slightly lower accuracy when trained and tested on CHASE DB1 and DRIVE database respectively. The reason for this deviation was assumed as the CHASE DB1 images contained non-uniform background illumination, wider arterioles, and poor contrast of blood vessels compared to DRIVE images [32]. They [66] performed cross-training for further validation of their work and found that there was a slight deviation of sensitivity while training the model on STARE database and testing on the DRIVE database but the sensitivity was satisfactory in case of training the model on DRIVE database and testing on STARE database.
The developed method of [62] and [67] obtained the best accuracy amongst the described methods in Table 1. The work of [62] obtained 96%, 98%, and 97% Acc on the DRIVE, AVRDB and AV classification dataset respectively while [67] [61] was recorded as 96.09% and 96.46% accurate for DRIVE and STARE database respectively. According to the performance analysis, it was found that the proposed method of [66] obtained the best accuracy, 96.94% for the STARE database.
The work of [55], [56] and [34] obtained 95.33%, 93.27% and 95.27% Acc respectively, [57] achieved 94.66% Acc and 0.974 AUC and [58] achieved 96% Acc, 0.714 Se and 0.986 Sp on DRIVE database. Besides [37] obtained 0.789, 0.980, 0.956 and 0.980 Se, Sp, Acc, and AUC respectively while [35] obtained 0.746, 0.983, 0.961 Se, Sp, and Acc respectively on the DRIVE database. [36] obtained the best outcome on STARE that is 0.773 Se, 0.985 Sp, 0.963 Acc, and 0.983 AUC. The U-Net based method of [65] obtained better accuracy than [60] and amongst the DNN based methods, the accuracy of [61] was higher while the method of [56] obtained lower accuracy than the others'. The performance of ELM method of [58] outgrew the performances of the methods of [57], [59], [37], and [55] in terms of accuracy. Though [58] claimed that their method showed time efficiency on the new Retinal Images for Screening (RIS) database, according to Table 1 the Deep vessel method of [59] is the most time-efficient that had been recorded 1.3 seconds as run time.

Discussion
Development of AI-assisted automated applications and tools for medical image analysis to-date is on the point of interest as it is potentially offering the feasibility in disease diagnostic and treatment systems. The relevant AI techniques need to understand clearly as it is complex to implement and repeat the procedures for learning the systems to get expected outcomes automatically. Available different learning methods such as supervised, unsupervised, and reinforcement learning are the core techniques to train an intelligent tool. To develop AI machines, the algorithm should be selected empirically considering the facts that the characteristics of data, length of the training period, number of parameters and features, and training curve. For supervised learning, the two most popular algorithms in terms of data processing efficiency are ANN and SVM among the existing algorithms such as Decision Tree, Random Forest, Naive Bayes classifier, K-nearest neighbor (KNN), and Fuzzy logic [68]. Unsupervised learning mostly, associations rule and clustering algorithms are used to develop the DL model for medical data processing as DL that can deal with noisy and low-quality data. Among some recently introduced DL models such as LSTM, DNN, CNN and RNN, CNN and RNN [68] showed their efficiency in medical image processing to predict cardiovascular diseases state, detect the responsible abnormal features from brain MRI and Fundus retinal images.
Experts from ophthalmic discipline are responsible for manual segmentation of retinal image. Manual delineation of retinal microvasculature is challenging and time consuming due to its complex hierarchical structure and highly varied pixel intensity of vessel width that generally ranges from 1 to 20 pixels depending on both image resolution and the anatomical width of the vessel. Presence of pathological signs also make the retinal image segmentation cumbersome [32], [34], [36]. Subtle features like lesions lie in the microvasculature of retina can significantly affect the performances of the DL approach. The existing DL based retinal image segmentation models were designed to segment the vessel irrespective of the pixel intensity utilizing a unified pixel-wise loss function. Due to the unequal distribution of pixels in image space, thick vessels principally influence the pixel-wise loss while thin vessels influence very less and this scenario affects the segment accuracy [36]. The result of the unsupervised model can be biased due to inefficient identification of the initial cluster pattern of data as the final cluster of the patterns of data depends on the initial cluster patterns [68]. Consequently, development of more robust DL-based vessel segmentation method attains the crucial focus. For securing consistency in automatic retinal disease detection, acurate segementation is critically significant as the complex microvascular structure need be understood more clearly.
According to the Table 1 under section 3.1 there is no individual DL-based method that can segment and extract both quantitative and qualitative features together. All the existing segmentation and feature extraction methods are dedicated to segment and detect single feature such as exudates, haemorrhages, microanurysms. To make task easy and time saving for the ophthalmic expert, it is worthwhile to develop DL-based algorithm that can handle the segmentation and multiple features detection simultaneously. Though most of the DL-based existing retinal image segmentation methods showed above 95% accuracy, development of these methods is important to secure the consistency in diagnostic performance. Inappropriate image acquisition leads to form highly varied datasets. It is important to annotate characteristics features in retinal image acurately. Using less quality dataset and poorly annotated images for training degrades the system accuracy.
Different CNN architectures such as deep CNN, recurrent CNN exploiting different non-linear functions have proved their robustness in retinal image segmentation and feature extraction and obtained higher processing performances compared to logistic regression. Exudates, haemorrhages, microaneurysms are the significant features for DR detection and some of the existing DR feature detection methods such as [69], [70], [71], and [72] examined their proposed method for both image-based and lesionbased criteria for appraising detection results. Their developed methods obtained best detection result for the image-based criterion. The pixel intensity of each lesion, exudate, is comparatively less as lesions need to be annotated alone from a whole image and accurate ground truth estimation based on pixel is more complex in this instance. In this case the detection accuracy of the lesion-based criterion is compared with the manual grading of ophthalmologist that can lead poor performance of exudate detection. On the other hand, the logical features responsible for DR, exudates, are annotated in the training images during ground truth estimation for image-based criterion where algorithms examine the testing image to find the presence or absence of exudates in the entire image [72]. CNN offers advantages over conventional statistical analysis but still need to focus on the problems of overfitting training data, utilizing more parameters that make the computational process lengthier. To avoid overfitting in data augmentation the training data can be increased and the hidden layers from DL architecture can be reduced. To develop the DL model for rare disease analysis, ocular tumor, or even common disease, a cataract that is not screened routinely for clinical purposes is challenging due to an inadequate amount of data [73]. Besides training the DL or CNN architectures are time-consuming, not suitable to integrate into the real-time mobile application and low memory space but testing the algorithms needs less time and cost effective [74].
To speed up the computational process of the DL method the hardware needs to be highly configured with powerful graphical computing units. It is seen that integration of DL-based eye care tools efficiently supporting ophthalmic diagnostic system achieving promising outcome but still human interference is needed. Medical practioners need to have clear and comprehensive idea about the relevance of utilizing DL applications in contrast with the patients history. The clinicians need to enrich their understanding of how DL method works to maximize the ensurity of diagnosis authenticity [75]. There are also some technical and clinical challenges in developing and validating the DL algorithm as it is completely data-oriented. Some issues are raised in data acquisition related to patient's consent and confidentiality, highly varying standards and regulations among the different institutions, lack of opportunities to test the algorithm in different population-based datasets [73]. Another challenge is that DL technology in healthcare still in the initial state which leads the patients and medical practitioners to think about DL being 'Black Box'. As the ophthalmic and cardiovascular diseases diagnostic systems are mostly dependent on medical imaging, DL can be investigated to facilitate the lifestyle of the urban as well as a remote area providing reliable medical facilities and reducing obstacles.