Early Alzheimer's Disease Detection Using Different Techniques Based on Microarray Data: A Review

Alzheimer's Disease (AD) is a degenerative disease of the brain that results in memory loss due to the death of brain cells. Alzheimer's disease is more common as people get older. Memory loss happens over time, and as a result, the person loses the ability to react appropriately to their surroundings. Microarray technology has emerged as a new trend in genetic research, with many researchers utilizing it to look at the changes in gene expression in particular organisms. Microarray experiments can be used in various ways in the medical field, including the prediction and detection of disease. Large amounts of unprocessed raw gene expression profiles sometimes contribute to computational and analytic difficulties, including selecting dataset features and classifying them into an appropriate group or class. The large dimensions, lesser sample size, and noise in gene expression data make it difficult to attain good Alzheimer classification accuracy using the entire collection of genes. The categorization process necessitates careful feature reduction. As a result, a comprehensive review of microarray Alzheimer's disease studies is presented in this paper, focusing on feature selection techniques.


Introduction
Over time, memory loss and intellectual impairment occur as a result of AD, a common form of dementia, as well as other mental functions being impaired. AD causes structural alterations in the brain. The symptoms appear gradually and steadily worsen with time. The patient first develops Moderate Cognitive Impairment (MCI), which then advances for Alzheimer's. AD, Moderate Cognitive Impairment stage occurs halfway through the disease. Though not all MCI patients develop AD [1] [2],some do. Although AD is currently incurable, it can be slowed or stopped in its tracks if caught early [3].In 2006, 26.6 million persons were diagnosed with AD. In 2050, According to current estimates, AD will affect 1 in every 85 people sometime in the future, with around 43% of cases requiring high-level care [4].The transition from a healthy state to AD can take many years in Alzheimer's patients [5].Patients first acquire Moderate Cognitive Impairment, which then progresses to Alzheimer's Disease. However, not all MCI patients get AD [6].As a result, the current study is mostly focused on predicting the change of MCI to AD.
In the last decade, Alzheimer's Disease diagnosis has benefited greatly from the use of machine learning techniques [7] [8]. Support vector machines are the most extensively used methods of classification, (ANNs), also Deep Learning (DL). This nature of a problem of optimization [9][10] [11]is the major difference between SVM and ANN. SVM provides a globally optimal solution. [12][13] [14]ANN, on the other hand, provides a locally optimal solution. Feature extraction is a key stage in both SVM and ANN. [15] [16]suggested that combining neural networks with intelligent agents to medical image processing could be advantageous. Deep learning, on the other hand, includes the feature extraction procedure directly into the learning model [17][18] [19].When dealing with huge datasets, Deep learning has been proven to be particularly beneficial for picture data [17] [20].Some researchers employed ensemble approaches to increase Alzheimer's disease classification accuracy [21][22] [23].
Deep exploration of molecular pathways has been more common in recent decades as a strategy of research for discovery in effect treatments used for complicated illnesses such as Alzheimer's, cancer, diabetes, as well as other diseases. Microarrays and Next-Generation Sequencing(NGS) are most often used technology in research methods. Studying using in-depth inquiry approaches have focused on the processing stages, it consists of a variety of feature selection(FS) as well as dimension reduction techniques [24] [25]. There are two types of people who work in this sector. The initial group of investigations [26] [27]used image recognition methods due to brain scan data (such as MRI). The research in the second group, used data from gene expression(GE) to estimate the risk of developing Alzheimer's illness [28] [29].High dimensionality data were found in gene expression data, however, including irrelevant, the disease diagnosis was unaffected by the redundant and noisy genes. Medical diagnostic accuracy is hampered by the application of artificial intelligence and data mining techniques because the prevalence for redundancy data expression and the limited sample size versus the huge number of genes (features) [30] [31]. As a result, dimensionality reduction is an exciting topic of data mining, statistics pattern recognition, and machine learning.
Dimensionality reduction (DM) aims to increase a classification algorithm's accuracy by reducing redundant and meaningless data from the Microarray dataset. There are numerous methods for reducing the dimensionality of a system. Dimensionality reduction methods are determined by the application domain and the dataset's peculiarities. Filter, wrapper, embedding, and hybrid approaches are all types of feature selection strategies [32]. Filter algorithms select attributes based on the characteristics of particular users [33] [34]. Wrapper techniques make use utilizing machine learning techniques or population movements; a subset of features is selected.
Methods that use filters are well-known for their ability to do calculations quickly at the cost of accuracy, whereas wrapper approaches have superior accuracy performance while requiring less computation. In domains with large datasets, filter-based method proved to be faster than wrapper methods. Both approaches have a drawback in that they fail to take into account how the classifier interacts with the addictions between the different features, resulting in varying classification accuracy depending on which features are used. Embedded techniques, on hand, Use learning algorithms to improve your performance. Wrapper approaches take a higher processing cost, while embedding methods offer the benefit of interfacing the with classification system [32].
Many methods of feature selection with in research are aimed on selecting useful and relevant traits that increase classification rates while decreasing computing costs, limited of these studying, however, in fact look at all of the methods used. The large majority of evaluations focused on a specific strategy for selecting features [35][36] [37]. The medical industry's general use of feature selection [38]. All available methodologies with their taxonomy in earlier research were not offered a thorough state-of-the-art, and issues with microarray data and experiments were not addressed. As an illustration, consider [36]. gave a review of filtering strategies for microarray gene analysis [37]. however, microarray data was not the focus of the survey on feature selection and fusion algorithms at the feature level. Gene expression microarray data were selected using matched-pairs feature selection, which was provided in a compressively brief overview by [39]. [40]gave a quick overview of widely used feature extraction and feature selection approaches that are currently popular. An overview of feature selection methodologies for medical field challenges was also provided by [41]. which incorporates biological, medical imaging, signal processing, and DNA microarray analysis of data. As a result, the study published by [42] is the most nearly similar to this review; nevertheless, Feature selection was all that was discussed in their review, the sources of microarray datasets and the problems they raise.
The objectives of this paper is to provide information on the difficulties and problems associated with microarray Alzheimer's datasets, with the current the feature selection methods employed in feature selection, Describe the microarray experiment in detail and point out the limitations of current methods. This publication also outlines important research opportunities for the future in this field.
All of this following information may be found in the paper: Section 2 provides an overview of Microarray technique and accompanying data, Section 3 focuses on Gene Selection. Dimension Reduction Approaches are classified in Section 4 according to their taxonomy. In Section 5, open research concerns are discussed. These concerns were taken into account when the data was collected. Section 6 discusses the literature review; Section 7 provides as the paper's conclusion.

Microarray technique
Microarray Technology (MT) is making revolutionary advances in the biological sciences since its introduction. It's seen as a launching pad for important new studies. It's made it look at tens of thousands of gene activity at once. even if most biologists and other researchers have problems when mining and working with for this data type. There are various databases where the results of Microarray experiments can be found.
The use of microarrays in scientific research dates back to the mid-1980s [43]. DNA Microarrays were first described by Augenlicht et al. (1987), who discovered over 4000 complementary DNA (cDNA) sequences on nitrocellulose [44]. Microarrays have allowed biologists to look into and measure for expression of tens of thousands of genomes at the same time [43] [38]. Bioinformatics, medical areas, all of these fields, as well as microarray research, have profited from advances in the technology [40].This type of microarray is often referred to as a biochip or maybe a DNA chip it has the number from little patches of genetic material attached a solid surface. DNA Microarrays are being used by scientists as a stage for studying the points of expression from different gene at same time, the numerous components of a person's genotype [43].

Microarray data
It is common practice to arrange and save Microarray experiment data in big matrices (M N). In reference to Table 1, There are rows of samples and columns of genes in every Microarray data matrix (features).
M by N matrices, which contain microarray data, are very huge. where N is the numbers of column and M is number of rows, each cell in a sample has its own unique value for gene expression [45] [46]. X ij shows the levels of expression of genes j then the situation or else sample i. while j is a positive integer between 1 and M, and I is a negative integer from 1 to N.

Microarray analysis of data
Because of the vast amount of data that can be extracted from genetic tissues, In the recent few decades, the biomedical industry has been increasingly important for machine learning. DNA Microarray datasets, in particular, have paved the way for the establishment of a new and dynamic area of bioinformatics research machine learning and. When there are so few samples (generally less than 100), yet so many properties, microarray data are usually viewed a structured data for machine learning (in the order of thousands).
Research groups in machine learning face significant obstacles when dealing with data of this type, since "false positives" are a possibility, or when selecting relevant features the prediction model, problems can arise (genes) [38]. Few genes a DNA Microarray are useful classification, according to research published in the literature. The removal of redundant and irrelevant information is critical in this case, as well as assisting experts in uncovering basic links between gene expression and certain diseases.
As a result, the gene dataset must be reduced in order to lower the expense of finding the optimal genes for differentiating between cell types (normal or abnormal cells). Clustering and classification [47] are well-known approaches for in-depth Microarray data analysis. Classification by clustering is an unsupervised strategy for sorting large amounts of data mad about smaller collections of genes otherwise samples that take common traits or shapes. Classes are created by the use of examples in a supervised learning environment. The classifier studies to classify unidentified sample instances into single of the specified sessions when given a batch of reclassified samples [48].

Background and advance of Gene Selection
To remove genes from a gene expression data, you can use a technique called Gene Selection, the DNA microarray is an example of this, that are redundant and/or ineffective. Feature selection is based on machine learning to select genes, which is well-suited for applications involving thousands of characteristics [49] [50]. Using Gene Selection approaches, scientists aim to locate and eliminate duplicate genes in the original space in two ways: first, to find and express the most useful information. In theory, Overfitting will reduce generalization and degrade model performance if the number of genes is increased. Our current work for Gene Selection (GS) is primarily concerned with identifying the most important genes, with less attention paid to reducing irrelevant or redundant genome [51]. Relevance, redundancy, and complementarity must be prioritized if meaningful outcomes are to be achieved. The relevance of a gene is determined by whether or not it possesses the appropriate info about the given class. According to [52], the feature set can be divided into three categories: highly relevant, marginally relevant, and irrelevant. There two types of marginally relevant features: those that are redundant and those that are not The non-redundant and highly relevant feature sections contain the vast majority of the useful material [53].Genes are chosen using a similar set of algorithms based on microarray data ( Figure 1).

Dimension reduction methods
When dealing with large amounts of dimensional data, classification algorithms face numerous computational and memory challenges [5]. There are two approaches to reduce the dimensionality of a system: extraction and selection of characteristics (also called Dimensionality Reduction (DR)/ Feature Transformation(FT)). There is no indication about the importance for a sole feature missing while using the Feature Selection method, the only exception being when multiple unique features are required, this may lead to the loss of information if certain features are missed when selecting a feature subset. Feature extraction, in contrast, the feature set can be reduced without losing too much information about the original feature. A type of data and application domain effect the selection of feature extraction and feature selection approaches.

Feature selection
High dimensional dataset includes features that are redundant, deceptive, or both, making it more difficult to interpret data further and thus not adding to process of learning. It's called feature subset selection when you pick the greatest characteristics from all the ones that can be utilized to differentiate between classes. a certain definition of relevance activates the statistical method called the feature selection algorithm. [54][55] Many feature selection methods have been empirically evaluated. According to various evaluation criteria, the search problem is a common term for feature selection. Feature selection algorithms use a searching organization that's characterized by exponential, sequential, or random search methods are all potential types of searching. It is possible to explore five various operators to create successors; weighted, compound, and random are a few of the possibilities available. Evaluation Metrics: Probability of Error, Divergence, Dependence, and Interclass Distance can be used to evaluate successors, Evaluation of Information, Uncertainty, and Stability in Figure 2 shows.
Filters, wrappers, and embedded/hybrid techniques are the three main types of feature selection methods. Wrapper-based approaches improve methods that use filters because of Feature Selection (FS) process is specific to a classifier being employed. Nevertheless, Wrapper techniques are very expensive to use in large feature spaces because to their high processing costs, and a trained classifier must be used to evaluate each feature set, this makes the process of selecting features more time consuming. In comparison to wrapper approaches, Filtering techniques are more efficient and faster to compute than traditional methods, however, their classification reliability is inefficient, making them better suited for large, complex datasets. Ways combining hybrid and embedded, which combine the best features of filters and wrappers, have recently been created. Combining independent tests with performance assessment functions for a feature subset is a hybrid approach [56] [57]. There are two classes of filter techniques, Specifically, feature weighting techniques and subset selection methods are shown in Figure 2. Methods for weighing in consider each feature separately and assign a value to it based on how important it is to the overall aim [58].
Following advantages of feature selection have been made available to you:  It decreases the feature space's dimensionality, hence reducing storage requirements and speeding up algorithms.  Data that is redundant, irrelevant, or obtrusive is discarded using this method.  Speeding up the learning algorithms' execution time has direct effects on data analysis activities.  Enhancing the accuracy of the data.  Increasing the resulting model's accuracy.  Reduction of the feature set in order to conserve resources for the next round of data gathering or during usage.
 Enhancement of capabilities in order to increase prediction accuracy.  Understanding data to learn more about the process that generated it or simply to see how it looks.

Feature transforming extracting
Extracting features is a process that involves transforming the original features into more significant ones. extracted features according to the following definitions: "Feature extraction" primarily refers to the process of creating linear combinations αTx of continuous features with high discrimination power between groups. Finding a good representation for multivariate data is a significant topic in artificial intelligence (AI) and neural networks (NN). In this case, features extraction can be used to simplify the data and offer a linear combination of every variable in feature set with the original input variable as input [59] [60]. Is the most extensively utilized feature extraction method. PCA a plethora of variations that have been put out. Principal component analysis is now an easy-to-understand analysis, the most important information in a confusing and redundant data collection can be found using a non-parametric methodology. Using principal component analysis (PCA) , we may reduce duplication (measured by covariance) while increasing info (measured by variance) in our data [61][62] [63] .
Many alternative dimensionality reduction strategies had really been established examined on two separate kinds of data to see how they affect classification performance, including information gain, wrapper approaches, or feature extraction through a variety of PCA techniques (e-mail data , medicine detection data) [64] [65].There is a strong correlation between the type of data and the results of PCA feature extraction (transformation). For both types of data, the method of deciding which features to include Wrapper has a moderate impact on categorization accuracy as compared to information gain.
As a result of a research, it's clear that dimensionality reduction is critical. Comparing feature selection methods with feature extraction methods, wrappers produce smaller feature subsets with better classification accuracy. Though computationally more expensive than feature extraction algorithms, wrappers are a viable alternative [66] [67]. in [56] [68] proposed methods for lowering the dimensionality of feature extraction and feature selection on the bi-level, as a means to enhance categorization efficiency. Dimensionality reduction begins with this step is to choose features depending on how closely they relate to one another. Select features from first stage are used in PCA and LPP at the second level to extract additional features. The suggested method was tested on a variety of widely used datasets to see how well it worked. The findings obtained suggest that the proposed system outperforms single-level dimensionality reduction approaches.

Literature review
In the literature, several approaches to Alzheimer's disease are employed in numerous pieces of work (AD). This section will include illustrations of the most recent research done in the field.
In 2011, B. Booij ,et al [69]. A disease classifier algorithm was developed using a Jackknife gene selection(GS) technique and (PLSR), which provides a test score indicating whether Alzheimer's disease (AD) is present or not (negative). An independent test group of 63 people, including 31 (AD) patients, 25 (HC) of the same age, and 7 young controls, validated the algorithm, which relies on 1239 probes. This technique accurately predicted of 55/63. (AUC 87 %).
In 2012, L. Scheubert, et al [70]. selection of features utilizing three different methods: (IG), (RF) accuracy, (GA) and Support Vector Machine (SVM) wrapper. When evaluating their output, we contrast it with GA/SVM outcomes (accuracy 85 percent). For the reason that of the lesser sample sizes in addition to unstable nature of this algorithm being presented.
In 2013,k, Lunnon, et al [71]. T-tests utilizing Meng scores and backward are two approaches for testing hypotheses that have been presented. we acquired a 75% accuracy rate in the validation group using AD and a control device. Sample sizes are restricted since they are small.
In 2014, P,Johnson, et al [72]. In this paper used Genetic algorithms (GA), as in the prediction of the onset of AD. An Accuracy of 0.90 for predicting HC and 0.86 for MCI conversion at (36) months that has been cross-validated. The constraints of the paper are as follows, the model developed is difficult to decipher, and the available data is less prone to overfitting.
In 2015, F, Sherif, et al [73]. The efficiency of the Bayesian network (BN) in determining the causes of SNPs has been demonstrated with a respectable level of precision. A result or included with indicated for advantage of a SNP group found using during this Markov techniques, does have a strong connection to AD and outperforms both the Nave Bayes(NB), the nave tree fed Bayes(NTB). This idea on building medicinal techniques for drug discovery is still completed. The accuracy and sensitivity of the minimal enhanced Markov blanket are 66.13 percent and 88.87 percent, respectively, compared to 61.58 percent and 59.43 percent in naive Bayes.
In 2015 ,S, Sood, et al [74]. To predict HC conversion to MCI/AD, we used Bayesian statistics (ULSAM Ageing) and KNN with AUC of 0.73%. In most cases, the microarray data are three-dimensional or more. sample sizes and variables that are not important to the study are covered in large numbers. Generate a lot of noise. As a result, finding out about the data sets and looking for correlations between qualities might be challenging.
In 2015,S, Paylakhi, et al [75]. The (GA) and (SVM) have been employed to build a gene selection strategy in this study. To begin, Using Fisher criteria, High dimensional microarray data could have noise and redundant gene eliminated. A (GA-SVM) then using to choose distinct subsets of maximally informative genes using different training sets. The Fisher Score and (GA)(SVM) approaches that combined for profit of a filtering technique and combined way. The suggested technique was evaluated using (AD) DNA microarray data. The result shows the suggested technique has a strong performance in classification and selection, which may provide a classification accuracy of 100 percent with only 15 genes. restrictions due to the detail that gene expression (GE) data can been erroneous or else missing.
In 2016 , S,Zahra Paylakhi, et al [76]. These methods combine the fisher Score, significant analysis of microarrays, and a (GA)-(SVM). A Fisher technique is employed for remove redundant and noisy genes from microarray data. Genetic algorithm -(SVM) selects subsets of highly informative genes using different training sets and the SAM approach is usage. Microarray data from AD patients was usage for test the proposed technique. The result appearances that suggested method implements fit in selection with classification, It has a classification accuracy of 94.55% utilizing just 44 genetic parameters. Biologically speaking, at least 24 (55%) of these genes are related with dementia, namely Alzheimer's disease. Small sample sizes and low precision limit the ability to combine datasets from various sources in order to improve precision.
In 2016,N, Voyle, et al [77]. Methods: for predicted used random forest (RF) and removal of the recursion feature. All analyses included age and APOE 4 genotype as variables. 70 percent of the time. We discovered that a lack of homogeneity among the control group may have resulted in lower prediction accuracy.
In 2016,M, Barati, et al [78]. Methods include (SVM), information, deviation, Gini coefficient and the gain ratio. A minimum of two algorithm weights greater than 0.5 are considered important for the sequences studied. A neural network approach (such as auto multilayer perceptron, neural net, and perceptron) was then applied to 11 sets of data using the weighted perceptron technique, with an overall performance of 97 percent. It does, however, introduce some issues since even if features have been selected, they do not provide the same level of confidence as a stepwise selection process that goes in both directions.
In 2017, M, Balamurugan, et al [79]. They proposed KNN Classifying Algorithm according to dimensionality reduction for diagnosing and classification Alzheimer's disease(AD), (MCI) in datasets. The (RDD-UDS) is a dataset provided by the (NACC) enabling researchers to analyze clinical and statistical dataset. The drawbacks of the KNN method based on the feature from a data; with huge data, the prediction step may be slow and sensitive to the data's size and irrelevant aspects.
In 2017,K, Nishiwaki, et al [80]. machine learning technology of random forest to develop a gene selection method. A study with an accuracy of 0.83 percent employed this method on (AD) microarray data to appropriately score the gene. The main weakness all datasets used are microarrays, hence their RNA-seq application is more accurate and less noisy.
In 2017,H, Li, et al [81]. proposed a method, The Ref-REO assay is used to identify variations in leukocyte-specific expression in blood samples containing both white and red blood cells. We found 42 and 45 DEGs in two datasets using Ref-REO in this work, which compared Alzheimer's disease (AD) blood samples to normal peripheral whole blood (PWB), with an AUC greater than 0.73 for predicting AD .It's quite tough to choose an appropriate feature combination from little DNA microarray data that's high dimensional.
In 2018, L, Xu, et al [82]. Alzheimer's disease should be detected at an early stage, scientists have developed a computational method analysis of protein sequence data. The number of times two amino acids appear in a row is used in their improved technique to represent sequences, and the SVM classifies the data after that. Magnetic resonance imaging-based research has been done in the past, but this new approach is more expensive and time demanding. Experiments have shown that the approach they designed has an accuracy of 85,7 percent. Additionally, the dataset used to classify AD their efforts resulted in the creation of. The main weakness in their system is that they don't look at how qualities interact with one another to improve predictions method.
In 2018,X, Li, et al [83]. In this paper, first big systematic analysis was done to discover (DEGs) had samples of blood with (245) Alzheimer's disease, 143 (MCI), and 182 (HC). A genome-wide association analysis was conducted to identify novel risk genes based on gene-based analyses of two different datasets of Alzheimer's disease blood samples. There was a new test that could tell Alzheimer's disease patients of healthy controls with a precision of 85.7 %. Limitation a small number of features.
In 2019, K, Sekaran, et al [84]. In this work, the gene expression profiles of Alzheimer's disease (AD) and healthy individuals are compared using numerical methods and (ML) techniques. Identification of differential gene expression) contributes significantly to the identification of most useful genes. Rhinoceros Search Technique, an algorithm based on a meta-heuristic globally optimization meta-heuristic (RSA). In the wake of RSA, researchers have discovered 24 new gene biomarkers. Four supervised ML techniques including Support Vector Machines, Random Forest , Nave Bayes and (MLP-NN) are usage to classify two separate groups of samples. One of these models, the RSA-MLP-NN, was 100 percent accurate in distinguishing between Alzheimer's disease (AD) and normal genes, demonstrating its usefulness. The study's weakness is that the training set is possible to contain a large amount from noise, which could have an impact on model performance.
In 2020,T, Lee, et al [85]. For the aim of this research. Five (5) feature selection approaches and five classifications have been used to identify genes related with Alzheimer's disease and to differentiate those patients. The best average AUC values for ADNI, ANMI, and ANM2 were 0.657, 0.874, and 0.804. For external validation, the greatest accuracy was 0.697 (for training ADNI to test ANM1) value 0.76 (for ADNI-ANM2) value 0.61 (for ANM1-ADNI) value 0.79 (for ADNI-ADN2), and 0.655 (for ANM2-ADNI), with an overall AUC of 0.859. (ANM2-ANM1). Due to sample size limits and low accuracy, a combination of feature selection approaches and local search methods was used to improve accuracy.
In 2020, H,Ahmed, et al [86] . The focus of this research is on the use of ML approaches to identify AD biomarkers. Random Forest (RF), Nave Bayes (NB), (LR) and Support Vector Machine algorithms were used to every Alzheimer's disease genetic information from ADNI-1 imaging project datasets. Nave Bayes (NB), Random Forest (RF), Support Vector Machine(SVM), and Logistic Regression methods got 98.1 percent, 97.97 percent, 95.88 percent, and 83 percent overall accuracy in ADNI-1's wholegenome approach. The findings suggest that classification algorithms are effective in detecting Alzheimer's disease early. limitation this takes a lot of time to locate the best features for given budget range.
In 2020,R, Saputra, et al [87]. The Particle Swarm Optimization (PSO) technique is used Use the Alzheimer OASIS 2 dataset of kaggle.com to test several decision tree algorithms with feature or characteristic selection. The result for studies utilizing 10fold CV, via evaluating a decision tree approach to conducting the attribute and feature values, show that random forest(RF) method has the maximum degree of accuracy, with a value of 91.15 percent. The PSO method is used for feature selection, and the testing is frequent several times usage the (DT) algorithm, the Particle Swarm optimization based RF method has a kappa rate of 0.884 and precision value 93.56 percent. The challenges of limited sample numbers and low accuracy are the constraints of this paper. To boost accuracy, a combination of different feature selection approaches and local search methods is used.
In 2020,C, Park, et al [88]. The paper suggested the deep learning approach this uses (DNA) methylation data and large-scale gene expression (GE) to predict AD Modeling Alzheimer's disease using a multi-omics dataset is difficult since it requires integrating multiple omics data and dealing with large quantities of small-sample data. We came up with an innovative, yet simple, strategy to minimize the number of features in the multi-omics dataset based on differentially expressed genes and differentially methylated positions to address this issue. (AUC = 0.797, 0.756, 0.773, and 0.775, respectively). a list of the paper's limitations Highest computing speed possible.
In 2020 , K,Muhammed Niyas, et al [89]. suggest the efficient combination greedy searching and Fisher Score (FS) the selection for Alzheimer's diagnosis features. To classify Normal Controls, MCI the suggested technique achieves a 90% and 91% Balanced Classification Accuracy and then the Curve values 0.97/ 0.98 utilizing SVM, K-Nearest Neighbor, etc. The suggested technique provides greater sensitivity and specificity (84 percent and 82.5 percent, respectively). According to the results, the proposed strategy for early Alzheimer's disease detection via effective feature selection is intriguing and may even be superior to present methods in some instances. Determining the criterion for the optimal combination of attributes based on ranking.
In 2021, N, Le, et al [90]. This work, our machine learning model was trained utilizing 35 expression characteristics using gene expression microarray data. The 35feature model outperformed classifiers by an average (AUC 98.3percent). The paper's limitations are due to the approach adopted, which is insufficient for predicting survival outcomes and even results in a prognosis that is polar opposite from the actual event. Table 2 summarizes the most recent progress in the Alzheimer's disease prediction system (2011 -2021), In focusing on the feature selection approach, it gives us a quick review of the work that has been done in this crucial medical domain.

Discussion
As time goes on, more research on Alzheimer's Disease prediction utilizing various methodologies has been published. Comprehensive reviews of the current state of research and implementation are required because it is so important. As a result, the purpose of this study is to present a comprehensive overview of the most recent research in the field of Alzheimer's Disease detection that employs various techniques. From the end of 2011 until the present, there has been a lot of research on Alzheimer's Disease. Based on a review of the works under consideration, Because of the noisy data, feature extraction approaches were found to be far more suitable for automated identification of Alzheimer's Disease than feature selection techniques. Because the majority of biomedical datasets have noisy data rather than useless or redundant data. Feature selection is a tool that can be used to remove irrelevant and/or superfluous features in a variety of applications. There is no unique way of selecting features that can be used across all applications. Some techniques are used to remove unimportant characteristics while avoiding redundant features. A feature weighting algorithm based just on relevance does not adequately address the need for feature selection. Subset search algorithms look for candidate feature subsets based on an evaluation metric that measures how good each subset is.
The consistency measure and the association measure are two current evaluation tools that have been proved to be successful at eliminating both irrelevant and redundant characteristics. Experiments demonstrate that the number of iterations necessary to discover the optimum feature subset is usually at least quadratic to the number of features. As a result, existing subset search methods with quadratic or greater time complexity in terms of dimensionality do not have adequate scalability to deal with high dimensional data. Filters and wrappers are two types of feature selection strategies. Wrapper approaches typically outperform filter methods because the feature selection process is tailored to the classification technique being utilized. However, if there are a lot of features, they're usually too expensive to employ because every feature set should be evaluated with the trained model separately. Filter techniques are so much faster than wrapper methods, making them better suited to large data sets. To dealing with high-dimensional data, methods in a hybrid paradigm have recently been proposed to incorporate the benefits of both models. And there are just a few strategies for dealing with noisy data. As a preprocessing phase, feature extraction approaches have been proposed to reduce the impact of type noise on the learning process. According to research, the accuracy of classification achieved with various feature reduction algorithms is strongly dependent on the type of data. When opposed to approaches that discretely handle feature redundancy and/or irrelevant characteristics, techniques that handle both irrelevant and redundant features at the same time are far more robust and advantageous for the learning process. As a result, work based on a small amount of data would not be labeled a significant addition to this discipline. The identification of Alzheimer's Disease using various approaches has three major limitations. The first is a data imbalance that can be addressed in future work by adding more features or knowledge-based characteristics to the model. The second issue was dealing with a large number of data; for this problem, cloud computing would be preferable to locally training a large amount of data, which would take more technical and manual effort. The last concern was a lack of available datasets, which is currently the most serious challenge in this field. There are a few reliable gene expression datasets for Alzheimer's Disease.

Conclusions
The detection of Alzheimer's Disease (AD), gene expression datasets and machine learning algorithms are commonly employed. Due to its vast dimensional features and small sample sizes, DNA microarray data present numerous hurdles to machine learning research. Features selection as a pre-processing method is only important in lowering the number of input features and in saving computing time and memory. Feature selection helps to improve classification accuracy. Researchers must also deal with the data's uneven distribution of classifications. A variety of test and training datasets have been located, but apart from the problem of using too many features for several small samples, the presence of outliers remains concern (i.e. dataset shift). Every year, researchers develop many new strategies to enhance earlier methods' classification accuracy and overcome limitations. Researchers also hope to assist biologists in discovering and understanding the fundamental pathway that connects gene expression to disease.
This challenge is being tackled through feature selection, and the results have been encouraging. Researchers are increasingly turning to hybrid feature selection strategies for guidance in their feature selection work. These approaches can essentially be categorized as a filter, wrapper, or embedding strategies. Given the enormous computer resources required by massive datasets, filtering algorithms are the most common. Wrapper and embedding techniques have been strategically avoided. These techniques have improved the robustness of the selected genes and the accuracy of the Alzheimer's Disease classification model.