A Study of QSAR Based on Polynomial Modeling in Matlab

—Mu-opioid receptor (MOR) is an attractive target for in silico docking experiments. Many potent analgesics currently in use act through the MOR. The main objective of the present work was to find the polynomial function for modelling of the structure-activity relationship of a series of MOR analogues and the results of the molecular docking with MOR (PDBid:4dkl). The relationship of the biological activity of the ligands with the ChemScore function and with the total energy (MolDock function) was modelled with first-to third-degree polynomials and surface fitted method, assessed by least squares method. The finding, established in the paper, suggests that the third order polynomial could be successfully used for modelling of the relationship between the biological effect of the MOR analogues and results from docking procedure. Analysis and comparison of the data from in vitro tests and docking studies could help to understand better the relationship between in vitro biological effects and docking studies and to answer whether the models of the biological macromolecules (in our case MOR) correspond to the real 3D structure.


Introduction
The opiates are among the oldest drugs for manage a number of medical problems. Morphine is the most active ingredient of opium. It is a widely used painkiller in medicine, despite side effects such as respiratory depression, constipation, drowsiness, tolerance, and dependence.
Endogenous opioid system (EOS) plays a critical role in modulating a large number of sensory, motivational, emotional, and cognitive functions [1,2]. Endogenous opioid peptides (EOP) are small molecules that are naturally produced in the central nervous system (CNS) and in various glands throughout the body. They function both as hormones and as neuromodulators [3,4]. Many peptides with opioid like effects have been found in the CNS and in peripheral tissues, that fall into three categories -endorphins, enkephalins, and dynorphins.

Ligands
Data for MOR analogues from in vivo test used in the study are presented in Table 1. Table 1. Data for the ligands from in vivo test [22]. Tyr-Pro-Leu-Gly-NH2 246,00

Computational tools
Molecular docking experiments with MOR and the investigated ligands, presented in Table 1 were performed by software GOLD 5.2 (Genetic Optimization for Ligand Docking), run on Scientific LINUX 5.5 operating system. All optimization functions embedded in the program were used (GoldScore, ChemScore, ASP and ChemPLP) [23][24][25][26]. The values of the ChemScore function shown in Table 2 indicate that there is a signification correlation between the ability of the ligands to bind to the MOR and its analgesic effect [22]. Ligands preparation were done by the program Molegro Molecular Viewer V 2.5 (MMV), (www.clcbio.com). The prepared structures were used for molecular docking by software GOLD 5.2. The total energies of formed ligand-receptor complex after docking were calculated by MMV 2.5, using MolDock scoring function (Eq.1) [27]. For analyzing the docking results, it was used Ligand Energy Inspector tool of the program. It allows getting detailed information about the energy interactions for the protein-ligand complex.
Where 45678 is a docking scoring function, +/987 is ligand-protein interaction energy, and +/97; is internal energy of the ligand [33]. Where the values ( . , Q … , 9 ) of the dependent variable represent the values of the biological activity of the investigated ligands; the values ( . , Q … , 9 ) of the independent variable represent the results from the docking -values of ChemScore optimization function (calculated by GOLD); the values ( . , Q … , 9 ) of the independent variable represent the total energies for the ligand-receptor complex formed after docking -the values of MolDock function (calculated by MMV); +? are the parameters of the model; is the degree of the polynomial (0 < + < ); is the number of ligand-receptor complexes (data points). The coefficients of Eq.1 were determined by the least squares method (Eq.3).
The Surface Fitting Toolbox of Matlab was applied for analyzing the behavior of one variable which depended on more independent variables and the individual model could be interpreted as a surface fitting function of the experimental data by least squares method. (http://www.mathworks.com/products/matlab). The following statistic parameters are used to evaluate the goodness of fit: -SSE (Sum of squares due to error) -it represents the total deviation of the response values from the curve fit to the response values, where the value of SSE near to 0 shows that the model has a smaller random error component and then the fit will be more useful for prediction. = = + ( + − Z + ) Q / +-.

(4)
Where + is the observed data value, Z + is the predicted value from the curve fitting and + -the weighting applied to each data point, ( + = 1).
-R-Square (R 2 ) -it measures how successful the fit is in explaining the variation of the data and it is defined as the ratio of the sum of squares of the regression and the total sum of squares about the mean. The values of R-Square closer to 1 indicates that a greater proportion of variance is accounted by the model. -Adjusted R-square -it is the best indicator of the fit quality when two models are compare. It can take on any value less than or equal to 1, with a value closer to 1 indicating a better fit.
-RMSE (Root Mean Squared Error) -it represents the standard error of the regression and an estimate of the standard deviation of the random component in the data. The values of RMSE closer to 0 indicates a fit that is more useful for prediction.

Results and discussion
The neuropeptide hormones are biologically active peptides that affect the CNS. Some investigators reported that after applying the Paw-pressure test to 22 investigated peptides [22], in contrast to the native MIF-1, having a weak analgesic effect, the analogues (with the exception of the MIF-1 analogues, modified in a 2 -nd position with arginine, lysine and canavanine) have pronounced antinociceptive effect at 15 th min of the experiment, which is antagonized by Naloxone (1 mg/kg, i.p.). The authors concluded that a modified structure of MIF-1 and Tyr-MIF-1 with the non-protein amino acids as well as analogues coupled with cinnamic acid exhibited analgesic effect at the 15th min of the experiment. This effect, in most cases, is significantly stronger than the effect of MIF-1, and in some cases even stronger than the effect of Tyr-MIF-1. Brominated analogue of Tyr-MIF-1 and the analogue MIF-1 modified at position 2 with norsulfoleucine exhibit stronger analgesic effect than Tyr-MIF-1. The effects of other analogues of the MIF-1 and Tyr-MIF-1 are in the range of 101-246 g/cm 2 . MIF-1 has the lowest effect of all investigated ligands -101g/cm 2 . In the present work, we continue to investigate the relationship between the chemical structure and the biological activity of 12 selected ligands from all presented in the publication [22]. The structures of the ligands are shown in Table 1.
For the purpose of the in silico docking the crystal structure of MOR was obtained from RCSB (PDB id: 4dkl). From the literature, it is known binding sites of the MOR [22]. These are the residues within a radius of about 10 Å of the asparagin (Asp) acid residue located in the transmembrane helix 3 -Asp147. Computer modeling and molecular docking experiments with MOR and the investigated ligands (Table 4) were carried out with software GOLD 5.2 and all optimization functions embedded in the program: GoldScore, ChemScore, ASP and ChemPLP [23][24][25][26]. А significant correlation between the values of ChemScore optimization function and the values of the biological activity of the investigated ligands was established [22].
The values of the scoring function indicate that there is a correlation between the ability of the ligand to bind to the MOR. In order to establish the relationship between the docking results (scoring function) and the values of the biological activity of the investigated analogues, a correlation between these two sets of data is searching for. The correlation analysis is performed by software GraphPad Prism 3.0 (http://www.graphpad.com/scientific-software/prism).
The value of the Pearson's correlation coefficient is = 0,837. It indicates that the correlation between the experimental data are significant and we can talk about the linear relationship between the values of the scoring function and the values of the in vivo test (the pain tolerance). We can conclude that an increase of the value of the optimization function increases the affinity of the ligand to the MOR, respectively, and hence increases its biological effects. The values of the ChemScore function shown in Table 1 indicate that there is a correlation between the ability of the ligands to bind to the MOR and their analgesic effect.
In order to establish the relationship between the values of the docking, the values of the minimum energy conformation for each obtained ligand-receptor complex after docking procedure and the data obtained for biological activity it was applied the Surface Curve Fitting Toolbox in software Matlab between the three sets of data. The program provides applications and functions for fitting curves and surfaces to data [28][29][30][31][32][33][34][35][36]. The total energies of the ligand-receptor complexes, which are formed after molecular docking with MOR and the best pose of the corresponding ligands, were calculated by MolDock optimization function by MMV [27]. Modelling of the structures of ligands from Table 1 have been performed using MMV [27]. Rendering of MOR and ligands, labeling of amino acids of proteins residues of highest potent one were also performed for better visualization. The modelling results are shown in Figure 1.
The investigated ligands interact with Asp147 which is crucial for the analgesic effect [22]. Moreover, they form multiple hydrogen bonds with residues of the receptor binding site.  The obtained models were evaluated on how well they fitted the data and how precisely they could predict. The experimental data can be represented as follows: • the values of represent the values of the biological activity of the ligands [22]; • the values of represent the result from the in silico docking -the values of ChemScore scoring function [22]; • the values of represent the total energies for ligand-receptor complex -the values of MolDock scoring function in MMV 2.5 [27] for the ligand-receptor complexes forming after the docking in GOLD 5.2. The values of the main parameters used for surface fitting in Matlab for MOR are presented in Table 2. All polynomial models from first to third degree were evaluated on how well they fitted the data and how precisely they could predict. The models were estimated with the statistical criteria of goodness of fit -SSE, R 2 , adjusted R 2 , RMSE. The results obtained for the statistic parameters are presented in Table 3.
As it can be seen from the results in Table 3 the goodness of fit statistics shows that the obtained model for fitting of the data for MOR with the third degree for and the third degree for is a good one. The polynomial model of third degree is with the highest value of Q = 1.0 and the value closer to 1 indicating that a greater proportion of variance is explained by the model. The value of = 0.6290 for the cubic polynomial are close to 0. Therefore, this value shows that the model of third-degree has a smaller random error component and then the fit will be more useful for prediction. The value of Q = 0.9999 for the cubic polynomial are less than 1. This statistic parameter is a good indicator of the fit quality when two models are compared and with a value closer to 1 indicating a better fit. The value of the = 0.5608 for the third degree of polynomial are closer to 0 and indicate a fit that is more useful for prediction. This shows that the obtained polynomial model for the surface fitting data is a good model and it explains a high proportion of the variability in experimental data, and it is able to predict new observations with high certainty [30][31][32][33][34][35][36][37]. The surface fitting starting from the first-degree to the third-degree polynomial of the experimental data (Table 2) is presented in Figure 2 (A, B, C). The biological activity of the ligands as a function of the values of ChemScore function of docking procedure and the values of the total energies was presented in Figure  3 (A, B and C) with a polynomial surface fitting of first to third order in Matlab. As it can be seen from the graphical representation of the experimental data ( Table  2) the best polynomial surface fitting is obtained for third order of polynomial model (Figure 3 C).
A graphic representation of the relationship between the three numeric variables in 2D is presented in Figure 3  The residual plot for obtained polynomial model is presented in Figure 4. It provides visual displays for assessing how well the model fits the data, for evaluating the distribution of the residuals, and for identifying influential observations. The top plot of residual diagram shows that the residuals are calculated as the vertical distance from the data point to the fitted curve. The bottom plot displays the residuals relative to the fit, which is the zero line. As it can be seen in Figure 4 (C) the obtained model is randomly scattered around zero. This indicates that the polynomial model of third degree describes the experimental data in a good way.
The best results for fitting of experimental data according to the results in Table 3 were obtained for surface fitting by a cubic polynomial in three-dimensional for determining the relationship between biological activities and docking results of investigated compounds. By using a polynomial least squares surface fitting technique, a third order polynomial was fitted to the data and it is represented as follow: ( , ) = @@ + .@ * + @. * + Q@ * Q + .. * * + @Q * Q + 2@ * 2 + + Q. * Q * + .Q * * Q + @2 * 2 (9) Where is normalized by mean 13.21 and standard deviation 6.494 and where is normalized by mean -80.69 and standard deviation 28.75. According the terms of goodness of fit the empirical testing found that the third order polynomial surface fit was the best in the overall representation of the experimental data. Тhe mean surface was calculated for all investigated ligands. The confidence bounds on the coefficients determine their accuracy. In the present work they are with 95% confidence bounds and are relevant to evaluate and compare fits. The coefficients of the polynomial model with two variables are given in Table 4.    Curve fitting of experimental data by polynomial models are commonly used because they have a simple form and they are computationally easy to use. The regression function of the models is linear in terms of the unknown parameters and this allows easy to find the optimal regression coefficients using least squares method.
The obtained polynomial model of third degree for fitting the experimental data showed good fitting properties and significant predictive ability( Q = 1.0, = 0.6290, Q = 0.9999, = 0.5608). Therefore, this model is suitable for determination the relationship structure-biological activity. This would be helpful in shortening the drug design process.
The in silico docking experiments were applied to determine the interactions between the studied MIF-1 and Tyr-MIF-1 ligands and the MOR. From the relationship found between the docking results and the in vivo test, we could predict the biological effect of newly synthesized analogues, which is much higher than other compounds in the tested series. For some work along these lines, see [38][39][40][41][42][43][44][45][46][47][48].

Conclusion
The docking procedure was carried out with the model of MOR with crystal structure (PDBid:4dkl) and mu-opiod ligands. The total energies for the obtained ligandreceptor complexes were calculated. The biological activity of the investigated ligands is compared with the results from docking (scoring function) in order to find correlation. ChemScore function correlates (Pearson's correlation coefficient = 0,837) with the biological activity of the ligands. From the obtained correlation, we could predict the biological effect of newly synthesized analogues, which is much higher than other compounds in the tested series. Using in silico docking, we could make prediction about the biological effects of newly synthesized compounds, and we could propose structures with higher biological activity.
The best results for fitting of experimental data according to the results in Table 1 were obtained for surface fitting by a cubic polynomial in three-dimensional for determining the relationship between biological activities and docking results of investigated compounds. The surface fitting by polynomial of the third order has the best fit, assessed by least squares method. The finding, established in the current study, suggests that: the third order polynomial could be successfully used for modelling the relationship between the biological activity of the mu-opioid analogues and the docking results; and the combination of the ligand-based and the structure-based approaches of virtual screening is a reliable search of effective mu-opioid candidates. The polynomial surface fitting model was obtained by Surface Fitting Toolbox in Matlab.
Analysis and comparison of the data from in vivo and docking experiments could help to understand better the relationship between the biological effects of the compounds and in silico docking and to answer whether the models of the biological macromolecules (MOR) correspond to the real 3D structure. The obtained model is applicable for predicting the efficacy of compounds with known score and total energy. This opens the scope for further research.