Paper—Curve Estimation Models for Estimation and Prediction of Impact Factor and CiteScore ... Curve Estimation Models for Estimation and Prediction of Impact Factor and CiteScore Using the Journal Percentiles: A Case Study of Telecommunication Journals

The impact factor and CiteScore of journals are known to be positively correlated with journal percentile but the use of the later to predict the formers are scarcely discussed, especially for journals in a specific subject classification based on the web of science. This paper proposed different curve estimation models for predicting the impact factor and CiteScore of 89 telecommunication journals using their corresponding percentiles. Out of the 11 models, only Logistic, exponential, Growth and Compound models are the best models for predicting the impact factor and CiteScore using their corresponding journal percentiles. The models were chosen because of their high values of R Square and Adjusted R Square and low values of the standard error of the estimates. In addition, strong significant positive correlations were obtained between impact factor and the CiteScore of the journals. The findings will help authors and editors in decision making as regards to manuscript submission and planning. Keywords—Impact factor, CiteScore, Quartiles, Percentiles, curve estimation, ranking analytics, statistics.


Introduction
In all the bibliometric metrics used in journal evaluation, impact factor and CiteScore are the leading ones. Impact factor is exclusive to the web of science managed by Clarivate Analytics. Science Citation Index, Social Science Citation Index, and Arts and Humanities Index are journals with impact factors. On the other hand, CiteScore is exclusive to Scopus and is managed by Elsevier. Both impact factor and CiteScore are employed in the evaluation of impact, prestige and quality of 2

Literature Review
Impact factor has survived many criticisms despite the fact that it is the first bibliometric parameter created to evaluate journal articles [14]. The advent of CiteScore has helped to break the monopoly of impact factor and the combination of the two parameters is needed to effectively evaluate research activities using two different platforms. In addition, the h-index from Scopus database is now increasingly used to evaluate researchers for academic recruitment, promotions and grant assessment.
The issue of the transparency, comprehensive, reliable and timely evaluation of journals between the impact factor and CiteScore is often a major area of intense debate among the researchers [15][16][17]. Although the two have been adjudged far different from some untrusted and competing metrics [18]. The arguments have been on the capability of the bibliometric parameters to effectively evaluate journals within and outside the different subject areas [19][20][21], ensure maximum coverage while maintaining consistent impact measurement [22][23][24], avoid underrating the influence of smaller journals [25] and ensure adequate inclusion of conferences, books, book chapters and trade publications [26][27]. Both bibliometric parameters have been accused of focusing on journals with high impact factor or CiteScore without taking into consideration of other factors that can predispose authors into their choice of academic outlets [28]. For example, some authors may prefer to publish in journals of their core subject areas (areas of specialization) or association or university-based journals, which may not rank high based on impact factor or CiteScore. Hence, over dependence on the metrics can impair sound judgement on analysis and evaluation of scholarly output [29]. Measures are to be put in place to ensure that the metrics are applied appropriately and in an objective manner [30], especially in the presentation of the true quality of journals [31]. This will guide authors on tracking the growth and progress made by journals over a period [32].
Although, the two bibliometric parameters are related to their respective percentiles, the relationship is yet to graduate to predictive modelling for some subject area [33]. What that is available is when the relationship is considered for all the journals, which may not show the true picture. This is because of the following; firstly, citation patterns differ in different subject classification. Secondly, some journals have more than two subject classifications of which they can rank high in one and low in another [34]. Thirdly, there is uneven distribution of journals across the different subjects. Lastly, quartiles of journals are often different between the impact factor and CiteScore [35]. Percentiles therefore represent the viable alternative of predicting the two-bibliometric parameters.

Materials and Methods
Descriptive statistics were used to present the statistical moments of the impact factor (IF) and its journal percentile (JP (SCIE) and CiteScore and its percentile (JP (Scopus). Correlation and Curve estimation models were also used. The curve estimation models used are inverse, S, logarithmic, linear, quadratic, cubic, power, constant, growth, exponential and logistic. The model summary, analysis of variance (ANOVA) and coefficients of the models are computed to assess the model with the best fit. The models were applied in two separate cases using the IF and CiteScore as respective dependent variables and JP (SCIE) and JP (Scopus) as the respective independent variables. The models can be used in prediction of dependent variables using the independent variables.

Descriptive statistics, robust estimators and correlation
The descriptive statistics for the impact IF, CiteScore, JP (SCIE), JP (Scopus) of the 89 telecommunications journals is presented in Table 1. The average and median CiteScore is greater than the impact factor. The same applies to the percentiles of the two metrics. The sum of the impact factors of the journals is less than the Scopus. The mean of the percentiles indicates that most of the journals are in the second quartile. Table 2 presents the values of the Huber's M-estimator, Tukey's biweight, Hampel's M-estimator and Andrews' wave, which gave almost the same results for the median for all the metrics. This is an indication that there is presence of no outliers that can adversely affect the results of the curve estimation models. Strong significant positive correlations were obtained between impact factor and the CiteScore of the journals shown in Table 3. The same was obtained between JP (SCIE) and JP (Scopus) shown in Table 4.

Curve estimation models
Eleven curve estimation models were used. Curve estimation models were done in two parts. Firstly, is the case of estimating impact factor using JP(SCIE) and lastly, the estimation of CiteScore using JP(Scopus).
The model summaries of the eleven curve estimation models were presented in increasing order of model fit in Table 5. Logistic, exponential, Growth and Compound models are the best models for predicting the impact factor using the journal percentile judging from their high values of R Square and Adjusted R Square and low values of the standard error of the estimates. The remaining seven models would present wrong predictions and high variance if used in estimating the impact factor because of their low values of R Square and Adjusted R Square and high values of the standard error of the estimates, despite that the models are significant as shown in Table 6. The coefficients of the four best fit models for estimating the impact factor using the journal percentile are presented as follows: compound model ( Table  7), growth model ( Table 8), exponential model ( Table 9) and logistic model ( Table  10).      The model summaries of the eleven curve estimation models for predicting the CiteScore using the journal percentile were presented in increasing order of model fit in Table 11. Logistic, exponential, Growth and Compound models are the best models by virtue of their high values of R Square and Adjusted R Square and low values of the standard error of the estimates. The remaining seven models (power, cubic, S, quadratic, linear, logarithmic and inverse) present average to poor fit, despite that the models are significant at 0.05 level of significance as shown in Table  12. The coefficients of the four best fit models for estimating the CiteScore using the journal percentile are presented as follows: compound model (Table 13), growth model (Table 14), exponential model (Table 15) and logistic model (Table 16).

Conclusion
The paper has successfully obtained predictive models for predicting impact factor and CiteScore using journal percentiles extending the observed correlation between the metrics to predictive models. The coefficients of the percentiles in the various models is significant and the best models guarantee minimum errors between the actual and predicted values. The research can be extended to other bigger subject classifications.