A Practical Evaluation of ML Algorithms for a Tag-Based BLE Indoor Positioning System

—In this paper, we evaluate the performance of machine learning (ML) algorithms employed in a commercial Bluetooth Low Energy (BLE) Indoor Positioning (IP) solution relying on practical measurements in a commercial office space setting. The BLE IP system utilizing tags presents an ideal economic approach for large facilities with a limited number of tracking elements (gate-ways). In this investigation, data collection campaigns were conducted in an in-door facility fitted with BLE gateways to aggregate Received Signal Strength Indicator (RSSI) fingerprints. Performance of a collection of well-known ML algorithms in terms of accuracy of positioning of the desired objects, in addition to training complexity and online tracking speed were evaluated. ML algorithms of increased accuracy and efficiency were identified and tabulated in both of the offline and online phases. It is also envisaged that as part of this practical study, the results will serve to identify proper economical topologies and configuration in real-life installations for tag-based BLE IP systems.


Introduction
Satellite navigation systems, such as the global positioning satellite system (GPS) [1], GLONASS [2], Galileo [3], and BeiDou [4], are able to provide positioning coordinates with great accuracy for various outdoor applications.Although, for indoor applications, the generated localization estimates are very crude as a consequence of the diminished satellite signals in an indoor environment.Hence, alternate systems were envisaged and researched to furnish the needed more accurate user locations within the interior of buildings or obstructed environments [5].Over the last two decades, location-based services (LBSs) have gained a great deal of popularity, where most consumer gadgets and goods are equipped with user location feature.By the year 2027, it is anticipated that the global indoor positioning and navigation market will grow to reach $ 50.35 billion [6].Indoor positioning and navigation systems enable a central platform to identify the locale of asset or personnel geographically utilizing a deployed wireless technology, such as Bluetooth low energy (BLE) [7], WiFi [8], Radio Frequency Identification (RFID) [9] or similar ones.As a result, end-users can be provided with solutions that include fast access to precision positioning, asset tracking, location analytics, mapping, wayfinding, and a plethora of other applications.In addition, indoor location-based services, in conjunction with asset and personnel tracking, are expected to facilitate proximity marketing and to optimize workflows.Furthermore, the widespread adoption of BLE tags/beacons accompanied by a surge in deployment of Internet of Things (IoT) [10] in indoor settings are among the factors that are envisioned to further drive the acceleration of deployment of such systems.These solutions will serve shopping malls, hotels, airports, warehouses, hospitals, and university/school buildings, museums [11] among the representative numerous venues of other areas that will benefit from such systems.Smart homes [10] offering intelligent features will implement enhanced systems that rely on indoor positioning, and will represent an additional prominent segment of these settings.
Wireless signal fingerprinting [12] is the most emerging among the techniques designated for indoor positioning [5], [13], because of the widespread deployment of wireless networks.Moreover, positioning methods based on Received Signal Strength Indicator (RSSI) [14] fingerprinting are attractive for their accuracy and independence of radio propagation models [13], [15].
Within the broad domain of indoor positioning, BLE at the present is a comprehensively utilized technology in pervasive computing in addition to the numerous IoT applications.Lately, the Bluetooth 5 protocol was introduced by the Bluetooth Special Interest Group (BSIG), which comes just 14 years subsequent to its initial introduction.This new protocol provides for four times extension in range, twice the speed, in addition to eight times the capacity of broadcasted messages.
BLE technology presents several attractive features within indoor positioning (IP) when deployed in conjunction with fingerprinting.These include the common support of smartphones and mobile devices to BLE; the portability of lightweight transmitters (such as tags), ease of installations, RSSI measurements can be gathered in a rather simple manner that is reflected in greater accuracy and high precision [16], in addition to the most prominent feature of low-energy consumption.
Referring to the literature on indoor positioning systems incorporating BLE and established on fingerprinting, the authors of [16]- [17] performed experiments on finegrained BLE IP that aimed to analyze the vital factors affecting accuracy of IP relying on the signals of the BLE radios.Their metric was Euclidean distance, which was employed to compute a figure to be associated with every cell in their positioning grid.Inturn, utilizing a Gaussian kernel, a probability for a particular cell was computed.They were among the first groups to perform such tests [17], and illustrated enhancements in positioning over their WiFi counterparts.In [18], the RSS of BLE beacons, fingerprinting, and template matching via squared Euclidean distance have been proven to be a feasible combined positioning approach.The authors of [19] presented a scheme incorporating the BLE RSSI, that enhanced the mean positioning, when compared with alternate IP techniques which used k-nearest neighbor (kNN) or weighted-kNN (WKNN) or similar algorithms.Reference [20] proposed a BLE IP beacons system based on fingerprinting, which employed the kNN and support vector machine (SVM) ML algorithms.They suggested some key performance parameters of the IP system to make it more effective.In [21], a collection of effectiveness bounds were theoretically established on the essential beacons count for resolvable positioning of users.The authors of [22] have alleviated the fast fading effects on BLE signals by employing channel diversity, which has also mitigated interference within the recorded RSSI data in their presented BLE IP System.Our BLE IP system described in this work is a tag-based implementation for a commercial proof-of-concept prototype incorporating the fingerprinting approach.It will be shown experimentally that this BLE IP system attains practical high accuracy in combination with a collection of candidate group of machine learning (ML) algorithms.In our investigation, data collection campaigns were conducted in an indoor facility fitted with BLE gateways to allow the aggregation of the signal-specific measurements of the BLE tagged items at known locations (fingerprints).Performance of the existing basic algorithms was evaluated and the accuracy of tracking locations of the desired objects was found to meet the industrial grade requirements.The rest of the paper is organized as follows: In Section 2, we describe the system architecture, topologies and methodologies, experimental testing at various phases in addition to working principles.Obtained results are presented in greater details in Section 3. Finally, we conclude the paper in Section 4 by highlighting our findings in this research.

System Architecture and Methodology
In order to conduct micro-positioning relying on BLE tags based on the fingerprint technique, the definition of a radio signal map is necessary in the form of a combined geographical coordinates (2-Dimension Cartesian Space) and RSSI values received by the BLE gateways, as shown in Fig. 1-(a), that are transmitted by the BLE Tag(s), as shown in Fig. 1-(b) [23].The location of a BLE tag is a key point, because it must be chosen in such a way that, at any time, the tag is within the radio range of at least one BLE gateway.The generic RSSI vector received by a server is denoted as r = (r1, r2, rj, …, rN), where rj, (j = 1 ,…, N) denotes the RSSI value from the j th BLE gateway, and N is the number of BLE gateways in the area of interest.However, the RSSI-based scheme is susceptible in general to large signal fluctuations and fading due to the multipath the signals [22].Hence, to treat these imperfections in signal propagation in the indoor environments, it is necessary to turn to advanced techniques in signal processing and machine learning algorithms.
This implementation process is comprised of an offline phase, in which the fingerprinting database is constructed, and an online phase in which a suitable ML positioning algorithm is employed to localize the BLE tags within the area under consideration.

Fingerprinting (offline phase)
A set of nine BLE gateways, as the one shown in Fig. 1-a, were conveniently installed in spatially diverse rooms, and their locations within each room were chosen to enable best visibility for a designated BLE gateway inside any room, in addition to achieving expected spatial paths diversity.The nine different rooms with BLE gateways were installed are depicted in Fig. 2. The black dots mark the positions of the gateways, where each was given a label as Z1 through Z9.Note that Zone (Z9), which represents the hallway, does not have a designated BLE gateway installed as was done for other zones.
In this offline phase, the fingerprinting data of a selected BLE Tag, shown in Fig. 1b, were collected, that is comprised of the RSSI measurements receved from this BLE Tag at each of the BLE gateways.Measurements were collected on a Saturday; when the company was vacant of most employees.This is not necessarily a good thing, but was done to minimise distraction to company employees.Data were collected in a manner that gives almost equal amount of data per unit area.Therefore, for each of the zones (Z1 -Z9), the BLE Tag was placed for 60 seconds at positions of distances of 40 centimetres in the horizontal and vertical directions.As a result, the measurements grid, in each zone, would have a square tile pattern of 40cm sides.It should be mentioned that the big zones (e.g.hallway), were divided into smaller sub-zones to obtain similar number of measurements for each unique zone.This was done to eliminate bias in the training stage, which in-turn, will not influence the prediction accuracy.These new subdivisions are illustrated in building plans of Fig. 3.A fingerprinting software, with a graphical user interface (GUI) as shown in Fig. 4, was used to query and fetch from the server the Radio Signal Strength Indicators (RSSI) for each of the nine BLE gateways every one second.The measurement data (fingerprints) for each zone was stored and saved in a separate file.Note that the average time the server takes to update the RSSI reading of a given gateway is 15 seconds on average.

Dataset creation
The measurement data was stored for each sub-zone individually.There was approximately 1100 points for each sub-zone distributed in 21 separate files of CSV file format.Time-domain averaging, filtering, was performed on the data points of each subzone to eliminate radio signal fluctuations due to fast fading effects [22].It was determined empirically that a 5 second widow yielded acceptable performance for this particular location, considering that the server takes about 15 seconds, on average, to update the BLE gateway RSSI measurements.
Subsequently, the resulting new data for each sub-zone was labeled with its subzone alphanumeric label, as demonstrated in Fig. 3, to create a class for each location sub-zone.The filtered and labeled data of the 21 files were then combined into one fingerprints dataset file, which was used for the training phase and the initial ML algorithm performance comparisons.

ML algorithm training and evaluation
The fingerprints dataset was first imported into the Classification Learner application within the Matlab ® software package [24], and was used to evaluate and investigate the performance of our BTL IP system.We started by evaluating two basic ML algorithms, namely the generic kNN [25] and classification Tree [26] ML algorithms, that are commonly employed for this task of IP applications.Our goal was to obtain baseline performance benchmarks using a 5-fold validation approach.In Fig. 5, we illustrate this step and the obtained results.As presented in Table 1, the basic kNN classification algorithm had an accuracy of 78.3%, and the basic classification Trees algorithm had and accuracy of 80.3%.This relatively low performance was attributed to the sub-zones of the Hallways, where there were no BLE gateways installed.This accuracy measure represents the percentage of falsely classified sub-zones (locations) over the total number of observations that were included in the classification process.

Investigation of advanced ML algorithms
A collection of ML classification algorithms were evaluated and investigated to identify best performers for this particular application of IP (i.e., the fingerprinting technique for IP).Various classification algorithms were used to predict the location of the BLE Tag (beacon).Among the investigated algorithms, the ones that resulted-in best performance in terms of accuracy were Ensemble Learners (Subspace KNN [27] and Bagged Tees [28]), Weighted KNN [29], Fine KNN, Cubic SVM, and Fine Gaussian SVM [30].That was accomplished and verified utilizing Matlab as was stated earlier and performance is presented in Table 2.

ML algorithm online performance investigation
In order to test the real-life performance of the top-performers of the collection of ML algorithms identified in the previous Section (2.4), the same BLE Tag that was previously used to collect fingerprinting data, was placed in arbitrarily selected locations, where subsequently the RSSI fingerprints at the nine gateways were pooled from the server.In Fig. 6, we have plotted the observed RSSI that was recorded over time at one of the gateways for the stand-still BLE Tag.The plot of Fig. 6 demonstrates the random fluctuations of the RSSI measurements as a function of time.For completion sake, a standard spectrum of the BLE Tag (beacons) is also shown in Fig. 7 overlaid the spectrum of WiFi channel 1 in the 2.4 GHz ISM band [31].The newly collected data of locations fingerprints was labeled with the corresponding sub-zone (e.g., Z11).Subsequently, it was fed to each of the set of ML trained classifiers from the previous section to independently predict the corresponding location sub-zone.The obtained results are illustrated in Fig. 8 for each of the top six performers of the ML algorithms presented in Section 2.4.The figure depicts histograms for zone prediction when the BLE Tag was placed in zone Z11.It can be seen that the online accuracies and performance do not compare well with those obtained using offline data.Fig. 7. Spectrum for Wi-Fi Channel 1 versus BLE Beacon Advertisements Channels [31].

Effects of reducing the number of deployed BLE gateways
We investigated the possibility of using fewer number of BLE gateways in the IP system without sacrificing significant performance (accuracy) in positioning.To this end, we empirically tested the gradual elimination of BLE gateways in a systematic manner, while observing the resulting prediction accuracy in each scenario.
Scenario A: Using the original offline fingerprints, where the ML algorithms were tested on the offline data, but BLE gateway #5 RSSI contribution (feature) was excluded from the dataset.The resulting prediction accuracy revealed no degradation in iJOE -Vol.16, No. 8, 2020  accuracy (or performance).It was then validated using online data that produced similar results.

Scenario B:
Here, one corner and three center gateways were kept (gateways 1, 3, 7, and 9).As illustrated in Fig. 9, the accuracy of the kNN family of algorithms was significantly degraded.This was especially manifested in the subspace kNN, that had relatively superior performance.On the other hand, the accuracy of the other classifiers has only changed a little around what it was, with all gateways in online prediction mode.

Evaluation of the ML algorithms in online real-time tracking
It is desired to evaluate and demonstrate the online real-time performance (on-thefly predictions) to replicate an actual deployment of the selected trained ML algorithms when incorporated in a commercial software application solution.For this purpose, an application software (proprietary tool) was created using Matlab ® .It obtains, in realtime, from the server the most recently available RSSI measurements from the IP system BLE gateways, and subsequently feeds this set of RSSI values as features to a collection of previously trained ML algorithms (of primary interest) to obtain predictions of location (sub-zone numbers) based on these values.The application interface is shown in Fig. 10 displaying the measured RSSI values (dBm) at the nine gateways, in addition to the predicted location (zone number, i.e., Z913).Note that a zero (i.e., 0) indicates a very week signal (out of range) at any of the designated BLE gateway.Each of the algorithms independently outputs its best estimate of the location sub-zone.The list of trained ML algorithms that are interfaced in this application is as follows: basic kNN with 10 Nearest Neighbors (10-NN), Weighted kNN (WKNN); Fine KNN (FKNN); Subspace kNN (SSKNN); 10-NN with Voting (10-NN-V), and Bagged Trees (BagdTrz).Considering the superior performance of the variants of the kNN-family of algorithms for this application, it was selected to be incorporated in this real-time application and compared to the second best performer (Bagged-Tree).We note here the wrongly reported location (Z912) by the Bagged-tree (BagdTz).

Discussion of Results
Most of the obtained results were presented directly within the methodology section.Nevertheless, we re-present some of the results for the purpose of comparisons and discussion.As was shown in Table 1, the basic classification Trees algorithm had a little better accuracy than that of the basic kNN algorithm, although this was not of sufficient performance level.Also, Table 1.depicts that the classification Trees ML algorithm has superior performance in terms of prediction speed once it has been trained.The subspace kNN classification algorithm was shown, in Table 2, to have the highest accuracy with good prediction speed and intermediate training time.The bagged Tree algorithm has a relatively lower accuracy than that the subspace kNN, but with double the prediction speed.The SVM ML algorithms family has much faster prediction speed with accuracy close to that of the Bagged Tree algorithm.
A look at the sub-zone prediction accuracy using the online data of Section 2.5 indicates that prediction accuracy was significantly reduced, when compared with the accuracy obtained in the offline phase of Section 2.4.This can be attributed to a form of overfitting in the application of the ML algorithms, although the employed 5-fold crossvalidation helps in reducing its effect.In addition, the resolution and quality of the measurement data acquired from the utilized servers could not be verified.Moreover, fluctuations of the recorded RSSI data rendered repeatability and uniqueness of the fingerprints at the same location to be a discrepancy.Removing one of the gateways had no significant effect on the system accuracy.The results of scenario B, where some of the BLE gateways were systematically eliminated, as was described in Section 2.6, are listed in Table .4. Keeping only four out of the original nine BLE gateways in the IP system had contradicting effects on the six investigated ML algorithms in the online mode of operation.Some had minor reduction in accuracy, while others had suffered major degradation in accuracy such as the WKNN.On the other hand, an increase in accuracy was observed for the group of algorithms that had less favorable accuracy in the online mode.Finally, the real-time investigation of selected algorithms that allowed movement and on-the-fly reporting of predicted location (sub-zone label) demonstrated the viability of employed ML algorithms.Moreover, it allowed the discovery of most error prone areas near sub-zone borders, and in zones where there were no BLE gateways installed.

Conclusion
The work presented in this paper focused on performance evaluation of ML classification algorithms in a tag-based BLE IP system to identify optimal performers for a real-life practical deployment.Data collection campaigns were conducted at a designated commercial office facility, which was fitted with a number of BLE gateways that allowed the successful aggregation of RSSI fingerprints of the BLE-tag.These fingerprints were utilized in comparative analysis of a collection of ML algorithms in both of the offline and online position tracking experimentations.Among the investigated algorithms, the ones that resulted-in best performance in terms of accuracy were ensemble classification learners (Subspace kNN and Bagged Tees), WKNN, Fine KNN, Cubic SVM, and Fine Gaussian SVM.The subspace kNN had the best performance in terms of overall accuracy.Online phase results indicated that prediction accuracy was degraded when compared with that of the offline phase.This degradation was attributed to a form of overfitting in application of the ML algorithms.The 5-fold cross-validation has helped in mitigating this effect.It was observed that systematically removing up to 50% of the installed BLE gateways of the investigated IP system had conflicting results, where the accuracy of some of the tested ML algorithms had increased, while it had deceased for others, suggesting that the viability of this approach to be site specific as well as an algorithm specific procedure.The real-time implementation of selected algorithms that allowed movement and on-the-fly reporting of predicted location demonstrated the viability of integrating the selected ML algorithms into commercial BLE IP solutions.

Fig. 2 .
Fig. 2.An illustration of the nine zones and BLE gateways locations.

Fig. 3 .
Fig. 3.A floor plan illustrating all sub-zones for indoor positioning.

Fig. 4 .
Fig. 4. The GUI of the utilized fingerprinting software.

Fig. 5 .
Fig. 5. Evaluating the performance of the BLT IP system using Classification Learner Application.

Fig. 6 .
Fig. 6.A Plot of Random Fluctuations of the Observed RSSI Versus Time at a BLE gateway.

Fig. 10 .
Fig. 10.Application Software Interface of Online Real-Time Tracking.

Table 1 .
Performnace of the basic kNN and Classification Trees ML Algorithms.

Table 2 .
Performnace Summary of Advanced ML Algorithms.

Table 3 .
Performnace Summary of Advanced ML Algorithms in Online Phase.

Table 4 .
Results of scenario B with only four BLE gateways