Application Programming Interface for Flood Forecasting from Geospatial Big Data and Crowdsourcing Data

— Nowadays, natural disasters tend to increase and become more severe. They do affect life and belongings of great numbers of people. One kind of such disasters that happen frequently almost every year is floods in all regions across the world. A preparation measure to cope with upcoming floods is flood forecasting in each particular area in order to use acquired data for monitoring and warning to people and involved persons, resulting in the reduction of damage. With advanced computer technology and remote sensing technology, large amounts of applicable data from various sources are provided for flood forecasting. Current flood forecasting is done through computer processing by different techniques. The famous one is machine learning, of which the limitation is to acquire a large amount big data. The one currently used still requires manpower to download and record data, causing delays and failures in real-time flood forecasting. This research, therefore, proposed the development of an automatic big data downloading system from various sources through the development of application programming interface (API) for flood forecasting by machine learning. This research relied on 4 techniques, i.e., maximum likelihood classification (MLC), fuzzy logic, self-organization map (SOM), and artificial neural network with RBF Kernel. According to accuracy assessment of flood forecasting, the most accurate technique was MLC (99.2%), followed by fuzzy logic, SOM, and RBF (97.8%, 96.6%, and 83.3%), respectively.


Introduction
Floods are a kind of natural disasters that happen almost every year in each region across the world, with effects in wide areas [1]. In the past, it was unable to predict or forecast floods accurately, e.g., which areas and what time floods would happen; due to the shortage of data or insufficient data for flood forecasting [2]. Technological limitations were also taken into account, e.g., computer capacity for big data pro-area due to different spatial contexts. The studies revealed further that using suitable factors for spatial contexts could increase forecasting accuracy. This research, therefore, relied on machine learning techniques for flood forecasting. The techniques widely used for such forecasting include decision tree (DT) [21][22][23], ANN [15][16][17], MLC [24][25][26], fuzzy logic [18][19][20], and SOM [27][28][29]. These techniques contain high accuracy. Yet, big data has not been used much.
According to the literature review of big data use for flood forecasting or flood management in a research, it was basically used for data collection by rainfall and water level measurement devices, based on the internet of thing (IOT) [41]. The measurement values used included rainfall, water level, and water flow. Besides, meteorological data was also collected from the government website. Weather data and the one from NOAA Satellite was collected for flood forecasting. The forecasting technique used was K-mean by Holt-Winter's method, an unsupervised learning technique. For the results of error measurement of flood forecasting in the research, the errors of forecasted water level were measured on different dates in one, three, and six months by MAPE. The measurement in one month showed MAPE of 1.39% while the other months showed higher errors. Nonetheless, this error measurement of flood forecasting was to measure flood errors at a certain point only, not in different areas (only one tested area in the research). The highlight of the research was big data use for forecasting, resulting in high accuracy and low errors. For the analysis of the research limitations, big data used here was still data acquisition from traditional methods. No API development for automatic data downloading nor crowd source data use. Another research used big data for flood risk analytics and assessment [42][43]. The data used included crowd source data acquired from Twitter, rainfall detect sensors, and NASA. To download API for showing data on a website, users could see water level, rainfall, and moisture of soil in each particular area. Still, the research did not describe methods or techniques for flood risk assessment, but to notify users of flood monitoring through proposed data on the website. The highlight of the research was API development for automatic downloading. For the limitation, the presented results to users were not from forecasting each area whether or not floods happened. Users must use their own experiences to analyze or assess flood possibilities in the areas.
However, most research papers relying on machine learning for flood forecasting still use data collected by manpower or involved agencies in the traditional method, i.e., file recording or downloading by manpower, without automatic downloading. That is why they failed to conduct real-time flood forecasting. And it also affects the implementation cost as more labors must be hired, together with implementation delays. Fortunately, capabilities of software development technology today do not require manpower since API is developed and replaced manpower. API is a connecting channel between a website and another; between a user and a server; or from a server to a server. API is like a computer language that facilitates computers to communicate and exchange data freely to one another. It helps access to data or downloading data from data-serviced websites by API. Access scope to services to data can be automatically restricted with no need to use manpower to record data for future use. Data can be updated or downloaded in real-time. With such capabilities and qualities of API, involved service providers have increased in larger numbers. According to the studies of API development for hydrology, developed API is used to assess construction/building damage caused by floods. To describe, users set dummy flood levels for damage assessment in order to see whether or not those exact levels will affect their constructions, and how many constructions will get affected. API development uses Python [44] and is offered for users to take data of elevation with high resolutions for flood modelling [45]. So far, API has not been used by involved agencies yet for downloading data of factors influencing floods in the form of big data for flood forecasting. According to the studies of related works, the advantages and disadvantages of flood forecasting can be concluded by different methods, as demonstrated in Table  1. 3. Sufficient data for forecasting.

Study areas
The study areas in this research were in the southern part of Thailand, including 2 provinces with regular floods, i.e., Surat Thani Province and Nakhon Si Thammarat Province. According to surveys of repeating flooded areas in the past 10 years by Department of Disaster Prevention and Mitigation (DDPM) and the Geo-Informatics and Space Technology Development Agency, Public Organization (GISTDA), there are the areas with regular repeating floods (8-10 times in 10 years), those with frequent floods (4-7 times in 10 years), and those with occasional floods (less than 3 times in 10 years). No matter what, flood forecasting methods in this research can definitely be applied in other areas.
This research used data from various sources with big data storage. The key quality of big data in this research was its high-volume. To elucidate, the data was big and in a large amount, which was available offline or online. This research used online data through API downloading for real-time flood forecasting. Besides, data used was with high variety; in other words, it was various and either structured or unstructured. Furthermore, data used for flood forecasting was with high velocity. It changes all time and rapidly, with continuous transfer in real-time. This brought some limitations to manual data analysis. For example, rainfalls change with time. So, if the traditional methods if data transfer had been used, it would not have been possible to use rainfalls from measurement or forecasting for flood forecasting. On the other hand, it might have caused delays and could not have been used in time. Hence, this research proposed the development of data downloading in the form of API. Data used was divided into 3 main parts. Part 1 was meteorological and hydrological data, automatically downloaded from GLOFAS by API. This part included accumulated precipitation and probability forecasting or probability of precipitation at different levels, i.e., 50 mm, 150 mm, and 300 mm in each area, flood hazard 10-year return period, 5-year return period exceedance, and rainfall forecasting from big data of TMD. API was used to download data automatically at particular periods to be forecasted. Part 2 was geospatial data, downloaded from the databases of involved agencies through Web Feature Service (WFS). This part included height above sea level, slope, land use and land cover, 10-year repeating floods, and flow direction. Part 3 was crowdsource data or volunteer data. This part was the written program for data acquisition from users in a certain area so that they can notify real situations in each area. This shall increase flood forecasting accuracy, because it is real-time data. Users can send data via mobile devices such as smart phones and tablets. For the details of the conceptual framework in this research, as demonstrated in Figure 2, each part was described about the title of the development of data downloading system in the form of API, flood forecasting using machine learning techniques, and accuracy assessment.

The development of application programming interface for flood forecasting
The development of data downloading system in the form of API in this research was divided into 3 main parts. Each part included different details. Part 1 was meteorological and hydrological data, automatically downloaded from GLOFAS and TMD. Downloading from GLOFAS included 5 factors, i.e., accumulated precipitation and probability forecasting or probability of precipitation at different levels, i.e., 50 mm, 150 mm, and 300 mm in each area, flood hazard 10-year return period, 5-year return period exceedance, and rainfall forecasting. PHP was used for the development. Code program was demonstrated in Figure 3. For downloading big data of TMD, it was to  Figure 4. Part 2 was downloaded geospatial data, including 5 factors, i.e., height above sea level, slope, land use and land cover, 10-year repeating floods, and flow direction. The development of this part relied on WFS because most data was stored in the form of Shapefile. It was different from Part 1, which was stored in the form of image. API was used as Web Map Service (WMS) instead. Part 3 was crowdsource data or volunteer data. This part included 3 factors, i.e., rainfall duration, rainfall intensity level, and drainage ability problem. PHP, Google Map, and API were used for the development in this part. The system downloaded data from these 3 developed parts. In addition, the data of all 13 factors were downloaded for flood forecasting. The details of each factor were demonstrated in Figure 5. The data of each factor was divided into multiple classes based on data qualities to brought for forecasting, to which machine learning techniques were used in the next step.

Flood forecasting using machine learning techniques
At the step of flood forecasting by using each technique of machine learning for leading to learning process, the techniques used for comparison to find flood forecasting efficiency in this research included MLC, Fuzzy Logic, SOM, and RBF. The input factors were divided into 3 main parts, i.e., meteorological and hydrological Data, including 5 factors. Part 2 was geospatial data, including 5 factors. The last part was crowdsource data or volunteer data acquired from users for 3 factors. There were totally 13 input factors for learning and test as stated before under the title "The Development of Application Programming Interface for Flood Forecasting." The results of forecasting in this research included 4 classes, i.e., non flood, low floods (below 20 cm), moderate floods (between 20-49 cm), and heavy floods (over 50 cm or equivalently). Then, the results were compared with the actual data from involved agencies. Field survey was also conducted for data collection. The study areas in this research included the areas in Surat Thani Province and Nakhon Si Thammarat Province.
Flood forecasting in this research contained the development of an algorithm for forecasting, as demonstrated in Table 2. The input data included 13 factors, of which results were from flooding forecasting in compliance with current coordinates of users. The first step was to read current coordinates of GPS from mobile devices. Next, TMD rainfalls were examined. If the coordinates were below 40 mm and the values of accumulated precipitation were below 4, the results of forecasting in that area showed "Non Flood." But if now, the values of other factors must be downloading and brought for forecasting by machine learning in order to produce the result of forecasting at the location of users. The criteria of TMD Rainfalls and accumulated precipitation were acquired from flood statistics analysis in the study areas in the past 10 years. It was found that means of rainfalls causing floods in each area were not below 40 mm/day. This research, therefore, applied the statistical values to the algorithm for flood forecasting so as to primarily screen non-flood areas. By doing so, forecasting would be faster.

Accuracy assessment
This research relied on accuracy assessment, of which correctly classified instances were calculated. Data used to compare with the results of flood forecasting by MLC, fuzzy logic, SOM, and RBF was the data from the past floods. That data was acquired from field survey by the researchers and DDPM. The equation for accuracy calculation was demonstrated in Equation (1). Data used for accuracy assessment included 1,000 points in the study areas. The number of those points were acquired from automatic point randomization process. Apart from this, this research also conducted forecasting error assessment by using mean absolute percent error (MAPE). The equation for calculation was demonstrated in Equation (2).

Correctly Classified Instances = (True forecasting)/(Total Forecasting)
(1) where n is the size of the sample, is the value forecasted by the model for time point t, and is the value observed at time point t.

Result and Discussion
The results of this research were divided into 2 parts, i.e., the results of API development for flood forecasting and of flood forecasting. The first part referred to the results of system development to download data from GLOFAS and TMD Big Data, as demonstrated in Figure 6 -8. According to Figure 6, data of accumulated precipitation between 3 -6 January 2019 was downloaded. This implied different accumulated precipitation each day, represented by the colors of pixel for acknowledgement of accumulated precipitation in each particular area. Periods could be chosen for data use, starting from 1 January 2011 to the present date. So, besides using accumulated precipitation for flood forecasting, it can also be used for analysis in other aspects. According to Figure 7, the results of downloading probability forecasting or probability of precipitation at 150 mm was demonstrated. The data was downloaded from GLOFAS in the form of Raster, supporting analysis capability by using the programs called GIS, remote sensing, and image processing. According to Figure 8, the data of forecasted rainfalls from TMD was downloaded. Rainfalls were represented by different colors of pins containing different meanings. To explain, yellow pins referred to rainfalls below 40 mm. Orange pins referred to rainfalls between 41 -80 mm. And red pins referred to rainfalls over 81 mm. The demonstration of rainfalls could be chosen based on exact dates to be used.   Figure 9-10. According to Figure 9, the results in Surat Thani Province from each method were demonstrated in Figure 9(a) -9(d), by MLC, fuzzy logic, SOM, and RBF, respectively; and demonstrated in pixel. Forecasting was represented by different colors, including 4 classes. To illustrate, non flood was represented by green; low floods (water levels below 20 cm) was represented by yellow; moderate floods (water levels between 21 -49 cm) was represented by orange; and heavy floods (water level above 50 cm or equivalently) was represented by red. According to forecasting analysis by visual inspection, MLC, fuzzy logic, and SOM produced similar results of forecasting. To elaborate, most areas in Surat Thani Province showed "non flood" and most flooded areas showed low floods. Areas with moderate and heavy floods had similar sizes. Forecasting by those 3 techniques produced different results from those by RBF, which showed more flooded areas than other techniques. Such additional areas belonged to Khiri Rat Nikhom District, with heavy floods. According to the results of forecasting in the areas of Nakhon Si Thammarat Province in Figure 10(a) -10(d), the results conformed with those of Surat Thani Province, that is, MLC, fuzzy logic, and SOM produced similar results of forecasting. To elaborate, most areas in Nakhon Si Thammarat Province showed "non flood." MLC, fuzzy logic, and SOM produced similar results of forecasting and the same as Surat Thani Province; whereas RBF produced the different results from those by other techniques. According to the overall results of flood forecasting in both provinces, assessed by visual inspection, the results produced by MLC, fuzzy logic, and SOM were similar; whereas RBF produced the different results. Nevertheless, for accuracy assessment of flood forecasting by different techniques in order to compare most accurate techniques, 1,000 points in the study areas were randomized and compared with flood situations in the real areas. The results were demonstrated in Figure 11 -13. Those in Figure 11(a) -(d) were from flood forecasting in Surat Thani Province by MLC, fuzzy logic, SOM, and RBF, respectively. Those in Figure 12(a) -(d) were from flood forecasting in Nakhon Si Thammarat Province by MLC, fuzzy logic, SOM, and RBF, respectively. The results demonstrated in both provinces were represented by points. Each point was the coordinate of a particular location or area. The results of forecasting were represented by the color of each point. There were 4 classes all together, i.e., non flood was represented by green; low floods (water levels below 20 cm) was represented by yellow; moderate flood (water levels between 21 -49 cm) was represented by orange; and heavy floods (water level above 50 cm or equivalently) was represented by red. According to accuracy assessment of flood forecasting by MLC, fuzzy logic, SOM, and RBF, the results were demonstrated in Figure 13   Regarding accuracy assessment of flood forecasting by the 4 techniques, as demonstrated in Figure 14(a). The results of the assessment revealed that they were in line with visual inspection. The most accurate technique was MLC (99.2%), followed by fuzzy logic (97.8%), SOM (96.6%), and RBF (83.3%), respectively. What's more, this research also conduct forecasting error assessment by using MAPE. The results of assessment were demonstrated in Figure 14(b). The technique with the least error was MLC (2.13%), followed by fuzzy logic (2.31%), SOM (2.88%), and RBF (19.85%), respectively. When analyzing errors from each technique based on the standard interpretation criteria, it was fixed that MAPE below 10 referred to highly accurate forecasting. Between 10 -20 referred to good forecasting. Between 20-50 referred to reasonable forecasting. And over 50 referred to inaccurate forecasting. Thus, when considering the results of forecasting in this research, the 3 techniques, i.e., MLC, fuzzy logic, and SOM showed highly accurate forecasting or very low errors. RBF was the only 1 technique that showed "good forecasting." This research was implemented with the survey of real flood data in each particular area. Field survey was conducted and data was brought from involved agencies. Crowdsourcing data was also taken into account from their flood notifications on thaiflood.org., which the researchers had developed for users to notify flood situations in each area in order to use data for flood management in the area and to examine flood forecasting in the research. The examples of real flooded areas were demonstrated in Figure 15. The points with the red flags represented real heavy floods. This was in accordance with the results of flood forecasting by the methods presented in this research. Likewise, the points with the orange flags represented real moderate floods; and the points with the yellow flags represented real low floods.

Conclusion
This research developed API for downloading big data automatically. It included factors influencing floods and was downloaded from involved agencies in order to be used for flood forecasting in the areas of Surat Thani Province and Nakhon Si Thammarat Province, Thailand. 4 machine learning techniques were used, i.e., MLC, fuzzy logic, SOM, and RBF. The results of flood forecasting revealed that using big data with machine learning produced highly accurate forecasting. The most accurate technique was MLC. Additionally, the methods presented in this research helped flood forecasting process become faster, with better advantages than traditional method with traditional data. They also helped saving cost in term of labors and time to collect or download data for forecasting.
The future research to be implemented is the development of flood forecasting systems for real-time forecasting and demonstrating results in each area in the form of a web application or a mobile application. The purposes are to allow users to keep follow-up and get ready to cope with upcoming floods; and to provide supporting devic-es for flood warning to people and involved person. As a consequence, flood management will be done more efficiently and damage will be reduced.