Optimization Research and Exploration of New Rural Cooperative Medical Information System

The New Rural Cooperative Medical Care Information System (NRCMS) used a single relational database to store most of the data in the early days. The scheme is difficult to expand and has little concurrent ability, so it is not suitable to deal with semi-structured, Unstructured data. Then along came distributed technologies such as Hadoop, which had the advantage of using inexpensive machines to store and process data. But HADOOP was designed to distribute data equally among the nodes in the cluster for storage and processing, ignoring the differences in the storage and computing capabilities of the cluster nodes themselves, and the performance of individual nodes even affects the performance of the collation system, and it is master-slave node processing data, there are data can tamper, system terminal is too single and so on. The contribution of this paper lies in the overall structure and technical route of the new rural cooperative medical system, and the optimization method of data flow of heterogeneous nodes, which can effectively save the scale and cost of the cluster, with the help of blockchain technology to ensure the security and credibility of data, optimize the front-end cluster service architecture, support multi-channel and high-concurrent key information push mode, has a very good promotion. Keywords—New rural cooperative medical system optimization, Hadoop, blockchain, load balancing


Introduction
The New Rural Cooperative Medical System [1][2] (hereinafter referred to as the New Rural Cooperative Medical System) is supported and guided by the government, and farmers actively participate in it. It is a mutual assistance system jointly funded by individual farmers and the government for the purpose of overall medical care for major diseases. In order to implement this system, the Ministry of Health and other government departments on the introduction of the new rural cooperative medical information system such as a series of technical specifications. It mainly stipulates that the health authorities of various provinces and cities should make a good plan for the development of the new rural cooperative medical system. In order to save construction costs and combine the local conditions, establish and upgrade the new rural cooperative information system that meets actual needs, so as to enjoy this with the majority of farmers. policy. At the beginning of the construction of the new rural cooperative information system, a single relational data system was widely used to store data. With the continuous increase of data in the new rural cooperative system and the continuous enrichment of data types, the expansion of this kind of data storage scheme is difficult, and the software and hardware upgrading the increasingly high cost, difficulty of maintenance is also increasing, especially medical examinations stored in semi-structured images, video and other unstructured the data, store and analyze abnormal difficult.
Due to the continuous increase in the scale and complexity of data involved in the new rural cooperative information system, the time required for traditional singlenode data processing has become unacceptable. Therefore, in order to improve the efficiency of data processing, many scholars have been studying the parallel processing of big data. In the past, due to the development of parallel processing program needs a wealth of knowledge in parallel, the development extremely difficult, and the parallel computing framework also is to have a lot of memory space and high network environment bandwidth (such as supercomputers) designed. In the past ten years, due to the emergence of distributed frameworks such as Hadoop [3], as shown in Figure 1 below, users can easily develop massively parallel programs. The new rural cooperative information system has also begun to adopt this distributed parallel processing technology to coordinate, organize, analyze and process data such as participation, outpatient, inspection, and medicine. However, the current system has problems such as inappropriate cluster scale, high construction cost, easy tampering of medical data, delayed push of end-user data, poor pertinence of the system, and poor concurrent performance. This article will focus on optimizing three issues: 1. Build an appropriate cluster size, optimize the data flow of heterogeneous nodes, effectively save the cluster size and energy costs, and improve the execution speed of MapReduce [4] 2. To improve system security and data credibility, high-level use of district block chain technology ensures that the new rural cooperative medical system critical data credible, cannot be tampered with, to ensure that farmers, government, healthcare system benefits are not infringed 3. Optimize the concurrent performance of the system, improve the front-end cluster service architecture, expand the terminal form, and push the associated data of different roles in time through data mining and other technologies to improve the user experience of the new rural cooperative system.

Related Research
The new rural cooperative medical system has been upgraded from a traditional relational database to a Hadoop platform. The majority of engineers and researchers mainly explore and construct around three issues.
1. Data storage capacity and construction cost, optimize the Hadoop underlying file storage system. Different distributed scenarios have great performance differences and instability. The reason is that the server performance involved in the new rural cooperative medical system has obvious differences. Research in this area is mainly focused on the performance degradation of Hadoop when it runs on heterogeneous clusters of nodes with different performance capabilities [5][6]. Throughput of each node may be variations, all of computing nodes cannot be the same number of blocks to perform the same operation. Therefore, idle nodes (that is, nodes that have completed work) can continuously receive data of uncompleted tasks through network communication, which causes network congestion and seriously affects overall performance. In a heterogeneous environment, there may be a mismatch between the computing power of a node and the number of blocks allocated to it, and data locality cannot be maintained. Related to domestic and foreign scholars have been on this issue related research, for example, Xie [7] put forward a Zhong data placement scheme based on performance over each node. Scheme includes two steps: first, the initial data is placed in use Preduce the Application of Normalized the ResponseTime to cycle, data is re-assigned to the performance measurement metric proportional custom data placement. This is because the initial data placement may collapse after deleting or adding blocks.
Although the programs they propose to solve the problem of some performance degradation, but this solution is not sufficient to assess the heterogeneous environment of each node work load, because it uses a simple performance metric. In addition, Wang [8] et al. studied various information collected by master nodes and slave nodes to evaluate workloads in heterogeneous environments. The research focuses on storage performance in a heterogeneous environment. Qureshi [9] and the like provide a data storage layer disposed scheme perception, and introduces a new performance metric, referred to as "category calculated ratio", calculated for the capacity between the processor, memory and storage difference.
In addition, Guerrero [10] et al. proposed a migration-aware data layout scheme based on genetic algorithm. Most of the existing data placement algorithms focus on complex performance evaluation [11] performance, and their results cannot be consistently applied to all tasks. Therefore, there are limitations in maintaining data locality [12][13], and block replication schemes are still needed. At the same time, existing file-based replication schemes will bring a lot of disk overhead. This paper proposes an improved HDFS data layout scheme, which improves the overall performance of Hadoop in a heterogeneous environment, can save the scale and energy consumption of the new rural cooperative system cluster, give full play to the maximum advantages of the cluster, increase the operating speed of the cluster, and use the least cost in exchange for the greatest benefit [14][15]. 2. Information system data security and credibility. The new rural cooperative funds involved, the original medical records and other data, with a strong rigor, is widely important basis for large farmers settlement reimbursement, the national policy of benefiting directly reflect, also were based on a deeper analysis of the data, therefore, be sure ensure that data cannot be modified, it may be the data on the storage layer using "block chains" technology, to ensure NCMS critical data cannot be tampered with trusted. For research in this area has a lot, "decentralized" system of distributed and parallel computing in many areas of research more and more the majority of scholars seriously. Current "block chain" study belong to a relatively relatively new field, many domestic and foreign researchers are studying the technology, the technology for the Internet will have a disruptive innovation, and its decentralized, cannot be tampered with, information tracking, etc. It can be widely used in product supply chain, securities trading, electronic banking, government affairs system, medical management and other fields. The past two years in academia and industry for this technology to be a higher concern, research and development blocks chain in many fields of application scenarios also made progress. For example, Xia Xinyue [16] and other scholars have used blockchainrelated technology to develop an asset system on equity, which can make the entire process of equity transactions tamper-proof; scholars such as Huang Yonggang [17] proposed the use of blockchain technology in residents' health records. application to solve the issue of credibility building electronic health records ; Cai Weide [18] and other scholars on the block chain of data consistency and expand malleable aspect conduct a thorough study , designed based on the architecture license chain, to block chain technology specific application system is developed to provide a direction; Xue off [19] and other scholars to study the technology based on block chain of secure sharing of medical data and cannot be tampered model to solve the various medical institutions of data security credible shared challenges; Wang Rise [20] and other researchers mainly to block chain technology in the medical field has been prospect , emphasis discussed block chain technique in the field of specific application direction; Nipei Kun [21] and other researchers given block chain technique in certain medical field aspects of the value in terms of research , we describe the block chain and the medical model applications with combined breakthrough point , how to build intelligent medical assistance platform , to build the open health care resources unified shared data center, and further analysis of the The difficulties encountered in the application of blockchain technology in the medical field . In summary, the district block chain technology mainly to solve the different fields of data credible problem, can achieve a deal across time and space, faster and more convenient.

Block chain technology not only can be used in economic and financial fields, who
Dui transaction authenticity cannot be tampered with sex, dating back, safe and reliable in all areas and so there is a demand of this technology can be applied.
Although "decentralized" distributed systems have gradually begun to be applied in other industries, so far, according to relevant literature search results, there are few reports on the overall "decentralized" distributed architecture and design of the new rural cooperative system. If we can learn from the successful experience of this type of application architecture design in e-commerce , summed Summary Area fragmented block chain technology applications in the medical field , it will be integrated into the new rural cooperative integrated information system, the system will be able to significantly reduce the expansion of soft and hard parts costs and enhance the credibility of the security system, but also for policy makers, health workers and peasants intelligent information push, enhance the quality of medical services in the region farmers. 4. The information system has high concurrency performance, system friendliness, and key information mining and active push. The front end adopts the methods of Web, App, WeChat Mini Program, WeChat Official Account, etc., to connect the New Rural Cooperative Medical System Information System through multiple channels. Improve the concurrent performance of the system by optimizing data storage, related Web service functions, and adopting load balancing strategies. At the same time, data mining technology is used to capture and push key information to facilitate the establishment of an intelligent platform. By improving traditional data mining algorithms, different data can be mined, and an intelligent information push platform for different objects can be established. Users can obtain relevant information through WeChat, SMS, email and other platforms. For details, please refer to the technical diagram Route 2: Focusing on data mining, the related research on information push is mostly in theoretical research, but there are few comprehensive applications in the new rural cooperative medical system. For example, Zhu Zhengwang [22] and other data mining-based analysis of the characteristics of Chinese patent medicines containing jujube mainly use Simple data mining software such as Clementine 12.0 adopts relevant mining methods to analyze the data of Chinese medicine prescriptions that meet the standards. Tanding Guo [23] and other studies in a hospital micro-channel platform within the influence of medication prescription information push to improve clinical rational drug use, mainly elaborated hospital micro-channel platform for learning information to push the medication, so as to effectively reduce the unreasonable prescriptions generated. The above research almost centered on a single system for data mining and information push. This paper builds multiple business systems on the Hadoop platform, and mainly studies cross-information platform data mining and information push, and multi-channel display issues.

3
Overall System Architecture To make the system have good scalability, credibility, security, intelligence, using the system in a distributed data storage technology to solve the scalability problem, the upper use area block chain technology to establish the credibility of the authentication, the application layer mined Related data is pushed with high concurrency. The specific system architecture is shown in Figure 3.

Fig. 3. System overall architecture diagram
The entire NCMS information system mainly adopts a three-tier overall structure. Application level to different objects, processing the corresponding business logic, combined with the corresponding aware devices, such as cell phones, smart wearable devices, the RFID, etc. to obtain the corresponding auxiliary data, and cautions, push or warning and the farmers closely related to health and summary report data; Network in addition to transmitting data layer, for critical data, for example, the hospital data, diagnostic data block chain common network node authentication, to ensure that data cannot be trusted tampering; in the data storage layer, then for different data using different data storage , such as caching data enters the Redis, it relates to the amount of data and the like into the Mysql , other unstructured and semistructured data into the Hbase, Mongodb other database integrated process . The new rural cooperative medical information management system construction is the national Ministry of Health, "Guidelines on construction of new rural cooperative medical information system" and a series of documents for the project based on the new rural cooperative medical care system should provide cost intelligence estimates, participation and allocation of funds seized , Reimbursement payment compensation, supervision of partial audits, decision analysis of medical authorities and related policy announcements, etc., can effectively improve the operational efficiency of the New Rural Cooperative Fund, effectively eliminate loopholes in fund supervision, and greatly improve the service quality and supervision of government departments. The construction of this information system is generally constructed by bidding in various provinces. The basic functions of its provincial management information system are data processing and exchange, statistical reports, accounting reports, business monitoring, fund supervision, referral management, analysis and evaluation, configuration maintenance, portal websites, etc.; The basic functions of county-level business system, including the participation management, compensation management, based fund management, accounting, query statistics, monitoring and analysis, configuration and maintenance of public services and other functions. To save storage system, overall system power consumption, personnel management costs, first may be of data storage layers for optimization.

Hdfs Bottom Layer Optimization and Related Processing Results
HDFS is mainly used to store and the processing computer cluster ultra large data (node number may be several to several thousand). When the New Rural Cooperative Information System processes related data, it generally divides a large file into multiple blocks (128MB by default), and distributes data among nodes. Then, HDFS replicates blocks according to the rack heartbeat awareness strategy to improve availability and fault tolerance. The rack-aware strategy generally guides the copy storage strategy. The second copy block is stored in the same rack to provide high network bandwidth or accessibility, and the third copy block is stored in other racks to provide high fault tolerance. As FIG. 4 depicts a general HDFS data blocks stored sample. In this project, the replication factor is set to Hadoop default value of 3, i.e., each data block have been replicated 3 times, and stored in a distributed manner 3 th node, wherein the data two backup will be stored in the same rack of On different nodes, the remaining one is stored in nodes in different racks, which can reduce the risk of failure of the entire cluster caused by a single rack power supply or hardware failure.

Fig. 4. Example of HDFS underlying file storage
In the existing system, the data is generally distributed equally, but there are two important problems. First, the storage capacity of the cluster will be affected by the minimum storage capacity node. The current solution is to expand the scale of the cluster, which can greatly increase the cost of system operation. Second, if the hardware configuration of a lower node, processing speed, and data distribution requirements uniformly in, the process must wait for the entire cluster of data processing nodes will finish ends, thus greatly pulled low overall cluster performance, Even if we expand the speed of the cluster nodes, there will still be the problem of "casual weakness". In the version after Hadoop 2.0, a new resource management framework, YARN, is designed, which can perform resource allocation and scheduling well. It consists of three parts: 1. ResourceManager, responsible for processing client requests, and starting and monitoring ApplicationMaster and NodeManager 2. ApplicationMaster, the application for the application -related resources, and assign its internal tasks, can also be resources of the allocation, task scheduling, monitoring and fault tolerance and the like 3. NodeManager, data resource management on a single node, mainly processing the commands of ResourceManger and ApplicationMaster Because the performance of a single node is jointly determined by the CPU, memory, etc., we cannot quantify it by the proportion formula. Here we have designed a simple estimation model, using the container (Container) as the dynamic resource allocation unit, and each container All encapsulate a certain amount of resources such as CPU, memory, disk, etc. In the specific experiment process, the CPU is allocated through the virtual machine Oracle VM VirtualBox.
The memory usage ratio and the performance of distinguishing nodes are compared by the number of container units (assuming that the storage and computing capabilities of each container unit are the same). The scheduling process is as shown in Figure 5 Data flow simple model and schematic diagram, we can see in the model Each container unit accepts a data block, DATANODE2 node has fewer container units, but three units are allocated, while DATANODE1 has more container units, but two units are allocated. We will DATANODE2 data to be processed went DATANODE1, improve execution speed of the cluster, the actual environment than the Figure 5 complex, may be concerned about the most cutting-edge and sophisticated scheduling algorithm and apply it to the system, this is clearly explained and the importance of principle, using a simple example of model, specifically using a node table and data flow computing algorithms to define the data flow process. In order to facilitate the design of the scheduling algorithm, set the CPU frequency, allocate memory, and network bandwidth. The number of nodes is N (calculation node N-1, without name nodes), and the maximum disk read and write speeds are Tcpu, Tmen, its Tnet is, TDisk average cluster CPU clock speed, memory available for allocation, network bandwidth, disk read and write speeds are maximum AVGcpu, AVGmen, AVGnet, AVGdisk, a single computing capabilities of node Cresource reference to equation (1), the average computing capacity of the cluster Cavgresource Refer to formula (2), the calculation of the number of container units K refer to formula (3).  and a simplified model, a node to build a distributed environment (1 th the Name NODE, 4 th the Data NODE, the same computer configuration, each quad-core Core i5-2650M the CPU, memory, 16GB, the SSD hard drive, Using 20%, 60%, 40%, 80% to allocate to virtual machines, build Hadoop under Ubuntu system) Refer to formulas (1), (2), (3), and convert them into containers as shown in Table1. The following HDFS node scheduling algorithm is used.  By Data-Processer generate samples of random data, and converted into words, and finally for a size of 10GB, 20GB, 30GB of Word COUNT number of word counts, no data flow technology Old, there optimization techniques for New, Comparative Experiment The results are shown in Figure 6. With MapReduce technology demonstration experiments, experimental results show tha: the cluster average increase processing speed by 22% or so, with Spark demonstration technology experiments, obtained clustering average increase rate of 30% or so. In summary, at the beginning of the architecture design of the new rural cooperative system, full consideration should be given to expansion and optimization issues, especially data storage and processing issues. Only by understanding international advanced optimization technologies and methods can the system be continuously improved and the underlying data can be solved. For storage and processing issues, the next step will be to optimize research and exploration of key data that cannot be tampered with.

Blockchain Technology Application
For information about the new rural cooperative medical insurance system, hospital costs, settlement ratio and other key data to ensure that its credibility cannot be tampered with is, we will carry out guarantee from a technical level, the establishment of multi-sectoral block the hinge point, the use of chain The combination of transmission and ordinary transmission communicates with the underlying data storage layer to construct a new rural cooperative information security management model, as shown in Figure 7 . Block chain technology mainly public have chains, as well as private chain alliance chain three common ways, public chain is characterized accessible to everyone, is the main representative of Bitcoin, Ethernet imitation and other applications, which significantly exceeds the scope of application of the new rural cooperative role the scope of application; private chain mainly in a single some enterprises of internal conducted using the operation 's authority is generally a single one organization or institution to master, NCMS is participatory, so this kind of program does not comply with the new rural cooperative medical system. The consortium chain is very suitable for multi-party participation and joint supervision. It adopts relevant consensus algorithms, and both joining and exiting the chain need to be recognized by relevant organizations. It has the characteristics of decentralization. Its representative is Hyperledger. Therefore, the specific selection of the new rural cooperative medical system can use the alliance chain, and the Hyperledger technology can be used to establish the alliance chain of the new rural cooperative system. It has high credibility and obvious advantages in privacy information protection, which can enable the health supervision department to have the supervision power, It can also open the corresponding level of authority to the medical and health departments and the majority of farmers, which meets the requirements of the new rural cooperative system. For the specific construction process, please refer to the document Xu Jiping [24] and other blockchain -based prototype system for information security management of grain, oil and food supply chain The experimental process contained in it .

Multi -Channel Front-End Interface and Advanced Service Architecture Application
Prime Minister Li Keqiang in 2020 on 5 Yue 28 at a press conference mentioned 3 digits brush screen of the network, that is, per capita annual income is 3 million yuan, but there are 6 million people monthly income of only 1000 Yuan. And that 6 billion people mostly farmers, the country is bound to give them in the future Medicare preferential policies , our scientific and technical personnel at the same time respond to national call, and should be fully in the design of the system taking into account the new rural cooperative multi-channel display system way, friendly information push both can push data to regulators targeted, can do for the health sector a disease early warning data, but also for the majority of farmers to push health-related knowledge, the ultimate in disease prevention, state funds and other savings In terms of technical optimization. Specifically reflected in the following three Dian: 1. Different information platforms make data interface, conduct data exchange, the use of relevant intelligence data mining algorithms to discover valuable information. Through corresponding data collection standards, establish corresponding interfaces, obtain or crawl corresponding information through related technologies, intelligently set confidence levels through optimized data mining algorithms such as Apriori, and push valuable data for each role through obtaining information relevance. Provide corresponding data support for decisionmaking 2. The front-end technology should be diversified, supporting client / server mode, Web mode, smart phone application mode, based on WeChat, enterprise WeChat platform, TV terminal and other modes. Full consideration should be given to the diversity of user information acquisition platforms. Good end-user data collection and monitoring data collection, and increase the push of relevant data. For example, diabetic patients blood glucose, blood pressure monitoring data can be uploaded, the number of steps automatically collects energy consumption conversion, exercise reminders, while focusing on privacy information protection, epidemics, high incidence of reminders, for example, to remind the new crown epidemics and other risks in disease prevention Play a certain role in this aspect, and pay attention to people with excessive expenditure on diseases, intelligently optimize the reimbursement ratio, and establish a corresponding key assistance database 3. Do a good job of load balancing [25] in front-end web servers to improve system concurrency performance. Some scholars have done corresponding research. For example, Zhang Jiandong [26] and other improved AHP algorithm load balancing in Web cluster system the application mentioned in AHP algorithm, can establish multi-channel access level of the front-end server cluster, can improve the web server's concurrent performance and access speed, but there are still clustered resource waste problem, efficiency is not high, and the performance of the different server itself differences in their utilization, and even lead to the collapse of Ben cluster front-end services, we can learn from the advanced electronic business platform architecture of some successful experiences, such as the more popular Docker + K8s micro-service manner, in order to structure the new rural cooperative front-end services Cluster. Docker is an open-source container engine, can the new rural cooperative business logic of different procedures, packages, dependent libraries packaged into a container, the container can simultaneously between mutual isolation, also do some resource access control, to avoid affecting the entire individual business functions Cluster performance can also push container resources to remote warehouses. K8s (Kubernetes) is a container orchestration system that can automatically roll back and upgrade, service discovery, and can automatically expand, password configuration management, storage architecture, and self-update checking. Small development costs of the program, post-upgrade low maintenance difficulty, systems integration and good facilities take full advantage of the underlying hardware layer, with very good elasticity stretch in resource scheduling has an edge distribution, there is a strong concurrent and disaster recovery capabilities.

Summary
The New Rural Cooperative Information System is a comprehensive medical business system involving ordinary people. The data growth rate is fast, the business process is complicated, and the amount of funds is large. This article mainly optimizes the underlying data storage architecture, trusted network, multi-channel concurrent access, and front-end service architecture. Some studies have been done in several aspects, which can provide ideas for the further construction and optimization of the new rural cooperative information system, and can also be used as a technical reference for the construction of similar information systems. At the same time, the amount of data involved in this article is generated by the simulator. The smaller value is only demonstrated in the experimental environment. The next step will continue to focus and optimize the HDFS file block scheduling algorithm and its application, the development of blockchain technology and the front-end server cluster architecture technology updates , the focus of attention and study distributed the underlying data storage and credible method field, in order to advance in the laboratory and real-world environments step demonstration, integration and optimization of relevant engineering-level technology and applied to the new rural cooperative medical and other systems.
Thanks to Wang Ling, researcher, New Rural Policy Consultation, and Yin Min, director, department of Liver and Intestine, Tongzhou District Hospital of Traditional Chinese medicine, for providing medical knowledge and data support. And Nantong Tongzhou District Health and Family Planning Commission Related Comrades to provide policy support, Nanjing Normal University big data lab Sun Guofu and other teacher's full support.