Learner

Paper— Experiential Learning in Bioinformatics –


Introduction
Bioinformatics is an emerging field in STEM education. It combines both biological data and computer science to help research in biology and medicine. The subject domain often has steep and long learning curves due to its wider scope and the integration of two core scientific and technical fields: biosciences and computing; a student should develop strong knowledge in both areas to be competent in bioinformatics. Even though the acquisition of one part of the knowledge can be straightforward, the acquisition of aggregate knowledge is challenging. One of such combined concepts in bioinformatics curricula is complex workflow modelling and analysis: a student should be thorough on biological datasets, particularly sequencing and analysis, while knowing the fundamental concepts of computer science such as data structures and algorithms, scheduling and resource management, and data analytics. These are advanced concepts in respective fields hence at the introductory level biological courses face many challenges with respect to teaching and learner support. Hence, proper educational tools and guidelines are necessary to effectively teach the subject to students. Furthermore, having a solid set of tools to experiment, visualize and simulate the subject content helps students to learn efficiently and effectively by developing their confidence.
In the present context, it is challenging to find organized courses and tools that integrate together to provide bioinformatics related learning facilities. While there are some tools and learning content resources exist, it is difficult for one to learn on their own by referring these materials alone due to their weak correlation: existing tools presume prior knowledge on bioinformatics whereas course content and learning materials lack support for practical experience. According to [2] bioinformaticians often analyse big data sets that require knowledge of formatting and parsing datasets by scripting computer programs integrating available software tools. "They have to apply software tools that do not have graphical user interfaces, navigate the use of high-power computer clusters, and often have to have at least basic system administration knowledge".
In a 2017 survey of more than 700 biologists funded by the American National Science Foundation, more than 90% of respondents stated that they were, or soon would be, working with large data sets that required high-performance computing. These same researchers listed training in data analysis tools and bioinformatics as the most urgent and unmet need they had to address to successfully complete their research projects [3].
This research intends to bridge the gap between the available tools and the required course content by carefully designing and introducing a new tool named BioWorkflow, with significantly enhanced user interaction through rich user interface, to enhance the learning experience of bioinformatics students. Section 2 presents a summary of the conducted literature review and section 3 describes the experimental methodology. Section 4 elaborates on the conducted evaluation, followed by results and discussion which ends the paper with concluding remarks in sections 5 and 6 respectively.

Literature Review
The field of technology enhanced learning (TEL) is continuously expanding its tools and techniques facilitating a range of course modules and study domains in education sector. The primary mode of TEL incorporation in to education system is by using computer-based tools and services to make the learning process usable, efficient and interactive. Many researches have been conducted in the field of education using models of learning and in the field of computer science using computer aided tools.
According to Oettingen et al. on educational psychology [4], the authors present the challenges faced by students in the process of attaining the learning outcomes intended by the course outline. One of the most important steps in mastering this challenge has been identified as the motivation of student. Motivation must be attained by heightening both the incentive value of academic achievement and the relevant expectations. It must also be ensured that the tasks are not challenging in a manner that students might pullback themselves from the desired path. The concept of constructive alignment given by Biggs [5] can be an effective avenue to streamline the intended learning outcomes (ILOs) with teaching and learning activities (TLAs) and assessment tasks (TAs). Therefore, students can easily understand the objectives, forms of participation and expected achievements that they must engage in during the module learning; this can motivate them for learning. In such context appropriate use of tool support can be an essential norm for higher levels of successful student learning.
Magana et al. [6] presents a survey conducted using undergraduate and graduate students in the fields of bioinformatics education and educational research in bioinformatics. The research addressed challenges faced during the integration of both computer science and biology in the early curriculum to support teaching bioinformatics. Their work has mainly focused on areas such as DNA sequencing, pairwise alignments using BLAST [7], Microarray technology and the use of PubMed and NCBI [8] resources. Furthermore, the survey has covered areas of genomics, proteomics, and structural biology. They have concluded that the use of appropriate tools, and other resources made significant improvements among the students for grasping complex concepts in the course domain. Their findings prove that the incorporation of resources such as tools has a promising improvement of the outcome of the bioinformatics education.
OpenHelix by J. M. Williams et al. [9] is a service to assist a range of people who are interested in learning bioinformatics as a subject. The presented work focuses on informal and less formal means of education such as public articles and other scholarly articles, rather than following a strict curriculum. The work focuses on education techniques based on biological data with the advancements in DNA sequencing. They propose four mandatory factors to enhance such education via less formal resources; raising awareness of resources, evaluation of resource functionality, lowering barrier between awareness and utilization of resources. In their view one of the main types of resources is tools to support bioinformatics learning. Therefore, a certain extent of formal education would be more effective along with tool support for better visualization of DNA related computations and simulations, as presented in this paper.
Saravanan et al. [10] presents the use of E-Learning as a new tool to teach bioinformatics. The research introduces four phases of E-Learning to fit the domain of bioinformatics: i.e., content development, instructional phase, multimedia and web design phase and testing and execution phase. The flow is defined in a manner that the courses have a design phase with course content, preparation of quizzes and test methods, incorporation of multimedia in to the learning environment and finally a testing phase. The research incorporates videos, websites and other digital media in order to enhance the learning experience. However, less focus has been paid towards the use of simulation software to practice bioinformatics related computations and visualization.
Brame [11] presents the Cognitive Theory of Multimedia Learning which explains the learning pattern of students and the understanding of content by passing through three stages, namely sensory memory, working memory and the long-term memory. Furthermore, the research presents three kinds of loads a student may face; intrinsic load due to the complexity of the subject, germane load due to required cognitive capacity and the extraneous load which is the part of cognitive capacity that does not help in achieving the desired learning outcome. The research concludes that, among other factors, the use of visual support in the form of tools is important. It justifies the importance of our work by providing a tool with visualizations to help students with intrinsic, germane and extraneous loads that present in bioinformatics subject matter.
Galaxy [12] and Taverna [13] are two popular tools which provide an environment for students to conduct experiments in bioinformatics. These tools enable combining several bioinformatics related computations using a user interface and execute them to obtain results. Both the tools are available off-the-shelf as desktop and web-based solutions, but the usability of the tools is challenging to conduct learning sessions. Furthermore, the online tool is not available free of charge and the desktop tool cannot be installed easily; the process is cumbersome for students who have limited knowledge in computer science. Biopipe [14] is another web-based tool which is available as a standalone Java application. It allows users to create and edit workflows with userfriendly web interfaces and automates workflow synthesis. The tool lacks comprehensive support for learners and is focused more on subject experts who have deep knowledge on how algorithms work. Unipro UGENE [15] is a similar desktop tool which enables DNA viewing; it does not support designing comprehensive workflows.
Tomandl et al. [16] demonstrates the pros of using simulated interactive research experiments to enhance the learning procedure. They emphasize the importance of using tools to perform laboratory experiments. They present that the results were significantly improved after students were exposed to tools. Students were able to conduct experiments related to the subject matter and they scored better. Furthermore, Ortega et al. [17] elaborates on a Multidimensional Educational Tool for developing mental models. The research was primarily conducted based on the growth of cancer cells. Students were given with mind maps to explore the cancer growth in step by step.
Practical bioinformatics skills are becoming increasingly essential for biologists; However, incorporation of such skills into the curriculum is still not fully practiced, in part due to lack of learning resources and tool support to facilitate students and instructors. Madlung [2] has successfully conducted an undergraduate level bioinformatics course particularly designed with sufficient resources and tools to support both instructors and students; the module was developed expressing ILOs, TLAs and ATs in natural language while using publicly available computing tools and data to teach and learn next-generation sequence analysis. Along with their course various tools were developed and employed to support effective student learning in applied bioinformatics.
Numerous research studies have been carried out on TEL and tool support for learning. Most of the successful learning tools incorporate Kolb's experiential learning framework [18]. This includes the four cyclic steps; Abstract Conceptualization, Active Experimentation, Concrete Experience and Reflective Observation. Furthermore, educational tools can be supported with the SOLO (Structure of Observed Learning Outcome) taxonomy [5] by Biggs and Collis. The learning process from the student perspective can be studied using the ZPD (Zone of Proximal Development) [19] by Vygotsky. In this research we identified Kolb's model to be more appropriate to use for our tool design since it covers both theoretical and practical aspects of the related learning activities. We also noted, however, it is challenging to find research conducted to improve bioinformatics learning using software tools. Even if few tools are available as presented above they are not carefully designed considering suitable learning model. This is a very important area of improvement needed in bioinformatics teaching and learner support. Therefore, our research focus of provisioning a learning tool aligned with appropriate learning model to facilitate bioinformatics students to experiment, visualize and understand relevant concepts can be justified.

Application of Experiential Learning to enhance learner experience
Fig. 1 demonstrates Kolb's experiential learning cycle which was used as the basis for the development of the course outline. The BioWorkflow tool was used specifically to improve the aspects of active experimentation and concrete experience. Therefore, the prepared tutorial provided information for the students, guiding them on workflow modelling, emphasizing on the practical aspects of bioinformatics processes and linking of each bioinformatics process to make a useful outcome. The course was designed incorporating the following subject content. • Introduction to pairwise sequence alignment • How to perform pairwise sequence alignment • Tools which utilize pairwise sequence alignment algorithms ─ FAST ─ BLAST This is one of the most important topics in the course setting as it directly maps with the main learning outcomes of the course; i.e., the application of algorithms to derive the similarity of different genes. Students were provided with the workflow tool to match sample sequences and understand how to represent the matches visually or textually. For the evaluation (in Section 3.3), the group of students who did not have access to the tool were free to follow any online resource available for public access. The area intends to improve on the student capability of active experimentation in Kolb's cycle.
The above content (partly shown in Fig. 2) was rendered to the target student groups using traditional learning facilities and using the tool. To provide a similar learning experience on the common course material, both the groups were provided with same content through a web page. Multiple sequence matching section focuses on matching several DNA sequences with each other and the theoretical background of the study area. The following sections were provided under the course material.
• Introduction to Multiple Sequence Alignment (MSA) • How to perform multiple sequence alignment • MSA tools ─ Clustal Omega ─ T-Coffee ─ DIALIGN Section 4 -Bioinformatics Workflows. The last section of the course module introduces bioinformatics workflows and how they are visualized. The following sections were introduced to the course material: • What is a Workflow?
• What are Bioinformatics Workflows?
• Illustration of a sample workflow The concept of the workflow modelling requires students to possess a significant experience of various operations on gene sequences. Therefore, the section intends to improve student capability to go beyond the course content using their cognitive abilities to derive solutions in practical problems such as, identifying matching species for a certain gene sequence and then derive similarity among such species by a multiple sequence alignment. This relates with the concrete experience phase in Kolb's model.

BioWorkflow Tool
BioWorkflow is a workflow design tool where users can design and execute various bioinformatics workflows and visualize the outputs. The tool integrates various services to manipulate gene data in the form of components. The user can add components to the canvas, change the component parameters, connect them accordingly and visualize the final result. Furthermore, students can use the tool as means of having practical exposure on various bioinformatics tasks such as sequence alignment and sequence matching. This capability is provided in the tool to improve student ability to conceptualize based on the theoretical knowledge and experiment on their own. Fig. 3 shows a student created workflow instance and its simulated result visualization.

Experiment Design
The experiment was conducted by preparing a course to enhance the learning process by considering Kolb's experiential learning framework [18] cycle as the basis. The course content was rendered to the student by means of an online tutorial. Two groups of students, having 40 students in each, were used for the experiment.
• Group 1: Students were given only the online tutorial and were allowed to use other resources in the Internet as well. • Group 2: Students were provided with the online tutorial, allowed to use other resources in the Internet, and the BioWorkflow tool.
Half an hour was given as the study time to both the student groups to go through the content provided. Afterwards, the students of both groups were asked to answer a quiz based on what they learned. They were given half an hour to answer the quiz. Finally, separate feedback forms were given to the two student groups to get their opinion regarding the material they used and suggestions for improvement.

Quiz
The quiz provided for both the student groups contained the same set of questions which tested each of the sections in the course. The quiz consisted of five multiple choice questions and five structured type questions. The multiple-choice questions were based on the introductory section and basics of molecular genetics. The structured questions were based on sequence alignment and workflow modelling, where the students had to calculate alignment scores, draw sequence alignments and draw workflows to perform a given processes. Fig. 4 shows an example structured question which was given in the quiz, where the student had to model a workflow.   There were 40 students for each session, one group to use only tutorial and the other to use the tool in order to study the designed course content. Fig. 6 demonstrates students trying out the BioWorkflow tool for workflow simulations and obtain visualizations.

Measures Obtained
Different measures were obtained to evaluate the efficacy of the learning experience. Scores obtained for the quiz. The quizzes of the students from the two groups were graded out of 100. Questions were assigned with marks considering their relative complexity and time required to answer. Higher weights were given to structured questions as they required the students to do calculations and model workflows.
Feedback. Two separate feedback forms with five questions were given to the students of the two groups to obtain feedback regarding the learning experience they had. Three questions were designed with a 1 to 5 points Likert scale (1: strongly disagree to 5: strongly agree). These three questions were focused on the learnability of the material and ability to visualize the theories included. The last two questions were open ended questions which were based on the overall experience, suggestions and improvements.

5
Results and Analysis Fig. 7 shows the percentage of students who were actively using the provided learning material during the experiment period. It is clearly evident that the students did not find the tutorial alone to be useful for answering practical questions; the tutorial support was not helpful towards the latter part of the experiment due to the lack of support of the tutorial to fulfil the experiential learning presented in the Kolb's model. However, the students with access to the workflow tool were continuously engaged in the tutorial and workflow tool; they used the full-time period to actively use the tool to exercise the experimentation with different bioinformatics workflows.

Quantitative Analysis
The quizzes of students from the two groups were graded and the final scores were used to calculate relevant statistical measures. These are compared as given in Table 1.
The results indicate that the students who used the tool have scored significantly better when compared to the students who have followed only the course material. Furthermore, the scores are more stable among the students who used the tool, which is indicated by the lower standard deviation, than those who did not use the tool.  Fig. 8 demonstrates the distribution of scores of the two student groups. Marks of the two groups are distributed normally, as shown in the graph. It was observed that 42.5% of the students of the group that did not use BioWorkflow could not score more than 50% of the marks for the assessment. These students could not successfully complete the questions with the application aspects where they were required to align sample sequences and develop small flowcharts to represent a workflow that would appear in a real-world scenario. Marks obtained by the other group, i.e., the students who used BioWorkflow tool, indicated a negatively skewed distribution, representing higher marks. In this sample, 87.5% of the students have scored 70% or above for the quiz. The improved outcome of the students was clearly due to better scores obtained in more practical oriented questions. All the students were at the similar level of Bioinformatics knowledge at the beginning of the experiment, however the experiment results suggest that students who used the tool performed better. This implies that the practical exposure received from BioWorkflow tool has contributed for students having concrete experience and provides the opportunity to experiment the acquired knowledge through reflective observation.   Fig. 9 demonstrates the comparison of marks obtained by the two student groups for the questions. The question Q1 was used to gather the opinion of students regarding the held session. The questions Q2 to Q6 are based on the theoretical knowledge the students have learnt whereas questions Q7 to Q10 examined the advanced skills that a student can apply their knowledge into a practical problem to solve it. If we carefully observe, we can see that for the theoretical questions (Q2-Q6) both groups have performed equally well; it suggests that the tutorial resources helped in for both groups for the step abstract conceptualization in Kolb's cycle. However, when it comes to the questions Q7-Q10, which are based on practical applications, we can clearly see that there is a significant gap between the two groups; the group that used the tool has outperformed the control group completely in all questions (as illustrated in Fig. 9); in fact, this set of questions is based on the remaining three steps, active experimentation, concrete observation, and reflective observation of Kolb's cycle. This clearly indicates that the practical exposure required by students was provided through the tool and the support given for student learning is complementary in nature.

Qualitative Analysis
Group 1 Feedback. The student feedback received from student group 1, who used only the online tutorial, suggested that 52.5% of the students found it hard to follow the online tutorial alone, 32.5% of the students found it easy to learn by using the online tutorial and 15% had neutral opinions. 47.5% found it difficult to visualize the relevant theories without tool support whereas 22.5% were comfortable at visualizing the concepts without support. According to the feedback, in general, an overwhelming positive opinion towards the importance of tool support for learning was observed; these students found it hard to visualize and learn the relevant theories without such support. Significantly, 92.5% of the students agreed that it would have been better if a visualization tool was provided to learn bioinformatics.
Following are some of the open-ended feedback given by these students from group 1 regarding what additional material could have been added to improve the material. When analysing their open-ended feedback, it was evident that majority of the students stated that it would have been better if an interactive experience was provided along with the ability to execute and visualize outputs of the sequence alignment techniques and workflows in bioinformatics.
Group 2 Feedback. The feedback of group 2 participants, i.e., who had access to BioWorkflow tool, indicated strongly supportive views towards our tool. 87.5% of the students found BioWorkflow tool easy to learn and usable whereas only 12.5% had neutral opinions. No one had negative views towards the tool or its value for learning. 82.5% agreed that BioWorkflow tool was helpful to visualize and learn the relevant theories in bioinformatics. Furthermore, 87.5% of the students mentioned that they would probably use a tool such as BioWorkflow to learn bioinformatics in the future.
Following are some of the open-ended feedback given by students from group 2 regarding their experience on using BioWorkflow tool. After analysing the open-ended feedback, it was clear that majority of the students were able to visualize what happens within the theories for better understanding. Furthermore, the idea to implement an interactive learning platform such as BioWorkflow was appreciated by the students and they had stated that such interactive learning tools should be used in other subjects as well.

Conclusion
The research was aimed at exploring the need for tools that support learning in bioinformatics as an emerging domain in STEM education. Kolb's cycle of experiential learning was adopted as the basis for the experiment where we intended to improve the active experimentation and concrete experience of students. The evaluation demonstrated significant gains in outcome of the students who utilized the workflow tool during their learning. However, the students who were not given access to the tool performed poorly in the quiz. The control group without the tool showed low practical knowledge due to the lack of support for concrete experimentation scenarios such as bioinformatics workflow structures. Furthermore, the set of students without the tool had challenges connecting steps of a bioinformatics process which clearly demonstrates the lack of experience in the traditional learning techniques. Throughout the experiment it has been clear that, for a subject that demands high intellectual capacity and inter disciplinary knowledge, opportunities for practical exposure is essential. The use of a proper tool such as the BioWorkflow, can significantly improve the student ability to experiment and experience the practical aspects of STEM subjects such as bioinformatics, enabling proper application of acquired knowledge in a practical scenario.
The experiment was conducted by narrowing the subject scope to introduction to bioinformatics and sequence alignment. The session was conducted limiting to a period of one hour, where significant results would have obtained if the tool was introduced to the entire course module. Moreover, the prepared quiz and course material covered the content that was possible to attempt by both the groups and no subject material was tested where it would have only taught by using a software tool to the curriculum.
While BioWorkflow tool presented in this paper is complete in its implementation, one possible future work can be to integrate it with existing learning platforms such as Moodle; such integration can seamlessly facilitate the teachers to streamline the tool usage as part of their learning activities and relate with student gradebooks for summative assessments. Furthermore, the tool intends to further integrate lab experimenting functionality so that students can actively conduct lab experiments with chemicals and gene samples. Eventually, the tool will provide a complete platform to conduct preliminary bioinformatics teaching and more advanced gene experimentation and analysis.