Black Box Metadata Creation : The Academic Problem

Adaptive learning systems attempt to adapt learning content to suit the needs of the learners using the system. Most adaptive techniques, however, are constrained by the pedagogical preference of the author of the system and are always constrained to the system they were developed for and the domain content. Understanding the environmental constructs of a learning environment is critical to be able to consider adapting content to individual learners. A sample personal profile is described that can be used to automatically generate instructional content to suit the pedagogical preference and cognitive ability of a learner in real time, in an online learning environment. This paper introduces a Content Analyser (CA) that is used to automatically generate metadata to encapsulate cognitive resources within instructional content. The analyser is designed to bridge the perceived gap found within instructional repositories between inconsistent metadata created for instructional content and multiple metadata standards being used. All instructional content that is analysed is repackaged as Sharable Content Object Reference Model (SCORM) conforming content.


INTRODUCTION
Currently there are roughly seventy million people in higher education worldwide.This number is expected to more than double before the year 2025 to over 160 million people [1].One possible solution to cater for the expected influx of people entering into higher education is to automate the process of learning using online learning environments.Learning Management Systems (LMS) like Moodle, Sakai, Blackboard, and Desire2Learn act as a framework for educational providers to organize and deliver their instructional content in a standard way.They also offer some blended learning facilities to promote a constructivist approach to learning, for example using discussion forums…etc.No content adaptation is taken into consideration, consequently these platforms only act to transfer the educational sector into an online environment including an easy to use interface to enable the management of educational material.

A. Technologies using adaptive strategies
There are many other types of learning technologies, for example Adaptive Hypermedia Systems (AHS) [2] and Intelligent Tutoring Tools (ITT) [3] [4] [5] [6] that are focused on developing the learning potential of a learner.In particular, AHS are designed to adapt to the needs of the learner with respect to their domain experience, while recent ITT helps to develop cognitive skills of a learner [3].Although these learning technologies have their strengths and weaknesses, they are constrained by the pedagogic preference of the author of the learning technology and are all designed for integration into a custom build system.
This paper investigates the foundation of the Advanced Distributed Learning (ADL) initiative and their production of a standardized reference model to reference instructional material as learning objects.To bridge our perceived gap between traditional adaptive learning technologies and SCORM, an explicit consideration is taken to explore the different environmental contexts of a learning experience [7].These include the type of learning objects, the level the knowledge is being taught at and the various methods of delivering the content to the users.

II. MAPPING COGNITIVE RESOURCES TO EDUCATIONAL CONTENT
In order to develop individual content a suitable personal profile associated with the environmental contexts of an online environment is required.The profile should include the cognitive ability of the learner to ensure that adaptation can occur across multiple domains and not be constrained by domain adaptation typically found in AHS.The Cattel-Horn-Carroll definitions project is involved with the classification of a taxonomy of human cognitive abilities, in terms of broad and narrow categories, these are: • Auditory Processing Additional reductions can be applied to the list of categories: the personal profile should be independent of domain, the effects of robotic voices on online learning environments is unknown, however it can be assumed that there would not exist enough robotic voices to suit each individual learner, consequently placing some learners at a disadvantage using the learning component and Fluid reasoning can also be eliminated as it is associated with mental operations to solve problems and would be deemed more suitable to specific domains or gaming applications.The reduced set of categories is defined as the following: • Long-term Storage and Retrieval Considering that the vast majority of learning objects currently available do not have appropriate associated metadata, the classification process for the identified abilities must be identifiable using an automated process in an online environment.The VARK element represents the visual-spatial category, as the learning environment conducts learning experiences within an online learning environment the VARK learning style is restricted to suit the visual constructs of the learning unit.The Long-term Storage and Retrieval category / Long-term memory is removed as the learning component will initially generate content that is independent of educational history.This category would have great benefit when considering the associative learning skill of the learner, however as there does not exists enough learning experiences from each student the associative learning skill cannot be used.The reading / writing ability category is defined by the readability level and the information processing speed of a learner.These elements along with the working memory of the learner identify the constructs for determining a chunk of memory when interacting in an online learning environment.In particular the readability level of instructional content is used as a minor indicator for the suitability of instructional content for a given learner.

III. CONTENT ANALYSER
There exist many instructional content repositories, for example, Multimedia Educational Resource for Learning and Online Teaching (MERLOT), Jorum and the National Digital Learning Repository (NDLR).These repositories contain various types of instructional content including text files, word documents, PDF documents, presentations, complete SCORM packages, SCOs etc… .Metadata can be defined as data describing other data and is typically produced external to the creation of instructional content in a black-box fashion.This method of metadata generation is insufficient, as no guarantee exists between the actual content and the metadata describing the content.It was found by Norm Freisen [8] that only 57% of content authors complete keywords within Learning Object Metadata (LOM) files associated with SCORM content, consequently this results in a large amount of learning objects with insufficient metadata, for search and discovery.In general, the goal of creating suitable metadata is to allow a process to identify instructional content for reuse.There are two essential categories of metadata associated with a learning object that should be included with learning objects for reuse and the identification of suitable learning experiences, firstly, to be easily recognisable as the instructional content in terms of domain specific searches (domain relevance), and secondly the metadata should reflect cognitive stimulus required for interacting with the learning object in an optimal learning experience.Without metadata reflecting the internal design of the instructional content it would be impossible to develop an automated process for content adaptation.Neither of these conditions are common practice, thus resulting in inconsistencies within learning object repositories and insufficient consistent metadata for search and discovery.
The Content Analyser (CA) is focused on bridging the perceived gap between repositories, standards and inconsistency of learning objects focusing on the second category of metadata as described above to enhance the learning experience by the identification of cognitive resources within instructional content.The CA was designed to automatically generate metadata for some instructional content that stimulates the cognitive traits and pedagogic preference of each learner (as discussed in the Introduction section), thus addressing the second condition stated above.

A. Inside the Content Analyser
The CA was designed to automatically generate metadata for some instructional content that stimulates the cognitive traits and pedagogic preference of each learner.The CA takes as input some instructional content (.txt files, .docfiles, .htmlfiles or .zipfiles), decouples the content and generates Sharable Content Objects (SCOs) with added metadata to describe the type of information, the amount of information, the size of the instructional space, the readability level of the content and the VARK representation of the instructional material.The outputted data is encapsulated as XML metadata files.These metrics enable searching strategies to find content based on both domain relevance and cognitive stimulus.

B. Content outputted by the Content Analyser
The CA automatically produces metadata to describe the cognitive metrics found within instructional content suited to the personal profile.In addition to identifying these metrics the CA identifies the author of the instructional content and keeps track of this information.Metadata 1 gives an example of a metadata file that was generated by the Content Analyser (CA) and in particular shows the author contact information.
iJET -Volume 9, Issue 5, 2014 The IPS indicator is used as an estimation of the working memory of an individual.The cognitive metrics found within instructional content that stimulate a learners personal profile are: the amount of content, the readability of the instructional material and the VARK representation of the content.These metrics are described below: • Amount: the amount is an indicator of the volume of words found within the instructional content.o This metric is used to calculate an approximation towards the WMC of a learner.Multiple file formats are catered for using the Java Open Document (JOD) libraries to interface with Open Office.o The working memory of an individual has been extensively researched.Three models of working memory that have emerged from this area are: Baddelys model, Cowans Model and the theory of Ericsson and Kintsch [10].Unfortunately all three models have their differences and different interpretations of a capacity associated with the WMC of a learner.The concept of a chunk of information is discussed without referring to a specific definition of a chunk, especially in a general term.Within online learning the problem is further increased as the exercise is not to remember several digits but is related to text comprehension, which requires all of the following to take place: perceptual features, linguistic features, propositional structure, macrostructure, situation model, control structure, goals, lexical knowledge, frames, general knowledge and episodic memory for prior text [9].All of these components taken separately would exceed any limitation of working memory, however Kintsch et al [10] believes that every reader is able to form episodic text structures during text comprehension.Furthermore, if a single sentence is considered, constructed using suitable visual stimulus (suited to a learners pedagogic preference) and containing a level of readability approximating the learners readability level this establishes the foundation of understanding a chunk within an online learning environment.Additionally if the granularity of the learning content is described as previously stated at the concept level, this will further enhance the working memory of the learner as a single concept should contain information relating to the concept and not contain too many external interruptions diverging from the overall meaning of the instructional content.• FleschReadingEase: is used as an indicator of the readability level of the learner.All readability formulas are limited, especially when applied to specific learners and settings.The readability level is used as a metric for the adaption process to enhance the WMC metric.
• VARK: This method takes as input an absolute file name and returns a double value indicating the percentage of the screen that is composed of visual elements.These visual elements are identifiers for the visual resources as described by Neil Flemming describing the VARK learning preference.
o The image / objects are defined by: "IMG or img","AREA or area", "map or MAP", "object or OBJECT", "param or PARAM".o The value of the VARK representation is calculated as follows: where, and, o totalVisual = the total number of visual constructs as defined above o words = total number of words found within the instructional content as defined above o pixel = total screen covered by the image or object constructs as defined above

C. The Importance of Structure
Laurillard discussed the problems associated with decoupling instructional material and modifying the possible meaning of instructional content [11], as discussed in Chapter three.However when the granularity of the learning material is at a conceptual level and there exists enough learning resources, it should be possible to insert or remove images (with associated textual information) without destroying the overall meaning of the instructional content.Ensuring that no meaning is lost in the addition or removal of an image, all associated references and text associated with the image must also but added or removed.The metadata in Metadata 3 allows an automated process to automatically insert or remove images and provides all the metadata required to update the cognitive metrics found within the instructional content.It can be clearly seen in Metadata 3 that an image has an associated name, dimensions, word count and visual tokens.These metrics are used to calculate the impact that the image will have on the evaluation of instructional content against the personal profile of a given learner.

IV. CONCLUSION
In conclusion, the Content Analyser was designed and constructed to bridge the perceived gap between the inconsistencies found with instructional content within content repositories and the lack of consistency found with metadata creation.Consequently, this creates an environment whereby traditional Adaptive Hypermedia Systems (AHS) cannot be used in the real world as their closed loop approach is too restrictive, however if a closed loop approach was not used AHS would still not be ready for wide spread adoption as the information available is inconsistent (multiple referencing standards, etc…) with insufficient metadata.The personal profile that was described above included the cognitive traits and pedagogical preference of a learner, which had associated cognitive metrics within instructional content designed for an online learning environment.Once the metadata is created the content is repackaged as SCORM compliant content.
Metadata 3: Metadata produced by the Content Analyser to enable the insertion and deletion of images.