Declarative Memory-based Structure for the Representation of Text Data
In the era of intelligent computing, computational progress in text processing is an essential consideration. Many systems have been developed to process text over different languages. Though, there is considerable development, they still lack in understanding of the text, i.e., instead of keeping text as knowledge, many treat text as a data. In this work we introduce a text representation scheme which is influenced by human memory infrastructure. Since texts are declarative in nature, a structural organization would foster efficient computation over text. We exploit long term episodic memory to keep text information observed over time. This not only keep fragments of text in an organized fashion but also reduces redundancy and stores the temporal relation among them. Wordnet has been used to imitate semantic memory, which works at word level to facilitate the understanding about individual words within text. Experimental results of various operation performed over episodic memory and growth of knowledge infrastructure over time is reported.
keywords:Cognitive Psychology, Knowledge Representation, Episodic Memory, Text Processing
Msc: 00-01, 99-00
Knowledge could be identified as a ingredient for the computational process of thinking, irrespective of whether the computation is rational or irrational. A rational computation yield an intelligent behaviour by making best use of knowledge whereas an unintelligent behaviour could be well understood as;
When we say that someone has behaved unintelligently, like when someone has used a lit match to see if there is any gas in a car’s gas tank, what we usually mean is not that there is something that person did not know, but rather that person has failed to use what he or she did know.
— Brachman and Levesquebrachman1992knowledge
Vast knowledge is available to us since the advent of life. But, all this knowledge is mere useless until it can be utilized so as to be able to determine its implication and draw relevant inference from itmccarthy2008well. Evidenceshodges1999and; pushp2018cognitively suggest that in order to do so, human brain structure the information. A multi store model of human memory systematkinson1968human was proposed, way back in 1968. The model claims a memory system specifies the underlying infrastructure for rational behaviour for animals. Human intelligence immensely rely on such infrastructure. Among all such type of memory structure Long Term Memory(LTM) is of special interest as it can store information for unlimited period of time. Further Declarative and non declarative are two logical separation of LTM, where first can store knowledge that someone can tell others and later contains knowledge that someone can show by doinganderson2004integrated; saikia2016cbdi. Two different type of declarative memory - semantic memory and episodic memory keeps general world knowledge and experience gained over time respectively. Briefly, memory includes those aspects of human which they gain by their own observation and experiences which changes over time to revise beliefs about an object, event, or action of the world.
Research in the field of knowledge representation has been pursued since the concept of modern computer. There has been much development in the area over the last few decadesdavis2015commonsense. In the context of intelligent processing, most of the existing knowledge representation techniques provide strong evidence towards processing of a particular language for a particular problem, but most of the such system lack the significant objective, of ”generalization”.
Text processing usually involves performing various computations on the unstructured text. A more efficient way to process text could be obtained by first transforming the text into a structured format and then fed for processing. There is growing interest for knowledge representation technique for natural languagedavis2015commonsense with systems that are able to store information in a structured format so that we could say it knowledge.
The remainder of this paper is organized as follows: section 2 provides the background review and motivation towards the representation followed by desidarata in Section 3. Section 4 contains the formal structure to illustrate the fundamental intuition of representation followed by experiment and evaluation in section 5. Finally section 6 briefly concludes the paper.
2 Background and Motivation
Human memory has long been studied and attempts have been made to map the same, but it is a very complex system whose absolute structure hasn’t yet been obtained. Recent research radvansky2015human has shown that human memory is not located in one particular place in the brain, but is instead a distributed structure in which different parts of the brain act in co-ordination with one another. Thus, an actual representation might be a large complex network, in which the nodes symbolize the various elements that join at edges to form a memory. A memory system specifies the underlying infrastructure for rational behaviour for animals. Human intelligence immensely rely on such infrastructure. Briefly, memory includes those aspects of human knowledge which they gain by their own observation and experiences which changes over time to revamp beliefs about objects, events or actions of the world.
Psychological studies fundamentally classify human memory system in two classes based on their temporal existence 1. Short Term Memory(STM) and 2. Long Term Memory(LTM). Short Term Memory holds current observation and survive for very short periods of times, which are assumed to be in the order of secondslaird1. Where as Long Term Memory can detain acquired knowledge for unlimited durationanderson. LTM is further divided into two subcategoryatkinson1968human; Declarative and Non-declarative storage, where first can store knowledge that someone can tell others and later contains knowledge that someone can show by doingbaddeley1997human, which is of our special interest. Evidences suggest that declarative memory contains two different types of declarative knowledge, which are separately stored in different storage infrastructure known as semantic memory and episodic memoryanderson.
Semantic memory refers to the facts, information and features about objects which are internally used by brain in order to determine what the object is. Semantic Memory is used in the field of AI in order to determine the meaning of a word or fact such that the computer system is able to understand it and perform computations on it. Episodic memory represents the chronological record of a persons experience, where some specific events are only stored for long interval of timebliss1993synaptic. Episodic Memory can be explicitly accessed and an episode can be reconstructed using it. When the personal context is shared from the Episodic Memory, it becomes a part of the Semantic Memory of the person. This generally happens when there are some facts or information in the episodic memory which is repeatedly learnt over time mckoon1986critical; greenberg2010interdependence.
Beyond representation, it also important to define how operations are performed on knowledge. While retrieving instances of knowledge it is important to determine which information is relevant to the task at hand. Insignificant knowledge needs to be removed so as to avoid accumulating non-essential information. The essence is reinforce important parts to make sure that the knowledge is not forgotten. Five different operations that a knowledge representation technique should be able to perform in order to enable the aforesaid are listed belownuxoll2007enhancing:
Encoding deals with determining how a new data will be transformed following the rules of the KR such that the original structure of the KR is maintained. Also, it has to be taken care of when encoding would be initiated.
Storage handles the internal storage structure for the KR. Further, it needs to ascertain how the dynamics of the storage will change when any other operation is performed, so as to reflect the operation but at the same time retain the structural rules.
Retrieval manages when and how retrieval will be triggered. Most importantly, the process involved in the retrieval should be defined in detail, which involves initiation condition, selection methodology, and similarity determination.
Forgetting tackle the removal of insignificant knowledge so as to avoid accumulating non-essential information.
Consolidation take care of reinforcing important information to make sure that knowledge is not forgotten while it is in use. Further, care must be taken such that knowledge which is used frequently becomes a permanent memory after reaching a frequency threshold.
The essential requirement for any system to exhibit human-like intelligence is to be able to draw conclusions from the knowledge the system already possesses. This requires the system to be able to represent relationships between various beliefs, includes not only inferring any new rules encountered but also to be able to update the existing ones. In order to establish an intelligent conversation, the system must be capable of determining the feasible choices by associating it with the existing patterns and then on the basis of the feasibility of various choices, be able to choose a alternative.
The human-machine interaction system can work towards achieving the goal only when it supports some varied capabilities that are required for a system to achieve human level intelligence pat; pushp2017cognitive. Desiderata for completeness and evaluation of a cognitive agent for human machine communication, whether directly or through embedded process are:
Adaption could be bolstered if one can properly categorize knowledge and can recognize and extract rational knowledge. Therefore a proper knowledge structure along with satisfactory retrieval mechanism is essential.
The essential requirement for any system to exhibit human-like intelligence is to be able to draw conclusions from the knowledge the system already possesses. This requires the system to be able to represent relationships between various beliefs, which includes not only inferring any new rules encountered but also to be able to update the existing rules.
The system must possess the ability to encode and store the result of previous experiences and to be able to retrieve them later and the inferences drawn by them at the previous stage. Additionally, the rules must be generalized in memory to be able to learn by applying them to similar problems or other tasks in the same domain.
Aforementioned desiderata express the abstract notion of underlying requirement. Throughout the work we have emphasize on developing the knowledge infrastructure which is fulfilling part of requirement discussed above. The proposed mechanism is psychologically plausible and never denies the possibility of other better methodology. The design perspective would go around the human centered functionality.
4 Episodic Memory and Knowledge Representation
Knowledge representation fundamentally is a function which takes input from one domain and returns an output belonging to another one. In other words, it is a mapping of entity between two domain. There could be enormous variety of structural method through which one could perform such a mapping. Due to human centric bias, this section will supply the formal description of episodic memory based knowledge representation structure for text data.
Episodic memory in general is a network of experienceanderson gained by individual. Which is certainly different for different individual. Since experiences are gained over time, therefore it requires a temporal order of maintenance. Multiple experiences gained over time could have variety of interconnected contexts, that is again a crucial challenge for modeling.
Episodic memory is a 5 tuple consists of chronological sequence of episodes and temporal relationship between them.
where, refers to the unique identification associated with individual episode, records the temporal parameter for each instance of individual episodes, which further could be used to establish the temporal occurrence of individual episodes. indicate the references to nextepisode; whereas significance of a particular instance of episode could be recorded in tuple. establishes the similarity connection between instances of episode which itself composed of two tuple ,. Pair of parameter defines the strength of similarity between episodes. It comes into picture when a particular episode is linked with number of other episodes due to context similarity.
The knowledge representation generates a graph where there are various episodes linked in chronological order, as shown in Figure 1. In the memory it can serve to find the last gained knowledge about a particular context as well as to answer a query whose context could be inferred from previous experience.
Episodic memory is composed of instances of episodes linked together depending on the strength of similarity and temporal occurrence. Among such instances, a few could be more significant then other in a particular context.
Each episode in turn consists of instances which have smaller data chunks stored in the form of nodes. Instances are connected to previous instances on the basis of similarity. An episode from node s to g is a sequence of nodes (, , ,…,) such that s= , g= . Individual episode could be formally expressed as a 5-tuple entity;
where, is again unique identification for individual nodes within an instance of an episode. keep the sequence information about the nodes i.e. the order of occurrence of nodes within an instance of episode. indicate the temporal parameter about individual nodes within each instance of episode. contains the reference to next node within an episode. is again there to state the significance or importance about the context of particular node. This would be initially fixed constant value for any newborn episode and can change according to uses or priority of the episode.
Individual episodes are further composed of series of nodes; which are organized as a collection of sub nodes expanded in three logical layers; primary, secondary and ternary.
A node is an elementary data unit of episodic memory. It consists of vital decision making information. A node is a 5 tuple structure.
Where, indicates the unique identification of a node. Depending on type of node (indicated by ), a node contains relevant information; refers to the collection of keywork associated with a node in a context, records the episode identification of within which current node belongs.
As an elementary data unit and in line with the complexity of text representation, the three types of nodes are a logical separation of text is based on the building block of a particular language, which turns out to be a language dependent functionality.
Primary node indicates the subject of the sentence. It acts as an anchor to the underlying context. Such that it could be used as a reference point whenever that is called upon in future. The primary node has been chosen from the tags such that it contains the information regarding the main subject of the instance. Table 1 contains the list of tags associated with primary node.
|NN||Noun, singular or mass|
|NNP||Proper noun, singular|
|NNPS||Proper noun, plural|
Secondary node works as the sub nodes of primary and keeps the information about the subject of the primary node. This helps in determining the attributes of the subject. Table 2 comprises the possible tags to identify a secondary node.
|VB||Verb, base form|
|VBD||Verb, past tense|
|VBG||Verb, gerund or present participle|
|VBN||Verb, past participle|
|VBP||Verb, non-3rd person singular present|
|VBZ||Verb, 3rd person singular present|
Further, tertiary node indicate the adverbs related to the property being referred to in the secondary node. Table 3 shows two observed tag for tertiary node.
Beyond primitive structure, we present how variety of operations would operate over the dynamic structure to deal with the transformation of informal input to a computable knowledge structure. While retrieving instances of knowledge it is also important to determine which information is relevant to the task at hand. Also, it is important that the insignificant knowledge be removed so as to avoid accumulating non-essential information. At the same time, it is also of essence to reinforce important parts to make sure that the knowledge is not forgotten. Various operations nuxoll2007enhancing that a knowledge representation technique should be able to perform in order to enable the aforesaid are:
Encoding deals with determining how a new data will be transformed following the rules of the Knowledge Representation such that the original structure of the Knowledge is maintained.
Whenever a new action sequence is observed, it is required to examine whether the new action sequence would be considered as a new episode or an instance of the ongoing episode. We assert in general that if elapsed time between timestamp of current instance and timestamp value of last observed instance is greater then the time elapsed between the last observed instance and first instance of that episode then new action sequence would be considered as start of new episode.
Episode determination is assimilated as a Boolean valued function. Function returns 1 if new episode is required to be started, similarly returning 0 indicate that current instance continues with the current episode. We would consider , and as different timestamp and indicates start timestamp and last timestamp of episode in which last instance belongs. Further returns the timestamp of current instance for which we would like to determine the episode.
Where is the time-stamp constant which is a tuning factor that may be set according to the application requirement i.e. when an application demands new time-stamp be created within a small time-interval, smaller value of would be preferable.
Nodes are an elementary unit of proposed episodic memory which is formed based on the classification of input text based on a tag it acquire. Tag features of the input text are used to organize the node as a primary, secondary and ternary node.
Storage handles the internal storage structure for the KR. The storage handles the changes when any other operation is performed, so as to reflect the operation but at the same time retain the structural rules.
The structure used to store an episode will influence the efficiency of addition or modification of knowledge as well as relevant retrieval. As it has been experimentally established that graphs would best complement to assimilate a non linear structure like an episodic memorynuxoll2007enhancing. Therefore, to take care of such structure it maintains an interaction graph.
The Interaction graph is a directed graph denoted as , is of two tuple structure of (,). Where is a finite, nonempty set of episodes and is a set of links between pair of episodes.
Conventionally, the set E and L represents the vertices and edges of a graph respectively.
Encoded tags are used to classify the input text into three node type; primary , secondary and ternary. Storage structure treat primary node of the input text as root for that node instance. Secondary nodes are directly connected to primary node whereas ternary nodes are directly connected to secondary nodes. Therefore a evolutionary model of knowledge structure is stored in form of node for each input instance. A symbolic structure is shown in the figure 2.
Retrieval manages when and how knowledge will be triggered. The retrieval process defines how the past episodes and instances may be retrieved from the storage. It involves initiation condition, selection, and similarity determination
Spontaneous retrieval is initiated when an episode is retrieved in order to link similar instances whereas deliberate retrieval is initiated when the conversation demands an answer which might be present in the memory. The knowledge structure backtracks and finds the first node it encounters that has the same primary node or similar to the current node. Here WordnetWinNT plays significant role to determine the similar knowledge.
Whenever it encounters a node which satisfies this condition, it applies the function S, where i is the node on which function is applied and computes value for both, the current node and the node being compared, j, using the equation,
Where, k, l, m are constants such that k + l + m = 1 and , in order to ensure that maximum priority is given to the primary node.
Here, is the inverse of distance between the primary nodes which is calculated by finding the similarity between them using WordNet, is the inverse of least distance between the secondary nodes and is the inverse of least distance between the tertiary nodes.
If (threshold constant) then, the difference between the two nodes is acceptable and they are linked, otherwise previous nodes are searched until similar node is found or start of time is reached. The weight of any new link will be initialized at the time episode is linked as 1.
Forgetting performs the removal of insignificant knowledge to avoid accumulating non-essential information. Therefore the organization of the episodic memory changes over timenuxoll2010comparing. Usually forgetting target those episodes which is least used. Therefore it weakens the link and decrease the utility value of that instance over the period of time.
Link weight has to decrease to weaken the significance of an instance. At time , the new weight depends on the difference in time elapsed between target instance to the current instance and utility of the target instance. Therefore, could be updated as;
Where is the difference between the time stamp of target instance and current one. refers to the utility of the particular instance. Further, is link weight constant, is forgetting constant and is utility constant.
The constants x, y, z are tuning factors whose values may be taken according to the application. Value of x can be chosen in the range (0,1) where values close to 1 implies slowest forgetting and closer to 0 implies rapid forgetting. Value of t must be chosen such that effect of passage of time may be reflected on the link, where the value must be kept in the range (0,1) with greater value signifying rapid decrease in weight links with time. The weight of any new link will be initialized at the time episode is linked as 1, while the utility of instance is fixed as 1 at the time of instance creation itself.
The utility value of an instance at time of an instance will change with respect to the elapsed time measure. Therefore new utility could be computed as,
where, y is the forgetting constant which will be consistent with the forgetting value with respect to time of links.
If the weight of any link is lower than link threshold, or the utility of an instance is lower than utility threshold, then the link should be severed or the instance be deleted, respectively. The deletion of instances or severing of links must be done while the application is idle.
Consolidation take care of reinforcing important information to make sure that the knowledge is not forgotten while it is in use. Also, it must be taken care of that some knowledge which is used frequently becomes a permanent memory after reaching a frequency threshold.
Whenever, difference is acceptable, and the instances are linked, the utility value of both the instances and the weight of the link is increased using the equation,
The weight of the link is increased using the same parameters as the link weakened has to be performed during forgetting.
The utility value at time t is increased with reference to its previous value, threshold constant and forgetting constant. Consolidation is performed when the system is idle.
5 Experiments, Evaluation and Result
In this section, we present the experiments conducted to observe the working of each operation. At different instance of time, different input paragraphs are introduced to examine the snapshot of episodic memory coupled with various operators. Table 4 presents a snapshot of existing episode ahead of introducing input. Creation of episodes, instances of node within individual episode and effectiveness of operators will be observed to examine the working of knowledge structure.Firstly, three different paragraphs are given as input at three separate instance of time and observation will be made thereafter.
Input: The sun is a huge ball of gases.
The Sun is mainly made up of hydrogen and helium gas.
The surface of the Sun is known as the photosphere.
|4300673472||None||0||2018-12-09 10:09:08. 532000||4300679876||None||1|
We would visualize the instances of node correspond to individual sentence in Table 5. Individual row in Table 5 reflects the unique identification mark for individual node, fragments of input belongs to variety of node type, time-stamp and next linked node.
|4300679876||[’sun’, ’ball’, ’gases’]||[’huge’]||||||2018-12-09 10:09:08. 485000||4300679886|
|4300679886||[’Sun’, ’hydrogen’, ’helium’, ’gas’]||[’made’]||||[’mainly’]||2018-12-09 10:09:08. 485000||4300679896|
|4300679896||[’Sun’, ’surface’, ’photosphere’]||[’known’]||||||2018-12-09 10:09:09. 485000||None|
In continuation Table 6 present the newly created episodes based on acceptable time difference considering value of is equal to 0.1.
|4300673472||None||0||2018-12-09 10:09:08. 532000||4300679876||4300673572||1|
|4300673572||None||0||2018-12-09 10:03:09. 452000||4300679234||None||1|
Broadening the knowledge infrastructure has been designed to work implicitly. Where system learns through experience to upscale the knowledge acquisition. The episodic memory has been developed in such a way that the system learns over time as more and more information is fed to the system.
5.1 Learning Experience
The estimated enrichment of the episodic memory is represented graphically through Figure. 3. Initially the episodic memory is created and linked to each other only through chronological sequence as they rarely have anything in common. However later on, slowly but steadily the links start to rise as the system starts finding some knowledge in common. After sufficient accumulation of knowledge, there is a steep growth in links as for almost every new memory, similar memory can be found out in storage.
Learning of such kind eliminate the redundant knowledge which significantly reduce the dense search space to a sparse search space. Which should improve the retrieval opportunity of a query.
5.2 Retrieval Time
Time to retrieve older memories will change with the accumulation of knowledge. Best Case will be observed when continuous knowledge on the same topic will be given. Average Case will be observed when sufficient knowledge has been obtained and the topic on which knowledge is obtained is previously present in the memory.Worst case scenario will be observed when the topic is not previously present, therefore it is required to search for the topic till the start of time.
A graphical representation of the memory retrieval rate with respect to knowledge acquired has been given in Figure 4. Here, retrieval time denotes the time taken in order to find the similar node. It is based on the observance that, if the knowledge network increases the system needs to go to the start of time less and less. Which is because the system only has to find the latest instance of topic and updates the links in the episodic memory accordingly.
5.3 Question Answering
In order to evaluate the working of episodic memory knowledge infrastructure, integration with a different application were carried out. The objective was to demonstrate the functioning of retrieval.
Analysis of the System
Firstly, an analysis is done where the system is given a set of simple and complex sentences jumbled together. Now, a set of simple questions are fired on the simple text as well as the complex text. Similarly, a set of complex questions are also fired on on both the texts. Here, the questions are asked only for the cases where the answer is present in the text given to it previously. The results recorded are observed in Table. 7.
|Simple Question||Complex Question|
Comparison with Cleverbot
For demonstrating the capabilities of our system with respect to other artificial intelligence question answering machine, we compare our system with ”Cleverbot”carpenter2015cleverbot. Cleverbot is a very popular web application which was developed by Rollo Carpenter. The reason for selecting Cleverbot when there are so many question answering machines available is its unique feature of developing database by having conversation with people. During its launch it had 200 million conversations which now has increased to 265 million. When asked a question, Cleverbot tries and matches it to the exact phrase. If no exact phrase is found, it searches for keywords in input and then retrieves the best match from database.
Therefore, to show the comparison between the two, knowledge related question were fed to our system. As shown in Table. 8 the results were found to be comparable when our system was fed appropriate data. However, it could not answer any questions for new topic because it entirely depends upon its accumulated knowledge and cannot give answers to such questions. We can see that the Cleverbot performs better for any given topic. Although, in case of a known topic, keeping in mind the vast difference between the database of the two, the observations made were quite satisfactory.
In this paper we have presented a psychologically plausible knowledge representation infrastructure to organize text data. The intuition was to have text as a knowledge. Formal structure of an artificial episodic memory and number of operators were defined. Wordnet was used to supersede the requirement of semantic memory. Proper functioning episodes and operators were examined. Finally evaluation of the knowledge structure were presented to establish the claim. Looking at ways to deal with topics not seen before is part of an ongoing research.
- journal: Journal