Capturing Knowledge Triggering in Collaborative Settings
In collaborative knowledge building settings, the existing knowledge in the system is perceived to set stage for the manifestation of more knowledge, termed as the phenomenon of triggering. Although the literature points to a few theories supporting the existence of this phenomenon, these have never been validated in real collaborative environments, thus questioning their general prevalence. In this work, we provide a mechanized way to observe the presence of triggering in knowledge building environments. We implement the method on the most-edited articles of Wikipedia and show how the existing factoids lead to the inclusion of more factoids in these articles. The proposed technique may further be used in other collaborative knowledge building settings as well. The insights obtained from the study will help the portal designers to build portals enabling optimal triggering.
Human knowledge is perceived to evolve with time through the creation of new artifacts of knowledge. An important thing about the evolution of knowledge is that the development of new artifacts depends on the existing ones [1, Pg. 86]. This belief is also supported by Constructivist learning theories  as well as other theories on Genetic Epistemology [3, 4, 5, 6, 7, 8]. As per these theories, how we construct knowledge depends on what is already known. Understanding this evolutionary process has been of interest to the researchers from a long time [9, 10]. Many of these works have pointed to the phenomenon of triggering to be responsible for the creation of new knowledge. Here, triggering is a procedure by which an idea or a comment spearheads the generation of another idea or thought . Although this phenomenon has been mentioned in different texts in diverse ways, yet they all refer to the same underlying idea. For instance, classical theories suggest that in a social system such as a collaborative knowledge building system, people add more content due to the cognitive conflicts  or perturbations . These conflicts arise when they see content that is not complete or does not match with what is there in their cognitive systems (i.e., minds) already.
A few other works point to theories that support the existence of an underlying network among the pieces of knowledge concerning a knowledge artifact [9, 10, 11]. For example, it is perceived that knowledge is organized into frames and each of these frames possesses a particular concept . These frames may be of varying sizes, and those that are related to each other are linked together in the network . Therefore, when a frame is triggered, the other frames that are linked to it are also likely to be triggered [10, p. 55]. These frames may be linked sequentially or in any non-linear fashion. One can imagine a forest of nodes where each node is a knowledge frame, and the attached frames form the connected components in the forest. Figure 1 shows an example of the underlying network of the knowledge frames (concepts) which are shown as nodes and a link between two frames depicts that they are associated with each other. Further, these frames are connected by condition-action rules, that determine which frames to trigger next . When the triggering conditions for a frame are met, that frame is brought into the system. Figure 1 captures this by the thickness of the edges that represents the strength of association between the nodes. As an instance, the concept ‘A’ is associated with four more concepts, namely ‘B’, ‘C’, ‘D’ and ‘E’, where A’s association with ‘B’ is more than that with ‘C’ and ‘E’, which is further more than that with ‘D’. Hence when ‘A’ gets introduced, the chances of inclusion of ‘B’, ‘C’, ‘D’ and ‘E’ also increase based on the strength of their edges. This phenomenon leads to a ubiquitous and self-regulating phenomenon of the existing knowledge frames leading to the inclusion of more knowledge into the system, making it an autopoietic system.
A limitation of the existing works on understanding the evolution of knowledge is that they have suggested theories that have not been validated with real-world data. An inability to get access to the right kind of data has mainly resulted in only theoretical work pursued in this direction. However, in the recent past, due to the advancements of tools enabling large-scale collaboration for building knowledge, it is possible to acquire the required underlying data. This is due to the fact that these tools store all the footprints of the knowledge built through them in digital format.
In this study, we use Wikipedia which is one of the most successful portals that enables building knowledge with a combined effort of a large group of users. On this portal, the content available in any given article does not reach its eventual state in a single step . Rather, a few factoids
The development of mechanized ways to observe the evolution of a collaborative piece of knowledge may pave the way for advancements in the fundamental research on knowledge building. This will further lead to better mechanism design of the collaborative tools for building knowledge.
Ii Related Work
Past research has attempted developing models that mimic the triggering phenomenon. However, these models have focused mainly on the growth properties of the knowledge rather than exploring how the existing knowledge frames steer the inclusion of more knowledge.
Some of these models are based on Polya’s Urn Model  and its extensions [17, 18] where the knowledge units are represented as colored balls in an urn. In the basic model, one ball is drawn from the urn uniformly at random and its color is observed. This ball along with an extra ball having the same color is then inserted into the urn. This process is repeated and the growth of balls of different colors in the urn is observed. It is understandable that in such a model, an observed ball (color) is more likely to be observed again.Tria et al.  present a mathematical model to emulate the occurrence of a new invention. The model is also a generalization of the Polya’s Urn model  and is based on the idea that the space of the existing novelties expands as a new invention occurs. It uses the concept of adjacent possible introduced by Kauffman , where the adjacent possible may contain all those concepts that are one step away from the existing concepts . The authors show that the rate of occurrence of novelties follows Heap’s Law . As per this law, the growth rate of knowledge reduces wih time. The same was then verified using the data taken from various sources such as Wikipedia, an annotation system and an online music catalog. The Wikipedia dataset used by the authors contained a collection of Wikipages and the first edit by a user to a wikipage was considered equivalent to an invention. This model was further extended by Loreto et al.  where the authors provided a class of probabilistic models using Simon’s model. In a recent work, Iacopini et al.  modeled the dynamics of innovation processes using Edge Reinforced Random Walks  on the network of ideas. Edge Reinforced Random walk is one in which the weights of the edges are incremented as they are visited, thereby, changing the weights of the edges in the network dynamically. The authors kept the probability of visiting an edge to be directly proportional to the weight of the edge. They showed that the rate of increase of innovations through their model follows Heaps law, which has been shown to exist for such settings by past literature. In another work, the knowledge networks of questions and answers were studied by Miroslav et al. . Using the concept of triggering from the classical cognitive theories, Chhabra et al.  developed a mathematical model that computes the knowledge produced in a system due to the effect of triggering. The model uses the concept of diversity in activity selection behavior of users in a collaborative environment [28, 29].
All these models have mainly focused on either finding the rate at which the inventions occur or some other statistical property of the process rather than understanding how the knowledge evolves through the process of triggering. To the best of our knowledge, how the existing knowledge sets the stage for the manifestation of more knowledge has not been explored using real-world data so far.
The data set
Iv Article Evolution: Tracking the Factoids
A Wikipedia article is always in-flux, i.e., it keeps changing with time as new units of information, i.e., factoids keep getting added. The introduction of these factoids is what leads to the evolution of the article. Therefore, it is required that out of the entire content of the article, these factoids be identified. A manual assessment of an article may provide a clue to a domain expert about what may be considered as factoids out of the entire content of the article. However, in order to capture the evolution through automated techniques, it is essential to devise some measure to identify these important units of information.
Every Wikipedia article contains a number of Internal links which point to other Wikipedia articles. These links may contain either a single word such as ‘Bible’ or a phrase such as ‘Second Samoan Civil War’. We posit that an article is created for a given word or phrase if it is important. Therefore, in our analysis, we use the internal links of an article as a proxy for its important terms or factoids. Moreover, out of all the terms, those that stay till the end, i.e. remain in the final version of the article are even more important. Keeping track of the time of introduction and the inter-dependency among these factoids may provide insights into the evolution of the article. Further, the frequency with which factoids are introduced may also help in revealing which phase an article is currently in. For example, it is understandable that when an article is newly created, the frequency might be greater as compared to the later phases.
We were interested to observe how the articles in the data set evolved to reach their final state. For that, we gathered all the factoids present in the latest version of the articles. We then recorded the revisions where these factoids were first introduced. Figure 2 shows the number of factoids introduced with respect to their revision numbers for two of the articles: ‘Abraham Lincoln’ and ‘Jesus’. These articles were arbitrarily chosen out of the data set and a similar pattern was observed for the other articles in the dataset as well. The revisions on the X-axis are in the order of their timestamps. As expected, a large number of factoids are added in the first few revisions, whereas the frequency of addition of new factoids reduces in the later revisions.
Further, in order to determine an aggregate behavior for all the articles in the data set, we divided the lifespan of each article into four quadrants- , , and respectively. We then computed the fraction of factoids that were introduced in each quadrant. Figure 3 shows the overall percentage of factoids introduced across all the articles. Intuitively, when the article is in its inception, there is more scope of inclusion of new pieces of information, hence has the maximum fraction of factoids. The number of factoids introduced in and were found to be comparatively lesser. However, showed an increase in the number of factoids as compared to and . We feel that around this period, the articles were competing for the status of a ‘Featured/Good Article’ leading to a relative increase in the addition of new pieces of information.
V Measuring Triggering Among Factoids
Triggering is a cognitive and sometimes an individual-specific phenomenon. Automatically perceiving the presence of triggering among the factoids, as well as discerning what may have instigated a user to add their own content, is a challenging task. The non-trivial nature of the analysis is what has led to only theoretical evidence of the presence of triggering phenomenon in existing literature.
We propose the use of Normalized Google Distance (NGD)  to measure triggering among factoids and show that it may help in automatically measuring triggering in a collaborative environment to a reasonably good extent. It should further be noted that there are a few other association measures such as network distance or semantic distance techniques such as word2vec, however, these methods may not serve our purpose. Network distance may not help because in our analysis, factoids represent Wikipedia articles and the distance between any two articles on Wikipedia network may be able to provide values in a very small range only. This is because the Wikipedia network follows small-world phenomenon  which leads to a very small average distance between any two nodes on the network
NGD is a sort of ‘crowdsourced’ way of computing the semantic similarity between two words or phrases. It is based on computing the number of hits that are returned by the Google search of these phrases. It exploits the idea that the phrases which are semantically similar will be found together in more number of web pages as compared to those that are not quite related to each other. The formula of NGD is given by:
where and are the phrases between which the semantic distance has to be computed. Here, , and are the number of hits returned by the Google search on the phrases , and , together, respectively. Further, is the total number of web pages examined by the Google query, multiplied by the average number of words on any web page. An estimate of the total number of web pages is found by searching for a word such as ‘the’, that is found on almost every page, which at the time of the study came out to be 25,27,00,00,000. Further, for our analysis, we took the average number of words on any web page to be 1,000. Although the value of NGD between two phrases and can vary from to , however, if it is greater than 1, and are considered to be reasonably dissimilar. A value of for the metric indicates that the phrases are very related and always occur together. Next, we explain how we used NGD to analyze triggering in Wikipedia articles.
For each Wikipedia article, we first created a list of all the internal links, i.e., factoids that were present in the final version of the article. It should be noted that in this analysis, we considered only the factoids present in the final version, but for a more comprehensive analysis, the factoids introduced in all the revisions - which includes those that got extinct and did not make it to the final version - may also be considered. Subsequently, for every revision, we prepared a list of factoids that were added in that revision. For each revision, the user id of the user who made the revision was also recorded. The next step was to prepare a list of lists, which we named as RFFR, whose each member list was of the following form:
[, , , ]
where and were the set of factoids added in and respectively. Further, was among the next subsequent revisions after such that the user of was not the same as the user of . Essentially, for each revision , we checked the subsequent few
We computed RFFR and RFFR_cross for all the articles in the data set. The maximum number of rows in RFFR were 139 for ‘New York City’ article, whereas the maximum number of rows in RFFR_cross were 596 for ‘Lionel Messi’ article. The average number of rows in RFFR and RFFR_cross were found to be 39.3 and 146.92 respectively. We then computed the NGD values for each factoid-pair from RFFR_cross and manually observed the association between the factoids against their NGD values. It was interesting to find a good association between the factoids where the NGD values were less than 0.5, whereas there was very less or no association between the factoids with high values of NGD. To perceive how the existing factoids increase the likelihood of the inclusion of related factoids in the subsequent revisions, we discuss here the results obtained for two of the articles: ‘India’ and ‘New York City’ in detail. The article ‘India’ was chosen due to the domain knowledge of the authors, and ‘New York City’ was chosen given that it had the maximum number of entries in RFFR.
|Factoid Added in ()||Factoid Added in ()||NGD(, )|
|17030||NKP Salve Challenger Trophy||17031||2003 Afro-Asian Games||0.098|
|392||Indian Coast Guard||398||Sino-Indian War||0.099|
|19235||Street cricket||19237||1936 Summer Olympics||0.122|
|17031||2003 Afro-Asian Games||17034||Hockey India||0.154|
|52||Indian subcontinent||57||Arabian Sea||0.156|
|392||Indian Army||398||Sino-Indian War||0.166|
|572||Manmohan Singh||577||Atal Bihari Vajpayee||0.168|
|145||Mahatma Gandhi||148||Mohandas Karamchand Gandhi||0.216|
|392||Indian Air Force||398||Sino-Indian War||0.235|
|52||Indian subcontinent||57||Indian Ocean||0.322|
|244||Kolkata||246||East India Company||0.359|
|Factoid Added in ()||Factoid Added in ()||NGD(, )|
|19235||Bagepalli||19237||1936 Summer Olympics||0.568|
|30||United Kingdom||32||Satyameva Jayate||0.568|
|13411||Current Science||13420||Cambridge University Press||0.572|
|18643||Telecom Regulatory Authority of India||18667||Kedarnath Temple||0.710|
|19235||Juara||19237||1936 Summer Olympics||0.974|
|19231||UNESCO World Heritage List||19235||Juara||1.261|
|Factoid Added in ()||Factoid Added in ()||NGD(, )|
|14999||Battle of Long Island||15021||New-York Historical Society||0.061|
|1187||Wagner College||1189||Manhattan College||0.086|
|77||Long Island Rail Road||79||LaGuardia Airport||0.093|
|7||Staten Island||11||Central Park||0.094|
|77||Roosevelt Island||79||LaGuardia Airport||0.101|
|355||Metropolitan Opera||357||New York City Public Schools||0.101|
|19847||New York City Pride March||19855||Riverside Church||0.104|
|15005||Conference House Park||15021||New-York Historical Society||0.113|
|77||Port Authority of New York and New Jersey||79||JFK International Airport||0.125|
|18||New York University||19||New York Botanical Gardens||0.128|
|2037||Throgs Neck Bridge||2043||Triborough Bridge||0.131|
|77||Port Authority Bus Terminal||79||LaGuardia Airport||0.136|
|1097||City park||1103||Battery Park City||0.148|
|57||Immigration||65||United States Census Bureau||0.150|
|77||People mover||79||LaGuardia Airport||0.152|
|495||News Corporation||497||Television production||0.154|
|71||Bronx Zoo||77||Long Island Rail Road||0.155|
|7||George Washington||13||Columbia University||0.158|
|355||World War II||357||City University of New York||0.167|
|11||Central Park||18||Washington Square Park||0.171|
|7||financial center||12||New York Stock Exchange||0.179|
|10578||General American||10581||Italian American||0.181|
|77||Long Island Rail Road||79||JFK International Airport||0.181|
|77||Port Authority of New York and New Jersey||79||LaGuardia Airport||0.188|
|1186||Fordham University||1187||Wagner College||0.203|
|14999||Lord Howe||15021||New-York Historical Society||0.208|
|Factoid Added in ()||Factoid Added in ()||NGD(, )|
|691||Metro-North Railroad||697||United Kingdom||0.582|
|7||port||13||New York Yankees||0.601|
|355||Brazil||357||Fashion Institute of Technology||0.634|
The total number of factoid-pairs for ‘India’ and ‘New York City’ articles were found to be 305 and 533 respectively. The NGD values for all these pairs were computed, and the pairs were sorted based on their NGD values. The minimum and maximum NGD values observed for ‘India’ were 0.042 and 1.26 respectively, whereas for ‘New York City’, these were 0.044 and 0.63 respectively. We observed the association between the factoids against their NGD values. It was interesting to observe that the pairs where the NGD values were less, i.e. the top entries of the sorted list, a strong association was found among the factoids. However, as we went down the list, the association between the factoids reduced. In fact, the factoids in the bottom-most rows were found to be having a very high conceptual distance. Tables I and II show a few entries from the top-most rows and the bottom-most rows respectively, out of the sorted list for ‘India’ article. Tables III and IV show the corresponding values for ‘New York City’ article. It can be seen that the inclusion of the terms ‘Sachin Tendulkar’ and ‘Cricket’ led to the introduction of terms ‘Kabaddi’, ‘Chess’ and ‘Gilli danda’ - which are other games played in India - in the very next revision. The inclusion of ‘Islam’ lead to the inclusion of terms ‘Christianity’, ‘Sikhism’ and ‘Jainism’ in the next few revisions. In the ‘New York City’ article, ‘Wagner College’ led to the inclusion of another competitive college of a similar rank, i.e. ‘Manhattan College’. Similarly, ‘Metropolitan Opera’, which is engaged in deepening student experiences with opera in the schools of New York City triggered ‘New York City public schools’. It should be noted that in our analysis, we took care of considering only those pairs of factoids where the users of the first and second factoid of the factoid-pair were different. If we observe the entries in Tables II and IV, we find that the values of NGD higher than 0.5 belong to factoid-pairs having a very less apparent association. Therefore, it may be a good idea to keep the threshold of NGD to be 0.5 in this case and remove the rest of the rows from the obtained factoid-pairs. For India and New York City articles, 52 and 39 pairs respectively were found to be having NGD values more than 0.5. We believe that the choice of a threshold for NGD is dependent on the context. If we wish to find pairs with a very high association only, this threshold may be tuned to a lower value accordingly.
We also created the triggering networks for these articles (See Figure 5) similar to the one shown in Figure 1 to get an overall picture of triggered terms. Here, the nodes represent the factoids and there is an edge between two factoids if they were added in close-by revisions and one of them had likely led to the inclusion of the other factoid. The size of a node depicts the number of other factoids that a given factoid is connected to. The strength of the association between the factoids is represented by the darkness of the color of the edges, which was computed based on their NGD values. In other words, this strength represents the probability of a factoid getting added to the content of the article, when its connected factoid is already present in the article. This probability is inversely proportional to the value of NGD between the two factoids and , i.e.,
Therefore, the computation of NGD values among all the related terms of a given knowledge artifact can help us understand its underlying network and hence its evolution.
Vii Discussion and Future Work
This paper is a starting step to instigate work in a domain that has remained dormant despite its importance given the extensive usage of crowdsourced portals for building knowledge these days. Triggering is the basis of these portals and an understanding of this phenomenon may help in building interfaces that are able to facilitate optimal triggering. It was interesting to see how the inclusion of a few terms to the articles led to the insertion of more terms in the subsequent revisions. The analysis shows that the introduction of a few key terms acts as milestones for the evolution of the article. When a factoid is added, more knowledge related to that factoid is likely to be added. However, different users get triggered differently, leading to the inclusion of diversified knowledge into the articles.
To the best of our knowledge, there has not been any automated way to capture triggering among knowledge units in a collaborative setting. Given the difficulty in finding what may have triggered a human mind to add a particular piece of knowledge to an article, it is challenging to devise a foolproof method. Therefore, there may be some limitations of the current analysis as well. For instance, there may be some number of false positives. For example, in our analysis, despite a small value of NGD, it may not be completely guaranteed that the factoids may have been triggered by each other. For example, the NGD between the factoids ‘American English’ and ‘Italian American’ was 0.079, however, in this case, we are not sure that the former may have necessarily led to the inclusion of the later. At the same time, it should also be noted that given the cognitive nature of the triggering phenomenon, sometimes there may be an indirect and not-so-obvious connection between the factoids for a particular user which is difficult to capture objectively through any automated means. Nevertheless, one thing that can be clearly considered as a take-away from the proposed analysis is that the usage of NGD method does provide us a probability value where a small NGD value indicates a high probability of one factoid leading to the inclusion of another. In other words, the proposed method does a good job of automating the process of getting the underlying network for an artifact of knowledge that gets built incrementally. This information may provide deeper insights into the dynamics of creation of a knowledge artifact.
There may be various extensions to this work. As an example, the current analysis assumes that triggering happens due to the terms added in the immediate previous revisions only, hence it checks for the triggered terms in the subsequent few revisions. The method may be extended to include all the successive revisions. The decision regarding how many subsequent revisions should be checked also depends on the portal’s interface. For example, a portal that sends notifications to its users about any changes made to the content will most likely have triggered terms being added in the close-by revisions as compared to another portal which does not have this facility. Also, the current analysis has been performed only on the terms that remain in the final version. An extended analysis may further be performed on the terms that were present at some point of time in the article, but later got extinct. The underlying network may provide additional insights into the reasons of their extinction. Further, apart from the important terms, the same analysis may be performed for all the nouns or all the non-stop-words as well. The properties of the underlying triggering network may also be studied to get deeper insights into the article creation. For instance, this network may provide a clue about demarcating the independent and triggered terms in an article. Additionally, the timestamps of the terms’ introduction may provide a directed acyclic graph which may answer many questions regarding the evolution of the articles. The proposed method may also be used in other collaborative settings such as Q&A portals.
In this work, we first suggest a proxy for capturing the important pieces of information in a Wikipedia article. We then show through the analysis performed on some of the most edited articles of Wikipedia that the semantic distance between important terms of a knowledge artifact may help in automatic detection of triggering. We propose the use of Normalized Google Distance as one of the potential measures for computing semantic distance. The analysis may help in understanding the evolution of a piece of knowledge that goes through multiple refinement steps. It may pave way for examining the dynamics of knowledge building on collaborative portals, which has so far remained in the theoretical realms only. This will in turn help in building better crowdsourced portals.
- publicationid: pubid: (A preliminary version of this paper is available in the proceedings of OpenSym 2018.)
- An autopoietic system is one in which “subsequent operations build on the results of the preceding operations” .
- A factoid may refer to a standalone piece of information about the topic of the article.
- Collected in November 2017.
- It has been observed that Wikipedia network is a classic example of a small-world network which is so densely hyperlinked that on an average, it takes only 4.5 clicks to go from one article to another .
- In the current analysis, we considered the subsequent five revisions such that the users of these revisions were different. In the cases where the same user made the next revision, we considered more revisions accordingly such that we analyze the revisions by at least next five users.
- W. F. Ogburn and D. Thomas, “Are inventions inevitable? a note on social evolution,” Political Science Quarterly, vol. 37, no. 1, pp. 83–98, 1922.
- T. Anderson and H. Kanuka, “Online social interchange, discord, and knowledge construction,” 1998.
- J. Piaget, The construction of reality in the child. Routledge, 2013.
- P. A. Cooper, “Paradigm shifts in designed instruction: From behaviorism to cognitivism to constructivism,” Educational technology, vol. 33, no. 5, pp. 12–19, 1993.
- P. A. Ertmer and T. J. Newby, “Behaviorism, cognitivism, constructivism: Comparing critical features from an instructional design perspective,” Performance improvement quarterly, vol. 6, no. 4, pp. 50–72, 1993.
- L. Vygotskie, A. R. Embong, and N. Muslim, “Mind in society: The development of higher psychological,” info: Cambridge: Harvard University Press, 1978, 1978.
- N. Luhmann, Social systems. Stanford University Press, 1995.
- J. Piaget, Piaget’s theory. Springer, 1976.
- M. Minsky, “Frame-system theory,” Thinking: Readings in cognitive science, pp. 355–376, 1977.
- K. Fisher and J. I. Lipson, “Information processing interpretation of errors in college science learning,” Instructional Science, vol. 14, no. 1, pp. 49–74, 1985.
- D. A. Norman, “Categorization of action slips.” Psychological review, vol. 88, no. 1, p. 1, 1981.
- U. Cress and J. Kimmerle, “A systemic and cognitive view on collaborative knowledge building with wikis,” International Journal of Computer-Supported Collaborative Learning, vol. 3, no. 2, pp. 105–122, Jan. 2008. [Online]. Available: http://link.springer.com/10.1007/s11412-007-9035-z
- S. Nunes, C. Ribeiro, and G. David, “Wikichanges: exposing wikipedia revision activity,” in Proceedings of the 4th International Symposium on Wikis. ACM, 2008, p. 25.
- H. P. de Vladar, M. Santos, and E. Szathmáry, “Grand views of evolution,” Trends in Ecology & Evolution, vol. 32, no. 5, pp. 324–334, 2017.
- A. Chhabra and S. Iyengar, “How does knowledge come by?” arXiv preprint arXiv:1705.06946, 2017.
- G. Dosi, Y. Ermoliev, and Y. Kaniovski, “Generalized urn schemes and technological dynamics,” Journal of Mathematical Economics, vol. 23, no. 1, pp. 1–19, 1994.
- H. Mahmoud, “Pólya urn models. texts in statistical science,” 2008.
- L. Marengo and P. Zeppini, “The arrival of the new,” Journal of Evolutionary Economics, vol. 26, no. 1, pp. 171–194, 2016.
- F. Tria, V. Loreto, V. D. P. Servedio, and S. H. Strogatz, “The dynamics of correlated novelties,” Scientific reports, vol. 4, p. 5890, 2014.
- S. A. Kauffman, “Investigations: The nature of autonomous agents and the worlds they mutually create.” Santa Fe Institute, 1996.
- J. Tebbe, “Where good ideas come from: The natural history of innovation,” 2011.
- H. S. Heaps, Information retrieval, computational and theoretical aspects. Academic Press, 1978.
- V. Loreto, V. D. Servedio, S. H. Strogatz, and F. Tria, “Dynamics on expanding spaces: modeling the emergence of novelties,” in Creativity and universality in language. Springer, 2016, pp. 59–83.
- I. Iacopini, S. Milojević, and V. Latora, “Network dynamics of innovation processes,” Physical review letters, vol. 120, no. 4, p. 048301, 2018.
- M. S. Keane, S. W. Rolles et al., “Edge-reinforced random walks on finite graphs,” Verhandelingen KNAW, vol. 52, 2000.
- M. Andjelković, B. Tadić, M. M. Dankulov, M. Rajković, and R. Melnik, “Topology of innovation spaces in the knowledge networks emerging through questions-and-answers,” PloS one, vol. 11, no. 5, p. e0154655, 2016.
- A. Chhabra, S. Iyengar, and J. S. Saini, “Skillset distribution for accelerated knowledge building in crowdsourced environments.” CoRR, 2015.
- A. Chhabra, S. Iyengar, P. Saini, and R. S. Bhat, “Presence of an ecosystem: a catalyst in the knowledge building process in crowdsourced annotation environments,” Proceedings of the 2015 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2015), 2015.
- A. Chhabra, S. Iyengar, P. Saini, R. S. Bhat, and V. Kumar, “Ecosystem: A characteristic of crowdsourced environments,” arXiv preprint arXiv:1502.06719, 2015.
- R. L. Cilibrasi and P. M. Vitanyi, “The google similarity distance,” IEEE Transactions on knowledge and data engineering, vol. 19, no. 3, 2007.
- D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world networks,” nature, vol. 393, no. 6684, p. 440, 1998.
- S. Dolan, “Six Degrees of Wikipedia,” http://mu.netsoc.ie/wiki/, 2008, [Online; accessed 17-May-2018].