Uncovering Relations for Marketing Knowledge Representation

Uncovering Relations for Marketing Knowledge Representation


Online behaviors of consumers and marketers generate massive marketing data, which ever more sophisticated models attempt to turn into insights and aid decisions by marketers. Yet, in making decisions human managers bring to bear marketing knowledge which reside outside of data and models. Thus, it behooves creation of an automated marketing knowledge base that can interact with data and models. Currently, marketing knowledge is dispersed in large corpora, but no definitive knowledge base for marketing exists. Out of the two broad aspects of marketing knowledge - representation and reasoning - this treatise focuses on the former. Specifically, we focus on creation of marketing knowledge graph from corpora, which requires identification of entities and relations. The relation identification task is particularly challenging in marketing, because of the non-factoid nature of much marketing knowledge, and the difficulty of forming rules that govern relations. Specifically, we define a set of relations to capture marketing knowledge, propose a pipeline for creating the knowledge graph from text and propose a rule-guided semi-supervised relation prediction algorithm to extract relations between marketing entities from sentences.

1 Introduction

Effective decision making to choose marketing actions is much more than utilization of data, reporting tools and models offered by today’s advanced Analytics capabilities. Decision making is part art, part science1. While the “science” of marketing decision making captures research imagination and offers great advances, the “art” of marketing decision making lags behind. The art includes knowledge humans use to overlay on structured information from data, tools, and models, to make decisions and choices. Part of the knowledge resides inside humans, and others lie in corpora of text books, business articles, experts’ writings, research papers, and case studies. With focus on the latter, our objective is to give this knowledge shape in the manner of a Marketing Knowledge Representation (MKR) for it to interact with structured information from data, tools, and models, to eventually advance decision making. The paper exposes the specific challenges of creating MKR, relative to other forms of KR, and then addresses some of them.

For concreteness, consider segmentation, a regularly occurring, fundamental task in marketing decision making. Data from transaction and clickstream capture consumer behavior and are added to data on marketing actions and consumer demographics. Models attempt to estimate the differential effects of marketing actions across consumers on business-desired outcomes and map demographics to those effects to divide the consumers into different segments. When presented with these results, a seasoned human marketer’s knowledge suggests that for effective segmentation she needs to look beyond demographics, into other characteristics, say, psychographics. In this paper, we demonstrate an approach to encoding such knowledge into an MKR. Once an MKR is built, the next step involves building a reasoning engine on the MKR to move toward automated decisions. Staying within segmentation, a reasoning engine can explain whether psychographics segmentation is the way to go, given findings from data and models. In this paper we focus on knowledge representation, but not on reasoning.

Our representation takes the form of a knowledge graph (KG), where the graph “mainly describes real world entities and their interrelations” [17]. The objectives we pursue are organization of marketing information, non-factoid concepts and results from marketing academic literature in a Marketing domain specific KG (MKG). A KG embodies nodes and edges, where nodes are subject, object and edges are relations. The problem of extracting triples, defined as subject, relation, object, from Marketing corpora is challenging for multiple reasons: (1) much of marketing knowledge is non-factoid; (2) entities do not have a taxonomy; (3) the typical corpora is not tightly worded leading to non-informative content; (4) entities are longer sequence of words; (5) relations are marketing domain specific and cannot be necessarily drawn from existing sources of relations such as ConceptNet; (6) supervised approaches for relation prediction cannot be used due to severe labeling limitations. Specifically, the current effort addresses the challenge of predicting relations using a semi-supervised approach and based on a relatively small set of labeled relations. The experiments in this work are based on our efforts of creating an MKG from the chapter on Segmentation in a marketing textbook.

Our main contributions in relation prediction are: (1) demonstrating an approach in creating a Marketing Knowledge Graph, with (2) semi-supervised Relation prediction, using (3) Rule-regularization, given relatively few labeled relations.

2 Related Literature

Our work is closely related to efforts in commonsense knowledge representation, automatic knowledge base construction, relation extraction from text and knowledge integration in deep neural networks.

Commonsense Knowledge Representation: Proposing task-independent knowledge representation for any domain has been a central challenge for the KR&R community. Many commonsense KGs that capture ontological, causal, and other types of common-sense relations between general-domain concepts have been fairly popular such as ConceptNet [21], Cyc [11] and WordNet [13]. Domain-specific knowledge bases such as AURA-KB [1] built on top of the Knowledge-machine ontology for encoding knowledge in biology books have seen some adoption. For marketing domain, the semantics of general-world relations and concepts become ambiguous. Also, our search did not produce any KG specifically for marketing. Our experiments show that existing KGs do a poor job of representing knowledge in Marketing domain, due to the nuanced and non-factoid nature of knowledge in this domain, and the emphasis of these KGs on representing general world knowledge. This makes our effort necessary.

Automatic Knowledge Base Construction: KGs have been traditionally constructed using curated (WordNet), semi-curated (ConceptNet [21]), fully automated (YAGO, NELL) approaches. Curated approaches pose very costly for marketing domain. Hence, automated knowledge base construction or completion cannot be avoided. KB construction has made strides with the use of knowledge graph embedding [24]. The continuous vector representation in low dimensions allows capturing latent semantic relations and applying vector algebra for inferencing about relations. In turn, this affords flexibility for tasks ranging from relation prediction, to entity resolution, to knowledge graph completion. One class of methods perform the embedding task by matching embedding to facts available on the knowledge graph. Other class of approaches uses additional information that are available [24]. This information includes types of entity, description and logical rules. The embeddings consider either instances of real-world entities in the knowledge graph, or, ontological concepts of the knowledge graph, but not both. More recent work [6] advances representational learning by capturing knowledge jointly in both real-world entities and in ontological concepts, as well as, in links that connects them. With focus on relation prediction, our work follows in this tradition of using knowledge graph embeddings.

Relation Extraction: For the task of relation extraction from text, the relation between two concepts or entity mentions in a sentence is mapped to one of the classes in a predetermined closed set of relations. The relevant literature on methods can be grouped as: i) rule-based, ii) supervised and semi-supervised, iii) link prediction. Research in relation extraction has moved from applying hand-coded rules to extract relations [20], to using hand-engineered features and strong classifiers [8, 14] to classify relations between entities. However, given the brittleness of manually designed rules or features, and availability of large amount of data, the focus has shifted to different end-to-end neural models such as convolutional neural networks [26], recursive neural network [3], and long short-term memory network [15]. Work in link prediction [16] has also inspired use of information from available knowledge graph for relation prediction tasks [25]. One obstacle in employing successful supervised classifiers is the dearth of large human-annotated data set of labels. Hence, semi-supervised approaches are receiving attention. Some work model this problem as a multi-instance learning problem [19], and improve the overall accuracy through distant supervision and active learning [22]. Under distant supervision, the problem of predicting relations from noisy annotations is tackled by [4] using reinforcement learning. A recent paper [12] takes an important step forward by jointly optimizing the dual tasks of retrieving sentences given a relation and predicting a relation in a given sentence (hereafter, DualRE). Rather than self-selection, both prediction and retrieval module annotate unlabeled sentences and provide data to each other, thus potentially curbing the limited supervision issue.

Annotations of relations for sentences in Marketing corpora are generally not available. There is need for marketing expertise to annotate relations in order to obtain high quality labels. Relatively few labels can be annotated and that too at significant cost in time and money. Given the unusually low labels, we look towards encoding knowledge using rules that govern the relations and take inspiration from the knowledge integration work in deep neural networks [7, 5, 23]. However, to the best of our knowledge, these work do not integrate weighted First Order Logic rules in a semi-supervised scenario. Given our goal of relation prediction in marketing corpus and faced with a small set of labeled relations and a large set of unlabeled corpus, we improve upon the DualRE approach by integrating knowledge from weighted logical rules.

3 Background: Markov Logic Network

Markov Logic Network (MLN) [18] is a popular probabilistic logical framework that uses weighted First Order Logical (FOL) formulas to encode an undirected, grounded probabilistic graphical model (i.e. Markov Network). The rules in MLN are weighted so that the strict constraints of hard rules (rules that are satisfied always) are eliminated to model the real world more efficiently. It retains the flexibility of modeling hard FOL rules by adding hard constraints as well. Formally, an MLN is a set of pairs , where is a first order formula and is either a real number or a symbol denoting hard weight. Together with a finite set of constants , a Markov Network is defined as containing: i) one binary node for each grounding of each predicate appearing in ; and ii) one feature for each grounding of each formula in . The value of feature is 1, if grounded formula is true; 0, otherwise. The probability distribution over possible worlds specified by the ground Markov Network is given by:

where is the number of grounded formulas, is the number of true groundings of the formula in the world . The MLN inference is equivalent to finding the maximum probable world according to the above probability formulation. Weight learning is done by maximizing the pseudo-likelihood.

4 Marketing Knowledge Representation

Sentence Triplets
If preferences are relatively
homogenous within a segment,
the positions of competing
brands will be relatively similar,
and the quantity of advertising
and promotion will be critical
competitive weapons.
homogeneous preferences
LeadsTo(competing brands,
HasProperty(competitive weapons,
quantity of advertising and promotion)
Segments often overlap, making it
difficult to position products in
different segments independently.
ObstructedBy(position product,
segments often overlap)
MotivatedByGoal(position products,
different segments)
We must balance the costs
of positioning with price and
share changes to identify the
strategy that will achieve
maximum long-run profitability.
DependsOn(costs of positioning,
DependsOn(costs of positioning,
share changes)
maximum long-run profitability)
Table 1: Triplets from illustrative sentences

We first note the idiosyncrasies of marketing corpora to argue that (i) semantics of marketing-concepts do not map to common notion of entities, and (ii) relations in marketing are not adequately captured in sources such as ConceptNet5. Marketing-concepts are compound and much information is not commonsense knowledge. Consider the sentence on the vital topic of positioning. ???If we are to make good positioning decisions, we need to know what dimensions do consumers use to evaluate competitive marketing programs.??? For an MKR, the marketer relevant information in this sentence is a set of triplets: HasPrerequisite(positioning decisions, know what dimensions), UsedBy(know what dimensions, consumers), UsedFor(know what dimensions, evaluate competitive marketing programs). A few notables are: ???know what dimensions??? implicitly means ???knowledge of product dimensions???; those ???product dimensions??? that are UsedBy ???consumers??? and UsedFor ???evaluate competitive marketing programs;??? where ???evaluate??? is a short hand for ???evaluation.??? The last concept ???evaluate competitive marketing programs??? is an amalgam of three entities ??? ???evaluate???, ???competitive???, and ???marketing programs???. Splitting into three entities explodes the set of nodes without adding to generality of representation. Moreover, entities do not form an ontolgy which can be exploited. Coming to relations in the above sentence, we use (HasPrerequisite, UsedFor) from ConceptNet5, and add a new relation, UsedBy, as needed for marketing corpora. See Table 4 for some new and ConceptNet5 relations used. Yet another sentence reads, ???Product positioning takes place within a target market segment and tells us how we can compete most effectively in that market segment.??? In essence the sentence states that ???product positioning’ is important to understand ???target market segment,’ and guides competition in the market segment. A KR shows; UsedFor(product positioning, target market segment) and RelatedTo(target market segment, compete most effectively). The use of RelatedTo is not a precise association; however, as a form of general knowledge captures the essence of association.

Consider another compound sentence, ???Segmentation analysis tells us how the market is defined and allows us to target one or more market opportunities.??? A KR takes the form of UsedFor(segmentation analysis, how the market is defined), and UsedFor(segmentation analysis, market opportunities). Since segmentation analysis only makes sense within the context of a market we can add clarity, without losing any generality, by pre-fixing ???market.’ Hence, it gives, UsedFor(market segmentation analysis, how the market is defined), and UsedFor(market segmentation analysis, target market opportunities). Note that how the market is defined and target market opportunities are key aspects of segmentation performed by marketers, and important concepts to be represented in a KR, along with their relations to segmentation analysis. Additionally, ???market segmentation analysis’ is equivalent to ???market segment analysis’ and a KR must recognize these similarities since both renditions appear in corpora. More examples of annotating triplets from sentences are found in Table 1. Sentences in a marketing corpora are often written in an indirect style, making extrication with any existing parser prone to significant inaccuracies. The challenge is in devising a pipeline which can predict relations among these non-factoid, compound concepts; and as well, recognize when different variations of a concept mean the same thing. Importantly, we want relatively few relations which capture more general rules that govern association among different concepts. The complete set of relations, their semantics and examples used for our experiments are shown in Table 4.

5 Marketing Knowledge Acquisition Pipeline

We adopt a pipeline-based approach, which has four stages: i) definition sentence extraction, ii) candidate triplets prediction, iii) relation extraction, and iv) merging. A book chapter can be divided into definitions of important marketing terms and the rest of the content. The pipeline is described with respect to our example of the topic of Segmentation.

1. Definition Sentences: For each definition of a marketing term such as “segmentation”, we process them sentence by sentence.

2. Candidate Triplets: For each sentence, we parse using the Stanford syntactic dependency parser [2] to get the syntactic parse tree and part-of-speech tags. We then use the parse tree (induced by syntactic dependency relations) and the part-of-speech tags to collect the set of all noun-phrases (NP), which do not include verb phrases or prepositional phrases. We treat each pair of NPs as a candidate for the next step. For example, “Product positioning takes place within a target market segment and tells us how we can compete most effectively in that market segment”, produces NPs “product positioning”, “target market segment” and “market segment”.

3. Relation Extraction: For relation extraction, we pre-train a relation classifier which takes two noun-phrases, the sentence and positional part-of-speech and named entity tags. To train this classifier, we first consult a marketing expert to annotate correct relations for a small set of NP-pairs for the sentences from the textbook. Table 1 shows a few examples. We use this small labeled data and a large set of unlabeled data to train a semi-supervised relation classifier. This is a significant benefit of our approach.

4. Merging: Using this classifier, we identify relations between all pairs of NPs from the previous step. This same classifier also informs which NP-pairs are not related by any relation. This step has two substeps: 4(a) To concentrate on the important entity1-relation-entity2 triplets; in the first sentence, we extract the list of NPs that are connected (via a path in the dependency graph) to the defined term, such as ???segmentation???. This list becomes the next set of important entities for the next sentence. We only concentrate on entities which are connected to the list of important entities. 4(b) For the rest of the corpora, the hierarchical assumption over sentences is withdrawn. We extract entity1-relation-entity2 using similar method as in Steps 2 and 3. Given the graphs from 4(a) and 4(b), we merge using overlapping entities to arrive at the MKG.

The complexity of recovering the interrelations between entities and mapping to a chosen set of well-defined relations are pushed to the relation extraction phase (Stage 3), which we describe next.

6 Marketing Relation Prediction

Figure 1: Rule-regularized Selection in a Semi-Supervised Relation Prediction Framework

Relation prediction is the task of predicting a set of structured triplets (subject, relation, object) from a sentence encoding marketing knowledge. Figure 1 shows the framework. This process is performed in two steps: i) candidate relation mention extraction i.e. extracting from corpus where is a sentence, and and are marketing terms, ii) relation extraction, i.e. predicting a relation given a relation-mention .

Relation-mention Extraction

We get candidate NPs for each sentence from the second stage of our pipeline. We heuristically eliminate NP-pairs that are connected via a path with length more than in the tree. This provides a set of unlabeled relation-mentions . We sample from this set and consult a marketing expert to provide correct labels for a small set of relation-mentions, which finally creates the set of labeled relation-mentions and set of unlabeled relation mentions .

DualRE: Semi-Supervised Relation Extraction

Given a set of labeled () relation-mentions and a set of unlabeled relation-mentions (), our goal is to learn a relation prediction model that represents the training data and captures the information from the unlabeled data . We follow the framework proposed in [12]. It consists of a prediction module and a retrieval module , where and are the model parameters. The prediction module’s task is to represent the function , i.e. predicting the relation given the relation-mention . It models the conditional probability for a mention-label pair . The retrieval module complements above by retrieving relevant relation-mentions given a specific relation. Hence, it models for a mention-label pair. As for a given relation , the retrieval module estimates the joint probability and induces a ranking over different mentions for a label . The overall objective function is given by


can be calculated using a cross-entropy loss between the ground truth and predicted labels, as shown in Equation 1. The objective is approximated using a ranking loss:


where is a labeled pair in , is an incorrect relation pair with a relation mention , is mention encoding for , and are the embeddings of the relations and . Lastly, is approximated by the lower bound: .

DualRE Learning Algorithm: As proposed in [12], an Expectation Maximization approach is used to jointly learn the modules. In the E-step, the prediction module is learned by fixing . Calculating the gradient of with respect to amounts to:

where the first and second terms correspond to , and respectively. Similarly, in the M-step, the retrieval module is updated fixing . The gradient with respect to is calculated as:

where the first and second terms correspond to , and respectively. Both the steps require sampling from unannotated data. It is assumed that sampling from the averaged distributions, i.e. , is less noisy. Hence, samples are annotated using the intersection of these two modules before every iteration. For each iteration, the labeled dataset is added with the two modules’ annotations (best predictions) to form . Then and are updated according to the E-step and M-step equations.

Wt.s Rules Semantics
causes implies enables.
a2 can not be both first and last sub-event.
affects implies causes.
relatedTo is symmetric.
partOf implies hasA.
synonym is symmetric.
Table 2: Set of rules used to act as constraint over the world of grounded predicates.
dependsOn(Lifecycle, Trial)
dependsOn(Segmentation, Selection)
causes(Free_samples, Potential_purchasestate)
partOf(Product_class, Brand)
dependsOn(Diffusion, Environmental_change)
partOf(Home_ownership, Religion)
hasProperty(Physical_product, Distribution)
leadsTo(Product, Dimensions)
partOf(Sex, Demographic)
dependsOn(Trial, Sampling)
motivatedByGoal(Price_reduction, Encroachment)
relatedTo(Place, Target_market_segment)
leadsTo(Product, Growth_phase)
affects(Government_regulations, Product_lifecycle)
Table 3: Some examples of annotated ground-truth relations treated as predicates in Markov Logic Network.

Rule-Regularized Semi-supervised Relation Prediction

Given the dearth of annotations in the marketing domain, we observe that prior rules over the relations can act as (global) constraints. A major drawback of the independence assumption of different samples is that the predictor () is free to predict any conflicting relations between two concepts - such as A can not be both first and last sub-event of B (if B has more than one sub-event). Hence, weighted simplified rules can act as constraints. This requires us to solve two problems i) how to acquire the rules, and ii) how to integrate these rules with the predictor.

Firstly, these rules might be incomplete and should not be modeled as hard constraints. To model this ambiguity, probabilistic logical mechanisms such as MLN [18] becomes a natural choice. The rules can be learned from the set of expert-provided ground truth relations using MLN’s standard structure learning algorithms [9]. In our case, the closed-world assumptions and sparse annotations force the MLN structure-learner to learn only unary clauses. Instead, we write the rules ourselves and then use MLN weight learning algorithm to learn the weights. We treat the ground-truth annotated relations as predicates of truth-value 1 (examples in Table 3), and use a few rules that can act as constraints. The rules and examples of ground truth are shown in Table 2. Let the set of rules be denoted by , where . Using MLN’s weight learning algorithm, we then learn the weights for each rule in .

input : Labeled data , unlabeled data , Weighted Rules
Initialize: ;
Pre-train prediction and retrieval module using L;
Compute from and using (Eqn. 3));
while  do
       Retrieve instances using intersection of and module;
       Remove from and add them to ;
       Optimize using both and (Eqn. 1);
       Optimize using both and (Eqn. 2);
       Compute again using and (Eqn. 4);
end while
Algorithm 1 Rule-Regularized DualRE Learning Algorithm
Relations Explanation Example(s)
LeadsTo A results in occurrence of B. The occurrence can be through other states, not necessarily direct. Homogeneous preference (among consumers) (A) leads to competition (more competition) (B)
UsedBy Usage of A by B for achieving some end state. Applies to both companies and consumers. Product dimension (A) is used by consumer (B) to make a choice.
ImportantTo A is a quality or characteristic that is salient to / for B. For a marketer, it is valuable to highlight some characteristics as particularly important, more than merely identifying them as a characteristic. New dimension of product (A) is important to consumers (B). New dimension of product (A) is important for product positioning (B) by marketer.
Affects A can have impact on B, does not mean it will have an impact [inverse - AffectedBy]. Government regulation (A) affects product life cycle (B). Style and fashion affects product life cycle. Political influence affects govenrment regulation.
Enables A can facilitate the occurrence of B [inverse - EnabledBy]. Good positioning is enabled by strong advertising claims. Perception and choice consumers form are enabled by product attribute.
PartOf A is a characteristic, which marketer associates with B. Demographics (A) is a part of consumer (B). Price sensitivity (A) is a part of consumer preference (B).
HasFirstSubevent A can start to happen when B starts to occur. For diffusion of innovation (A) to occur the first subevent of adopt[ing] new product [by consumers](B) is necessary.
HasA A possesses certain traits B. A may not possess always. Company (A) has strong patents (B). Consumer (A) has a higher price elasticity (B).
Synonym A and B are often considered similar in what they convey. Attitude segmentation (A) is synonymous with psychographics (B).
UsedFor Purpose of A is to achieve B. Perceptual map (A) is used by marketer to identify gaps (B) in marketplace.
RelatedTo As in ConceptNet5, interpreted as a general relation. In marketing, many relations take this form, since pin pointing directionality is very difficult, without considering many other factors of context ad environment. Maturity stage of product (A) in life cycle is related to product’s ease of use by consumers (B).
Causes A can cause B; although not always. In marketing, causal-relations are soft in scope, that is, does not mean A implies B [inverse - CausedBy]. Good positioning (A) of a product causes high trial rate (B) of the product.
Table 4: Illustrative relations with explanation and examples. The top five relations are new, while others come from ConceptNet5. We also use CausesDesire, HasPrerequisite, MotivatedByGoal, HasProperty, DependsOn, CapableOf are ommitted. These will be included in appendix.

Knowledge Integration: For integrating the knowledge in these soft rules, we follow the idea of projecting the learnt predictor function into a rule-regularized subspace [7]. The authors propose a generic way to learn a teacher distribution from a student distribution and a set of rules. Essentially, the teacher () is learned by optimizing the KL-divergence with the student and the constraints imposed by the grounded rules, as follows:


As hard rules evaluate to 1.0, these constraints try to ensure that should be as close to 1. Solving the above equation amounts to computing a closed-form solution as given in Equation 4 in [7], which we reproduce here for convenience:


To calculate the second term, we use concepts from MLN inference and T-Norm equations. Primarily, for a predicate and the input (i.e., ignoring the sentence information), we assume truth-value of to be 1 and calculate the value for each grounding of each rule. Essentially, this provides an estimate of number of grounded rules satisfied by the query . Here, the truth value of a grounded rule is computed using Lukasiewicz’s T-norm equations. This is a sharp departure from the way this equation is computed in practice by [7]2. Overall, we change slightly the DualRE learning algorithm to Algorithm 1. Equation 4 is computed using the current labeled data and the set of weighted rules.

7 Experiments and Results

To evaluate the pipeline for relation prediction we use an annotated data set. In creating this ground truth from a well-regarded marketing text corpus, out of a total 1748 candidate triples in 231 sentences, 415 triples are annotated by hand. The annotation is done by a marketing expert with more than two decades of consulting and managerial-teaching experience in marketing in the US. In doing this annotation, the expert is provided with relations from ConceptNet. The relation semantics are altered to fit the needs of the domain. A total of 19 relations are used (18 in Table 4 and one for no relations). Given this annotated dataset, at first we extract the set of features such as tokenized words, parts-of-speech tags, subject and object position indicators for each labeled and unlabeled relation-mentions. In a difference with the DualRE implementation we ignore the object and subject types (and NER tags) as the concepts in our MKG are not named enitites and there is no well-defined ontology to the best of our knowledge. We use 53 triplets each for validation and test set, and the rest of annotated and unlabeled data are used to create the train set. We use the annotated part as train () and unannotated part as raw () according to script in [12]3. For the baseline DualRE, we run their DualRE-pointwise variant. For the rule-regularized version, we run the MLN weight learning algorithm a priori and then provide weighted rules (in Table 2) as inputs to the Algorithm 1. We use the similar EM-based algorithm and run for 10 iterations. We report the final precision, recall and F1 scores for the validation and test set in Table 5.

dev test
P R F1 P R F1
DualRE 96.4 50.9 66.6 59.4 41.5 48.8
DualRE+Rules 88.9 60.4 71.9 28.6 56.6 37.9
Table 5: Results on the Segmentation chapter. We report precision, recall and F1 scores for both validation and test set.

Ablation Study

dev test
P R F1 P R F1
DualRE+Rules 88.9 60.4 71.9 28.6 56.6 37.9
DualRE+R/{Rj} 88.9 60.4 71.9 28.6 56.6 37.9
DualRE+R/{R2,4,6} 91.17 58.5 71.3 31.4 50.9 38.8
DualRE+R5 92.0 59.6 72.3 32.5 50.5 39.54
Table 6: Ablation study to see the effect of removing each of the rules from the set.
P R F1
Affects B 75 100 85.71
R 50 100 66.7
DependsOn B 54.55 46.15 50.00
R 44.4 61.5 51.6
LeadsTo B 33.33 37.5 35.39
R 11.3 75 19.7
MotivatedByGoal B 50 33.3 40.0
R 25 33.3 28.5
PartOf B 90.0 60 72
R 50 80 61.5
Table 7: DualRE baseline (B) and rule-regularized (R) results for the relations in the test set.

One of the contributions of this work is to learn the weights of rules using MLN and integrate this knowledge for improving the accuracy in our relation extraction task. So, as an ablation study, we experiment with removing each rule and observing the impact on overall scores. The scores are reported in Table 6. While we observe that the final scores after removing individual rules do not differ significantly, removing subset of rules makes the end-to-end difference in precision and recall more prominent. We observe, that as we decrease the number of rules precision increases and recall value decreases. In fact, as the set of rules shrinks, we choose to be less restricted in terms of selecting new samples in . For convenience, we also show per-relation statistics in the test set in Table 7. As our test set is relatively small (because of the limited annotations), most of the other relations occur at most twice and hence we omit them from the table.

8 Discussion and Conclusion

For human managers, marketing decision making is often a complex combination of years of experience in the field, knowledge from text and case studies, and insights from current data. Current technologies provide a peek into utilizing the massive amount of analytics data often available to corporations, but interpreting the data without the lens of knowledge can often send incorrect signals. We intend to bridge the gap by creating a marketing knowledge graph by capturing the knowledge in marketing text. In doing so, the dearth of annotations invokes a well-known, although less-addressed, challenge of predicting relations in a semi-supervised setting. We investigate the effects of integrating hand-coded rules with learned weights as (global) constraints in a semi-supervised relation prediction method and observe improvements. We observe that while trying to learn the rules from a small set of annotated triplets using MLN, the closed world assumption forces the learner to learn only unary clauses. Our current choice of rule integration method leads us to believe that removing a single rule does not affect the results much (and often not at all). Even though, adding rule-based constraints seem to be the intuitive way of integrating prior knowledge in the prediction formulation, final results are not always conclusive. These results yearn for future research in these directions.


  1. (https://hbswk.hbs.edu/item/making-right-choices-art-or-science)
  2. As discussed in [10], the mathematical equations do not fully match the code released by authors of [7]
  3. Code: https://github.com/INK-USC/DualRE


  1. K. Barker, V. K. Chaudhri, S. Chaw, P. E. Clark, D. Hansch, B. E. John, S. Mishra, J. Pacheco, B. Porter, A. Spaulding and M. Weiten (2007) AURA: enabling subject matter experts to construct declarative knowledge bases from science textbooks. In Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 2, AAAI’07, pp. 1960–1961. External Links: ISBN 978-1-57735-323-2, Link Cited by: §2.
  2. D. Chen and C. Manning (2014-10) A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 740–750. External Links: Link, Document Cited by: §5.
  3. J. Ebrahimi and D. Dou (2015) Chain based rnn for relation classification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1244–1249. Cited by: §2.
  4. J. Feng, M. Huang, L. Zhao, Y. Yang and X. Zhu (2018) Reinforcement learning for relation classification from noisy data. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.
  5. S. Guo, Q. Wang, L. Wang, B. Wang and L. Guo (2018) Knowledge graph embedding with iterative guidance from soft rules. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.
  6. J. Hao, M. Chen, W. Yu, Y. Sun and W. Wang (2019) Universal representationlearning of knowledge bases by jointly embedding instances and ontological concepts. Cited by: §2.
  7. Z. Hu, X. Ma, Z. Liu, E. Hovy and E. Xing (2016-08) Harnessing deep neural networks with logic rules. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 2410–2420. External Links: Link, Document Cited by: §2, §6, footnote 2.
  8. N. Kambhatla (2004) Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 178–181. Cited by: §2.
  9. S. Kok and P. Domingos (2005) Learning the structure of markov logic networks. In Proceedings of the 22nd international conference on Machine learning, pp. 441–448. Cited by: §6.
  10. K. Krishna, P. Jyothi and M. Iyyer (2018) Revisiting the importance of encoding logic rules in sentiment classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4743–4751. Cited by: footnote 2.
  11. D. B. Lenat, R. V. Guha, K. Pittman, D. Pratt and M. Shepherd (1990-08) Cyc: toward programs with common sense. Commun. ACM 33 (8), pp. 30–49. External Links: ISSN 0001-0782, Link, Document Cited by: §2.
  12. H. Lin, J. Yan, M. Qu and X. Ren (2019) Learning dual retrieval module for semi-supervised relation extraction. In The World Wide Web Conference, pp. 1073–1083. Cited by: §2, §6, §6, §7.
  13. G. A. Miller (1995) WordNet: a lexical database for english. Communications of the ACM 38 (11), pp. 39–41. Cited by: §2.
  14. A. Minard, A. Ligozat, A. Ben Abacha, D. Bernhard, B. Cartoni, L. Deléger, B. Grau, S. Rosset, P. Zweigenbaum and C. Grouin (2011) Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. Journal of the American Medical Informatics Association 18 (5), pp. 588–593. Cited by: §2.
  15. M. Miwa and M. Bansal (2016) End-to-end relation extraction using lstms on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1116. Cited by: §2.
  16. N. Ostapuk, J. Yang and P. Cudré-Mauroux (2019) Activelink: deep active learning for link prediction in knowledge graphs. In The World Wide Web Conference, pp. 1398–1408. Cited by: §2.
  17. H. Paulheim (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic web 8 (3), pp. 489–508. Cited by: §1.
  18. M. Richardson and P. Domingos (2006) Markov logic networks. Machine learning 62 (1-2), pp. 107–136. Cited by: §3, §6.
  19. S. Riedel, L. Yao and A. McCallum (2010) Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Cited by: §2.
  20. G. Rosemblat, D. Shin, H. Kilicoglu, C. Sneiderman and T. C. Rindflesch (2013) A methodology for extending domain coverage in semrep. Journal of biomedical informatics 46 (6), pp. 1099–1107. Cited by: §2.
  21. R. Speer, J. Chin and C. Havasi (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §2, §2.
  22. L. Sterckx, T. Demeester, J. Deleu and C. Develder (2014) Using active learning and semantic clustering for noise reduction in distant supervision. In 4th Workshop on Automated Knowledge Base Construction at NIPS2014 (AKBC-2014), pp. 1–6. Cited by: §2.
  23. P. Wang, D. Dou, F. Wu, N. de Silva and L. Jin (2019) Logic rules powered knowledge graph embedding. arXiv preprint arXiv:1903.03772. Cited by: §2.
  24. Q. Wang, Z. Mao, B. Wang and L. Guo (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29 (12), pp. 2724–2743. Cited by: §2.
  25. P. Xu and D. Barbosa (2019-06) Connecting language and knowledge with heterogeneous representations for neural relation extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 3201–3206. External Links: Link, Document Cited by: §2.
  26. D. Zeng, K. Liu, S. Lai, G. Zhou and J. Zhao (2014) Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335–2344. Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description