Katecheo: A Portable and Modular System for Multi-Topic Question Answering
We introduce a modular system that can be deployed on any Kubernetes cluster for question answering via REST API. This system, called Katecheo, includes three configurable modules that collectively enable identification of questions, classification of those questions into topics, document search, and reading comprehension. We demonstrate the system using publicly available knowledge base articles extracted from Stack Exchange sites. However, users can extend the system to any number of topics, or domains, without the need to modify any of the model serving code or train their own models. All components of the system are open source and available under a permissive Apache 2 License.
When people interact with chatbots, smart speakers or digital assistants (e.g., Siri
Developers could support question answering using publicly available chatbot platforms, such as Watson Assistant
To overcome the burden of programming intents, developers might look towards more advanced question answering systems that are built using open domain question and answer data (e.g., from Stack Exchange or Wikipedia), reading comprehension models, and document search techniques. In particular, Chen et al. previously demonstrated a two step system, called DrQA, that matches an input question to a relevant article from a knowledge base and then uses a recurrent neural network (RNN) based comprehension model to detect an answer within the matched article. This more flexible method was shown to produce promising results for questions related to Wikipedia articles and it performed competitively on the SQuAD benchmark Rajpurkar et al. (2016).
However, if developers want to integrate this sort of reading comprehension based methodology into their applications, how would they currently go about this? They would need to wrap pre-trained models in their own custom code and compile similar knowledge base articles at the very least. At the most, they may need to re-train reading comprehension models on open domain question and answer data (e.g., SQuAD) and/or implement their own knowledge base search algorithms.
In this paper we present Katecheo, a portable and modular system for reading comprehension based question answering that attempts to ease this development burden. The system provides a quickly deployable and easily extendable way for developers to integrate question answering functionality into their applications. Katecheo includes three configurable modules that collectively enable identification of questions, classification of those questions into topics, document search, and reading comprehension. The modules are tied together in a single inference graph that can be invoked via a REST API call. We demonstrate the system using publicly available knowledge base articles extracted from Stack Exchange sites
The rest of the paper is organized as follows. In the next section, we provide an overview of the system logic and its modules. In Section 3, we outline the architecture and configuration of Katecheo, including extending the system to an arbitrary number of topics. In Section 4, we report some results using public knowledge base articles. Then in conclusion, we summarize the system, its applicability, and future development work.
2 System Overview
Katecheo is partially inspired by the work of Chen et al. on DrQA. That previously developed method has two primary phases of question answering: document retrieval and reading comprehension. Together these functionalities enable open domain question answering. However, many dialog systems are not completely open domain. For example, developers might want to create a chatbot that has targeted conversations about restaurant reservations and movie times. It would be advantageous for such a chatbot to answer questions about food and entertainment, but the developers might not want to allow the conversation to stray into other topics.
One of the goals of Katecheo was to create a question answering system that is more flexible than those relying on curated responses while remaining more targeted than a completely open domain question answering system. The system includes document retrieval (or what we refer to as “knowledge base search”) and reading comprehension, but only within sets of curated knowledge base articles each corresponding to a particular topic (e.g., food and/or entertainment).
When a question text is input into the Katecheo system, it is processed through three modules: (1) question identification, (2) knowledge base search, and (4) reading comprehension. This overall logic is depicted in Figure 1.
2.1 Question Identification
The first module in Katecheo, question identification, determines if the input text (labeled Q in Figure 1) is actually a question. In our experience, users of dialog systems provide a huge number of unexpected inputs. Some of these unexpected inputs are questions and some are just statements. Before going to the trouble of matching a knowledge base article and generating an answer, Katecheo completes this initial step to ensure that the input is a question. If the input is a question, the question identification module (henceforth the “question identifier”) passes a positive indication/flag to the next module indicating that it should continue processing the positively identified question. Otherwise, it passes an error flag that ends the processing.
The question identifier uses a rule-based approach to question identification. As suggested in Li et al., we utilize the presence of question marks and 5W1H words to determine if the input is a question. To test this question identification technique, we extracted 3000 questions from the SQuAD dev data set Rajpurkar et al. (2018). We also extracted an equal number of statements from the contexts in the SQuAD dev data set using sentence segmentation. The overall accuracy of our rule-based approach to question identification on this test data is 95.1%, and the question identifier tends to predict very few false negatives as compared to false positives (as shown in Table 1).
2.2 Knowledge Base Search
Assuming the input text is identified as a question, a search is made to match the question with an appropriate knowledge base article from a set of user supplied knowledge base articles, which set corresponds to a user supplied topic. In this way, the system also assigns the question to a particular topic. The matched article will be utilized in the next stage of processing to generate an answer.
The user supplies one or more sets of knowledge base articles, where each set is tagged with a topic name. These sets of articles are formatted into JSON format with an array containing the knowledge base articles. Each article in the array has a title field, a body field, and an article ID field.
In the knowledge base search module of Katecheo (henceforth the “KB Search” module), articles and questions are compared using TF-IDF vectors and cosine similarity. However, given the potential for multiple sets of knowledge base articles (corresponding to multiple topics), these articles can be searched in (1) a segmented manner (where each set of knowledge base articles is searched in isolation), or (2) a combined manner (where the sets of knowledge base articles are concatenated and one search is performed on the concatenated sets).
To investigate the implications of using segmented or combined searching, we compiled a set of 6,110 Medical Sciences related articles from the WebMD site
|3,000 articles from each set|
|All 12,228 articles|
The most accurate method of searching through articles for smaller numbers of articles (e.g., 6000 or less in total) is the combined search method, while larger sets of articles seem to benefit from a segmented search strategy. Given that Katecheo users are expected to upload their own knowledge base articles for particular topics, we decided to employ the combined search strategy and assume that users will be able to curate their knowledge base articles into appropriately sized sets. In the future we anticipate extending Katecheo to optionally support both segmented and combined search depending on the size of input knowledge bases.
In addition to matching a question to an article and topic, the KB search module of Katecheo also uses a cosine similarity threshold value to filter out completely off topic questions (i.e., those that are dissimilar to one of the user supplied topics). This threshold helps Katecheo achieve the goal of being flexible while not completely open domain, which is often a goal of chatbot and assistant applications. The threshold value is optionally configurable by the user and can range from 0 to 1. Higher threshold values of cosine similarity will result is a very strict enforcement of questions aligning with the supplied topics, and lower threshold values result in more tolerance.
To choose an appropriate default threshold value for cosine similarity in the TF-IDF based search, we performed another experiment using the WebMD and GotQuestions data. In this case, we added an additional 50 off topic questions that were dissimilar to the Christianity and Medical Science topics, such as questions about sports or movies. We then varied the cosine similarity threshold while observing the number of correctly identified on-topic questions and off-topic questions. The results of this analysis are shown in Figure 2. Because we will be using a combined approach to search and wanted to filter out the majority of off topic questions, we use 0.15 as our default cosine similarity threshold.
2.3 Reading Comprehension
The final module of the Katecheo system is the reading comprehension (or just “comprehension”) module. This module takes as input the original input question plus the matched knowledge base article body text and uses a reading comprehension model to select an appropriate answer from within the article.
Users of Katecheo can configure the system to utilize one of a two different pre-trained, reading comprehension models. In the current release, users can choose between: (1) a Bi-Directional Attention Flow, or BiDAF, model Seo et al. (2017); and (2) a large BERT Devlin et al. (2018) whole-word model fine-tuned on SQuAD 1.0 Rajpurkar et al. (2018). We are using a pre-trained version of BiDAF available in the AllenNLP Gardner et al. (2017) library and the pre-trained version of BERT available in the transformers library Wolf et al. (2019). By default Katecheo will use the BERT model.
3 Architecture and Configuration
All three of the Katecheo modules are containerized with Docker Merkel (2014) and are deployed as pods on top of Kubernetes Hightower et al. (2017) (see Figure 3). In this way, Katecheo is completely portable to any standard Kubernetes cluster including hosted versions in AWS, GCP, Digital Ocean, Azure, etc. and on-premises version that use vanilla Kubernetes, OpenShift, CaaS, etc.
To provide developers with a familiar interface to the question answering system, we provide a REST API interface. Developers can call Katecheo via a single endpoint with ingress to the system provided by Ambassador
To specify the topic names and topic knowledge base JSON files (as mentioned in reference to Figure 1), the user need only fill out a JSON configuration file template listing the topic name and URL link for each knowledge base file. These could be static files or served via a separate API as appropriate. Once the configuration file is created, a Bash deploy script can be executed to automatically deploy all of the Katecheo modules to a Seldon-enabled Kubernetes cluster.
4 Example Usage
|Question||Matched Topic||Matched Article Title||Answer (BERT)||Answer (BiDAF)|
|Why do we get cold sores?||Med. Sciences||Cold sores: why do we get them on the lips?||In times of stress, fever, illness or even over exposure to sunlight,||an infection with the type 1 or Type 2 herpes simplex virus|
|How should you treat people with high risk factors for coronary heart disease?||Med. Sciences||Can Prediabetes cause coronary heart disease?||aspirin and/or statins||aspirin and/or statins|
|What is the best way to reduce pain and swelling in a knee joint?||Med. Sciences||Pain in knee joint||Applying cold compresses||Applying cold|
|Which would kill you first, hypothermia or frost bite?||None||None||None||None|
|Question||Matched Topic||Matched Article Title||Answer (BERT)||Answer (BiDAF)|
|What does LDS theology and official teaching say about who goes to Hell?||Christianity||Who goes to hell in LDS theology?||all who have died without the knowledge of truth||rejected the truth will|
|What is the Messianic Secret?||Christianity||Jesus concealing his identity||a prohibition to make known the messianic character of Jesus.||a prohibition to make known the messianic character of Jesus|
|What did Bart Ehrman say about Church scribes and the Bible?||Christianity||How do apologists defend against Bart Ehrman’s arguments that Church scribes corrected and changed the Bible to fit their theology?||corrected and changed the Bible to fit their theology?||to fit their theology|
|What is the biblical basis for God being omnipresent?||None||None||None||None|
We demonstrated the utility of Katecheo by deploying the system for question answering in two topics, Medical Sciences and Christianity. These topics are diverse enough that they would warrant different curated sets of knowledge base articles, and we can easily retrieve knowledge base articles for each of these subjects from the Medical Sciences
Example inputs and outputs of the deployed system are included in Table 3. As can be seen, the system is able to match many questions with an appropriate topic and article and subsequently generate an answer using the implemented comprehension models. Not all of the answers would fit into conversational question answering in terms of naturalness, but others show promise.
In our experience, the number and curation of the knowledge base articles directly influences the performance of the system more than any other single variable. The KB search module in our example deployment consistently matched more Medical Sciences questions to relevant articles than it did Christianity questions. The Christianity articles seem to be much more similar to one another in terms of vocabulary than the Medical Sciences articles. This vocabulary overlap creates problems for the KB Search module and may reduce the cosine similarity scores for any Christianity related question. In turn, reduced similarity scores cause many relevant questions to be classified as off-topic.
We recommend careful curation of knowledge base articles for input to Katecheo. Smaller topical knowledge bases with high quality, diverse articles will result in better question-to-article matching than large quantities of articles acquired indiscriminately.
In conclusion, Katecheo is a portable and modular system for reading comprehension based question answering. It is portable because it is built on cloud native technologies (i.e., Docker and Kubernetes) and can be deployed to any cloud or on-premise environment. It is modular because it is composed of three configurable modules that collectively enable identification of questions, classification of those questions into topics, a search of knowledge base articles, and reading comprehension.
Initial usage of the system indicates that it provides a flexible and developer friendly way to enable question answering functionality for multiple topics or domains via REST API. That being said, the current configurations of Katecheo performs best with smaller sets of knowledge base articles (6000 or less). We plan to overcome this limitation by upgrading the TF-IDF based document search implementation using, e.g., n-grams and/or more sophisticated language models. In addition, future development of Katecheo will include features that allow users to (i) dynamically adjust the cosine similarity threshold and reading comprehension model, (ii) utilize other reading comprehension models or even custom reading comprehension models, and (iii) return multiple answers or an ensembled answer generated from multiple comprehension models.
The complete source code, configuration information, deployment scripts, and examples for Katecheo are available at https://github.com/cvdigitalai/katecheo. A screencast demonstration of Katecheo is available at here. A example Streamlit
- Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke S. Zettlemoyer. 2017. Allennlp: A deep semantic natural language processing platform.
- Kelsey Hightower, Brendan Burns, and Joe Beda. 2017. Kubernetes: Up and Running Dive into the Future of Infrastructure, 1st edition. O’Reilly Media, Inc.
- Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, and Edward Y. Chang. 2011. Question identification on twitter. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 2477–2480, New York, NY, USA. ACM.
- Silvia Lovato and Anne Marie Piper. 2015. ”siri, is this you?”: Understanding young children’s interactions with voice input systems. In Proceedings of the 14th International Conference on Interaction Design and Children, IDC ’15, pages 335–338, New York, NY, USA. ACM.
- Dirk Merkel. 2014. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239).
- Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
- Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional attention flow for machine comprehension. ArXiv, abs/1611.01603.
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.