Fast Online ’Next Best Offers’ using Deep Learning

Fast Online ’Next Best Offers’ using Deep Learning

Rekha Singhal TCS ResearchMumbai Gautam Shroff TCS ResearchDelhi Mukund Kumar TCS ResearchMumbai Sharod Roy Choudhury TCS ResearchMumbai Sanket Kadarkar TCS ResearchMumbai Rupinder Virk TCS ResearchMumbai Siddharth Verma TCS ResearchDelhi  and  Vartika Tewari TCS ResearchDelhi

In this paper we present iPrescribe, a scalable low-latency architecture for recommending ’next-best-offers’ in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time streaming technology stacks. iPrescribe uses ensemble of deep learning and machine learning algorithms for prediction. We describe the scalable real-time streaming technology stack and optimised machine-learning implementations to achieve a 90th percentile recommendation latency of 38 milliseconds. Optimizations include a novel mechanism to deploy recurrent Long Short Term Memory (LSTM) deep learning networks efficiently.

journalyear: 2019copyright: acmcopyrightconference: 6th ACM IKDD CoDS and 24th COMAD; January 3–5, 2019; Kolkata, Indiaprice: 15.00doi: 10.1145/3297001.3297029isbn: 978-1-4503-6207-8/19/01

1. Introduction

Data analytics has evolved from descriptive, diagnostic and predictive to prescriptive analytics for effective business operations. Prescriptive analytics refers to ’what shall I do’ to engage customers in Business to Consumer (B2C) systems using recommendations and/or campaigns. Next Best Offer (NBO) is an extension of recommendation system keeping business objectives in focus. NBO refers to the ’right’ offer, given to a customer at ’right’ time in online setting while maximizing business objectives. iPrescribe is a high performance implementation of NBO which co-locates with B2C system.

B2C system processes millions of transactions per second with low latency and thus requires high scale. Any recommendation system interfacing with B2C system in real time necessitates to cope up with its performance and workload. Therefore, iPrescribe, needs to support high throughput and very low (few milliseconds) latency for making recommendations model inference with high accuracy.

iPrescribe uses analytical model to predict a customer’s repeat probability, which needs to be built using machine and/or deep learning techniques on transaction data, social feeds and data from other business channels. This prediction model is used to assign offers to users while optimizing business objective functions  (Kazmi et al., 2016).

Machine learning models work best when used with the right features. Deep learning models gives higher accuracy, however, model inference using RNNs (such as LSTMs) requires each user’s history for each prediction, leading to higher model inference time (Salehinejad and Rahnamayan, 2016). Combining multiple models’ predictions yields better accuracy (Cheng Ju, 2017) than traditional collaborative filtering and segmentation techniques (Ekstrand et al., 2011).

iPrescribe uses ensemble of machine learning (gradient boosting, XGBoost (xgb, 2018)), and deep learning (Long Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997)) with optimisation to avoid having to store and process user-histories for every prediction. We have achieved reasonable accuracy for the domain in question, as measured by Area Under Curve (AUC, where True Positive Rates are plotted against the False Positive Rates) is 0.67 and F-Score (Harmonic mean of precision and recall) is 0.3843 for our ensemble model. The model is configurable and can be build for any business domain since iPrescribe uses meta-models, which define business transaction data sets and functions for creating business specific features. The details on meta-model is out of scope of this paper and here, we focus only on the design, implementation and optimizations of iPrescribe system.

We have employed optimizations for XGBoost and LSTM model inference on transaction data, to reduce inference time. Further, performance gains in reducing recommendation latency are obtained by tuning open source stream processing architecture stack used to build iPrescribe. The use of open source modular technology stack brings agility to the iPrescribe architecture in a way that it can be built with different technology choices for each layer in the stack as discussed later in the Section 4, comparable to systems such as (rs, 2017) that also use open source big data technology. The proposed architecture allows iPrescribe to be autonomous so that it can detect degradation in the system performance and scale out.

1.1. Contributions

Our key contributions are as follows:

  • Real-time high performance streaming architecture for ‘next-best-offer’ recommendations, achieving low recommendation latency (90th percentile 38 milliseconds) and scalable high throughput using open source big data technology.

  • Optimizations in XGBoost and LSTM algorithms implementations to reduce model inference time.

The rest of this paper is organized as follows: Section 2 presents the related work, Section 3 presents the requirements for designing iPrescribe architecture. Section 4 describes components of iPrescribe architecture using open source technology. Section 5 presents performance optimizations in iPrescribe architecture which includes both model specific and technology stack specific optimizations for design exploration. Model specific optimization subsection also discusses the accuracy of the ensemble model. Section 6 presents the performance evaluation of iPrescribe architecture on two publicly available e-commerce data sets. Finally the paper concludes in Section 7.

2. Related Work

Lot of work done has been done in the area of personalized recommendation (Verbitskiy et al., 2015) and campaign management (HenryChan, 2008). Recommendation engines are in existence since a decade. (rs, 2017) lists the open source recommender systems.Their focus has been on the model accuracy (Levandoski et al., 2011). Traditionally, collaborative filtering has been used to build customer repeat probability prediction model (Verbitskiy et al., 2015; Chantat Eksombatchai and Leskovec, 2017; Chongxiao Cao and Waddington, 2013). However, predictive accuracy is substantially improved when blending multiple predictors (Cheng Ju, 2017), which may require continually updated features for better accuracy.

In open source recommendation system,  (Sel, 2016) based on big data technology is quite closer to iPrescribe architecture and has shown around 99th percentile recommendation latency as 200 ms, whereas iPrescribe shows 4 times better latency in Fig 4.  (Agarwal et al., 2011) performs multiple passes over data to build model using Hadoop technology but, iPrescribe require single pass on data to build feature dictionary which is then used to build concatenated one-hot vectors and model in lesser time.  (Davidson et al., 2010) uses bigtables to build recommendation model offline therefore features are not updated in real time unlike iPrescribe. (Sheth and Kaiser, 2013) has used caching to improve performance for recommendation system in terms of reducing latency and increasing throughput similar to iPrescribe in-memory feature store for similar performance gains. (HengTze Cheng and Shah, 2016) is closest to our work. They also employ ensemble of machine and deep learning algorithms to build recommender systems. They have published around 14 ms as recommendation latency, however architecture details are not shared. Moreover, their system is not claimed to be configurable for different data sets. (Srinivasan et al., 2016) had proposed a similar requirement architecture for real time operational DBMS.

3. Design Requirements

For real time deployment of iPrescribe, on-line users’ actions on B2C system are processed and applied on the already built model in real time for deciding the right offer. Therefore, iPrescribe, as shown in Fig 1 requires two types of interfaces: Batch and Real-time.

Figure 1. iPrescribe with B2C System

In this paper, we consider ensemble of LSTM and XGBoost algorithms in Keras and Python111Python is chosen being most popular among data scientists respectively to build customer repeat probability prediction model. Each of these techniques requires creating features from transaction data and then building respective models. Transaction data depicts temporal behaviour of users which is captured in features and need to be updated continuously. For example, a feature indicating a ”user’s buying behaviour in the last two hours” will be different in morning and evening of a day. The current value of such features improves the accuracy of XGBoost algorithm. These features are kept updated by getting all users’ actions from B2C system through one of the real time interfaces of iPrescribe.

3.1. Interfaces Performance Metrics

iPrescribe, as shown in Fig 1 requires two types of interfaces,batch interface to train the XGBoost and LSTM models and real time interfaces to capture users’ on-line actions and/or transactions on B2C system and return relevant offers. Thus iPrescribe has three interfaces. A batch interface, ’Startup’, to build model from transaction data and deploy in memory. Two real time interfaces, ’Recommend’ and ’FeatureUpdate’ for getting best offer for user and updating model features respectively. Fig 2 shows the detail processing involved at each layer of the architecture stack for these interfaces, with timing notations ’Tx’ refer to time required by each component for processing and also time taken for data transfer from one component to another.

Figure 2. iPrescribe Interfaces Processing

3.1.1. Startup Interface

This interface performs the complete pipeline of building model from transaction data: feature creation, training and testing the model. The feature creation process reads transaction data file and builds concatenated feature one-hot vectors which are also stored in data store. These concatenated one-hot vectors for each target (e.g. user) are used to build the XGBoost model. Similarly, LSTM model is built using the transaction sequences for each user. The performance metric for this interface is model building time which includes time to create the concatenated feature one-hot vectors and then training the model, this is referred as .


3.1.2. Recommend Interface

This real time interface is invoked through iPrescribe connector for every action message which need to trigger an offer for a user. This is a closed loop system, therefore, the performance metric of interest for this interface are recommendation latency (RL) and throughput. This involves retrieving the user’s context from an incoming message, fetching its features for both the models, preparing the concatenated feature one-hot vectors for XGBoost and input vector/matrix for LSTM models, ensemble inference of both the models to get the best offer for the user. Finally, the assigned offer is sent back to the B2C system through the connector. is time taken to send offer to the B2C system once iPrescribe receive a message.


Throughput is measured as number of messages serviced per second. Ideally, throughput shall increase linearly with increase in message ingestion rate till the system is fully utilized. Therefore,


where and are number of cores at message processing and model inference layers respectively.

3.1.3. FeatureUpdate Interface

This real time interface is invoked through iPrescribe connector for every action on the B2C system. Its purpose is to keep the features updated for every action. This is an open system therefore, the performance metric for this interface is only throughput. The workflow involves retrieving user details from an incoming message, fetching the user’s existing features, updating the features with current context and storing it back in the feature store. Throughput is measured as the maximum number of messages serviced per second while maximally utilizing the underlying system.


where is number of cores in stream processing layer and


4. Architecture

The performance of real time interfaces depend on the feature store access time, processing time and model inference time. The key components of the architecture are the design of feature storage for faster access and the technology stack encapsulating multiple layer architecture as discussed in this section.

4.1. Feature Storage Structure

Features are created for both XGBoost and LSTM model by processing transaction history. For the XGBoost model, to capture user’s persona and temporal behaviour, we define two categories of features: non-temporal and temporal, respectively for each user, which is similar to JSON (jso, 2017) data type. We reduce feature access time by sharding and creating indexes on ’user id’, so data for a user can be accessed in O(1) rather of sequential scan. Each user’s features are calculated in two passes.

The first pass on transaction history creates features’ cumulative values for each user, e.g. total count of product view in last 3 days, which is referred to as ’feature dictionary’. The second pass on the features’ cumulative values creates feature one-hot vectors for users, e.g. favorite category of user, and then concatenates them, which is referred to as ’concatenated feature one-hot vectors’. To reduce the processing time at ’Recommend’ interface, both feature dictionary and the concatenated feature one-hot vectors are kept in in-memory store.

4.2. Technology Stack Choices

We have used python to build models used in iPrescribe, so a naive approach is to build model offline and deploy it using python web based frameworks (PWF) (pwf, 2018) such as Flask (FriendFeed, 2018a) and Tornado (FriendFeed, 2018b). iPrescribe connector could capture on-line activities of B2C system and send it to PWF to get model inference. Python being an interpreter, this architecture will have challenges. The challenges include large disk access time, scalability, impact on B2C system performance. Thus, iPrescribe is implemented as five layer architecture stack outside B2C system.

4.2.1. Message and Persistant Store Layers

The open source technology, Kafka (kaf, 2018), and Hadoop Distributed File System (HDFS) (hdf, 2013), are considered for horizontally scalable message layer and persistent store respectively. All the actions and transactions of users on B2C system are captured as real time messages through real time interfaces. These are stored asynchronously in persistent store for future model rebuilding. Received messages are also co-related with actual conversions for a given offer and are asynchronously stored as ground truth both in persistent and in-memory store for model rebuilding and updating feature store in real time respectively.

4.2.2. In-memory Store Layer

The feature store schema depends on the data sets, therefore, technology for in-memory layer must support dynamic schema creation and JSON data types. ’Recommend’ and ’FeatureUpdate’ interfaces will only access the feature store concurrently for reading and updating only respectively, therefore, iPrescribe data store need not have strong transaction consistency. ’Recommend’ interface may read feature values without reflecting updates of few recent actions which may not impact model inference accuracy. We explored Mongo DB (mon, 2018) and Ignite (ign, 2018) for in-memory store and their impact on performance is discussed later in Section 5.

4.2.3. Stream Processing Layer

For scalable parallel data processing, Spark (spa, 2018) and ignite (ign, 2018) are explored. Spark supports python, as PySpark (pys, 2018), but has no memory store and Ignite does not support python but has in-memory store. Spark being Java based, it has additional python workers which lead to double serialization overheads. Moreover, Spark is a micro batch stream processing engine, therefore is bounded by the batch window size. Ignite supports per message processing and is a single technology for both stream processing and in-memory store; this reduces the message processing time to few milliseconds only. This is discussed in detail later in Section 6.

4.2.4. Python Web Framework Layer (PWF)

Python web framework is used only for model inference. Real time messages are processed in parallel by stream processing layer and sent to PWF for model inference using HTTP RestAPI calls. Each python process executes independent of any other process, therefore, PWF layer can be scaled out with more resources, upon increase in workload to ensure constant model inference time.

5. Performance Optimizations

This section discusses tuning of iPrescribe architecture. This includes XGBoost and LSTM model specific optimizations and the parameter tuning of various technology stacks for design exploration to achieve high scale iPrescribe with low recommendation latency.

5.1. Model Specific Optimizations

We have employed ensemble of XGBoost and LSTM algorithms to build the prediction model. Model inference time can be reduced by batching users’ concatenated feature one-hot vectors. It implies that messages coming to ’Recommend Interface’ within a few milliseconds can be processed in parallel and send to PWF together for model inference.

5.1.1. Model Building

A transaction history captures static information about entities and dynamic information about actions or transactions on entities. We have used PAKDD Recobell challenge (pak, 2017) and Kaggle Instacart challenge (ins, 2017) datasets, details given in Table 1, to build the model.

Statistic PAKDD Recobell KAGGLE Instacart
# event samples 4,80,26,835 34,21,083
# users 21,18,678 2,06,209
# products 4,22,880 49,688
Dataset duration 1 Aug,’16 - 1 Oct,’16 1 Year
Imbalance in target 10% positive class 4% positive class
Table 1. Statistics of Data Sets

Our machine learning model uses Gradient Boosting(XGBoost), where grid search was used to select the optimal parameter values for following XGBoost model parameters - colsample_bylevel, colsample_bytree, learning_rate, max_depth, min_child_weight, n_estimators and subsample. Deep learning algorithm can encapsulate many hidden features which cannot be captured using programmed feature engineering. We have used LSTM deep neural network. The model structure has 150 node and 20 node LSTM layer for PAKDD Recobell and Instacart respectively, followed by a dense layer with 2 nodes and a softmax activation function. We have also used l2 regularizer and rms prop as an optimizer with a learning rate of 0.001. We apply a weighted ensemble of predictions from gradient boosting algorithm as well as LSTM model to cover the spectrum of features which together can improve the accuracy. We calculate the weights given to predictions of both the algorithms to optimize the Area Under Curve (AUC). Threshold function is applied on the probabilities obtained after the ensemble to optimize the F-Score on the final predictions. We obtained Area Under Curve (AUC) score of 0.67 on PAKDD dataset and F-Score of 0.3843 on Instacart dataset.

5.1.2. LSTM Optimizations

LSTM, being a sequence based model, the model inference requires passing the whole sequence of transaction history to the network architecture. This leads to large model inference time which may increase over a period of time with increase in number of sequences. Naive approach of LSTM model inference technique takes 36 hours to train 22 million records and take 831 ms for a user with history of 10,000 samples. This is due to looping back of last hidden states and cell states for new sequence vector. The looping back of network can be unfolded as multiple sequence of the LSTM units (Hochreiter and Schmidhuber, 1997), each feeding to the next in sequence. According to the equations of LSTM (Hochreiter and Schmidhuber, 1997) only and are passed to the next time step. At any point of time the values of and together represents the LSTM network state trained with the historical data till ’t-1’. Therefore, LSTM model inference for a message at time ’t’ can be done in constant time on LSTM network loaded with value of and (Zhang et al., 2014) has used for LSTM model inference, however they have not exploited it for performance gains in real time model inference.

iPrescribe feature store stores and for each user as well during model training in ’Startup’ interface. These values are updated for each message in ’FeatureUpdate’ interface. Moreover, Keras library predict function incurs 67 ms and 25 ms for model inference using small size with more matrix multiplications and large size with less matrix multiplications respectively, to predict for one user; most of the time is taken up by core tensorflow back-end built-in methods TF_ExtendGraph and TF_Run, and other internal calls of tensorflow. iPrescribe has its own implementation of ’predict’ function in Java using JBLAS 1.2.4 (Braun et al., 2015), LAPACK and ATLAS for optimized matrix multiplication. The mapping of categorical columns to unique integers is done using hashing instead of data store to reduce inference time further.

5.2. Design Exploration with Technology Stack Optimizations

iPrescribe architecture as discussed in Section 4, has been implemented for various technology choices given in Table 2 to explore high scale and low latency technology stack architecture.

Architecture Layer Technology
Messaging Layer Kafka
Stream Processing Layer Ignite
In-memory Store Ignite
Python Web Framework Tornado
Persistent Store Layer HDFS
Table 2. Technology Choices for iPrescribe

Kafka, the messaging layer, is partitioned with multiple topics to support higher ingestion rate, however, performance gains with increase in number of partitions are limited by disk access overheads in Kafka for messages persistence. The performance optimizations in rest of the technology stack is discussed below.

Figure 3. IP Architecture

The model is hosted as web service on Tornado server, with facility of multiple processes, which listens to a particular port. However, using same port to serve larger number of requests will limit the throughput. Therefore, multiple Tornado processes are started independently, each one listening to different port.

Multiple process deployment of PWF (Tornado) leads to sub linear speed up with vertical scaling of a machine and hence increases the model inference latency unlike single server which incurs only 12 ms, i.e. the system is not scalable. This is because the XGBoost library spawns multiple threads and these threads waits on python GIL. We could limit number of threads to one for XGBoost by setting environment variable OMP_NUM_THREADS=1. Further, to avoid context switching on cores across different Torando processes, each Tornado process is attached to a core using environment variable ’Task set’. In this architecture after optimization, average is 78 ms, where time taken for message processing in Spark is 50 ms including 28 ms for time spent in accessing MongoDB, which is quite high. So we have used Ignite, and call our architecture to be Ignite+PWF (IP) Architecture. Ignite is both per message stream processing and in-memory store technology. Ignite Cache, key value store, is used to store features. The architecture is shown in Fig 3. User identifier is used as the key, to store and partition data across nodes in the Ignite cluster. Ignite StreamVisitor is being used for processing each key-value tuple from incoming data streams. In Ignite cluster, StreamVisitor collocates processing locally on the node where the data is cached to avoid data shuffling. This reduces message processing time to 14 ms. However, Ignite-Kafka connector introduces overheads of 45 ms. Therefore, web sockets are used to send messages from iPrescribe connector to Ignite client which reduces the communication delay to 2 ms with the tradeoff of Kafka’s reliability and availability. In this architecture after optimization, average is 29.52 ms. To support high ingestion rate, multiple Ignite client instances are launched with every instance listening to separate web socket.

6. Experimental Evaluation

In this section, we will discuss performance evaluation of iPrescribe optimized architecture.

6.1. Deployment System Details

We have deployed the technology stack of iPrescribe on six node cluster, each node with the following configuration: Intel CPU dual core with 56 cores and 256 GB RAM with 1 GB NIC. Choice of technology while design exploration of iPrescribe architecture and their deployment details with tuning is given in Table 2 and Table 3 respectively. Kafka and MongoDB are low on resource utilization in this case, therefore they are deployed on shared nodes.

Technology Deployment Tuning
Kafka 2.10 3 nodes cluster 10 partitions at each node
Ignite 1.9 2 nodes cluster Cache Memory= Off-Heaped, Indexing on UserId, Java Heap Size of Server=4GB, Cache-Partition= UserId
HDFS 2.6 1 node Default
Tornado 4.5 2 shared nodes Default
Table 3. Technology Deployment and Tuning

6.2. Benchmark Workload

iPrescribe architecture performance is evaluated on PAKDD (pak, 2017) and Instacart (ins, 2017) data sets. PAKDD has 22 million transaction records including 0.3 million impression records. The built model predicts whether a customer will click the given advertisement. Instacart has 2,06,209 users with 50,000 products, where the built model predicts whether a particular customer will buy a particular product. The model can be used for all products, to predict products in a customer’s basket for next order. The model is built on the initial data set in Startup Interface discussed in Section 3.1. We have extrapolated the impression records of PAKDD, to generate large number of impression records. These impression records are played as stream and fed to real time interfaces of iPrescribe. These records are sorted on clock time, so it simulates the behaviour of user clicks on transaction system. Similarly, test data of Instacart consisting of user records is simulated as stream to benchmark real time interfaces. We control the ingestion rate of the records and measure system throughput and utilization. For example, for PAKDD data sets, Recommend interface is ingested with stream of impression records, and FeatureUpdate interface is ingested with stream of records having mix of all order, view and impression records. We have benchmark iPrescribe for only 100% workload on Recommend interface, 100% workload on FeatureUpdate and controlled ingestion rate on both the interfaces in ratio of 80% and 20% respectively on FeatureUpdate and Recommend interfaces.

6.3. Performance Results

Startup Interface reads CDF file, prepares users’ concatenated feature one-hot vectors in parallel by processing transaction records using meta-model and builds ensemble of XGBoost and LSTM model in python framework. The execution time for this interface is 183 minutes and 269 minutes for PAKDD and Instacart data sets respectively. The details are given in Table 4 for feature creations and training time for each of the model. XGBoost model for Instacart is trained for 10x less number of users than that of PAKDD model, therefore XGBoost startup interface execution time is lesser in Instacart. LSTM model requires history of each sample during training and LSTM model for Instacart is trained for all user-product pairs which is 30x more than that in PAKDD, which lead to higher LSTM model training time.

Data Set XGBoost LSTM
Features Training , Training
PAKDD 90 6 20 67
Instacart 6 21 12 230
Table 4. Startup Interface Execution Time (in minutes)
Stack T4 T5 T6 T7 T8 T9 T10 T11
IP 7.3 0.5 0.5 14 0.5 0.5 10.52 3
Table 5. PAKDD: Average Timings as shown in Fig 2
Figure 4. iPrescribe Performance for Recommend Interface

Recommend interface has two performance metrics - recommendation latency () and throughput as given in Section 3.1. is measured on single node starting with 100 msg/sec as ingestion rate and gradually increasing to 1000 msg/sec. In our experimental setup, time components, ===1 ms. For Instacart data sets, model inference time per user-product inference is similar, however, model inference need to be done for all user-product pairs to predict user’s basket. Therefore, for an average of 40 products per user in basket, time to fetch concatenated feature one-hot vectors for XGBoost model is 81 ms and for LSTM model inference is 161 ms in stream processing layer.

This implies, =81+161=242 ms and XGBoost model inference time in PWF per user is =480 ms. does not increase linearly opposed to . The feature store is indexed on user id, so single fetch from Ignite Cache gets all concatenated feature one-hot vectors for a user on all products. Using equation 3.1.2, the average recommendation latency per user for PAKDD and Kaggle Instacart challenges are 29.52 ms and 842 ms respectively. Fig 4 shows throughput and average recommendation latency in iPrescribe architecture with increase in data ingestion rate. We see linear increase in throughput with increase in workload. However, the recommendation latency remains the same as we increase the ingestion rate. However, our experiments have shown that Ignite technology supports linear increase in throughput till the CPU utilization of the cluster is 80%, as we go on increasing the message ingestion rate. There is a sharp increase in processing time of each message in milliseconds after the CPU utilization hits 80%. Therefore, for high scale iPrescribe, the Ignite layer shall scale out on 80% utilization. FeatureUpdate interface processing time, using equation 3.1.3 and Table 5, is 7.3 ms which includes 6 ms for updating and for LSTM model and 1.3 ms for updating feature store. FeatureUpdate processing is done in Ignite in parallel across all the available cores, therefore, FeatureUpdate throughput linearly increases with number of cores and data ingestion rate, as shown in Fig 5. It also shows that throughput of Recommend interface does not degrade in presence of processing on FeatureUpdate interface.

Figure 5. iPrescribe Throughput for Mix workload

7. Conclusions and Future Work

We have presented design and different technology stacks for iPrescribe, a scalable and low recommendation latency system for next best offers in an online settings for B2C scenarios. We have shown the performance of iPrescribe on two publicly available data sets. The prediction model has been built as ensemble of XGBoost and deep learning LSTM network. These models accuracy has been shown to be AUC=0.67 and F-score=0.3843 for these two publicly available e-commerce data sets. We have discussed model specific optimizations and tuning of various technology stacks to achieve low recommendation latency. LSTM network deployment is optimized by storing and for inference, which are updated with each transaction message. iPrescribe optimized architecture using technology stack of Kafka, Ignite, Tornado and HDFS is shown to support high scale throughput and 90th percentile recommendation latency as 38 ms.

iPrescribe needs ground truths (or labelled data) to build model so it has problem of cold start. In future, we shall be augmenting iPrescribe with user behaviour model to capture their persona and Snorkel (Alexander Ratner and Ré, 2017) to generate labels for data sets to build models.


  • (1)
  • hdf (2013) 2013. HDFS. [Online; accessed 27. Feb. 2018].
  • Sel (2016) 2016. Seldon: Open source recommendation system.
  • jso (2017) 2017. JSON. [Online; accessed 27. Feb. 2018].
  • ins (2017) 2017. Kaggle Instacart Challenge.
  • rs (2017) 2017. List of Open Source Recommendation Systems.
  • pak (2017) 2017. PAKDD Recobell Challenge.
  • ign (2018) 2018. Apache Ignite - Open source memory-centric distributed database, caching, and processing platform. [Online; accessed 27. Feb. 2018].
  • kaf (2018) 2018. Apache Kafka - A distributed streaming platform. [Online; accessed 27. Feb. 2018].
  • spa (2018) 2018. Apache Spark™- Lightning-Fast Cluster Computing. [Online; accessed 27. Feb. 2018].
  • mon (2018) 2018. MongoDB for GIANT Ideas. [Online; accessed 27. Feb. 2018].
  • pwf (2018) 2018. Web Frameworks for Python. [Online; accessed 27. Feb. 2018].
  • pys (2018) 2018. Welcome to Spark Python API Docs. [Online; accessed 27. Feb. 2018].
  • xgb (2018) 2018. XGBoost - Extreme Gradient Boosting. [Online; accessed 27. Feb. 2018].
  • Agarwal et al. (2011) Alekh Agarwal, Olivier Chapelle, Miroslav Dudík, and John Langford. 2011. A Reliable Effective Terascale Linear Learning System. CoRR abs/1110.4198 (2011). arXiv:1110.4198
  • Alexander Ratner and Ré (2017) Henry Ehrenberg Jason Fries Sen Wu Alexander Ratner, Stephen H. Bach and Christopher Ré. 2017. Snorkel: Rapid Training Data Creation with Weak Supervision.
  • Braun et al. (2015) Mikio L. Braun, Johannes Schaback, Matthias L. Jugel, Nicolas Oury, et al. 2015. Jblas: Linear Algebra for Java. [Online; accessed 27. Feb. 2018].
  • Chantat Eksombatchai and Leskovec (2017) Jerry Zitao Liu Yuchen Liu Rahul Sharma Charles Sugnet Mark Ulrichm Chantat Eksombatchai, Pranav Jindal and Jure Leskovec. 2017. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time.
  • Cheng Ju (2017) Mark J. van der Laan Cheng Ju, Aurélien Bibaut. 2017. The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification.
  • Chongxiao Cao and Waddington (2013) Fengguang Song Chongxiao Cao and Daniel G. Waddington. 2013. Implementing a high-performance recommendation system using Phoenix++. In IEEE 8th International Conference for Internet Technology and Secured Transactions (ICITST).
  • Davidson et al. (2010) James Davidson, Palash Nandy, Benjamin Liebald, Taylor Van Vleet, and Junning Liu. 2010. The youtube video recommendation system. In In Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10. ACM, 293–296.
  • Ekstrand et al. (2011) Michael D Ekstrand, John T Riedl, Joseph A Konstan, et al. 2011. Collaborative filtering recommender systems. Foundations and Trends® in Human–Computer Interaction 4, 2 (2011), 81–173.
  • FriendFeed (2018a) FriendFeed. 2018a. Flask web development framework. [Online; accessed 27. Feb. 2018].
  • FriendFeed (2018b) FriendFeed. 2018b. Tornado Web Server . [Online; accessed 27. Feb. 2018].
  • HengTze Cheng and Shah (2016) Jeremiah Harmsen Tal Shaked Tushar Chandra Hrishi Aradhye Glen Anderson Greg Corrado Wei Chai Mustafa Ispir Rohan Anil Zakaria Haque Lichan Hong Vihan Jain Xiaobing Liu HengTze Cheng, Levent Koc and Hemal Shah. 2016. Wide and Deep Learning for Recommender Systems.
  • HenryChan (2008) Chu Chai HenryChan. 2008. Intelligent value-based customer segmentation method for campaign management: A case study of automobile retailer. Expert Systems with Applications 34, 4 (May 2008), 2754–2762.
  • Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  • Kazmi et al. (2016) Auon Haidar Kazmi, Gautam Shroff, and Puneet Agarwal. 2016. Generic Framework to Predict Repeat Behavior of Customers Using Their Transaction History. In IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE Proceedings.
  • Levandoski et al. (2011) Justin J. Levandoski, Michael D. Ekstrand, Michael Ludwig, Ahmed Eldawy, Mohamed F. Mokbel, and John Riedl. 2011. RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures. PVLDB 4, 11 (2011), 911–920.
  • Salehinejad and Rahnamayan (2016) H. Salehinejad and S. Rahnamayan. 2016. Customer shopping pattern prediction: A recurrent neural network approach. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI). 1–6.
  • Sheth and Kaiser (2013) Swapneel Sheth and Gail Kaiser. 2013. Towards Using Cached Data Mining for Large Scale Recommender Systems. Springer Berlin Heidelberg, Berlin, Heidelberg, 349–357.
  • Srinivasan et al. (2016) V. Srinivasan, Brian Bulkowski, Wei-Ling Chu, Sunil Sayyaparaju, Andrew Gooding, Rajkumar Iyer, Ashish Shinde, and Thomas Lopatic. 2016. Aerospike: Architecture of a Real-time Operational DBMS. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1389–1400.
  • Verbitskiy et al. (2015) Ilya Verbitskiy, Patrick Probst, and Andreas Lommatzsch. 2015. Development and Evaluation of a Highly Scalable News Recommender System. In Working Notes of the 6th International Conference of the CLEF Initiative. CEUR Workshop Proceedings, 10. Vol-1391, urn:nbn:de:0074-1391-8.
  • Zhang et al. (2014) Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI’14). AAAI Press, 1369–1375.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description