DNN-based with Click-sequence-aware Mechanism for Notifications/Pop-ups Recommendation
I Know You’ll Click
With the emergence of mobile and wearable devices, push notification becomes a powerful tool to connect and maintain the relationship with App users, but sending inappropriate or too many messages at the wrong time may result in the App being removed by the users. In order to maintain the retention rate and the delivery rate of advertisement, we adopt Deep Neural Network (DNN) to develop a notification/pop-up recommendation system "Click-sequence-aware Deep Neural Network (CDNN)" enabled by collaborative filtering-based hybrid user behavioral analysis. We further verified the system with real data collected from the product Security Master, Clean Master and CM Browser, supported by Leopard Mobile Inc. (Cheetah Mobile Taiwan Agency). In this way, we can know precisely about users’ preference and frequency to click on the push notification/pop-ups, decrease the troublesome to users efficiently, and meanwhile increase the click through rate of push notifications/pop-ups.
Smartphone has become a necessity in our life. According to the report in 2016 by International Data Corporation (IDC) 2016, the market share of Android smartphone in Q3 2016 has grown to 86.8%, becoming the operating system used by most majorities. On the other hand, according to the statistics by the software company App Annie, in Q3 2017, the number of downloads globally has reached 26 billion, not including App update or repeated downloads. The expenditure also broke the historical record. Users spent about 325 billion hours in total, increasing about 40% comparing to 2016. There is no relief on the growth of App economy. This brought the driving force to Online Advertising industry. In this industry, targeted advertising and its click through rate forecast play an important role on overall user experience and revenue of a system. Meanwhile, Apps usually send pop-ups to users as push notification service to notify user important information or alerts. In order not to make troubles to users, current method is to show simple graph and messages/texts on operation screen status column until the user drag down to show full contents. To open the message, just simply click on the column as shown in Fig. 1. Thorough reminding users its smartphone operating condition, we can increase the open rate and the number of advertisement display. Using push notification service properly can increase the using rate of Apps and further increase the number of advertisement display. However if using push notification in an inappropriate way, it will become a troublesome to users, resulting the App to be removed. Before thinking about how to increase the efficiency of targeted advertising and its click through rate, we should think about how to decrease the troublesome to users. It is important to forecast users’ preference and frequency on push notifications precisely.
In recent years, deep learning has made important progress in many fields including image recognition, speech recognition and natural language processing. It is a prime time for the development of deep learning. Its essence is to carry on deep abstract excavation to the data characteristic through big data, in order to learn the effective characteristic expression and the complex mapping mechanism to establish effective data model. In the era of big data, the individual needs of users are constantly increasing. Facing a big data, one of the major research challenges is how to help users effectively obtain the information they need. At present, the most common type of information processing system is a search engine. Users submit queries and the system returns search results. The other is that users do not need to explicitly submit any inquiries and interest preferences. The system uses automated algorithms to push information. The classic one is a recommendation system.
Meanwhile, Online Advertising has been a hot topic and up-until now there has been a number research papers and patents. Many startup companies have also involved in this area and generated certain amount of revenue, and their forecasting techniques improve overtime. The mobility and usability of smartphone that shows information such as personal data, browsing history, shopping history, financial details and so on, allows tracking on user online behavior to build personalized knowledge to provide advertisement based on their preference, interest and needs. The method is known as Collaborative Filtering. Then we can make behavioral model of users. Users with similar profiles tends to behave similarly to advertisements. According to the data collected by our system, the behavior to push notification is highly alike to the behavior to Online Advertising. Considering the similarity between different users, we hope to increase the display rate of advertisement based on this observatory result. However, according to what we have known, seldom research has been done in this specific topic, especially for push notification service and targeted advertising and forecasting. The reason is that the revenue generated by push notifications is far less than the direct revenue from Online Advertising. The main contribution of this work is to verify our "Click-sequence-aware Deep Neural Network (CDNN)" that can prove the effect on decreasing troublesome of pop-ups and increasing the display and click through rate of advertisement.
Ii Related Work
Ii-a Deep Learning
Recent success in deep learning research and development attracts people’s attention. Alpha Go from Google DeepMind gains a huge success in Computer Go. The deep learning behind the Alpha Go receives huge attentions from both publics and academics . In 2015, Google released Tensorflow  which provided a flexible framework for experimenting all kinds of deep neural network framework in mass-distributed training. Deep learning is a specific type of machine learning. More specifically, deep learning is an artificial neural network, in which multiple layers of neurons are interconnected with different weights and activation functions to learn the hidden relationship between input and output. Intuitively, input data is fed to the first layer that generates different combinations of the input . These combinations, after the activation function, are fed to the second layer, and so on. Under the above procedures, different combinations of the outputs from previous layer can be seen as different representation of features. The weights on links between layers are adjusted according to backward propagations, depending on the distance or less function between true output label and the label calculated by neural network. Note that deep learning can be seen as a neural network with a large number of layers. After the above learning process via multiple layers, we can derive a better understanding and representation of distinguishable features, enhancing the detection accuracy . Also notice that the effectiveness of deep learning increases with the network size.
In addition to deep neural networks, the most well-known deep networks are convolutional neural networks (CNN). The representation of CNN includes AlexNet, VGG, GoogleNet, Inception-v3 and ResNet . More specifically, CNN is composed of hidden layers, fully connected layers, convolution layers, and pooling layers. The hidden layers are used to increase the complexity of the model. If the same number of neural is associated with the input image, the number of parameters can be significantly reduced, adapting to the function structure much properly.
Ii-B Recommendation System
The popular topic recently is to profile user’s behavior based on various historical records and click through to make forecast and improve the system. One method is to improve user experience on recommendation system (eg. E-commerce) from voting or any satisfaction voting to leave specific explicit feedback, and another method is to store user browsing activities to actively follow user behavior as implicit feedback. Collaborative Filtering is another method to identify new users’ relation and forecast based on the relationship between users and inter-dependent relationship between items and recommend directly based on user purchasing behavior when user shows enough purchasing behavior . Y. Koren, et al. considered that the method based on CF is better than the method based on contents . Collaborative Filtering problems are alike on social networking sites, tags for photos, websites visited during a surfing session, articles bought by a customer etc. Koen Verstrepen et al. proposed user-based and item-based nearest neighbors algorithms: one-class collaborative filtering, and use this reformulation to propose a novel algorithm that incorporates the best of both worlds and outperforms state-of-the-art algorithms . Fabio Aiolli focus on memory-based collaborative filtering algorithms similar to the well-known neighbor based technique for explicit feedback. Then, starting from the definition of suitable similarity and scoring functions and suggestions on how to aggregate multiple ranking strategies, the overall recommendation is defined .
Based on Deep Learning and recommendation system application, Covington, P., et al. proposed Collaborative Filtering based Deep Neural Network recommendation system for Youtube, and that system is constitute by two neural networks, one for generating recommended videos and the other one for rankings. These two filters and its inputs decide what users can see on YouTube for recommended next video, the recommended video list, browsing video list and so on . Google also proposed Wide&Deep Learning ; Wide&Deep refers to Memorization and Generalization. Taking our system as an example (see Fig. 2), if the user clicked on “over memory usage” of background APP cleaning at a specific time and status on his smartphone, and our system recommended “garbage removal” of cleaning the APP cache file, which is also accepted by the user to click, but “the handset is too warm” is not accepted by the user. Our system needs to record how much each pairing is preferred by the user. If we want to recommend a pop-up to users to explore similar functions, we can map it and find the closest the user might like and recommend to the user. For example, “junk clean up” and “scrapbook clean up” are very close; then if we also recommend “scrapbook clean up” to users, they may also click on it because they are similar. Combining these two logics to train, the trained model will learn how to strike a balance between these two different needs.
Overall, AI technologies based on deep learning and machine learning are producing new understanding to the world. For example, based on business data, we can forecast future sales to increase sales performance and productivity. According to the forecast by IDC in 2017, the global expenditure on cognition and AI system will reach 12.5 billion US dollars with a growth rate of 59.3% comparing to 2016. The goal of this work is to describe our motivation and implementation decisions to know the challenges and strengths of the goal.
Iii Our Proposed System
There are three stages in developing our system: data generation, system building and model deploying. Like all the machine-learning scenarios, feature engineering is a time and resource consuming process. The features used will result in the performance of forecast model on click through rate, and the potential problems and the accuracy rate of unknown data. It is not always good to have more data. Besides, due to our difference on functions from traditional category and continuity/ordering, there is a huge difference on the base number. Some are binary (for example, “is_ charging”, “applock_enabled”, “notification_cleaner_enabled”), and some may have millions of possible value (for example, remaining_storage, remaining_ram, storage, ram). It is more important to find out the relation of the combination of data.
The accuracy of modelling is critical metrics. Speaking of our current system, we should be more aware of the no information rate in statistics. There is about only 10% click through rate in large sample users (as shown in Figure 3). Using imbalance data to train the model always reflects results of users answering not recommending popups. The accuracy rate reached 90%. To react to this situation, we use Random Forest model that can show features and feature importance to screen in advance.
There are slight differences between people in different countries or regions or even individuals. In this study, we do pre-processing based on different conditions and mechanisms of our system, meanwhile, we also classify the attributes associated with user’s contextual behavior. For example, “noti_display_30” of the number of popups displayed in the first 30 minutes, “noti_click_30” of the number of popups populated in the first 30 minutes, “noti_display_60 “, “noti_click_60”, and noti_click_last, noti_click_last2, notification_cancel_count and other features of the number of popups in the first 60 minutes. Through our ranking mechanism, we assign an independent score to each pop-up and feedback to the user. Our ultimate goal is to constantly adjust to real-time A / B test results. By the ranking mechanism, it can allow users to feel the pop-up notification not only based on their smartphone’s usage but also to remind users at the right time to protect their smartphone with our core features.
The brief description of our hybrid model listed as below:
Device Layer: smartphone using condition such as remaining_storage, remaining_ram, remaining_battery, installed_day_count;
Process Layer: Active or passive of our product functional operation such as active_scan_count, passive_scan_count, active_clean_count, passive_clean_count, active_boost_count, passive_boost_count, active_battery_saver_count, passive_battery_saver_count, applock_enabled, notification_cleaner_enabled, private_browsing_count, wifi_test_count, wifi_boost_count, notification_display_count, notification_click_count;
Ranking Layer: Feature adjustment evaluations based on the results of pop-ups click through rate such as noti_display_30, noti_click_30, noti_display_60, noti_click_60, noti_click_120, noti_display_120, notification_cancel_count, is_null.
Fig. 4 shows our system flow chart. There are six steps listed as below.
Collecting user’s preference for modeling;
Analyzing and initializing the model;
Uploading data to backend;
Training via Tensorflow and Keras;
Calculating the parameters of user’s pop-ups to control the number of pop-ups;
Adjusting the model focusing on if the user click through the pop-ups.
A brief description of our model is shown as followings:
the activation function is relu as the optimizer.
for the first layer, the number of input neurons is 80 while the number of output neurons is 40.
for the second layer, the number of input neurons is 40 while the number of output neurons is 20.
for the third layer, the number of input neurons is 20 while the number of output neurons is 10.
for the fourth layer, the number of input neurons is 10 while the number of output neurons is 5.
for the fifth layer, the number of input neurons is 5 while the number of output neurons is 1.
the last layer is the output layer with the sigmoid activation function.
Finally, once the model has been trained and validated, we deploy it on the backend server. For each request of the user, the server receives the retrieval options from the APP terminal and scores according to the user’s smartphone’s uploaded status, and the time for each request is 10 milliseconds, and then display to the user from the highest score to the lowest score.
Iv Experiment Result
Our data is collected from the product Security Master, Clean Master and CM Browser supported by Leopard Mobile Inc. (Cheetah Mobile Taiwan Agency). The company’s core products have reached 3,810 million installations globally with 623 million monthly active users by December 2016. We selected partial notification data and countries for our experiment. The period between September 24th and 30th 2017 is regarded as Week 1, and the period between October 7th and 14th is regarded as Week 2. We observed the growth rate of the number of pop-ups display and overall click though rate. We picked up “over-used in memory” (regarded as noti 1), “mobile temperature is too high” (regarded as noti 2) and “cleaning the garbage” (regarded as noti 3) from all the notification pop-ups, to process data collection and traning. For noti 1 as shown in Fig. 5, the result of week 1 indicated that the number of pop-ups decreased dramatically and the troublesome to users decreased simultaneously. Meanwhile, through our automatic adjustment with our Ranking, the number of pop-ups continued to decrease in week 2. As shown in Fig. 6, the click through rate in week 2 continued to increase comparing to the result from week 1. To verify the initial, process and ranking mechanism in our system can effectively auto adjust (as shown in Fig. 7 and Fig. 8), we applied the model to noti 2 and noti 3. We found that during 2:00AM and 7:00AM, the number of pop-ups is similar to the number of noti 1 pop-ups. There was no significant decrease. However, the number of noti 2 pop-ups and noti 3 pop-ups in week 2 was less than the number in week 1. No matter for noti 2 or noti 3, in week 2 and week 1 between 8:00AM and 6:00PM as well as between 7:00PM and 12:00AM, the number of pop-ups is decreasing. The click through rate increases along with the number of pop-ups.
Aside from the decrease of the number of troublesome popups and increase of click through rate, the retention rate of a smartphone mobile App is also important to maintain. 7-day retention is a critical indicator. It represents the number of users that logs in the App at least once in the following 7 days over the number of new users on that day. We found that in Fig. 9, the 7-day retention rate of Week1 and Week2 fluctuated between 44.05% and 44.6%. However in Week3 (2017/10/08-2017/10/14) and Week4 (2017/10/15-2017/10/21), the 7-day retention rate increased and continuously remained at about 46.7%.
We have described our recommendation system "Click-sequence-aware Deep Neural Network (CDNN)" and the feature engineering process. The system has been tested and verified with the products of our partners in some countries. The results showed that our system effectively decreased the number of popups and increased the click through rate and 7-day retention rate. We are now deploy the system to more products (as shown on the user interface in Fig.10). We expect to provide more convenient user scenarios to end users or enterprises. The future work is to improve our Deep Learning model, to decrease complicated tasks, and to train a high-performed advertisement recommendation system. As your reference, we keep our research results and experiment material on http://Notification.TWMAN.ORG, if there is any update.
This work would not have been possible without the valuable dataset offered by Leopard Mobile Inc. and Cheetah Mobile Inc.
-  D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-489, 2016.
-  M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, et al., "TensorFlow: a system for large-scale machine learning," in Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, Savannah, GA, USA, 2016.
-  Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp. 436-444, 2015.
-  Ian Goodfellow, Y. Bengio and Aaron Courville, "Deep learning" An MIT Press book (2016).
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems 25 (NIPS 2012), Harrahs and Harveys, Lake Tahoe, 2012, pp. 1097-1105.
-  A. Z. K. Simonyan, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in International Conference on Learning Representations 2015 (ICLR2015), San Diego, CA, 2015.
-  C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.,”Rethinking the Inception Architecture for Computer Vision.”in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), WA, USA, June 2016.
-  K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016.
-  R. Pan, Y. Zhou, B. Cao, N. Liu, R. Lukose, M. Scholz, and Q. Yang. One-class collaborative filtering. In ICDM, pages 502–511, 2008.
-  Y. Koren, R. Bell and C. Volinsky, "Matrix Factorization Techniques for Recommender Systems," in Computer, vol. 42, no. 8, pp. 30-37, Aug. 2009.
-  K. Verstrepen and B. Goethals. Unifying nearest neighbors collaborative filtering. In Proceedings of the 8th ACM Conference on Recommender Systems, pages 177–184, New York, NY, USA, 2014. ACM.
-  F. Aiolli. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings of the 7th ACM Conference on Recommender Systems, pages 273–280, New York, NY, USA, 2013. ACM
-  Covington, P., et al. (2016). Deep Neural Networks for YouTube Recommendations. Proceedings of the 10th ACM Conference on Recommender Systems. Boston, Massachusetts, USA
-  Cheng, H.-T., et al. (2016). Wide & Deep Learning for Recommender Systems. Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston, MA, USA.