# Learning from multivariate discrete sequential data using a restricted Boltzmann machine model

###### Abstract

A restricted Boltzmann machine (RBM) is a generative neural-network model with many novel applications such as collaborative filtering and acoustic modeling. An RBM lacks the capacity to retain memory, making it inappropriate for dynamic data modeling as in time-series analysis. In this paper we address this issue by proposing the p-RBM model, a generalization of the regular RBM model, capable of retaining memory of p past states. We further show how to train the p-RBM model using contrastive divergence and test our model on the problem of predicting the stock market direction considering 100 stocks of the NASDAQ-100 index. Obtained results show that the p-RBM offer promising prediction potential.

^{†}

^{†}publicationid: pubid: 978-1-5386-6740-8/18/$31.00 ©2018 IEEE

## I Introduction

A restricted Boltzmann machine (RBM) is a generative neural-network model used to represent the distribution of random observations. In RBM, the independence structure between its variables is modeled through the use of latent variables (see Figure 1). RBMs were introduced in [smolensky_information_1986]; although it was not until Hinton proposed contrastive divergence (CD) as a training technique in [hinton_training_2002], that their true potential was unveiled.

Restricted Boltzmann machines have proven powerful enough to be effective in diverse settings. Applications of RBM include: collaborative filtering [salakhutdinov_restricted_2007], acoustic modeling [dahl_phone_2010], human-motion modeling [taylor_modeling_2006, taylor_factored_2009], and music generation [boulanger-lewandowski_modeling_2012].

Restricted Boltzmann machine variations have gained popularity over the past few years. Two variations relevant to this work are: the RNN-RBM [dahl_phone_2010] and the conditional RBM [taylor_factored_2009]. The RNN-RBM estimates the density of multivariate time series data by pre-training an RBM and then training a recurrent neural network (RNN) to make predictions; this allows the parameters of the RBM to be kept constant serving as a prior for the data distribution while the biases are allowed to be modified by the RNN to convey temporal information. Conditional RBMs, on the other hand, estimate the density of multivariate time series data by connecting past and present units to hidden variables. However, this makes the conditional RBM unsuitable for traditional CD training [mnih_conditional_2012].

In this paper, we focus on the use of a modified RBM model that does not keep its parameters constant in each time step (unlike the RNN-RBM); and that adds hidden units for past interactions and lacks connections between past and future visible units (unlike the conditional RBM). Our model is advantageous because of two factors: (1) the topology of the model can be changed easily because it is controlled by a single set of parameters; and (2) the structure of our model allow the modeling of auto-correlation within a time series and correlation between multiple time series. These two factors allow many models to be readily tested and compared.

We show the performance of our model by applying it to the problem of forecasting stock market directions, i.e. predicting if the value of a stock will rise or fall after a pre-defined period of time. Previous work on the problem include [huang_forecasting_2005] where a support vector machine model was trained on the NIKKEI 225 index, the Japanese analog of the Dow Jones Industrial Average index.

The rest of the paper is organized as follows. Section II presents a review of RBMs describing their energy function and training method trough CD. Section III introduces our proposed model, called the p-RBM, which can be viewed as an ensemble of RBMs with the property of recalling past interactions. In section IV we apply our model to 100 stocks of the NASDAQ-100 index and show its prediction results. Conclusions and future research directions are provided in Section V.

## II The Restricted Boltzmann Machine Model

Restricted Boltzmann machines are formed by n visible units, which we represent as \textbf{v}\in\{0,1\}^{n}; and m hidden units, which we represent as \textbf{h}\in\{0,1\}^{m}. The joint probability of these units is modeled as

\displaystyle p(\textbf{v},\textbf{h})=\frac{1}{Z}e^{-E(\textbf{v},\textbf{h})}; | (1) |

where the energy function E(\textbf{v},\textbf{h}) is given by

\displaystyle E(\textbf{v},\textbf{h})=-\textbf{a}^{\intercal}\textbf{v}-% \textbf{b}^{\intercal}\textbf{h}-\textbf{v}^{\intercal}\textbf{W}\textbf{h}; | (2) |

matrix \textbf{W}\in\mathbb{R}^{n\times m} represents the interaction between visible and hidden units; \textbf{a}\in\mathbb{R}^{n} and \textbf{b}\in\mathbb{R}^{m} are the biases for the visible an hidden units, respectively; and Z is the partition function defined by

\displaystyle Z(\textbf{a},\textbf{b},\textbf{W})=\sum_{\textbf{v},\textbf{h}}% e^{-E(\textbf{v},\textbf{h})}. | (3) |

The bipartite structure of an RBM is convenient because it implies that the visible units are conditionally independent given the hidden units and vice versa. This ensures that the model is able to capture the statistical dependencies between the visible units, while remaining tractable.

Training an RBM is done via CD, which yields a learning rule equivalent to subtracting two expected values: one with respect to the data and the other with respect to the model. For instance, the update rule for W is

\displaystyle\Delta\textbf{W}=\langle\textbf{v}\textbf{h}^{\intercal}\rangle_{% \mbox{Data}}-\langle\textbf{v}\textbf{h}^{\intercal}\rangle_{\mbox{Model}}. | (4) |

The first term in equation (4) is the expected value with respect to the data and the second is the expected value with respect to the model. For a detailed introduction to RBMs the reader is referred to [fischer_training_2014].

## III The p-RBM Model

We generalized the RBM model by constructing an ensemble of RBMs, each one representing the state of the system at p connected moments in time. One way to visualize this is to think of it as p distinct RBMs connected together so as to model the correlation between moments in time. Each RBM contains a representation of the object of interest at different time steps, for example pixels of a video frame, or the value of a dynamic economic index. We then added connections between all the visible and hidden units across time to model their autocorrelation (see Figure LABEL:fig:_p_rbm).

Our model resembles a Markov chain of order p, because the RBM at time t is conditionally independent of the past, given the previous p RBMs. We showed that even with these newly added time connections the model remains tractable and can be trained in a similar fashion to that of a single RBM.

For convenience, we bundled the visible and hidden units in block vectors denoted \tilde{\textbf{v}} and \tilde{\textbf{h}}, respectively; we also included a vector of ones of appropriate size, to account for biases interactions; giving

\begin{aligned} \displaystyle\tilde{\textbf{v}}=\left[\begin{array}[]{c;{2pt/2% pt}c;{2pt/2pt}c;{2pt/2pt}c;{2pt/2pt}c}\textbf{v}_{t}&\tabularcell@hbox{{v}_{t-% 1}}&\tabularcell@hbox{\cdots}&\textbf{v}_{t-p}&\tabularcell@hbox{\mathbf{1}}% \cr\omit\span\omit\span\omit\span\omit\span\omit\span\omit\span\omit\span\omit% \span\omit\span\omit\span\omit\span\omit\span\@@LTX@noalign{ }\omit\\ \end{array}\end{aligned} |