Stacking with Neural network for Cryptocurrency investment
Abstract
Predicting the direction of assets have been an active area of study and difficult task. Machine learning models have been used to build robust models to model the above task. Ensemble methods are one of them resulting better than single supervised method. In this paper, we have used generative and discriminative classifiers to create the stack,particularly 3 generative and 9 discriminative classifiers and optimized over onelayer Neural Network to model the direction of price cryptocurrencies. Features used are technical indicators used are not limited to trend, momentum, volume, volatility indicators and sentiment analysis has also been used to gain useful insight combined with above features. For Cross validation, Purged Walk forward cross validation has been used. In terms of accuracy, we have done comparative analysis of the performance of Ensemble method with Stacking and Ensemble method with blending. We have also developed methodology for combined features importance for stacked model. Important indicators are also identified based on feature importance.
ref.bib
Keywords: Generative Models, Discriminative Models, Stacked Generalization, Xgboost, LightGBM, Bitcoin
1 Introduction
Today , there more are more than 1000 cryptocurrencies. Having nearly $200 Billion of market capitalization and daily volume of nearly $15 Billion. Bitcoin,Ethereum,Ripple,Bitcoin Cash and Stellar being top 5 cryptocurrencies based on market capitalization.Previous studies include price formation of Bitcoin and identifying important features to drive the price [1].
The Crash of cryptocurrencies in 2018 made it evident that it is complex , dynamic and nonlinear. The behavior is not very different compared to stock markets where sharp rise in measures of collective behavior was observed [2]. Assets direction predictability has been one of key area of study for portfolio management. Being complex,dynamic and non linear in nature makes it more difficult to develop robust strategies. Many authors have attempted to develop machine learning models for financial trading and success of it for Stock market prediction makes it suitable for cryptocurrencies price direction prediction.
Deep Learning has been applied for forecasting stock returns [3, 4]. It is shown that it is more successful than shallow neural networks. There are also other machine learning models applications which have shown great performance like Gradient Boosting [5], Bayesian Neural network [6], LSTM [7], Naive bayes [8], Random Forest [7] and many more. [9] talks about predicting stock market movement direction with SVM combined with LDA, QDA, and Neural network but doesn’t address through perspective of discriminative and generative models [10]. There are some trade offs of using each models.
Combining different models can lead to better prediction result [11] but there are two ways to combine it blending and stacking. Stacked Generalization [12] introduces the concept of metalearners. It combines different models unlike bagging and boosting. With new machine learning models developed like xgboost [13] and LightGBM [14], diversed base learners are included.
Technical indicators are mainly used with other fundamental indicators to develop trading strategies or models to prediction prices. As cryptocurrencies are used, fundamental indicators are not included. Along with Technical indicators , sentiment indicators are also used.Tweets by coindesk are used for creating sentiment indicators. Coindesk is considered to be leading news provider for Blockchain. Twitter data can be used to analyse the investor sentiment and leading to price formation of stocks and Bitcoin.
The main contributions are: (1) We formulate the problem of predicting direction of bitcoin’s price. (2) We have created the feautures using Technical Indicators including Momentum,Trend,Volume and Volatility and Sentiment Indicators using Tweets by Coindesk. (3) Mixing Discriminative and Generative models to create class of base learners including nonlinear models to capture nonseparability in the models. (4)Tuning hyper parameters of models using Pursed Time Series CrossValidation to estimate robust models. (5)Improving model performance using Stacking of base learners where Stacking model is 1Layer Feed Forward Neural network. (6)Finding important features using Partial Dependence Plot which is important to daytraders.
The remainder of paper is structured as follows.In section Materials and Methods , we describe the data,indicators,Comparison between Discriminative and Generative Models, Models, CrossValidation Technique and Stacking using 1Layer Neural Network. In section Results, we present the hyper parameters tuning of different models and corresponding performance of each model in terms of logloss, Accuracy, Recall and F1Score. Feature Importance is calculated using partialdependence plot for each model and also developed methodology to calculate feature importance for stacked model. In section Conclusion, we conclude and highlight the key results.
2 Materials and Methods
2.1 Data description and preprocessing
Bitcoin data is downloaded from quandl. Quandl offers to download bitcoin data from different exchanges to capture the true price of digital assets. We have considered four exchanges KRAKEN, BITSTAMP, ITBITUSD and COINBASE to remove ambiguity and final price is created based on weighted volume price. Missing data in price is imputed with exponential average technique. We have considered time period from Aug2017 to Jul2018 with end of the day data. This period covers the peak time and down time as well. Therefore this will be a right time to test the strategy as it includes bull period and bear period.
Data dredging or cherry pick are one of pitfalls of backtesting any strategy which we have avoided by picking diversed time period and testing only price direction prediction.
Next we have technical indicators as features in the model.We have considered four types of technical indicators Volume, Volatility, Trend and Momentum.
Technical Indicators
Notations used 
 Highest price for that day
 Lowest price for that day
 Close price for that day
 Open price for that day
 Volume for that day
 Exponential Moving Average of X with window n
 Moving Average of X with window n
Technical Indicators  

Volume  Volatility  Trend  Momentum 
Accumulation Distribution Index(ADI)  Average true range(ATR)  Moving Average Convergence Divergence(MACD)  Relative Strength Index (RSI) 
On balance volume  Bollinger Moving Average  Moving Average Convergence Divergence Signal  True strength index (TSI) 
On balance volume mean  Bollinger Lower Band  Moving Average Convergence Divergence Diff  Stochastic Oscillator 
Chaikin money flow  Bollinger Higher Band  Exponential Moving Average  Williams %R 
Force index  Bollinger Higher Band Indicator  Vortex Indicator Negative (VI)  Awesome Oscillator 
Volume Price Trend  Bollinger Lower Band Indicator  Trix (TRIX)  
Negative volume index  Keltner Channel Central  Mass Index(MI)  
Keltner Channel Higher Band  Commodity Channel Index (CCI)  
Keltner Channel Lower Band  Detrended Price Oscillator (DPO)  
Keltner Channel Higher Band Indicator  KST Oscillator (KST)  
Keltner Channel Lower Band Indicator  KST Oscillator (KST Signal)  
Donchian Channel Higher Band  
Donchian Channel Lower Band  
Donchian Channel Higher Band Indicator  
Donchian Channel Lower Band Indicator  
Next, we have used tweets to create the sentiment indicator.
2.2 Model
Lets say we have feature vector to build the model having dependent variable , here is defined based on the return of the asset where
(1) 
Building robust model involving time series data set can be difficult as nature of the asset can vary a lot from one time period to another. Therefore its important to consider wide range of predictive models to capture the linear and non linear relationship between feature vector and dependent variable. To encompass different models , we have considered both discriminative and generative models.Following are the keys points where discriminative and generative models differ :

Fitting Technique  Generally generative models require less sophisticated technique to fit the model i.e. naive bayes and LDA but discriminative models require more complex techniques such as convex and non convex optimization techniques for Lasso , Ridge , Logistic Regression and Sparse Net.

Class based Training  Discriminative models require retraining of the complete model again while for generative model separate training is required for each class.

Missing Value Treatment  Missing Values treatment is more difficult for discriminative model as we estimate parameters given x but generative models have simple methods to deal with this problem.

Backward Computation  For generative model , we can infer about the inputs given y as we are making assumptions over the distribution of x and this is not possible for discriminative models.

Training Data Requirement  If assumptions are correct generative models require less training data to attain similar performance compared to discriminative models but when assumptions are wrong discriminative models can provide better results compared to generative models.

Feature Engineering  As we are making assumption over the input features , any meaningful transformation of x distribution is difficult as it becomes correlated or violate the assumptions, however its feasible for discriminative models.

Probabilities Calibration  Generally generative models produce extreme probabilities because of assumptions however discriminative models produce better calibrated probabilities estimates.
We have considered following discriminative and generative models:
Supervised Model  

Discriminative Model  Generative Model 
Xgboost  Naive Bayes 
Support Vector Machines  Linear Discriminate Analysis 
KNearestNeighbor  Quadratic Discriminate Analysis 
Logistic Elastic Net Classifier  
LightGBM  
Random Forest 
We have described the above models briefly below:
Extreme Gradient Boosting:
It is a tree ensemble model where we regularize over leaves of the tree and each tree created by fitting residual calculated after adding the previous trees. Therefore, it outputs weighted sum of the predictions of multiple regression/classification trees. Relevant notation  is the prediction from our Model for ith observation. is the prediction function and f represents the classification tree. is the regularization over the leaves of the trees. Followings are the formula:
For estimation function 
(2) 
Loss Function 
(3) 
where,
(4) 
Support Vector Machines:
It is special machine learning model where we find the hyper plane which gives maximum distance between the classes. Training points which lead to creation of hyperplane are called support vectors. Assuming we have have two independent variables, then we can represent the hyperplane using following equation:
(5) 
By applying optimization over maximum margin hyperplane:
(6) 
For nonseparable classes , we can use kernel trick where it can be represented using following equation:
(7) 
Above optimization is equivalent to solving linear constrained quadratic programming with lower bound of 0 for .
K Nearest Neighbor
K Nearest Neighbor Classifier is considered lazy learning method. We don’t need to fit the model. Given , we find the knearest points to and classify based on majority votes. We estimate K based on crossvalidation technique leading to best performance. There are different kind of distances that can be considered like manhattan distances and euclidean distances. Euclidean distance has been used.
Logistic Elastic Net Classifier
It is regularized classifier with L1 and L2 penalty. It is useful to do automatic variable selection. L1 penalty works best in the sparse solution of variables and L2 works best with multicollinear variables.
(8) 
LightGBM
LightGBM is similar to other tree based models.It uses histogrambased algorithms which bucket continuous feature (attribute) values into discrete bins. This speeds up training and reduces memory usage. It allows leafwise growth compared to levelwise growth. It deals with categorical features differently with sorting the categories based on and then finding best split based on sorted histogram.
Random Forest
Random Forests are an ensemble learning method for classification. It avoids over fitting by fitting multiple trees. It reduces the variance using bagging idea but increases the bias at the same time. The training algorithm applies the general technique of bootstrap aggregating. It selects randomly B samples with replacement and then fit the deep tree and after that prediction made on the unseen data is the average of B trees.
Naive Bayes
Naive bayes is a classification method based on bayes theorem where each feature is independent among themselves given y. It is easy to fit this classification method as we don’t have to optimize the parameters. Assuming y being the class variable and are feature vectors.
(9) 
Assuming independent features,
(10) 
(11) 
Above equation is equivalent to below equation
(12) 
Linear Discriminate Analysis
Linear Discriminate Analysis is a generative classification method. It makes an assumption of normality and independence among features. With strong assumption of normality, it doesn’t support categorical features. We separately fit the distribution for each class.
Consider are the features for each sample with y being the response. We model and model separately, following normal distribution with and respectively and based on bayesian optimal threshold , we calculate the boundary of the classifier. Given observation belongs to second class based on the formula below:
(13) 
Quadratic Discriminate Analysis
This method holds similar assumptions as Linear Discriminate Analysis, with only exception of different variance. This can also accommodate interaction features.
2.3 CrossValidation
CrossValidation is used to estimate the robust hyperparameters , prediction on test data and calculating generalized error of the algorithm. There are different ways of doing crossvalidation such as KFold , Stratified Cross Validation ,LeaveOneOut and many more. Good CrossValidation is the one which doesn’t overfit and performs better on production timeperiod. But generally we tend to overfit in Finance through CrossValidation. We split the data into training set and Validation set, each observation belongs to either group to prevent data leak.
It has been shown that KFold CV provides lower crossvalidation loss but it might lead to worse performance during production trading or during live trading sessions. There are few reasons for that , one of the leading reason is observations are assumed to be following IID property. When we are doing KFold CV , testing dataset gets used multiple times leading to selection and bias.
One of the KFold CV method is timeseries nested crossvalidation technique.
Figure 7
First part shows the vanilla KFold Cross Validation method. Second one is Purged CrossValidation method where we delete some part of the interfering dataset between one Train and Test. This will reduce the leakage model development.
Following is the Foldwise timeperiod distribution:
Fold Distribution  

Fold No  Training Period  Test Period 
Fold 1  AugOct 2017  Nov2017 
Fold 2  AugNov 2017  Dec2017 
Fold 3  AugDec 2017  Jan2018 
Fold 4  Aug2017  Jan2018  Feb2018 
Fold 5  Aug2017  Feb2018  Mar2018 
We have deleted one week data between Training Period and Test Period to incorporate Purging.
2.4 Stacking  Using One Hidden Layer
[6] talks about technique to reduce generalization error rate. It aims to achieve generalization accuracy by combining weak learner. It is considered to be more sophisticated than winnertakesall strategy. Lets assume be one of the generalizer then we combine by taking the output of each generalizer and making it a new space. We can learn the estimates by insample/outsample techniques. It can also be considered a flexible version of crossvalidation.
It generally creates different levels of models having one level of output being the input for next level. Primarily , it removes the biases of all models leading to generalization of all the models. Many also consider it as ”black art” as we have multiple options of keeping diverged models at each level but it has been very effective in producing stable and effective models.
Similarly, stacked regression[7] was introduced where we linearly combine different predictors to improve accuracy. It also introduces nonnegativity constraints over the coefficients. Assuming we have K predictors or generalizers with (y,x) being the given data. are the K generalizers and we combine each generalizer to create level1 data. We leave one fold data to create level1 data, getting , leading to Kvariable input space.
(14) 
Finally, Levelone data is created as , [7] propose to linear combination of each generalizer and parameters are estimated using least square error. It addresses two problems of overfitting and multicollinearity by using levelone data and adding nonnegative constraints over coefficients.
(15) 
We have similar motivation but we have used nonlinear combination of generalizer by inducing hidden layer. It has been shown that hidden layer can be very effective in learning nonlinear functions.
Level 0 has 7 models and Level 1 is hidden layer with 6 nodes. We first trained level 0 models from Aug2017 to Mar2018. We trained complete model with Level 1 hidden layer from Apr2018 to June2018. We have predicted the output as 0/1 for each model in level 0 and final loss is logloss.
Following is the flow for stacked generalizer:
2.5 Feature Selection
For tree models Feature importance can be used as Feature selection methodology. Following are the ways to calculate feature importance:

Gain  In spliting features , we calculate the decrease in gini impurity or entropy which is finally combined to create combined Gain for each feature.

Real Cover  Similarly , when features are splitted, the split occurs over observations. We count the observations where split occured and it is finally combined leading to Real Cover.
[8] and [9] talks about visualizing the feature importance by estimating the variability in the estimated function by varying each particular variable and keeping other variables at their average. It is called partial dependence plot.
Partial dependence function for regression is defined below:
(16) 
Since calculating the integration is difficult, we use Monte Carlo to estimate the partial function.
(17) 
It comes with rigid assumption that feature t is not correlated with rest of the features. For classification , we output probabilities rather than 0/1.
Recently,[10] talks about partial dependencebased variable importance measure. It calculates the standard deviation for continuous predictors and range statistics divided by 4. Lets assume importance measure is i for any variable then
For Stacked generalization, we first analyzed level1 to calculate variable importance for each generalizer and in second step , important generalizer is analyzed further to calculate the variable importance and finally variable importance is calculated by taking average of all. We have Kgeneralizer then is calculated as below:

is calculated based on above formula


is calculated based on above formula for each

This is model independent variable importance for each feature. Its important to consider weight here as it is 2 level model.
3 Results
3.1 Performance
Extreme Gradient Boosting(Xgboost) requires , , , , , , and to be estimated. There are many methods to tune hyperparameters mentioned above. GridSearch crossvalidation , random search and bayesian optimization are among those. Generally GridSearch crossvalidation and bayesian optimization requires lot of time tune parameters. Therefore, we have used random search method to tune parameter. We have used 100 of iteration across each fold to estimate the best hyperparameter. Crossvalidation results are provided in the supplementary materials. In previous we have mentioned performance periods where first level0 data was created and then the output is used for level1 and final performance of stacking is compared with the rest of the models.
AprMay 2018 time period has been used to create level1 data in stacking. Stacked generalizer’s performance compared to rest of the models during June  July 2018. Below is the performance for xgboost:
Xgboost  

Parameter  AprMay 2018  June  July 2018 
AUC 
0.59  0.46 
Accuracy  0.57  0.46 
Precision  0.59  0.48 
Recall  0.59  0.59 
F1  0.59  0.53 
Support Vector Machines(SVM) requires different kernel, gamma and cost as hyperparameters to be tuned. For nonseparating hyperplanes, kernel trick is important where we use radial basis function and sigmoid to map into different space to make it separable. Crossvalidation results are provided in the supplementary materials. Below is the performance for SVM:
SVM  

Parameter  AprMay 2018  June  July 2018 
AUC  0.39  0.53 
Accuracy  0.48  0.50 
Precision  0.51  0.53 
Recall  0.62  0.59 
F1  0.56  0.56 
KNearestNeighbor(KNN) requires number of neighbors to be tuned. Crossvalidation results are provided in the supplementary materials. Below is the performance for KNN:
KNN  

Parameter  AprMay 2018  June  July 2018 
AUC  0.53  0.52 
Accuracy  0.59  0.52 
Precision  0.59  0.52 
Recall  0.66  0.52 
F1  0.62  0.52 
Light Gradient Boosting(LightGBM) requires ,, , , , , , , and . Crossvalidation results are provided in the supplementary materials. Below is the performance for LightGBM:
LightGBM  

Parameter  AprMay 2018  June  July 2018 
AUC  0.62  0.52 
Accuracy  0.61  0.52 
Precision  0.63  0.52 
Recall  0.69  0.52 
F1  0.66  0.52 
Random Forest(RF) requires ,, and . Crossvalidation results are provided in the supplementary materials. Below is the performance for RF:
RF  

Parameter  AprMay 2018  June  July 2018 
AUC  0.55  0.50 
Accuracy  0.55  0.50 
Precision  0.55  0.50 
Recall  0.76  0.59 
F1  0.64  0.54 
Logistic Elastic Net(LogisticENet) requires and . Crossvalidation results are provided in the supplementary materials. Below is the performance for LogisticENet:
LogisticENet  

Parameter  AprMay 2018  June  July 2018 
AUC  0.53  0.50 
Accuracy  0.52  0.50 
Precision  0.52  0.48 
Recall  0.48  0.44 
F1  0.50  0.46 
Naive Bayes(NB) doesn’t require any hyperparameters to be tuned. Below is the performance for NB:
NB  

Parameter  AprMay 2018  June  July 2018 
AUC  0.73  0.41 
Accuracy  0.52  0.50 
Precision  0.68  0.40 
Recall  0.66  0.44 
F1  0.67  0.42 
Linear Discriminate Analysis(LDA) doesn’t require any hyperparameters to be tuned. Below is the performance for LDA:
LDA  

Parameter  AprMay 2018  June  July 2018 
AUC  0.62  0.48 
Accuracy  0.54  0.48 
Precision  0.61  0.50 
Recall  0.66  0.48 
F1  0.63  0.49 
Quadratic Discriminate Analysis(QDA) doesn’t require any hyperparameters to be tuned. Below is the performance for QDA:
QDA  

Parameter  AprMay 2018  June  July 2018 
AUC  0.59  0.55 
Accuracy  0.55  0.52 
Precision  0.60  0.56 
Recall  0.52  0.81 
F1  0.56  0.67 
Performance of generative and discriminative models are very similar with best performance from Quadratic Discriminative Analysis(QDA)
Stacked Generalization(SG) requires number of layers and number of nodes in the hidden layer to be tuned. Below is the performance for SG:
SG  

Parameter  AprMay 2018  June  July 2018 
AUC  0.61  0.50 
Accuracy  0.52  0.54 
Precision  0.61  0.52 
Recall  0.59  0.59 
F1  0.60  0.55 
We have got best accuracy
3.2 Feature Selection
In previous section we showed the methodology to calculate feature importance based on partial dependence plot for stacked generalization. First we have shown the model importance based on stack generalization and then Top 10 individual features importance for each model. We have calculated overall feature importance as defined above.
Naive bayes(NB) and Support Vector Machines(SVM) are not contributing to Stacked model with highest contribution from K nearest neighbor(KNN) and linear discriminate analysis(LDA).
For Xgboost, most important features are features. LightGBM has as important features also. Logistic ElasticNet has only two important features and . KNN has only important feature . Similarly for Random Forest(RF) , and . Support vector Machines (SVM) has as important features. Naive Bayes(NB) has mostly volatility features as important features. Linear Discriminate Analysis(LDA) and Quadratic Discriminate Analysis(QDA) have important features distributed across momentum,trend,volume,volatility and sentiment.
4 Conclusions
Cryptocurrencies direction prediction can be improved by using different generative and discriminative models.The challenge has been to identify the domain of features and producing generalized model to perform well across the timeperiod with different natures of it. CrossValidation is very important for building robust model and its more tricky with timeseries data set. Purged CrossValidation addresses these problems. Stacked generalization technique has been used to create generalized model which has more information compared to individual models leading to better accuracy. Interpreting the machine learning models have become utmost important exercise. Partial dependence plots(PDPs) is used to uncover each model important features and also contribution from each model to stacked model. Having multiple models, we have used new definition based on PDPs to create combined feature importance. Finally, important features can be used to do day trading as well.
5 Appendix
5.1 Volume Technical Indicators
Accumulation/Distribution Index (ADI) 
It is a combination of price and volume and leading indicator.
(18) 
(19) 
On balance volume  It is based on total cumulative volume.
(20) 
On balance volume mean  It is 10 days rolling mean of On balance volume indicator.
Chaikin money flow  It measures the amount of Money Flow Volume over a specific period. We have considered sum of 20 days for the indicator.
(21) 
(22) 
(23) 
Force index  It shows the buying and selling pressure present.
(24) 
Ease of movement  It relates an asset’s price change to its volume
(25) 
(26) 
Volume Price Trend  It is based on running cumulative volume that adds or substracts a mutiple of change in close price.
(27) 
Negative volume index  It is about detecting smart money being active using volume.
(28) 
5.2 Volatility Technical Indicators
Average true range  The indicator provide an indication of the degree of price volatility.
(29) 
(30) 
Bollinger Moving Average  It is the moving average of close price.
(31) 
Bollinger Lower Band  It is lower band at 2 times an 20period standard deviation below the moving average of 20 days.
(32) 
(33) 
(34) 
Bollinger Higher Band  It is higher band at 2 times an 20period standard deviation above the 20 days moving average of close price.
(35) 
(36) 
(37) 
Bollinger Higher Band Indicator  It returns 1, if close is higher than bollinger high band. Else, return 0
Bollinger Lower Band Indicator  It returns 1, if close is lower than bollinger lower band. Else, return 0
Keltner Channel Central  It is 10day simple moving average of typical price.
(38) 
(39) 
Keltner Channel Higher Band  It shows a simple moving average line (high) of typical price.
(40) 
(41) 
Keltner Channel Lower Band  It shows a simple moving average line (low) of typical price.
(42) 
(43) 
Keltner Channel Higher Band Indicator  It return 1 if close price is greater than ,else 0.
Keltner Channel Lower Band Indicator  It return 1 if close price is lower than ,else 0.
Donchian Channel Higher Band  The upper band shows the highest price of an asset for 20 periods.
(44) 
Donchian Channel Lower Band  The lower band shows the highest price of an asset for 20 periods.
(45) 
Donchian Channel Higher Band Indicator  It returns 1 if close is greater than .
Donchian Channel Lower Band Indicator  It returns 1 if close is lower than
5.3 Trend
Moving Average Convergence Divergence(MACD) It is a trendfollowing momentum indicator that shows the relationship between fast and slow moving averages of prices.
(46) 
(47) 
(48) 
Moving Average Convergence Divergence Signal  It is EMA of MACD
Moving Average Convergence Divergence Diff  It is difference between MACD and MACD Signal
Exponential Moving Average  It is exponential moving average of close price
Average Directional Movement Index(ADX)  It is 14 day averages of the difference between +DI and DI, and indicates the strength of the trend.
(49) 
(50) 
(51) 
(52) 
(53) 
(54) 
(55) 
(56) 
(57) 
(58) 
Average Directional Movement Index Positive (ADX)  It is +DI()
Average Directional Movement Index Negative (ADX)  It is DI()
Average Directional Movement Index Indicator (ADX)  It returns 1 if difference between +DI and DI greater than 0 and else 0.
Vortex Indicator Positive (VI)  It captures bullish signal when positive oscillator trend crosses negative oscillator trend.
(59) 
(60) 
(61) 
(62) 
(63) 
Vortex Indicator Negative (VI)  It captures bearish signal when negative oscillator trend crosses positive oscillator trend.
(64) 
(65) 
(66) 
(67) 
(68) 
Trix  It shows the percent rate of change of a triple exponentially smoothed moving average.
(69) 
(70) 
(71) 
(72) 
Mass Index(MI)  It uses the highlow range to identify trend reversals based on range expansions. It identifies range bulges that can foreshadow a reversal of the current trend.
(73) 
(74) 
(75) 
(76) 
(77) 
Commodity Channel Index(CCI)  CCI measures the difference between a security’s price change and its average price change. High positive readings indicate that prices are well above their average, which is a show of strength. Low negative readings indicate that prices are well below their average, which is a show of weakness.
(78) 
(79) 
Detrended Price Oscillator (DPO)  It is an indicator designed to remove trend from price and make it easier to identify cycles.
(80) 
KST Oscillator (KST)  It is useful to identify major stock market cycle junctures because its formula is weighed to be more greatly influenced by the longer and more dominant time spans, in order to better reflect the primary swings of stock market cycle. r1=10, r2=15, r3=20, r4=30, n1=10, n2=10, n3=10 and n4=15.
(81) 
(82) 
(83) 
(84) 
(85) 
KST Oscillator (KST Signal)  It is useful to identify major stock market cycle junctures because its formula is weighed to be more greatly influenced by the longer and more dominant time spans, in order to better reflect the primary swings of stock market cycle
(86) 
Ichimoku Kinkō Hyō A (Ichimoku)  It identifies the trend and look for potential signals within that trend.
(87) 
(88) 
(89) 
(90) 
Ichimoku Kinkō Hyō B (Ichimoku)  It identifies the trend and look for potential signals within that trend.
(91) 
(92) 
5.4 Momentum
Relative Strength Index (RSI)  It Compares the magnitude of recent gains and losses over a specified time period to measure speed and change of price movements of a security.
(93) 
(94) 
(95) 
True strength index (TSI)  It Shows both trend direction and overbought/oversold conditions.
(96) 
(97) 
(98) 
(99) 
Ultimate Oscillator  A momentum oscillator designed to capture momentum across three different timeframes.
(100) 
(101) 
(102) 
(103) 
(104) 
(105) 
Stochastic Oscillator  Developed in the late 1950s by George Lane. The stochastic oscillator presents the location of the closing price of a stock in relation to the high and low range of the price of a stock over a period of time, typically a 14day period.
(106) 
(107) 
(108) 
Stochastic Oscillator Signal  It shows SMA of Stochastic Oscillator. Typically a 3 day SMA.
(109) 
Williams %R  Developed by Larry Williams, Williams %R is a momentum indicator that is the inverse of the Fast Stochastic Oscillator. Also referred to as %R, Williams %R reflects the level of the close relative to the highest high for the lookback period. In contrast, the Stochastic Oscillator reflects the level of the close relative to the lowest low. %R corrects for the inversion by multiplying the raw value by 100. As a result, the Fast Stochastic Oscillator and Williams %R produce the exact same lines, only the scaling is different. Williams %R oscillates from 0 to 100. lbp = 14
(110) 
(111) 
(112) 
Awesome Oscillator  The Awesome Oscillator is an indicator used to measure market momentum. AO calculates the difference of a 34 Period and 5 Period Simple Moving Averages. The Simple Moving Averages that are used are not calculated using closing price but rather each bar’s midpoints. AO is generally used to affirm trends or to anticipate possible reversals.
(113) 
(114) 