ForwardThinking:BuildingDeepRandomForests

# ForwardThinking:BuildingDeepRandomForests

KevinMiller,ChrisHettinger,JeffreyHumpherys,TylerJarvis,andDavidKartchner
DepartmentofMathematics
BrighamYoungUniversity
Provo,Utah84602
millerk5@byu.edu,hettinger@math.byu.edu,jeffh@math.byu.edu,
jarvis@math.byu.edu,david.kartchner@math.byu.edu

## 2 Mathematicaldescriptionofforwardthinking

Themainideaofforwardthinkingisthatneuronscanbegeneralizedtoanytypeoflearnerandthen,oncetrained,theinputdataaremappedforwardthroughthelayertocreateanewlearningproblem.Theprocessisthenrepeated,transformingthedatathroughmultiplelayers,oneatatime,renderinganewdataset,whichisexpectedtobebetterbehaved,andonwhichafinaloutputlayercanachievegoodperformance.

### Theinputlayer

Thedataaregivenasthesetofinputvaluesfromasetandtheircorrespondingoutputsinaset.Inmanylearningproblems,,whichmeansthattherearereal-valuedfeatures.Iftheinputsareimages,wecanstackthemaslargevectorswhereeachpixelisacomponent.Insomedeeplearningproblems,eachinputisastackofimages.Forexample,colorimagescanberepresentedasthreeseparatemonochromaticimages,orthreeseparatechannelsoftheimage.Forbinaryclassificationproblems,theoutputspacecanbetakentobe.Formulti-classproblemsweoftenset.

### Thefirsthiddenlayer

Letbeasetoflearningfunctions,,forsomecodomainwithparameters.Thislayeroflearningfunctions(orlearners)canberegression,classification,orkernelfunctionsandcanbethoughtofasdefiningnewfeatures.Letandtransformtheinputstoaccordingtothemap

 x(1)i=(C(1)1(x(0)i),C(1)2(x(0)i),…,C(1)m(x(0)i))⊂X(1),i=1,…,N.

Thisgivesanewdataset.Inmanylearningproblems,inwhichcasethenewdomainisahypercube.Itisalsocommonfor,inwhichcaseisthe-dimensionalorthant.Thegoalistochoosetomakethenewdataset“moreseparable,”orbetter-behaved,thanthepreviousdataset.Aswerepeatthisprocessiteratively,thedatashouldbecomeincreasinglybetter-behavedsothatinthefinallayer,asinglelearnercanfinishthejob.

Letbeaset(layer)oflearningfunctions.Thislayerisagaintrainedonthedata.Thiswouldusuallybedoneinthesamemannerasthepreviouslayer,butitneednotbethesame;forexample,ifthenewlayerconsistsofdifferentkindsoflearners,thenthetrainingmethodforthenewlayermightalsoneedtodiffer.Aswiththefirstlayer,theinputsaretransformedtoanewdomainaccordingtothemap

 \x(ℓ)i=(C(ℓ)1(\x(ℓ−1)i),C(ℓ)2(\x(ℓ−1)i),…,C(ℓ)mℓ(\x(ℓ−1)i)),i=1,…,N.

Thisgivesanewdataset,andtheprocessisrepeated.

#### Finallayer

Afterpassingthedatathroughthelasthiddenlayer,wetrainthefinallayer,whichconsistsofasinglelearningfunctiononthedatasettodeterminetheoutputs,whereisexpectedtobeclosetoforeach.

## 3 Forwardthinkingdeeprandomforestarchitecture

### 3.2 gcForestcomparison

UnlikeZhouandFeng’sarchitectureforgcForest,ourdeeparchitectureofdecisiontreesonlyrequiresthepreviouslayer’soutput.InZhouF17 ,eachlayerpassesboththeclassprobabilitiespredictedbytherandomforests(nottheindividualdecisiontrees)andtheoriginaldatatoeachsubsequentlayer.Ourmodel,ontheotherhand,passesonlytheoutputofthepreviouslayerofindividualdecisiontreestothenextlayer,toreducethespatialcomplexityofnetworktrainingandtesting.Moreover,FTDRFseemstoneedfewertreesineachlayer.Forexample,inourFTDRFdescribedinSection5weobtainedresultscomparabletoZhouF17 onMNIST,butweuseonlydecisiontreesineachlayer,whereasZhouF17 uses4randomforestsoftreeseach(ortreesperlayer).Anotherdistinctionisthatourfinalroutineusesinformationgainentropytocalculatenodesplits,whereasgcForestimplementsginiimpurity.Wealsoransometestswithginiimpuritytodeterminenodesplits,butfoundthatentropyusuallyperformedbetter.

### 3.3 Half-halfrandomforestlayers

Asisstandardinrandomforests,anodesplitinagivendecisiontreeisdeterminedfromarandomsubsetcontainingfeaturesoftheinputdatapassedtothelayer.Inagivenlayer,thecollectionofdecisiontreesrepresentingthelayercontainsbothrandomdecisiontrees,aswellasextrarandomtreestointroducemorevarietyintothelayer.ThisissimilartothelayersofZhouF17 ,whereoftherandomforestsinagivenlayer,ofthemarecompletelyrandomforestsLiu_2008 ,closelyrelatedtoextrarandomforests.Anextrarandomforestincreasestreerandomizationbychoosingarandomsplittingvalueforeachofthefeaturessubsettodeterminethenodesplit.Inourscheme,werandomlyassigntreestobeofthistypebasedonaBernoullidrawof.

## 4 Preprocessingofimagedata

ThedecisiontreestructureofFTDRFrequiressufficienttrainingdatatoavoidoverfittinginthefirstfewlayers.Thestate-of-the-artalgorithmsfordealingwithimagedatainclassificationusepreprocessingandtransformingtechniques,suchasconvolutions.Accordingly,weexperimentedwithtwoofthesetechniquesforFTDRF:asingle-pixel“wiggle”andmulti-grainedscanning(MGS).

### 4.1 Single-pixelwiggle

Forthedatasetusedhere,weaugmentedthetrainingdatabyasinglepixel“wiggle”technique.Thatis,foreachtrainingimageintheMNISTtrainingset,weincludecopiesoftheimagesshiftedaroundindiagonaldirections(up-left,up-right,down-left,anddown-right)byonepixel,seeFigure2.ThisdataaugmentationyieldstheresultsseeninTable1.AfurtherwaytoaugmentthefeaturerepresentationoftheimagesispresentedinthefollowingSection4.2,viaaroutinecalledMulti-GrainedScanningZhouF17 .

### 4.2 Multi-grainedscanning(MGS)

InZhouF17 ,aschemesimilartoconvolutionisproposed,termedMulti-GrainedScanning(MGS),whichweimplementedfortheFTDRFarchitecture.WeusetheexactsameprocessthatZhouandFengdointheirMGSschemeZhouF17 soastobeabletocomparetheresultsofourarchitectureinthesubsequentnetworkstructureFTDRF.WeviewthisMGSprocessasapreprocessingtransformationakintotheconvolutionsofconvolutionalneuralnetworks(CNNs),withthebenefitsandstrengthsthatsuchtransformationsprovide.

InMGS,windowsofasetnumberofsizesareobtainedinsidethetrainingsetimages(fortheMNISTdatasetwindowsizesare,,and).Foragivenwindowsize,thecorrespondingwindowscontainedinsideofalltrainingimagesareusedasatrainingsettoconstructarandomforestandanextrarandomforestwhoseoutputsaretheclassprobabilities.UnlikeourroutineforthebuildingoftheFTDRFlayersthatoutputtheclassprobabilitiesdeterminedbyeachindividualdecisiontreeinthelayer,thisschemeoutputstheclassprobabilitiesdeterminedbythewholerandomforest.Hence,foragivenwindowsize,theoutputoftherandomforestforeachimagewindowisavectoroftheclassprobabilities.Forallsamplesfedthroughtheserandomforests,theoutputsofallimagewindowsareconcatenatedtogethertoproduceafeaturevectorrepresentingclassificationprobabilitiesofeachofthewindows(seeFigure3).Withthewindowsizesspecified,theoutputsofeachoftherandomforestsfortherespectivewindowsizesareallconcatenatedtogether.ThisfeaturevectoristhenewrepresentationofeachgivensamplefedthroughtheMGSprocess.Withthistransformationofthetrainingdata(andsubsequentlythetestingdata),wetraintheFTDRFlayersaspreviouslydescribedinSection3.

## 5 FTDRFresultsonMNIST

WepresentresultsforanFTDRFontheMNISThandwritingdigitrecognitiondataset,whereeachsampleisablackandwhiteimageofisolateddigits,writtenbydifferentpeople.Thedatasetissplitintoatrainingsetwith(seenotebelow)samplesandtestingsetofsamples.

### 5.1 Resultswithsingle-pixelwiggle

Foreachtrainingimage,wecreatedmoreimagesviathesingle-pixelwigglingtechniquetoaugmentthesizeofthetrainingdata.ThelayersofFTDRFcontaineddecisiontrees(randomdecisiontreesandextrarandomtrees).Layersweregrownuntilthetherelativegainwaslessthan,totalinglayers.Nodesplitsweredeterminedbycalculatingtheinformationgainentropy.WeciteourresultsandtheresultsofZhouandFengZhouF17 tocompare,astheirarchitectureismostrelevanttoours.Wenote,however,thatZhouandFengdonotaugmentdatainthistest.Theresultsare:

### 5.2 ResultswithMGS

Table 2presentstheresultsofourarchitecturecomparedtoZhou’sgcForestZhouF17 ,includingtheMGSpreprocessingroutine.Notethenthatwedonotaugmentthedatasetwiththesinglepixelaugmentationaswedidpreviously.Inthistest,windowsizesof7,9,and14wereusedfortheMGSstep,creatingatotalofrandomforests(randomforestsandextrarandomforests)totransformthedatafortheFTDRFtraining.Then,trainingdatawaspassedthroughtotheFTDRFstep,wherelayersconsistedofdecisiontrees(randomdecisiontreesandextrarandomtrees)butinthisstep,onlylayerswerenecessarytoachievethedesiredrelativevalidationerrorthreshold.Theresultsare:

## 6 Relatedwork

### Reproducibility

Allpythoncodeusedtoproduceourresultsisavailableinourgithubrepositoryathttps://github.com/tkchris93/ForwardThinking.

#### Acknowledgments

ThisworkwassupportedinpartbytheNationalScienceFoundation,GrantNumber1323785andtheDefenseThreatReductionAgency,GrantNumberHDRTA1-15-0049.

## References

• [1] GerardBiau,ErwanScornet,andJohannesWelbl. Neuralrandomforests.
• [2] LeoBreiman. Randomforests. MachineLearning,pages45:5–32,2001.
• [3] TianqiChenandCarlosGuestrin. Xgboost:Ascalabletreeboostingsystem. CoRR,abs/1603.02754,2016.
• [4] PierreGeurts,DamienErnst,andLouisWehenkel. Extremelyrandomizedtrees. MachineLearning,63(1):3–42,2006.
• [5] KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun. Deepresiduallearningforimagerecognition. CoRR,abs/1512.03385,2015.
• [6] ChrisHettinger,TannerChristensen,BenEhlert,JeffreyHumpherys,TylerJarvis,andSeanWade. Forwardthinking:Buildingandtrainingneuralnetworksonelayeratatime. 2017. Preprint.
• [7] TorstenHothorn,KurtHornik,andAchimZeileis. Unbiasedrecursivepartitioning:Aconditionalinferenceframework. JournalofComputationalandGraphicalStatistics,15(3):651–674,2006.
• [8] PeterKontschieder,MadalinaFiterau,AntonioCriminisi,andSamuelRota Bulò . Deepneuraldecisionforests. June2016.
• [9] AlexKrizhevsky,IlyaSutskever,andGeoffrey EHinton. Imagenetclassificationwithdeepconvolutionalneuralnetworks. InF. Pereira,C. J. C.Burges,L. Bottou,andK. Q.Weinberger,editors,AdvancesinNeuralInformationProcessingSystems25,pages1097–1105.CurranAssociates,Inc.,2012.
• [10] YannLeCun,LéonBottou,Genevieve B.Orr,andKlaus-RobertMuller. Effiicientbackprop. InNeuralNetworks:TricksoftheTrade,ThisBookisanOutgrowthofa1996NIPSWorkshop,pages9–50,London,UK,UK,1998.Springer-Verlag.
• [11] YannLecun,LéonBottou,YoshuaBengio,andPatrickHaffner. Gradient-basedlearningappliedtodocumentrecognition. InProceedingsoftheIEEE,pages2278–2324,1998.
• [12] YannLeCun,CorinnaCortes,andChristopher J.C.Burges. Themnistdatabaseofhandwrittendigits. Accessed:2017-05-19.
• [13] Fei TonyLiu,Kai MingTing,YangYu,andZhi-HuaZhou. Spectrumofvariable-randomtrees. J.Artif.Int.Res.,32(1):355–384,May2008.
• [14] Kevin P.Murphy. MachineLearning:AProbabilisticPerspective. TheMITPress,2012.
• [15] F. Pedregosa,G. Varoquaux,A. Gramfort,V. Michel,B. Thirion,O. Grisel,M. Blondel,P. Prettenhofer,R. Weiss,V. Dubourg,J. Vanderplas,A. Passos,D. Cournapeau,M. Brucher,M. Perrot,andE. Duchesnay. Scikit-learn:MachinelearninginPython. JournalofMachineLearningResearch,12:2825–2830,2011.
• [16] RichardSocher,AlexPerelygin,JeanWu,JasonChuang,Christopher D.Manning,AndrewNg,andChristopherPotts. Recursivedeepmodelsforsemanticcompositionalityoverasentimenttreebank. InProceedingsofthe2013ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages1631–1642,Seattle,Washington,USA,October2013.AssociationforComputationalLinguistics.
• [17] JohannesStallkamp,MarcSchlipsing,JanSalmen,andChristianIgel. Manvs.computer:Benchmarkingmachinelearningalgorithmsfortrafficsignrecognition. NeuralNetworks,32:323–332,2012.
• [18] AlexanderStatnikov,LilyWang,andConstantin F.Aliferis. Acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarray-basedcancerclassification. BMCBioinformatics,9(1):319,2008.
• [19] ChristianWolf. Randomforestsv.deeplearning. 2016.
• [20] David H.Wolpert. Stackedgeneralization. NeuralNetworks,5:241–259,1992.
• [21] Zhi-HuaZhou. EnsembleMethods:FoundationsandAlgorithms. ChapmanandHall/CRC,1stedition,2012.
• [22] Zhi-HuaZhouandJi Feng. Deepforest:Towardsanalternativetodeepneuralnetworks. CoRR,abs/1702.08835,2017.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters