ForwardThinking:BuildingDeepRandomForests

# ForwardThinking:BuildingDeepRandomForests

KevinMiller,ChrisHettinger,JeffreyHumpherys,TylerJarvis,andDavidKartchner
DepartmentofMathematics
BrighamYoungUniversity
Provo,Utah84602
millerk5@byu.edu,hettinger@math.byu.edu,jeffh@math.byu.edu,
jarvis@math.byu.edu,david.kartchner@math.byu.edu

## 2 Mathematicaldescriptionofforwardthinking

Themainideaofforwardthinkingisthatneuronscanbegeneralizedtoanytypeoflearnerandthen,oncetrained,theinputdataaremappedforwardthroughthelayertocreateanewlearningproblem.Theprocessisthenrepeated,transformingthedatathroughmultiplelayers,oneatatime,renderinganewdataset,whichisexpectedtobebetterbehaved,andonwhichafinaloutputlayercanachievegoodperformance.

### Theinputlayer

Thedataaregivenasthesetofinputvaluesfromasetandtheircorrespondingoutputsinaset.Inmanylearningproblems,,whichmeansthattherearereal-valuedfeatures.Iftheinputsareimages,wecanstackthemaslargevectorswhereeachpixelisacomponent.Insomedeeplearningproblems,eachinputisastackofimages.Forexample,colorimagescanberepresentedasthreeseparatemonochromaticimages,orthreeseparatechannelsoftheimage.Forbinaryclassificationproblems,theoutputspacecanbetakentobe.Formulti-classproblemsweoftenset.

### Thefirsthiddenlayer

Letbeasetoflearningfunctions,,forsomecodomainwithparameters.Thislayeroflearningfunctions(orlearners)canberegression,classification,orkernelfunctionsandcanbethoughtofasdefiningnewfeatures.Letandtransformtheinputstoaccordingtothemap

 x(1)i=(C(1)1(x(0)i),C(1)2(x(0)i),…,C(1)m(x(0)i))⊂X(1),i=1,…,N.

Thisgivesanewdataset.Inmanylearningproblems,inwhichcasethenewdomainisahypercube.Itisalsocommonfor,inwhichcaseisthe-dimensionalorthant.Thegoalistochoosetomakethenewdataset“moreseparable,”orbetter-behaved,thanthepreviousdataset.Aswerepeatthisprocessiteratively,thedatashouldbecomeincreasinglybetter-behavedsothatinthefinallayer,asinglelearnercanfinishthejob.

Letbeaset(layer)oflearningfunctions.Thislayerisagaintrainedonthedata.Thiswouldusuallybedoneinthesamemannerasthepreviouslayer,butitneednotbethesame;forexample,ifthenewlayerconsistsofdifferentkindsoflearners,thenthetrainingmethodforthenewlayermightalsoneedtodiffer.Aswiththefirstlayer,theinputsaretransformedtoanewdomainaccordingtothemap

 \x(ℓ)i=(C(ℓ)1(\x(ℓ−1)i),C(ℓ)2(\x(ℓ−1)i),…,C(ℓ)mℓ(\x(ℓ−1)i)),i=1,…,N.

Thisgivesanewdataset,andtheprocessisrepeated.

#### Finallayer

Afterpassingthedatathroughthelasthiddenlayer,wetrainthefinallayer,whichconsistsofasinglelearningfunctiononthedatasettodeterminetheoutputs,whereisexpectedtobeclosetoforeach.

## 3 Forwardthinkingdeeprandomforestarchitecture

### 3.2 gcForestcomparison

UnlikeZhouandFeng’sarchitectureforgcForest,ourdeeparchitectureofdecisiontreesonlyrequiresthepreviouslayer’soutput.InZhouF17 ,eachlayerpassesboththeclassprobabilitiespredictedbytherandomforests(nottheindividualdecisiontrees)andtheoriginaldatatoeachsubsequentlayer.Ourmodel,ontheotherhand,passesonlytheoutputofthepreviouslayerofindividualdecisiontreestothenextlayer,toreducethespatialcomplexityofnetworktrainingandtesting.Moreover,FTDRFseemstoneedfewertreesineachlayer.Forexample,inourFTDRFdescribedinSection5weobtainedresultscomparabletoZhouF17 onMNIST,butweuseonlydecisiontreesineachlayer,whereasZhouF17 uses4randomforestsoftreeseach(ortreesperlayer).Anotherdistinctionisthatourfinalroutineusesinformationgainentropytocalculatenodesplits,whereasgcForestimplementsginiimpurity.Wealsoransometestswithginiimpuritytodeterminenodesplits,butfoundthatentropyusuallyperformedbetter.

### 3.3 Half-halfrandomforestlayers

Asisstandardinrandomforests,anodesplitinagivendecisiontreeisdeterminedfromarandomsubsetcontainingfeaturesoftheinputdatapassedtothelayer.Inagivenlayer,thecollectionofdecisiontreesrepresentingthelayercontainsbothrandomdecisiontrees,aswellasextrarandomtreestointroducemorevarietyintothelayer.ThisissimilartothelayersofZhouF17 ,whereoftherandomforestsinagivenlayer,ofthemarecompletelyrandomforestsLiu_2008 ,closelyrelatedtoextrarandomforests.Anextrarandomforestincreasestreerandomizationbychoosingarandomsplittingvalueforeachofthefeaturessubsettodeterminethenodesplit.Inourscheme,werandomlyassigntreestobeofthistypebasedonaBernoullidrawof.

## 4 Preprocessingofimagedata

ThedecisiontreestructureofFTDRFrequiressufficienttrainingdatatoavoidoverfittinginthefirstfewlayers.Thestate-of-the-artalgorithmsfordealingwithimagedatainclassificationusepreprocessingandtransformingtechniques,suchasconvolutions.Accordingly,weexperimentedwithtwoofthesetechniquesforFTDRF:asingle-pixel“wiggle”andmulti-grainedscanning(MGS).

### 4.1 Single-pixelwiggle

Forthedatasetusedhere,weaugmentedthetrainingdatabyasinglepixel“wiggle”technique.Thatis,foreachtrainingimageintheMNISTtrainingset,weincludecopiesoftheimagesshiftedaroundindiagonaldirections(up-left,up-right,down-left,anddown-right)byonepixel,seeFigure2.ThisdataaugmentationyieldstheresultsseeninTable1.AfurtherwaytoaugmentthefeaturerepresentationoftheimagesispresentedinthefollowingSection4.2,viaaroutinecalledMulti-GrainedScanningZhouF17 .

### 4.2 Multi-grainedscanning(MGS)

InZhouF17 ,aschemesimilartoconvolutionisproposed,termedMulti-GrainedScanning(MGS),whichweimplementedfortheFTDRFarchitecture.WeusetheexactsameprocessthatZhouandFengdointheirMGSschemeZhouF17 soastobeabletocomparetheresultsofourarchitectureinthesubsequentnetworkstructureFTDRF.WeviewthisMGSprocessasapreprocessingtransformationakintotheconvolutionsofconvolutionalneuralnetworks(CNNs),withthebenefitsandstrengthsthatsuchtransformationsprovide.

InMGS,windowsofasetnumberofsizesareobtainedinsidethetrainingsetimages(fortheMNISTdatasetwindowsizesare,,and).Foragivenwindowsize,thecorrespondingwindowscontainedinsideofalltrainingimagesareusedasatrainingsettoconstructarandomforestandanextrarandomforestwhoseoutputsaretheclassprobabilities.UnlikeourroutineforthebuildingoftheFTDRFlayersthatoutputtheclassprobabilitiesdeterminedbyeachindividualdecisiontreeinthelayer,thisschemeoutputstheclassprobabilitiesdeterminedbythewholerandomforest.Hence,foragivenwindowsize,theoutputoftherandomforestforeachimagewindowisavectoroftheclassprobabilities.Forallsamplesfedthroughtheserandomforests,theoutputsofallimagewindowsareconcatenatedtogethertoproduceafeaturevectorrepresentingclassificationprobabilitiesofeachofthewindows(seeFigure3).Withthewindowsizesspecified,theoutputsofeachoftherandomforestsfortherespectivewindowsizesareallconcatenatedtogether.ThisfeaturevectoristhenewrepresentationofeachgivensamplefedthroughtheMGSprocess.Withthistransformationofthetrainingdata(andsubsequentlythetestingdata),wetraintheFTDRFlayersaspreviouslydescribedinSection3.

## 5 FTDRFresultsonMNIST

WepresentresultsforanFTDRFontheMNISThandwritingdigitrecognitiondataset,whereeachsampleisablackandwhiteimageofisolateddigits,writtenbydifferentpeople.Thedatasetissplitintoatrainingsetwith(seenotebelow)samplesandtestingsetofsamples.

### 5.1 Resultswithsingle-pixelwiggle

Foreachtrainingimage,wecreatedmoreimagesviathesingle-pixelwigglingtechniquetoaugmentthesizeofthetrainingdata.ThelayersofFTDRFcontaineddecisiontrees(randomdecisiontreesandextrarandomtrees).Layersweregrownuntilthetherelativegainwaslessthan,totalinglayers.Nodesplitsweredeterminedbycalculatingtheinformationgainentropy.WeciteourresultsandtheresultsofZhouandFengZhouF17 tocompare,astheirarchitectureismostrelevanttoours.Wenote,however,thatZhouandFengdonotaugmentdatainthistest.Theresultsare:

### 5.2 ResultswithMGS

Table 2presentstheresultsofourarchitecturecomparedtoZhou’sgcForestZhouF17 ,includingtheMGSpreprocessingroutine.Notethenthatwedonotaugmentthedatasetwiththesinglepixelaugmentationaswedidpreviously.Inthistest,windowsizesof7,9,and14wereusedfortheMGSstep,creatingatotalofrandomforests(randomforestsandextrarandomforests)totransformthedatafortheFTDRFtraining.Then,trainingdatawaspassedthroughtotheFTDRFstep,wherelayersconsistedofdecisiontrees(randomdecisiontreesandextrarandomtrees)butinthisstep,onlylayerswerenecessarytoachievethedesiredrelativevalidationerrorthreshold.Theresultsare:

## 6 Relatedwork

### Reproducibility

Allpythoncodeusedtoproduceourresultsisavailableinourgithubrepositoryathttps://github.com/tkchris93/ForwardThinking.

#### Acknowledgments

ThisworkwassupportedinpartbytheNationalScienceFoundation,GrantNumber1323785andtheDefenseThreatReductionAgency,GrantNumberHDRTA1-15-0049.

