# Crime prediction through urban metrics and statistical learning

###### Abstract

Understanding the causes of crime is a longstanding issue in researcher’s agenda. While it is a hard task to extract causality from data, several linear models have been proposed to predict crime through the existing correlations between crime and urban metrics. However, because of non-Gaussian distributions and multicollinearity in urban indicators, it is common to find controversial conclusions about the influence of some urban indicators on crime. Machine learning ensemble-based algorithms can handle well such problems. Here, we use a random forest regressor to predict crime and quantify the influence of urban indicators on homicides. Our approach can have up to of accuracy on crime prediction and the importance of urban indicators is ranked and clustered in groups of equal influence, which are robust under slightly changes in the data sample analyzed. Our results determine the rank of importance of urban indicators to predict crime, unveiling that unemployment and illiteracy are the most important variables for describing homicides in Brazilian cities. We further believe that our approach helps in producing more robust conclusions regarding the effects of urban indicators on crime, having potential applications for guiding public policies for crime control.

^{†}

^{†}journal: Physica. A

## 1 Introduction

Social phenomena increasingly attract the attention of physicists driven by successful application of methods from statistical physics for modeling and describing several social systems Pentland (2014); Castellano et al. (2009); D’Orsogna and Perc (2015). In the particular case of crime, this interest trace back to works of Quetelet, who coined the term “social physics” in the 19th century Quetlet (1869). On the one hand, traditional physics methods have proved to be useful in understanding phenomena outside conventional physics Galam (2012); Conte et al. (2012). On the other hand, recently, several problems from physics have been addressed through the lenses of machine learning methods, including topics related to phases of matter Carrasquilla and Melko (2017), quantum many-body problem Carleo and Troyer (2017), phase transitions van Nieuwenburg et al. (2017), phases of strongly correlated fermions Châng et al. (2017), among others. As physicists had added these new tools to the box Zdeborová (2017), naturally, social physics problems could also be addressed using such ideas. In particular, in this work, we are interested in understanding the relationships between crime and urban metrics by using statistical learning.

Crime and violence are ubiquitous in society. Throughout history, organized societies have tried to prevent crime following several approaches Gordon et al. (2009). In this context, understanding the features associated to crime is essential for achieving effective policies against these illegal activities. Studies have linked crime with several factors, including psychological traits Kamaluddin et al. (2015); Gottfredson and Hirschi (1990), environmental conditions Gamble and Hess (2012); Hsiang et al. (2013), spatial patternsShort et al. (2008); Alves et al. (2015); D’Orsogna and Perc (2015), and social and economic indicators Becker (1968); Ehrlich (1973); Wilson and Kelling (1982); Glaeser et al. (1996). However, it is easy to find controversial explanations for the causes of crimes Gordon (2010). Methodological problems in data aggregation and selection Levitt (2001); Spelman (2008), errors related to data reporting Maltz and Targonski (2002), and wrong statistical hypothesis Gordon (2010) are just a few issues which can lead to misleading conclusions.

A large fraction of the literature on statistical analysis in criminology tries to relate the number of a particular crime (e.g. robbery) with explicative variables such as unemployment Raphael and Winter-Ebmer (2001) and income Kelly (2000). In general, these analyses are carried out by using ordinary-least-square linear regressions Alves et al. (2015). These standard linear models usually assume that the predictors have weak exogeneity (error-free variables), linearity, constant variance (homoscedasticity), normal residual distribution, and lack of multicollinearity. However, when trying to model crime, several of these assumptions are, often, not satisfied. When these hypotheses do not hold, conclusions about the factors affecting crime are likely to be misconceptions.

Recently, researchers have promoted an impressive progress on the analysis of cities, where one of the main findings is that the relationship between urban metrics and population size is not linear, but it is well described by a power-law function Bettencourt et al. (2007, 2010); Alves et al. (2013a, b, 2014); Hanley et al. (2016); Leitao et al. (2016). Crime indicators scale as a superlinear function of the population size of cities Bettencourt et al. (2010); Alves et al. (2013b, 2015). Other indicators (commonly used as predictors in linear regression models for crime forecasting) also exhibit power-law behavior with population size. These metrics are categorized into sub-linear (e.g. family income Alves et al. (2013b)), linear (e.g. sanitation Alves et al. (2013b)), and super-liner (e.g. GDP Bettencourt et al. (2007); Alves et al. (2013b, 2015)), depending on the power-law exponent characterizing the allometric relationship with the population size Bettencourt et al. (2007). In addition, the relationships between crime and population size as well as urban metrics and population size have some degree of heteroscedasticity Bettencourt et al. (2007), and most of these urban indicators also follow heavy-tailed distributions Marsili and Zhang (1998); Alves et al. (2014). Thus, it is not surprising to find controversial results about the importance of variables for crime prediction when so many assumptions of linear regressions are not satisfied.

A possible approach to overcome some of these issues is to apply a transformation to the data in order to satisfy the assumptions of linear regressions. For instance, Bettencourt et al. Bettencourt et al. (2010) (see also Alves et al. (2013b, 2015)) employed scaled-adjusted metrics to linearize the data and provide a fair comparison between cities with different population sizes. By considering these variables and applying corrections for heteroscedasticity Davidson and MacKinnon (1993), it is possible to describe of the variance of the number of homicides in function of urban metrics Alves et al. (2013b). Also, by the same approach, researchers have shown that simple linear models account for – of the observed variance in data and correctly reproduce the average of the scale-adjusted metric Alves et al. (2015). However, the data still have co-linearities which can lead to misinterpretation of the coefficients in the linear models Alves et al. (2013b, 2015).

A better approach to crime prediction is the use of statistical learning methods (e.g. Kang and Kang (2017)). Regression models based on machine learning can handle all the above-mentioned issues and are more suitable for the analysis of large complex datasets Breiman (2003). For instance, decision trees are known to require little preparation of the data when performing regression Breiman (2001); Hastie et al. (2013); James et al. (2014); Death and Fabricius (2000). Tree-based approaches are also considered a non-parametric method because they make no assumption about the data. Among other advantages, these learning approaches map well non-linear relationships, usually display a good accuracy when predicting data, and are easy to interpret Breiman (2001); Hastie et al. (2013); James et al. (2014); Death and Fabricius (2000).

Here, we consider the random forest algorithm to predict and quantify the importance of urban indicators for crime prediction. We use data from urban indicators of all Brazilian cities to train the model and study necessary conditions for preventing underfitting and overfitting in the model. After training the model, we show that the algorithm predicts the number of homicides in cities with an accuracy up to of the variance explained. Because of the high accuracy and easy interpretation of this ensemble tree model, we identify the important features for homicide prediction. Unlike linear models, we show that the importance of the features is stable under slight changes in the dataset and that these results can be used as a guide for crime modeling and policymakers.

## 2 Methods and Results

### 2.1 Data

For our analysis, we choose the number of homicides at the city level as the crime indicator to be predicted. Homicide is the ultimate expression of violence against a person, and thus a reliable crime indicator because it is almost always reported. In Brazil, the report of this particular crime to the Public Health System is compulsory, and these data are aggregated at the city level and made freely available by the Department of Informatics of the Brazilian Public Health System – DATASUS Brazilâs Public healthcare System (SUS) (2017). As possible predictor variables of crime, we select 10 urban indicators (also at the city level) available from the Brazilian National Census that took place in 2000. They are: child labor (fraction of the population aged 10 to 15 years who is working or looking for work), elderly population (citizens aged 60 years or older), female population, gross domestic product (GDP) , illiteracy (citizens aged 15 years or older who are unable to read and write at least a single ticket in the language they know), family income (average household incomes of family residents, in Brazilian currency), male population, population, sanitation (number of houses that have piped water and sewerage), and unemployment (citizens aged 16 years or older who are without working or looking for work). We have also considered as possible crime-predicting variables the number of traffic accidents and suicides (these data are also aggregated at the city level and from the same year of the census). There are thus 13 urban indicators (including also the homicide indicator) from the year 2000 in our dataset that will be used as predictors of the number of homicides 10 years later. In order to test for an autocorrelated behavior in the crime indicator, we further include the number of homicides in the year 2000 as a predictor for homicides in 2010. We choose a time interval of 10 years because the characteristic time for changes in social indicators has been estimated to be of the order of decades Bettencourt et al. (2010); Alves et al. (2015).

### 2.2 Problems with usual linear models

Many statistical models try to relate crime with possible explicative variables through least square linear regressions Gordon (2010). Thus, it is usually assumed that crime is a linear function of the explicative variables, i.e., crime rate explicative variables, where is a linear function Gordon (2010). In the context of our dataset, a naive approach to model homicides in Brazilian cities is to consider the following linear regression

(1) |

where the dependent variable is the number of homicides in the year 2010, is the number of homicides in 2000, is the -th () urban indicator in 2000, and are the linear coefficients, and is a random noise normally distributed (with zero mean and unitary variance) that accounts for unobserved determinants of homicides.

The linear regression of Eq. 1 requires the residuals to follow a normal distribution. However, this conditions is not satisfied as shown by the quantile-quantile plot in Fig. 1A, and confirmed by the Kolmogorov-Smirnov normality test (-value ). Usually, researchers account for this problem by applying some sort of transformation to data. This may eventually solve the normality problem, but collinearities may exist among urban indicators, and ignoring this fact can lead to controversial results, even when linear models display high predicting accuracies.

According to the urban scaling hypothesis Bettencourt et al. (2007, 2010); Alves et al. (2013b, 2015); Leitao et al. (2016); Hanley et al. (2016), the urban indicators in our dataset are dependent of the population size. Thus, each indicator can be written in terms of the population size as , where is the urban scaling exponent, a condition that introduces correlations among all indicators. This property in regression models is called multicollinearity of the variables, and we can check for that in our data by evaluating the Pearson correlation coefficient among all urban indicators, as shown in Fig. 1B. From this correlation matrix, we verify that there are significant correlations practically between all pair of variables, violating another condition assumed by most regression models.

If we ignore the assumptions required by linear regression models and perform the least square fitting, we could eventually find good predictions. In fact, the linear model of Eq. 1 explains of the variance in our dataset. It is an appealing result since it is easy to interpret the coefficients of a linear model. However, the feature importance quantified by regression models can be very sensitive to changes in the dataset, such as by considering undersampling or bootstrapping methods Efron and Tibshirani (1994). For instance, by bootstrapping our dataset and applying the the linear model of Eq. 1, we find that (depending on the sample used) city’s income can be positively, negatively, or even uncorrelated with homicides. These facts could explain some of the inconsistencies reported in the literature about crime. For instance, Entorf et al. Entorf and Spengler (2000) found that higher income and urbanization are related to higher crime rates, whereas Fatjnzylber et al. Fajnzylber et al. (2002) claimed that average income is not correlated with violent crime, but higher urbanization is associated with higher robbery rates, and not with homicides. The same disagreements happen for unemployment, punishment and deterrence, among others indicators Gordon et al. (2009); Alves et al. (2013b, 2015). We could further explore and check for other inconsistencies in this linear regression approach. Nevertheless, we now focus on an alternative approach to overcome these limitations and having a meaningful rank of features under slight changes in the dataset.

### 2.3 Random forest algorithm

Random forest is an ensemble learning method used for classification or regression that fits several decision trees using various sub-samples of the dataset and aggregate their individual predictions to form a final output and reduce overfitting Breiman (1996, 2001); Hastie et al. (2013); James et al. (2014); Death and Fabricius (2000). In the random forest regression, several independent decision trees are constructed by a bootstrap sampling of the dataset. The final output is the result of a majority voting among the trees (estimators). This process of “bagging features” selects more often the best metrics to describe the data splits, and consequently, make them more important on the majority voting Ho (1998). Unlike usual linear models, the random forest is invariant under scaling and various other transformations of the feature values. It is also robust to the inclusion of irrelevant features and produces very accurate predictions Hastie et al. (2013). These properties of the random forest algorithm make it especially suitable for crime forecasting, because of the multicollinearity and non-linearities present in urban data.

One common question that arises when using machine learning algorithms is related to underfitting and overfitting. These behaviors appear when estimating the best trade-off that minimizes the bias and variance errors James et al. (2014). Bias is related to errors in the assumptions made by the algorithm that causes the model to miss relevant information about the features describing the data. In other words, the model is too simple to describe the data behavior, what is called underfitting. On the other hand, if we increase the complexity of the model by adding a great number of parameters, the model can become quite sensitive to the noise of the training set, causing an increase in the variance of the outputs, what is called overfitting. Random forest has two main parameters controlling the trade-off between bias and variance errors: the number of trees and the maximum depth of each tree. If a forest has only a few trees with a depth that goes to few layers of nodes, we are likely underfitting the data. However, if there are too many trees splitting the data in a lot of nodes to make the decisions, there is a chance that we are making choices based on the noise of the data rather than the true underlying behavior.

To determine the set of parameters which avoid underfitting and overfitting of data, we use the stratified -fold cross-validation Kohavi et al. (1995); Hastie et al. (2013) with for estimating the validation curves for a range of values of the parameter number of trees (Fig. 2A) and maximum depth (Fig. 2B). If the number of trees and maximum depth are both slightly smaller than 10, we find that the training score and validation score are small. As we increase the number of trees and maximum depth, both training and cross-validation scores increase and reach constant plateaus, which indicates that up to trees or maximum depth equal to there is no overfitting.

Another question is about how much data we need to train our model. This is a relevant question because we can eventually introduce more noise to the model by adding unnecessary data. To answer this question, we apply again the stratified fold cross-validation with to estimate the learning curves for a range of values of random fractions of the dataset (Fig. 2C). We observe that the more data used for training the model, the better the cross-validated scores. However, we already have good cross-validated scores with of the data.

### 2.4 Predicting crime with the random forest regressor

We have to tune the model by searching for the best combination of parameters that enhances the performance of the random forest algorithm. The previous analysis on the validation and learning curves help to build a grid of parameters to seek the combination that improves our accuracy. To find the best combination of trees and maximum depth of the trees, we use a grid-search algorithm with the stratified -fold cross-validation (with ) as implemented in the Python library scikit-learn Pedregosa et al. (2011). This procedure exhaustively searches for the best combination of parameters that optimizes the score from a specified grid of parameter values. For our data, we find that the best accuracy (on average) is achieved for 200 trees and maximum depth equal 100.

Having the model properly trained, we can now use the random forest regressor to make crime predictions. Randomly splitting the data into for training and for test, we obtain up to of accuracy in our predictions with an average adjusted equal to . Figure 3A shows the empirical data versus the random forest predictions for a realization (one data splitting) of the algorithm. Because the dataset is split randomly, different runs can return different outputs and different scores. Figure 3B depicts the probability distribution (computed via kernel density estimation method) of the adjusted for 100 different splits, where we observe that the values are concentrated around the peak at .

### 2.5 Features importance

The rank of features (urban indicators) importance is another important question for a better understanding crime. As previously discussed, usual linear regression models have found different answers to this question Gordon et al. (2009); Entorf and Spengler (2000); Fajnzylber et al. (2002). Here, we use the random forest algorithm to identify the most important urban indicators to predict crime and test whether this ranking of features is robust under slightly changes on the dataset.

The importance of a feature in a tree can be computed by accessing its contribution to the the decision process. In each node of a single tree, the variable that have the best improvement in the squared error risk is used to split the region associated with that node into two subregions Hastie et al. (2013). Thus, Breiman et al. Breiman et al. (1984) proposed that the relative importance of a feature in a tree can be calculated by the sum of the improvements over all internal nodes of the tree where is used for splitting. The generalization of this measure to the random forest is the average importance of the features over all trees in the model, as implemented in the Python library scikit-learn Pedregosa et al. (2011).

We use this metric to calculate the importance of the features describing the number of homicides in our data. Because of the slight modifications in the dataset caused by the splits in training and test samples, the feature importance varies for each realization of the algorithm. To verify whether these changes affect the importance rank of the features, we calculate the importance of the features for 100 different samples and compute a box-plot ranked by the median importance of the outputs returned by the different samples (Fig. 4A). We check whether the differences in importance are statistically significant by computing the values of the Student test (testing if two samples have identical average values) with Bonferroni corrections (that corrects the values for multiple comparisons Rupert Jr (2012)). The resulting values are shown in Fig. 4B and ranked by the median in correspondence with the box-plot. From this figure, we note that unemployment is the most important feature to describe crime, followed by illiteracy and male population. The next urban indicators in importance order form groups of indistinguishable importance (squares on the diagonal matrix of Fig. 4B), i.e., the fourth most important feature is actually a group that includes female population, population, and sanitation. This group of features is followed by other three groups: child labor and homicides (at the fifth position), traffic accidents and elderly population (at the sixth position), and suicides and income (at the seventh position). The least important feature turns out to be GDP of cities. It is worth remarking that despite some fluctuations in the rank caused by the different samples used to train the model, the ranking of features remains the same. This result demonstrate the robustness of the random forest algorithm for detecting the importance of features.

## 3 Discussion and conclusions

The accurate predictions obtained through statistical learning suggest that crime is quite dependent on urban indicators. The easy interpretation and good accuracies of the random forest algorithm show that this model is an excellent solution for predicting crime and identify the importance of features, even under small perturbations on the training dataset. We cannot assert from the rank of Fig. 4A which features have a positive and negative contribution for the number of homicides. We could try to decompose the contributions of each feature in each node of the trees, and calculate an average over all nodes and trees to identify the gradients, as described by Hastie et al. Hastie et al. (2013). However, this decomposition is problematic because features can contribute differently depending on thresholds imposed by the constraints of the sample chosen to train the model. Thus, the signals of the contributions of a particular feature can vary for different thresholds, even when average rank of importance remains the same Breiman et al. (1984).

Because unemployment has a very large variability at local level Levitt (2001), Blumstein Blumstein (2002) argue that different data aggregation can lead to different conclusions on whether this indicator affects crime or not. There is also evidence supporting the idea that crime is affected by unemployment only when this indicator exceeds a given threshold Alves et al. (2013b). This is somehow similar to what random forest does when separating the hyperplane of features by thresholds and classifying the number of homicides according to the different values of the urban indicators. The algorithm cannot indicate whether unemployment contributes positively to crime; however, it indicates that this indicator is the most important for describing crime among the set of 12 features in our dataset. Particularly, a recent work has shown that the raising of shooting in schools is related to unemployment rate across different geographic aggregation levels (national, regional and city) Pah et al. (2017), which is in agreement with our findings.

The second most important feature for describing crime is illiteracy, which has been associated with violence by other works Davis et al. (1999). Previous works on scaled-adjusted metrics have also shown that the levels of illiteracy are correlated to the number of homicides Alves et al. (2013b, 2015). A report made by the Canadian Police further shows that people with low literacy skills are less likely to involve in group activities than those with higher literacy skills Literacy and policing project of the Canadian association of chiefs of police (2008). Consequently, low literate people often feel isolated and vulnerable, making them more likely to involve in violence and crime Literacy and policing project of the Canadian association of chiefs of police (2008). The male population is the third most important feature for describing homicides, and has also been linked to high levels of violence Hesketh and Xing (2006); Alves et al. (2013b, 2015). As discussed by Hesketh and Xing Hesketh and Xing (2006), the surplus of male population increases the marginalization in society and is linked to antisocial behavior and violence. Finally, it is worth noting that unemployment, illiteracy, and male population together are responsible for explaining of the variance in our dataset.

We believe that the application of machine learning for identifying urban indicators that correlate with crime helps to settle the discussion about whether an indicator is important or not for describing a particular crime type. The results of our analysis can further be used as a guide for building other crime models and may help policymakers in the seek of better strategies for reducing crime. Indeed, our results indicate that unemployment and illiteracy levels play an important role in controlling the number of homicides in Brazil.

## Acknowledgments

L.G.A.A. acknowledges FAPESP (Grant No. 2016/16987-7) for financial support. H.V.R. acknowledges CNPq (Grant No. 440650/2014-3) for financial support. F.A.R. acknowledges CNPq (Grant No. 307748/2016-2) and FAPESP (Grant No. 2016/25682-5 and Grant No. 13/07375-0) for financial support.

## References

- Pentland (2014) A. Pentland, Social Physics: how good ideas spread â the lessons from a new science, EBL-Schweitzer, Scribe Publications Pty Limited, 2014.
- Castellano et al. (2009) C. Castellano, S. Fortunato, V. Loreto, Statistical physics of social dynamics, Reviews of modern physics 81 (2009) 591.
- D’Orsogna and Perc (2015) M. R. D’Orsogna, M. Perc, Statistical physics of crime: A review, Physics Of Life Reviews 12 (2015) 1–21.
- Quetlet (1869) A. Quetlet, Physique sociale, Bachelier, Paris, 1869.
- Galam (2012) S. Galam, Sociophysics: a physicist’s modeling of psycho-political phenomena, Springer Science & Business Media, 2012.
- Conte et al. (2012) R. Conte, N. Gilbert, G. Bonelli, C. Cioffi-Revilla, G. Deffuant, J. Kertesz, V. Loreto, S. Moat, J.-P. Nadal, A. Sanchez, et al., Manifesto of computational social science, European Physical Journal-Special Topics 214 (2012) p–325.
- Carrasquilla and Melko (2017) J. Carrasquilla, R. G. Melko, Machine learning phases of matter, Nature Physics (2017).
- Carleo and Troyer (2017) G. Carleo, M. Troyer, Solving the quantum many-body problem with artificial neural networks, Science 355 (2017) 602–606.
- van Nieuwenburg et al. (2017) E. P. van Nieuwenburg, Y.-H. Liu, S. D. Huber, Learning phase transitions by confusion, Nature Physics 13 (2017) 435–439.
- Châng et al. (2017) K. Châng, J. Carrasquilla, R. G. Melko, E. Khatami, Machine learning phases of strongly correlated fermions, Physical Review X 7 (2017) 031038.
- Zdeborová (2017) L. Zdeborová, Machine learning: New tool in the box, Nature Physics 13 (2017) 420–421.
- Gordon et al. (2009) M. B. Gordon, J. R. Iglesias, V. Semeshenko, J.-P. Nadal, Crime and punishment: the economic burden of impunity, The European Physical Journal B 68 (2009) 133–144.
- Kamaluddin et al. (2015) M. R. Kamaluddin, N. Shariff, A. Othman, K. H. Ismail, G. A. M. Saat, Linking psychological traits with criminal behaviour: A review, ASEAN Journal of Psychiatry 16 (2015) 13–25.
- Gottfredson and Hirschi (1990) M. R. Gottfredson, T. Hirschi, A General Theory Of Crime, Stanford University Press, 1990.
- Gamble and Hess (2012) J. L. Gamble, J. J. Hess, Temperature and violent crime in dallas, texas: relationships and implications of climate change, Western Journal Of Emergency Medicine 13 (2012) 239.
- Hsiang et al. (2013) S. M. Hsiang, M. Burke, E. Miguel, Quantifying the influence of climate on human conflict, Science 341 (2013) 1235367.
- Short et al. (2008) M. B. Short, M. R. D’Orsogna, V. B. Pasour, G. E. Tita, P. J. Brantingham, A. L. Bertozzi, L. B. Chayes, A statistical model of criminal behavior, Mathematical Models and Methods in Applied Sciences 18 (2008) 1249–1267.
- Alves et al. (2015) L. G. A. Alves, E. K. Lenzi, R. S. Mendes, H. V. Ribeiro, Spatial correlations, clustering and percolation-like transitions in homicide crimes, EPL 111 (2015) 18002.
- Becker (1968) G. S. Becker, Crime and punishment: An economic approach, in: The Economic Dimensions Of Crime, Springer, 1968, pp. 13–68.
- Ehrlich (1973) I. Ehrlich, The deterrent effect of capital punishment: A question of life and death, 1973.
- Wilson and Kelling (1982) J. Q. Wilson, G. L. Kelling, Broken windows, Atlantic Monthly 249 (1982) 29–38.
- Glaeser et al. (1996) E. L. Glaeser, B. Sacerdote, J. A. Scheinkman, Crime and social interactions, The Quarterly Journal of Economics 111 (1996) 507–548.
- Gordon (2010) M. B. Gordon, A random walk in the literature on criminality: A partial and critical view on some statistical analyses and modelling approaches, European Journal of Applied Mathematics 21 (2010) 283–306.
- Levitt (2001) S. D. Levitt, Alternative strategies for identifying the link between unemployment and crime, Journal Of Quantitative Criminology 17 (2001) 377–390.
- Spelman (2008) W. Spelman, Specifying the relationship between crime and prisons, Journal of Quantitative Criminology 24 (2008) 149–178.
- Maltz and Targonski (2002) M. D. Maltz, J. Targonski, A note on the use of county-level ucr data, Journal of Quantitative Criminology 18 (2002) 297–318.
- Raphael and Winter-Ebmer (2001) S. Raphael, R. Winter-Ebmer, Identifying the effect of unemployment on crime, The Journal of Law and Economics 44 (2001) 259–283.
- Kelly (2000) M. Kelly, Inequality and crime, The Review of Economics and Statistics 82 (2000) 530–539.
- Alves et al. (2015) L. G. A. Alves, R. S. Mendes, E. K. Lenzi, H. V. Ribeiro, Scale-adjusted metrics for predicting the evolution of urban indicators and quantifying the performance of cities, PLoS ONE 10 (2015) e0134862.
- Bettencourt et al. (2007) L. M. Bettencourt, J. Lobo, D. Helbing, C. Kühnert, G. B. West, Growth, innovation, scaling, and the pace of life in cities, Proceedings Of The National Academy Of Sciences 104 (2007) 7301–7306.
- Bettencourt et al. (2010) L. M. Bettencourt, J. Lobo, D. Strumsky, G. B. West, Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities, PLoS ONE 5 (2010) e13541.
- Alves et al. (2013a) L. G. A. Alves, H. V. Ribeiro, R. S. Mendes, Scaling laws in the dynamics of crime growth rate, Physica A 392 (2013a) 2672–2679.
- Alves et al. (2013b) L. G. A. Alves, H. V. Ribeiro, E. K. Lenzi, R. S. Mendes, Distance to the scaling law: a useful approach for unveiling relationships between crime and urban metrics, PLoS ONE 8 (2013b) e69580.
- Alves et al. (2014) L. G. A. Alves, H. V. Ribeiro, E. K. Lenzi, R. S. Mendes, Empirical analysis on the connection between power-law distributions and allometries for urban indicators, Physica A 409 (2014) 175–182.
- Hanley et al. (2016) Q. S. Hanley, D. Lewis, H. V. Ribeiro, Rural to urban population density scaling of crime and property transactions in english and welsh parliamentary constituencies, PLoS ONE 11 (2016) e0149546.
- Leitao et al. (2016) J. C. Leitao, J. M. Miotto, M. Gerlach, E. G. Altmann, Is this scaling nonlinear?, Royal Society Open Science 3 (2016) 150649.
- Marsili and Zhang (1998) M. Marsili, Y.-C. Zhang, Interacting individuals leading to zipf’s law, Physical Review Letters 80 (1998) 2741.
- Davidson and MacKinnon (1993) R. Davidson, J. G. MacKinnon, Estimation and inference in econometrics, JSTOR, 1993.
- Kang and Kang (2017) H.-W. Kang, H.-B. Kang, Prediction of crime occurrence from multi-modal data using deep learning, PLoS ONE 12 (2017) e0176244.
- Breiman (2003) L. Breiman, Statistical modeling: The two cultures, Quality Control And Applied Statistics 48 (2003) 81–82.
- Breiman (2001) L. Breiman, Random forests, Machine Learning 45 (2001) 5–32.
- Hastie et al. (2013) T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: Data mining, inference, and prediction, Springer Series in Statistics, Springer New York, 2013.
- James et al. (2014) G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics, Springer New York, 2014.
- Death and Fabricius (2000) G. Death, K. E. Fabricius, Classification and regression trees: a powerful yet simple technique for ecological data analysis, Ecology 81 (2000) 3178–3192.
- Brazilâs Public healthcare System (SUS) (2017) Brazilâs Public healthcare System (SUS), Department of Data Processing (DATASUS), 2017. Accessed: 2017-06-01.
- Efron and Tibshirani (1994) B. Efron, R. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994.
- Entorf and Spengler (2000) H. Entorf, H. Spengler, Socioeconomic and demographic factors of crime in germany: Evidence from panel data of the german states, International Review Of Law And Economics 20 (2000) 75–106.
- Fajnzylber et al. (2002) P. Fajnzylber, D. Lederman, N. Loayza, What causes violent crime?, European Economic Review 46 (2002) 1323–1357.
- Breiman (1996) L. Breiman, Bagging predictors, Machine Learning 24 (1996) 123–140.
- Ho (1998) T. K. Ho, The random subspace method for constructing decision forests, IEEE Transactions On Pattern Analysis And Machine Intelligence 20 (1998) 832–844.
- Kohavi et al. (1995) R. Kohavi, et al., A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Ijcai, volume 14, Stanford, CA, pp. 1137–1145.
- Pedregosa et al. (2011) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.
- Breiman et al. (1984) L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, CRC press, 1984.
- Rupert Jr (2012) G. M. Rupert Jr, Simultaneous statistical inference, Springer Science & Business Media, 2012.
- Blumstein (2002) A. Blumstein, Crime modeling, Operations Research 50 (2002) 16–24.
- Pah et al. (2017) A. Pah, J. Hagan, A. Jennings, A. Jain, K. Albrecht, A. Hockenberry, L. Amaral, Economic insecurity and the rise in gun violence at us schools, Nature Human Behaviour 1 (2017) 0040.
- Davis et al. (1999) T. C. Davis, R. S. Byrd, C. L. Arnold, P. Auinger, J. A. Bocchini, Low literacy and violence among adolescents in a summer sports program, Journal of Adolescent Health 24 (1999) 403–411.
- Literacy and policing project of the Canadian association of chiefs of police (2008) Literacy and policing project of the Canadian association of chiefs of police, Literacy awareness resource manual for police, 2008. Accessed: 2017-06-01.
- Hesketh and Xing (2006) T. Hesketh, Z. W. Xing, Abnormal sex ratios in human populations: causes and consequences, Proceedings of the National Academy of Sciences 103 (2006) 13271–13275.