A probabilistic atlas for cell identification
Abstract
We propose a general framework for a collaborative machine learning system to assist bioscience researchers with the task of labeling specific cell identities from microscopic still or video imaging. The distinguishing features of this approach versus prior approaches include: (1) use of a statistical model of cell features that is iteratively improved, (2) generation of probabilistic guesses at cell ID rather than single bestguesses for each cell, (3) tracking of joint probabilities of features within and across cells, and (4) ability to exploit multimodal features, such as cell position, morphology, reporter intensities, and activity. We provide an example implementation of such a system applicable to labeling fluorescently tagged C. elegans neurons. As a proof of concept, we use a generative springmass model to simulate sequences of cell imaging datasets with variable cell positions and fluorescence intensities. Training on synthetic data, we find that atlases that track intercell positional correlations give higher labeling accuracies than those that treat cell positions independently. Tracking an additional feature type, fluorescence intensity, boosts accuracy relative to a positiononly atlas, suggesting that multiple cell features could be leveraged to improve automated label predictions.
I Introduction
The development of effective fluorescent reporters, increase in computing power, and proliferation of volumetric microscopy techniques are now enabling acquisition of large tissue volumes or entire organisms, within which many individual cells can be discerned (Ahrens et al. 2013, Kato et al. 2015, Prevedel et al. 2014, Skocek et al. 2018). With the explosion in data collection capabilities and throughput, the time demand for the experimentalist is shifting from data acquisition to postacquisition data analysis.
Common postacquisition analyses are to (1) find and delineate these cells in space and, for video recordings, in time, often collectively referred to as regionofinterest (ROI) detection, segmentation, and tracking, (2) extract static or timeseries data of fluorescence intensity arising from calcium dynamics or other cellular processes, and (3) determine the identity or type of the delineated cells, in order to fuse cellularresolution data across trials and animals (“intertrial registration”) and pave the way for statistically powerful analysis. This last step is the focus of this paper.
In eutelic organisms or certain experimental contexts in noneutelic organisms, cell identity labels may be unique within a recording (“cell ID”); in other contexts, the labels may be nonunique (“cell type”). Here, we focus on the former problem, typified by volumetric recordings of fluorescently tagged neurons of the brain of the nematode C. elegans hermaphrodite, which is composed of precisely 302 neurons. Manual identification based on comparison to previously labeled images or a traditional reference atlas is extremely laborious, and there currently does not exist a widely used method for automated cell identification in this context. We propose a robust and practical approach for automated cell ID.
Ii Design Features
We outline what we believe to be necessary design requirements as follows:
Statistical model. A canonical reference atlas of anatomy is a timeworn pedagogical device, but well appreciated to neglect inherent biological variability. A more informative atlas – derived from experimental data across multiple trials – should maintain a record of the biological variability of anatomical or physiological features (Mazziotta et al. 2001, Essen and Dierker 2007, Toga et al. 2006). Specifically, the atlas should maintain a probability distribution for each atlas feature or, to capture relationships between these features, a joint probability distribution over the set of atlas features.
Incrementally trained with partially labeled data. We would like the quality of our atlas to improve as each new training dataset is contributed to the system, as measured by a declining error rate and increasing confidence levels of labeling. Furthermore, training sets are likely to lack complete labeling, possibly severely so, because manual identification of even a subset of neurons is extremely laborious and unreliable.
Probabilistic output. Earlier approaches to produce algorithms for cell identification, such as those based on bipartite graph matching, were prone to cascading errors where the mislabeling of one cell could lead to the mislabeling of many other cells, prompting the exploration of probabilistic approaches (Long et al. 2009, Kainmueller et al. 2014, Wu et al. 2018). It is also probably unreasonable to expect a perfect automatic labeling for any large dataset due to the noise and variability in all imaging of biological systems, and lack of perfect knowledge in realworld experimental scenarios. Therefore, a system which can report label guesses with marginal probabilities, which reflect confidence of labeling, is of more practical use (Figure 1).
Acceptance of datasets with missing or erroneous data. Variability in development, reporter expression patterns, feature extraction algorithms, and imaging contexts may result in incoming datasets having varying numbers of cells to be identified. To be generally applicable, identification should be possible on ROI sets that correspond to both subsets and supersets of atlas cells.
Flexibility in exploiting available cell properties. Cells have many observable features which may covary within and between cells, including position, size, shape, fluorescence intensity, and time series (Table 1). We want the atlas to leverage this covariance structure to ID cells with increased confidence.
Community driven. Each experimental context may require its own atlas depending on the characteristics of the data; however, there are likely to be several groups around the world working in similar contexts who need a cell ID system and who can, in return, contribute training datasets for the improvement of the atlas. The precise parameters of each experiment are likely to change, and the use of new datasets must be robust to these changes.
We recommend that this system be implemented as a cloud service for accessibility and to pool the efforts of researchers.
Data Type  Data Format  Dimension  
Single Frame 
Position  3  
Size  3  
Orientation  3  
Fluorescence intensity  ,…  
Expression pattern  fold(gene)  
Morphology  voxel set  
Video 
Fluorescence time series  
Spacetime trajectory 
Iii System Description
We now describe a system that satisfies the aforementioned design requirements.
A sequence of datasets, possibly contributed by different users, is processed in the cloud one at a time by the system (Figure 2). We assume that a dataset arrives postROI detection (and post time series extraction for a video) as a set of unlabeled feature records, one for each cell. For ease of reference, visualization, and proof checking by end users, datasets may be accompanied by the original raw image stack data, although the atlas does not make use of this raw data.
To enable effective atlas learning, datasets must first be registered to match the coordinate system of the reference atlas without the benefit of labels to match particular cells from dataset to particular cells in the atlas. This registration operation may involve global transformations of the dataset, such as scaling, translation, and rotation, as well more localized transformations such as volumetric warping to correct for tissue deformation. Registration of other feature types, such as fluorescence intensity, may involve other approaches such as histogram equalization or color space transformation.
After registration, labels must be assigned to each cell in the dataset. Label assignment across all cells in a dataset is a combinatorial optimization problem, in which an objective function representing likelihood is maximized. Estimated marginal probabilities of individual cell labelings are reported to the user (Figure 1), along with accompanying visualization data for ease of validation.
A curation step, either manual or automated, follows: if the quality the dataset and success of labeling is acceptable, then the labeled dataset, possibly in part or in whole, will be used for a training update to the atlas. In the early stages of atlas training, manual curation will likely be required to produce a useful system for early users. In the future, partial or weightedimportance acceptance of datasets could replace this binary acceptance procedure.
Iv Mathematical Setting
We follow a sequential Bayesian updating approach. We define a probabilistic atlas as a set of hyperparameters that describe a joint probability distribution of cell parameters , along with the selection of probability distribution function family :
(1) 
with being the number of atlas cells.
Each dataset comes into the system as an unrolled, concatenated feature vector () where contains features for ROI (i.e. unlabeled cell) . At present we only consider the case of . For each , global registration (spatial and/or in other feature spaces) is performed between dataset and atlas features, which produces a transformed dataset :
(2) 
where parameter vector is the result of optimizing over an objective function such as leastsquares error to nearest neighbor. Without such global registration, direct application of probability models with location parameters is likely to fail. An example algorithm for spatial registration is coherent point drift (CPD) (Myronenko and Song 2010). We exclude these transformation parameters from our atlas hyperparameters as they are likely to be dependent on the nature of particular datasets, though in the future a supervised learning approach could be applied to the registration task.
After spatial registration, a putative labeling , defined as a mapping from atlas cells to ROIs with indicating that atlas cell is matched to ROI , is determined by maximizing likelihood over possible label permutations. This is a combinatorial search problem, and our strategies to solve it are discussed later in section VI.
For a transformed, labeled dataset that passes curation, denoted by , a sequential Bayesian update of the hyperparameters is performed:
(3) 
where the form of is determined by solving the Bayes’ theorem equation
(4) 
under the appropriate choice of conjugate prior form , which is in turn based on the choice of distribution family (Murphy 2012). The initial values of hyperparameters may be drawn from other data sources or based on uninformative or reference priors.
We consider three types of probabilistic atlas of increasing expressivity: univariate (UV), cell multivariate (CMV), and full multivariate (FMV). For ease of illustration, we use the normal distribution family, although it is not a requirement of the atlas.
Univariate model (UV). In this case, we assume that the probability distributions of each cell parameter are independent and identically distributed (iid). Each is tracked with a normal distribution, with unknown mean and unknown variance . Estimates of the true and can be obtained from sampled data using the normalinversegamma (NIG) conjugate prior with hyperparameters where and are priors for the mean and variance (up to a scaling factor); and and are scalars representing the strength of belief of the respective priors.
Cell multivariate model (CMV). In addition to tracking variability of individual features, we may wish to capture relationships of different feature types within each cell, such as correlation between the , , and coordinates of the cell. We still ignore intercell correlation and assume that the cell models are iid. In this case, the probability distribution for each cell is , where is a vector of means for each parameter, and is the covariance matrix of all features of a single cell, both unknown.
Estimates of and can be obtained from sample data using the normalinverseWishart (NIW) conjugate prior with hyperparameters where and are priors for the mean and covariance (up to a scaling factor); and and are scalars representing the strength of belief of the respective priors.
Full multivariate model (FMV). In the full multivariate case, we drop the assumption of iid cells in order to capture intercell, interfeature correlations and therefore use a single multivariate model spanning all parameters. Using a normal distribution, we can employ the same NIW conjugate prior across the full feature set.
In all three cases, we update the atlas hyper parameters with new labeled datasets according to Equation 4. Our atlases then atlas models yield estimates of a multivariate normal distribution, , but with different learned covariance matrix structure: is diagonal, is block diagonal with each block modeling intracell feature correlations, and has no a priori block diagonal structure.
Labeling. A putative labeling is scored using a cost function defined by the negative log likelihood as a function of atlas parameters:
(5) 
For our models, is given by the multivariate StudentT distribution (Murphy 2012). Any convenient cost function that is a monotonic function of the negative log likelihood is also suitable, such as
(6) 
where are the maximum a posteriori parameters from the atlas posterior (4).
Probabilistic output. The model output would ideally be the marginal probabilities of ROI having cell label evaluated over all possible labelings:
(7) 
where , and is the marginal distribution, i.e. integrated over . The factorial scaling, however, makes this full evaluation intractable for more than a small number of cells. We presume, without proof, that for a wellfitting model, the terms in the summation for are dominated by a comparatively small number of lowcost (high probability) labelings such that a partial summation
(8) 
where is analogous to the normalization above, is a reasonable estimate to (7). Note that in the =1 limiting case, with a single bestguess labeling , we recover the usual nonprobabilistic labeling . In our probabilistic scheme, the cost global minimum (GM) has the most statistical weight in (7) and (8), and it is therefore essential to have an algorithm capable of finding it. However, in addition, we seek to sample other lowcost labelings such that the partial summation (8) can be used to approximate the fully marginalized probabilities (7).
V Proofofconcept atlas: springmass model
We illustrate the construction and use of a probabilistic atlas for synthetic data. We generated a sequence of frames by simulating the dynamics of a springmass system, with each frame representing a new incoming dataset. This model is intended to capture the spatial variability of cells across experiments. The physical behavior of the toy model loosely models the phenomenon of nonoverlapping cells jostling within a confined volume with correlated motion due to deformations of the surrounding tissue. The cell feature set is limited to , , and positions. In this model, the groundtruth labeling () is known and the system’s behavior is tunable, providing a convenient testbed. Note that in this proofofconcept atlas, we assume each training frame is accurately labeled, and contains the same number of ROIs as atlas cells.
The generative toy model comprises point masses in 2D or 3D space, representing cell centroids, connected by damped springs. To test different aspects of the atlas labeling scheme, we generated toy systems with 10, 50, 100, and 302 cells with varying positions and connectivity. The cells were embedded in different environments via spring connections to flexible “matrix” points and fixed “anchor” points.
The system is 2D and sufficiently small such that all labelings can be enumerated, thereby allowing comparison between the approximate (8) and exact (7) marginal probability outputs. The cells’ coordinates were drawn from uniform distributions that allow for significant overlap to test both atlas quality and probabilistic outputs.
The and system (Figure 3A,B and Supplementary Movies 12) are 2D and exhibit largeamplitude correlated shifts in cells’ positions. The initial coordinates were generated by a branched growth scheme to give a lumpy but compact distribution of points in space.
The system (Figure 3C and Supplementary Movie 3) is based partially on the empirical neuroanatomy of C. elegans. A set of 134 head neuron 3D positions derived from (White et al. 1986) were copied and translated on the axis (corresponding to the worm headtail axis) to give a system of 302 cells with high aspect ratio. This system also incorporates regularly spaced anchor points as well as randomly positioned matrix particles which are bound to both the cells and the anchors, mimicking the influence of surrounding tissue.
The springmass systems were simulated via Langevin dynamics (Schlick 2010, see Methods). The particles’ , , and coordinates were snapshotted at fixed time intervals to reduce correlations between timeadjacent frames, and an initial segment of each trajectory was discarded to allow for equilibration of the dynamical behavior. Independent simulations, with different random seeds, were used for atlas training and labeling validation.
Vi Labeling algorithm
To perform atlas guided cell labeling we used the cost function (6), derived from the atlas (UV, CMV, FMV) and an algorithm to search the combinatorial space of labelings. Depending on the atlas model, different schemes are available to solve the combinatorial labeling problem. For the UV and CMV atlas models, the probability factorizes and the cost function becomes a sum of single cell assignment costs,
(9)  
In these cases, the task to find equates to the wellknown linear assignment problem (LAP). Deterministic LAP solvers, namely the Hungarian algorithm (HU) (Kuhn 1955), can reach the optimal solution in polynomial time. This straightforward scheme, however, does not guarantee that the result is the correct labeling , only that it is the optimum, given the cost function. Figure 4 illustrates how (for a CMV atlas) can contain assignment errors that ‘cascade’ throughout the structure.
These issues may be addressable with a full multivariate model (FMV) that captures correlations of features between cells. Unfortunately, the cost function for the FMV model does not factorize; thus the assignment problem is nonlinear. To approach this combinatorial optimization problem (GM search), we use a population cost minimization (PM) scheme that uses greedy and stochastic minimization steps as well as genetic algorithmlike mating operations. Nonbipartite (‘illegal’) assignments – where one ROI may hold two or more atlas cell labels, i.e. , and other ROIs are unlabeled – can be generated as intermediates in this procedure, but they are ‘legalized’ by resolving missing and duplicate labels.
The PM scheme is carried out in cycles, where one cycle takes a population of labelings , performs intermediate steps, and returns a new population . Figure 5 illustrates one cycle (panel A) and its constituent labeling moves (panel B). Each cycle starts (step I) by creating a set of 32 new labelings, , either by mating pairs or copying an individual from (with mating probability 50%). The labelings in then go through steps IIIV in series before the results are pooled (step V) and culled (step VI). The different intermediate steps/moves are summarized as follows:

Mating: A child labeling is created where each element is drawn with a 50:50 chance from either or . This introduces randomization and can potentially combine correct labels from disparate parent labelings, but can also create illegal assignments.

Pairwise swaps (greedy): For each atlas index (order shuffled) of labeling , the pairwise swap where that gives the lowest cost (including the null swap) is selected. This step tests all pairwise swaps, requiring cost function calls, assuming no heuristics are applied to winnow attempted swaps.

Stochastic flips: One label is flipped, , where and are both randomly drawn and the flip is only accepted only if cost decreases (1000 attempts per labeling).

Legalization: For a labeling that is ‘illegal’ due to mating or stochastic flips, for each (shuffled order) if , is chosen from the set of unused ROI indices such that gives the lowest cost. Legalization often increases cost and therefore does not directly drive minimization, but it converts illegal labelings back to legal ones so that the population of legal assignments can advance.

Pooling: Pooling combines with the labelings returned from steps IIV into and removes any duplicates.

Culling: Culling is carried out on using a modified cost function: , where =4 and is the number of unique in . The added term does not influence legal assignment costs () but adds a growing penalty for reused in (’illegality’) in order to prevent exceedingly nonphysical, lowcost labelings from misdirecting the population. With the pool ranked by , is built from the set union of: i) the 24 best labelings (illegal or legal) and ii) the 12 best legal labelings.

Iterating: (Optional) The next cycle is carried out with .
The PM scheme was seeded with and the naive Bayes labeling (where minimizes the linear cost sum Equation 9 irrespective of legality) from the corresponding CMV model. Only legal labelings are considered valid output, e.g. for use in Equation 7 or Equation 8.
Results. Here, we compare the GM labelings , found using the different models (UV/CMV/FMV) and labeling schemes (HU/PM). For the UV/CMV models, is guaranteed to be , whereas the PM search using the FMV model is not guaranteed to find the GM. In the latter case, the best found labelings are considered putative global minima.
To quantify the accuracy of a labeling relative to we use two metrics
(10)  
which are the number of correct labels and a test for an exact match to , respectively. When averaged for a set of validation frames, is the mean labeling accuracy and is the GT hit fraction.
In Figure 6 we plot , (the least accurate labeling in the entire validation set), and as a function of how many frames were used for atlas training. For the systems, the FMV model has equal or superior performance to the UV and CMV models. This shows that the correlations between cells’ positions, learned by FMV model, carry additional, useful information for cell labeling. The gain in accuracy is most evident for , reflecting the fact that the UV and CMV models are susceptible, on occasion, to large assignment errors that cascade through a structure, whereas the FMV model is not. Figure 4C shows one such case where the CMVHU labeling has 81/100. Using the FMV model and PM algorithm, however, the population reaches lower cost labelings and eventually finds after 11 cycles. Despite the high mean accuracy, , for the system with a welltrained atlas, the fact that suggests that perfect labeling accuracy remains a challenging goal for an atlas using only position data.
For , the FMV model requires a considerable amount of training, >4000 frames, before it outperforms the CMV and UV models on all three accuracy metrics. The high training cost for the FMV model is partly attributable to the need to learn a covariance matrix for 302 cells in 3D, whereas the 2D system has a covariance matrix. Moreover, unlike the 50 and 100 systems where the cells form a compact cluster, the cells in the system form many smaller clusters that are weakly coupled. In this latter case, it is to be expected that learning the highdimensional covariance structure requires more training data. If the covariance for real data is similarly highdimensional, this suggests that a FMV atlas will require a large amount of training data, spanning multiple experiments.
For the FMV results reported above, the PM scheme always found a legal labeling with , which suggests that the combinatorial search is tractable for . As Figure 6D shows, this typically required 20 or fewer cycles; however, in a handful of cases (<1%) with very inaccurate seed labelings, upwards of 200 cycles were required. Each PM cycle for the =302 system required around three minutes on a single cpu core (Intel i97940x).
Vii Generation of probabilistic output
We turn to the task of probabilistic output, as in Figure 1, computed using labelings found during the PM global minimum search or, for the system, computed using all labelings.
In the toy system with 10 point masses, we constructed a matrix of marginal probabilities by fully evaluating labelings and comparing them to marginal probabilities estimated from only the 10 lowest cost labelings (Figure 7). While increasing the number of labelings evaluated does provide a more complete picture of the model output (Figure 7B), a majority of the have values less than and do not significantly improve our output (Figure 7C). In this illustrative example, sampling as few as 8 lowcost labelings yields a max error of 0.01%, which increases our confidence in our reporting of truncated marginal probabilities (Figure 7D).
Figure 8 shows one case for a frame of the system where a probabilistic labeling report was computed using Equation 8. In this case, both the CMVHU and FMVPM labelings matched , but a group of neighboring cells (70, 85, 90) have shifted positions relative to their atlas means such that the labeling looks questionable. Without knowing , the table in Figure 8C captures the uncertainty in the labeling.
Viii Multifeature atlas
Lastly, we tested what labeling accuracy might be achievable using an atlas with two feature types, position and cell fluorescence intensity. If cells’ fluorescence intensities have a consistent dependence on cell identity across animals, are spatially uncorrelated, and not too noisy, they could help distinguish neighboring cells and improve labeling performance (Figure 9D). For this test, we reused the =50 system simulations and added a fluorescence intensity to each cell. Intensity was drawn from a uniform distribution across cells with Gaussian noise added to each frame, as shown in Figure 9E.
In the welltrained limit, all three accuracy metrics , , increase with the addition of the intensity feature. For the UV and CMV models, the considerable increase in shows that large misassignment errors are prevented. The GT hit fractions also increase by .
The FMV model is inaccurate when only 1040 training frames are used. We attribute this to the model initially learning spurious correlations between cell positions and fluorescence intensities. With sufficient training, however, these correlations fade and the FMV model outperforms the UV and CMV models. With 640 or more training frames, the FMV model has near perfect labeling accuracy: 247/250 validation frames are perfectly labeled and the other 3/250 have a single erroneous pairwise flip.
This test suggests that an extra feature uncorrelated to other features could substantially increase atlas labeling performance for real data. Multifeature datasets could also be used to produce labeled datasets to train a positiononly atlas for other experimental contexts lacking intensity data. We suspect that the addition of additional fluorescence intensity channels could further increase prediction accuracy.
Ix Discussion and challenges
We have proposed a machine learning framework to train a probabilistic cell atlas for use in computing probabilistic cell labelings for video or still imaging data. We choose a standard Bayesian parametric probability model, although nonparametric and nonBayesion approaches could be considered as long as they generate a probabilistic labeling. We used synthetic models of pointmasses connected by springs to simulate the biological variability of cells across organisms and experiments, and the collective movements caused by deformations of the surrounding tissue. Using this dataset, we compared different probabilistic atlas models (UV, CMV, FMV) and labeling schemes (HU, PM) for their labeling accuracy.
We found that a welltrained FMV atlas that models the covariance between cell positions outperforms more simplistic models (UV/CMV) that do not track intercell covariance. Standard linear assignment solvers could not be used with the FMV model, due to its nonadditive cost function; therefore we developed an algorithm (PM) to carry out the combinatorial labeling search for the FMV atlas.
Using an atlas trained only with cell positions, we found that the single bestguess labeling was not reliably accurate. However, an atlas that also tracked fluorescence intensity yielded an increase in labeling accuracy, suggesting that utilization of multiple cell features may yield significantly higher performance for real data.
The limited predictive accuracy of a single labeling permutation suggests the need for a probabilistic output that reports label guesses and their uncertainties. We illustrated a means to compute such a probabilistic labeling output. The accuracy of this probabilistic output depends on the extent to which the lowcost labelings (including ) are identified and how well the atlas has been trained. For the synthetic case, probabilistic outputs considering the 8 lowest cost labelings were within <0.01% error of outputs evaluated on all labelings. This suggests but does not prove that a reasonably accurate report for the larger systems should not require an intractable amount of combinatorial sampling. For the larger system, the probabilistic assignments as in Figure 8 might improve from a more comprehensive search for other lowcost labelings in addition to those collected during the PM search. Ensemble sampling methods such as WangLandau sampling (Wang and Landau 2001) or simulated tempering (Marinari and Parisi 1992) could potentially be applied.
In the current scheme, our model is trained on perfectly labeled frames containing the same number of ROIs as atlas labels. We anticipate this not to be the case with data taken from real experiments, which can be expected to have both missing and falsepositive ROIs, as well as partial and erroneous labels. Partially supervised learning is an active area of machine learning research.
Furthermore, we assumed that our training data was preregistered into a common coordinate system. In practice, depending on experimental context, datasets will need to be to preprocessed in a number of ways, each of which could influence labeling outcomes. Achieving high quality registration is an open challenge.
There is a large computational cost of the combinatorial search over labelings and evaluation of multivariate probability models. This cost could be reduced by building a more restricted model tailored to a particular system that leverages locality in physical or feature space. This may allow the use of partial multivariate models that require fewer computations to train and evaluate, as well as efficient means of combinatorial search of labelings. Additionally, approaches based on relaxing the combinatorial search over discrete labelings into a continuous optimization problem (Linderman et al. 2017, Mena et al. 2018) may yield computationally efficient probabilistic labeling algorithms.
The selection of feature vectors and probability distribution family are crucial design decisions of the atlas curator; therefore, we advocate storage of datasets along with the parameterized distribution so that more sophisticated models can be tried later or formal model selection can be performed.
The general probabilistic scheme described here can also be applied to cell typing, rather than cell ID, although the choice of feature vectors, probability distribution families, and labeling algorithms is likely to be different.
Methods
Each spring mass system consisted of points with cartesian coordinates connected by springs. Given initial coordinates , spring connections were generated to give each point a minimum connectivity, to its nearest neighbors. The system’s potential energy is a sum over connected pairs (springs, ):
(11) 
where and is a spring’s relaxed distance taken from (here and index the pointmasses within a simulation, and are not to be confused with the previous labeling notation). The exponential term mimics short range intercell repulsion and prevents unphysical overlap between connected points. The potential energy parameters for the different toy systems are given in Table II.
Springmass system dynamics were integrated using the Langevin equation
(12) 
and a RungeKutta 4th order integrator with a timestep of =0.1, where is a stationary gaussian process, all masses are set to unity, is the reduced temperature, and =0.2 provides damping. The system coordinates, snapshotted every 200 integration steps, were used as input for atlas training and validation.
50  1.6  10  20  2  7 

100  1.6  10  20  2  7 
302  1.7  0.05  20  0.2  8 
All code was implemented in Python and run on a Ubuntu 18.04LTS laptop (Intel i77560U) or Windows 10 Home desktop (Intel i97940x) and will be available on the focolab.org code repository. The molecular visualization package PyMOL (Schrödinger, LLC 2015) was used for visualization of springmass simulations.
Acknowledgments
We thank Sean Escola and Raymond L. Dunn for valuable discussion. This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM124735 and UCSF grant PBBRNFR7028580. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
References
 Ahrens et al. [2013] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, and P. J. Keller. Wholebrain functional imaging at cellular resolution using lightsheet microscopy. Nature Methods, 10:413–420, 2013.
 Essen and Dierker [2007] D. C. V. Essen and D. L. Dierker. Surfacebased and probabilistic atlases of primate cerebral cortex. Neuron, 56(2):209 – 225, 2007.
 Kainmueller et al. [2014] D. Kainmueller, F. Jug, C. Rother, and G. Myers. Active graph matching for automatic joint segmentation and annotation of C. elegans, Medical Image Computing and ComputerAssisted Intervention – MICCAI 2014, pages 81–88, Cham, 2014. Springer International Publishing.
 Kato et al. [2015] S. Kato, H. S. Kaplan, T. Schrödel, S. Skora, T. H. Lindsay, E. Yemini, S. Lockery, and M. Zimmer. Global brain dynamics embed the motor command sequence of Caenorhabditis elegans. Cell, 163(3):656–669, 2015.
 Kuhn [1955] H. Kuhn. The Hungarian method for the assignment problem. Naval Res. Logist. Quart., 2:83–98, 1955.
 Linderman et al. [2017] S. Linderman, G. Mena, H. Cooper, L. Paninski, and J. Cunningham. Reparameterizing the birkhoff polytope for variational permutation inference. arXiv:1710.09508, 2017.
 Long et al. [2009] F. Long, H. Peng, X. Liu, S. K. Kim, and E. Myers. A 3D digital atlas of C. elegans and its application to singlecell analyses. Nature Methods, 6:667–672, 2009.
 Marinari and Parisi [1992] E. Marinari and G. Parisi. Simulated tempering: A new monte carlo scheme. Europhysics Letters (EPL), 19(6):451–458, 1992.
 Mazziotta et al. [2001] J. Mazziotta, A. Toga, A. Evans, P. Fox, others, K. Zilles, R. Woods, T. Paus, G. Simpson, B. Pike, C. Holmes, L. Collins, P. Thompson, D. MacDonald, M. Iacoboni, T. Schormann, K. Amunts, N. PalomeroGallagher, S. Geyer, L. Parsons, K. Narr, N. Kabani, G. Le Goualher, D. Boomsma, T. Cannon, R. Kawashima, and B. Mazoyer. A probabilistic atlas and reference system for the human brain: International consortium for brain mapping (icbm). Philos Trans R Soc Lond B Biol Sci, 356(1412):1293–1322, 2001.
 Mena et al. [2018] G. Mena, D. Belanger, S. Linderman, and J. Snoek. Learning Latent Permutations with GumbelSinkhorn Networks. arXiv:1802.08665, 2018.
 Murphy [2012] K. P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
 Myronenko and Song [2010] A. Myronenko and X. Song. Point Set Registration: Coherent Point Drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262–2275, 2010.
 Prevedel et al. [2014] R. Prevedel, Y.G. Yoon, M. Hoffmann, N. Pak, G. Wetzstein, S. Kato, T. Schrödel, R. Raskar, M. Zimmer, E. S. Boyden, and A. Vaziri. Simultaneous wholeanimal 3d imaging of neuronal activity using lightfield microscopy. Nature Methods, 11:727 –730, 2014.
 Schlick [2010] T. Schlick. Molecular modeling and simulation: an interdisciplinary guide: an interdisciplinary guide, volume 21. Springer Science & Business Media, 2010.
 Schrödinger, LLC [2015] Schrödinger, LLC. The PyMOL molecular graphics system, version 1.8. 2015.
 Skocek et al. [2018] O. Skocek, T. Nöbauer, L. Weilguny, F. Martínez Traub, C. N. Xia, M. I. Molodtsov, A. Grama, M. Yamagata, D. Aharoni, D. D. Cox, P. Golshani, and A. Vaziri. Highspeed volumetric imaging of neuronal activity in freely moving rodents. Nature Methods, 15(6):429–432, 2018.
 Toga et al. [2006] A. W. Toga, P. M. Thompson, S. Mori, K. Amunts, and K. Zilles. Towards multimodal atlases of the human brain. Nat Rev Neurosci, 7(12):952–966, 2006.
 Wang and Landau [2001] F. Wang and D. P. Landau. Efficient, multiplerange random walk algorithm to calculate the density of states. Phys. Rev. Lett, 86:2050–2053, 2001.
 White et al. [1986] J. G. White, E. Southgate, J. N. Thomson, and S. Brenner. The Structure of the Nervous System of the Nematode Caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 314(1165):1–340, 1986.
 Wu et al. [2018] S. Wu, Y. Toyoshima, M. S. Jang, M. Kanamori, T. Teramoto, Y. Iwasaki, T. Ishihara, Y. Iino, and R. Yoshida. An ensemble learning approach to autoannotation for wholebrain c. elegans imaging. bioRxiv, 2018. doi: 10.1101/180430.