Robust Marine Buoy Placement for Ship Detection Using Dropout KMeans
Abstract
Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. In this paper, we formulate marine buoy placement as a clustering problem, and propose dropout kmeans and dropout kmedian to improve placement robustness to buoy disruption.
We simulated the passage of ships in the Gabonese waters near West Africa using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout kmeans to classic kmeans and dropout kmedian to classic kmedian. With 5 buoys, the buoy arrangement computed by classic kmeans, dropout kmeans, classic kmedian and dropout kmedian have ship detection probabilities of 38%, 45%, 48% and 52%.
Yuting Ng João M. Pereira Denis Garagic Vahid Tarokh
kmeans, kmedian, clustering, dropout, marine buoy placement
1 Introduction
Illegal, Unreported and Unregulated (IUU) fishing not only endangers marine ecosystems, but also is a global threat to economic and food security, with annual damages estimated at $1023.5 billion and 1126 million tons [1]. Developing countries which depend on fishing for food and export, such as those in West Africa, are most at risk [17]. In the battle against IUU fishing, a network of marine buoys can improve the monitoring of fishing activity via ship detection [5, 9, 19].
In May 2013, two marine buoys captured images of a fishing vessel that was fishing illegally. The vessel was caught despite its efforts to conceal its location, by not sending Automatic Identification System (AIS) location reports for a twoweek period from May 20 to June 1, 2013 [4]. The catch demonstrated the effectiveness of using marine buoys to detect fishing vessels, and there are plans to fit more buoys with cameras [4]. Marine buoys, however, may be disrupted [24].
At a given time, only about 70% of the marine buoys partnered with the National Oceanic and Atmospheric Administration’s (NOAA) National Data Buoy Center (NDBC) are reporting. For example, on July 10 2019, out of 1416 buoys deployed, only 1001 buoys reported back to NDBC in an 8 hour period, from 10:30 to 18:30 PDT, as shown in Figure 1 [18]. Communications with buoys are disrupted due to natural causes, such as harsh weather, corrosion, fish bites and marine growth, or acts of vandalism [4]. Fishing vessels engaging in IUU fishing are the most common perpetrators, despite regulations against buoy vandalism [7].
In order to place buoys, we use a kmeans type clustering algorithm to cluster ship positions, and place buoys at the cluster centers obtained by the algorithm. However, buoy disruption changes the spatial configuration of the buoy network. While the ship detection radius of each buoy does not change, the distance of each ship to the closest remaining buoy may increase, and the probability a ship is detected by the buoy network decreases. We thus propose a more robust buoy placement by considering weights proportional to the probability of buoy dropout in the kmeans and kmedian clustering algorithms. Therefore, we named these algorithms dropout kmeans and dropout kmedian.
Clustering algorithms, including kmeans [22, 15] and kmedian [6, 2], are widely used in sensor placement [12]. Examples include relay node placement in wireless sensor networks [23, 11] and temperature sensor placement in microprocessors [16].
Prior work proposed a stochastic dropout kmeans algorithm that applied dropout, a technique commonly used to regularize neural networks [21], to classic kmeans clustering [26]. In that paper, each iteration starts by a random dropout of some cluster centers, followed by an iteration of classic kmeans on the remaining clusters. There is a convergence issue, especially when the number of clusters is small.
In contrast, in this paper, we modify the classic kmeans and classic kmedian objectives to consider all possible dropout outcomes, with weights given by the probability of occurrence of each outcome. While the number of different dropout outcomes is , where is the number of clusters, we can effectively group outcomes and perform each cluster center update in polynomial time. We then position the marine buoys at the cluster centers obtained by the dropout kmeans and dropout kmedian algorithms.
Our contributions are:

Formulate buoy placement as a clustering problem, where buoys, as cluster centers, may be disrupted.

Modify kmeans and kmedian with a dropout probability that models buoy disruption.

Derive closedform updates that considers all possible dropout outcomes in each iteration.

Define a deterministic dropout kmeans and a dropout kmedian algorithm that runs in polynomial time, instead of exponential time, with the number of clusters.
The rest of the paper is structured as follows: Section 2 casts buoy placement as a clustering problem with the objective of minimizing of ship to buoy distances, with the probability of buoy dropout. In Section 3, we describe dropout kmeans and dropout kmedian, and derive closedform expressions for updating the centers and the clusters. Section 4 compares the performance of dropout kmeans and dropout kmedian with classic kmeans, classic kmedian and stochastic dropout kmeans implemented in prior work. Finally, Section 5 concludes the paper.
2 Problem Statement
We formulate buoy placement as a clustering problem where the goal is to place buoys, acting as cluster centers , in order to maximize the probability of detecting ships, acting as data points . The probability of detection is computed using the law of total probability:
(1) 
where is the set of ships, is a set indices belonging to the same ship, and is the probability of observing the ship; is a set of cluster indices, is the power set of , that is, the set of all subsets of , with size , representing all dropout outcomes, is the set of remaining buoy indices in a particular dropout outcome, and is the probability of the dropout outcome. The indicator function represents ship detection. We assume detection of a ship to depend on the proximity of any of the ship’s locations to any of the buoys remaining after dropout, where is the radius of detection. To simplify notation, we assume the probability of observing each ship to be the same, that is, .
The objective function given in equation (1) is piecewise constant and cannot be optimized via gradient based methods. Alternatively, we consider a convex relaxation of equation (1), which we minimize using a kmeans type algorithm.
The classic kmeans algorithm seeks to minimize the sum of square distances from data points to cluster centers, that is:
(2) 
where is the Euclidean norm, also known as the norm.
On the other hand, the classic kmedian algorithm seeks to minimize the sum of distances, that is:
(3) 
Since buoys may dropout as a result of disruption, changing the spatial configuration of the buoy network, we modify the classic kmeans objective to include all possible dropout outcomes, given that a single buoy remains.
Thus, in dropout kmeans, we add weights corresponding to the probability of dropout. Specifically, we minimize the objective function:
(4) 
where the probability of dropout outcome is given as:
(5) 
where dropout of each cluster center is assumed to be independent and identically distributed (i.i.d.) Bernoulli(). We do not consider the dropout outcome where no buoys remain, since, in that scenario, no ship can be detected, regardless of buoy placement. Thus, .
Similarly, in dropout kmedian, we add weights corresponding to the probability of dropout, and minimize the objective function:
(6) 
In dropout kmeans and kmedian, the terms and are considered as convex relaxations of the probability of no detection .
3 Method
In this section, we describe dropout kmeans, dropout kmedian, and derive efficient implementations for their closedform updates. Each update considers all possible dropout combinations. By first sorting cluster centers by their proximity to data points, each iteration scales proportionally to , instead of , where is the number of points and is the number of clusters.
3.1 Dropout kmeans
We first interchange the order of the summations in the dropout kmeans objective shown in equation (4):
(7) 
Fixing a data point , we can calculate equation (7) by grouping dropout outcomes. We denote a group of dropout outcomes, where the closest remaining cluster center to datapoint is cluster center , as . For each datapoint , the groups of dropout outcomes are thus . We then show the calculation for the probability of a group of dropout outcomes .
Let be a permutation such that the cluster centers are sorted in increasing distances to , such that . Then, with probability , is not dropped out, and remains the closest cluster center to . For to be the closest cluster center to , has to drop out and has to stay, which happens with probability . Using this reasoning, the probability that is the closest cluster center to data point is . On the other hand, the probability that is the closest cluster center to data point is , where is the inverse of , that is, if . The probability of the group of dropout outcomes is thus:
(8) 
The dropout kmeans objective from equation (7) is thus rewritten as:
(9) 
We treat permutation as a cluster assignment. For a fixed cluster assignment, a center is the minimizer of the weighted sum of square distances between itself and the points in that cluster, with weights given by the probabilities of each point being in that cluster. By taking a derivative with respect to , we obtain that the center is a weighted average of points in that cluster:
(10) 
In both classic kmeans and dropout kmeans, the objective decreases monotonically. The algorithms are thus iterated to convergence, that is, when cluster assignments no longer change. The algorithm complexity scales as , due to sorting the cluster centers, and is shown in Algorithm 1.
3.2 Dropout kmedian
The dropout kmeans objective is to minimize a weighted sum of square distances, and can be considered a convex relaxation of minimizing the probability of missed detection, where cluster centers are associated with the mean. Motivated by this convex relaxation approach, we also consider the algorithm minimizing a weighted sum of distances, where cluster centers are associated with the mean, also known as the geometric median. We denote this algorithm as dropout kmedian.
The minimizer is the geometric median instead of the geometric mean. Unfortunately, there is no closedform solution for the geometric median. However, there is an iterative algorithm that converges to the geometric median, known as the Weiszfeld algorithm [25]. We extend the Weiszfeld algorithm with weights given by equation (8) to obtain the cluster center update as:
(11) 
4 Evaluation
We compare the ship detection probability of dropout kmeans and dropout kmedian with classic kmeans, classic kmedian and stochastic dropout kmeans implemented in prior work [26], at the Gabonese Exclusive Economic Zone (EEZ) near West Africa. To simulate the passage of ships, we use AIS data. While AIS data may have inconsistencies from gaps in reporting, it is widely used in literature for visualizing fishing activity [13], improving collision avoidance systems [27, 10], anomaly detection [14] and trade route identification [20].
4.1 AIS Dataset
We downloaded AIS tracks from Global Fishing Watch’s github repository [8]. Of the 1258 tracks in the repository, 55 tracks, with unequal contributions to a total of 313390 location reports, passed through Gabonese EEZ. The 313390 ship coordinates were then used for clustering.
4.2 Evaluation Metrics
We use ship detection probability (), root mean square distance (RMSD), rate of convergence and runtime as metrics.
The is computed as in equation (1), the RMSD is computed as:
(12) 
Note that and RMSD may be calculated in polynomial time by grouping dropout outcomes, using the technique introduced in Section 3.
Rate of convergence is measured as number of iterations taken for the algorithm to converge, runtime is measured as the total time taken. With regard to convergence, classic kmeans and dropout kmeans have natural termination conditions, that is, when assignments no longer change. For classic kmedian and dropout kmedian, since the geometric median is found via an iterative algorithm, the algorithms might not have converged while the cluster assignments no longer change. However, when the assignments do not change, it means the solution is close to convergence. We thus take assignments not changing as the termination condition. For stochastic dropout kmeans implemented in prior work [26], there is no natural termination condition. In addition, it has a convergence issue, especially when the number of clusters is small. For example, in the two clusters scenario, stochastic dropout kmeans oscillates between one and two cluster configurations. For stochastic dropout kmeans, we thus set a relaxed termination condition as all cluster centers moving by less than , where is the sensor detection radius.
4.3 Results
We evaluated the performance of deploying 5 buoys. We assume each buoy is disrupted with probability 0.3, since this is the fraction of buoys that missed reporting back to NDBC. We assume the 55 ships occur with the same probability and repeated the experiment 30 times. At the beginning of each experiment, we randomly generated cluster centers with kmeans++ initialization [3] and used the same initial set of cluster centers for all algorithms. The radius of detection was set at 10km, consistent with current sensor detection radii found in literature [9, 19]. The buoy arrangement computed by classic kmeans, dropout kmeans, stochastic dropout kmeans [26], classic kmedian and dropout kmedian have ship detection probabilities of 38%, 45%, 45%, 48%, 52%. The results, with mean and standard deviation from 30 trials, are summarized in Table 1.
number of iterations  total runtime (s)  RMSD (km)  

classic kmeans  3112  0.50.1  1549  0.380.03 
dropout kmeans  192  1.00.1  1400  0.450.00 
stochastic dropout kmeans [26]  28651  1.10.4  1442  0.450.04 
classic kmedian  5418  1.20.4  1517  0.480.05 
dropout kmedian  7922  4.20.9  1411  0.520.03 
Both stochastic dropout kmeans and dropout kmeans have improved RMSD and over classic kmeans. This is expected as dropout models buoy disruptions. In addition, stochastic dropout kmeans should converge to the same result as dropout kmeans in expectation, as observed. Stochastic dropout kmeans, however, has a convergence issue, frequently hitting the limit of 300 iterations, despite having a more relaxed convergence condition. In addition, its RMSD and have larger variances than dropout kmeans.
Dropout kmedian also showed improved RMSD and over classic kmedian. In addition, classic kmedian and dropout kmedian showed improved RMSD and over classic kmeans and dropout kmeans. This is because distance is a tighter upper bound than square distance on the probability of missed detection. In this experiment, minimizing a tighter upper bound produced a stronger algorithm. Lastly, with our efficient implementation, the runtimes of the dropout algorithms were comparable to the classic algorithms.
5 Conclusion
Dropout kmeans and dropout kmedian clustering give more robust buoy placement, where dropout aptly models buoy disruption. We proposed an efficient implementation for dropout kmeans and extended the algorithm to dropout kmedian, where distance is a tighter upper bound than square distance on the probability of missed detection. We simulated the placement of marine buoys at the cluster centers computed from ship AIS data in the Gabonese waters near West Africa. For 5 buoys, the ship detection probability of classic kmeans, dropout kmeans, classic kmedian and dropout kmedian are 38%, 45%, 48% and 52%.
plus .05
Footnotes
 thanks: This work was supported in part by DARPA Grant No.HR00111990016.
References
 (200902) Estimating the worldwide extent of illegal fishing. PLOS ONE 4 (2), pp. 1–8. External Links: Link, Document Cited by: §1.
 (1998) Approximation schemes for euclidean kmedians and related problems. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, New York, NY, USA, pp. 106–113. External Links: ISBN 0897919629, Link, Document Cited by: §1.
 (2007) Kmeans++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’07, Philadelphia, PA, USA, pp. 1027–1035. External Links: ISBN 9780898716245, Link Cited by: §4.3.
 (2014Sep.) Countervandalism at NDBC. In 2014 Oceans  St. John’s, Vol. , pp. 1–7. External Links: Document, ISSN 01977385 Cited by: §1, §1.
 (201011) Concurrent use of satellite imaging and passive acoustics for maritime domain awareness. In 2010 International WaterSide Security Conference, Vol. , pp. 1–8. External Links: Document, ISSN 21661782 Cited by: §1.
 (1999) A constantfactor approximation algorithm for the kmedian problem (extended abstract). In Proceedings of the Thirtyfirst Annual ACM Symposium on Theory of Computing, STOC ’99, New York, NY, USA, pp. 1–10. External Links: ISBN 1581130678, Link, Document Cited by: §1.
 (2011) Ocean data buoy vandalism  incidence, impact and responses. Technical report Cited by: §1.
 Note: Available: \urlhttps://github.com/GlobalFishingWatch/trainingdata, Accessed on: Jul 29, 2019 Cited by: §4.1.
 (2012) Detection and tracking of ships in open sea with rapidly moving buoymounted camera system. Ocean Engineering 54, pp. 1 – 12. External Links: ISSN 00298018, Document, Link Cited by: §1, §4.3.
 (2018) Online prediction of ship behavior with automatic identification system sensor data using bidirectional long shortterm memory recurrent neural network. Sensors 18 (12). External Links: Link, ISSN 14248220, Document Cited by: §4.
 (2017) Hybrid clustering scheme for relaying in multicell lte high user density networks. IEEE Access 5 (), pp. 4431–4438. External Links: Document, ISSN 21693536 Cited by: §1.
 (2010) Data clustering: 50 years beyond kmeans. Pattern Recognition Letters 31 (8), pp. 651 – 666. Note: Award winning papers from the 19th International Conference on Pattern Recognition (ICPR) External Links: ISSN 01678655, Document, Link Cited by: §1.
 (2018) Tracking the global footprint of fisheries. Science 359 (6378), pp. 904–908. External Links: Document, ISSN 00368075, Link, https://science.sciencemag.org/content/359/6378/904.full.pdf Cited by: §4.
 (200806) Anomaly detection for sea surveillance. In 2008 11th International Conference on Information Fusion, Vol. , pp. 1–8. External Links: Document, ISSN Cited by: §4.
 (198203) Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), pp. 129–137. External Links: Document, ISSN Cited by: §1.
 (200607) Systematic temperature sensor allocation and placement for microprocessors. In 2006 43rd ACM/IEEE Design Automation Conference, Vol. , pp. 542–547. External Links: Document, ISSN 0738100X Cited by: §1.
 Note: Available: \urlhttps://www.fisheries.noaa.gov/insight/understandingillegalunreportedand \urlunregulatedfishing, Accessed on: Aug 3, 2019 Cited by: §1.
 Note: Available: \urlhttps://www.ndbc.noaa.gov/, Accessed on: Jul 10, 2019 Cited by: Figure 1, §1.
 (201708) Video processing from electrooptical sensors for object detection and tracking in a maritime environment: a survey. IEEE Transactions on Intelligent Transportation Systems 18 (8), pp. 1993–2016. External Links: Document, ISSN 15249050 Cited by: §1, §4.3.
 (201712) Knowledge extraction from maritime spatiotemporal data: an evaluation of clustering algorithms on big data. In 2017 IEEE International Conference on Big Data (Big Data), Vol. , pp. 1682–1687. External Links: Document, ISSN Cited by: §4.
 (2014) Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, pp. 1929–1958. External Links: Link Cited by: §1.
 (1956) Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci 1 (804), pp. 801. Cited by: §1.
 (201404) Aerial node placement in wireless sensor networks using fuzzy kmeans clustering. In 8th International Conference on eCommerce in Developing Countries: With Focus on eTrust, Vol. , pp. 1–7. External Links: Document, ISSN Cited by: §1.
 (200910) Buoy vandalism experienced by NOAA national data buoy center. In OCEANS 2009, Vol. , pp. 1–8. External Links: Document, ISSN 01977385 Cited by: §1.
 (1937) Sur le point pour lequel la somme des distances de n points donnes est minimum. Tohoku Mathematical Journal, First Series 43 (), pp. 355–386. External Links: Document Cited by: §3.2.
 (2016) Hierarchical feature learning with dropout kmeans for hyperspectral image classification. Neurocomputing 187, pp. 75 – 82. Note: Recent Developments on Deep Big Vision External Links: ISSN 09252312, Document, Link Cited by: §1, §4.2, §4.3, Table 1, §4.
 (2015) A method for detecting possible near miss ship collisions from AIS data. Ocean Engineering 107, pp. 60 – 69. External Links: ISSN 00298018, Document, Link Cited by: §4.