Patterns of Ship-borne Species Spread: A Clustering Approach for Risk Assessment and Management of Non-indigenous Species Spread ††thanks: JX and TW have contributed equally; TW is the corresponding author. ††thanks: Authors JX, TW and NC are with Dept. of Computer Science and Engineering; EG and DL are with Dept. of Biology of University of Notre Dame (ND), Notre Dame, IN, USA. Authors TW, EG, NC and DL are also with ND Environmental Change Initiative (ECI); Authors JX, TW, NC and DL are with ND interdisciplinary Center of Network Science and Applications (iCeNSA). Authors KS, RK and JD were previously with ND before joining Dept. of Computer Science and Engineering, University of Minnesota, the Institute of Environmental Sustainability, Loyola University Chicago and Odum School of Ecology, University of Georgia, respectively. ††thanks: This work is based on research supported by the ND Office of Research via funding under ND ECI for TW, EG, NC and DL.
The spread of non-indigenous species (NIS) through the global shipping network (GSN) has enormous ecological and economic cost throughout the world. Previous attempts at quantifying NIS invasions have mostly taken “bottom-up” approaches that eventually require the use of multiple simplifying assumptions due to insufficiency and/or uncertainty of available data. By modeling implicit species exchanges via a graph abstraction that we refer to as the Species Flow Network (SFN), a different approach that exploits the power of network science methods in extracting knowledge from largely incomplete data is presented. Here, coarse-grained species flow dynamics are studied via a graph clustering approach that decomposes the SFN to clusters of ports and inter-cluster connections. With this decomposition of ports in place, NIS flow among clusters can be very efficiently reduced by enforcing NIS management on a few chosen inter-cluster connections. Furthermore, efficient NIS management strategy for species exchanges within a cluster (often difficult due higher rate of travel and pathways) are then derived in conjunction with ecological and environmental aspects that govern the species establishment. The benefits of the presented approach include robustness to data uncertainties, implicit incorporation of “stepping-stone” spread of invasive species, and decoupling of species spread and establishment risk estimation. Our analysis of a multi-year (1997–2006) GSN dataset using the presented approach shows the existence of a few large clusters of ports with higher intra-cluster species flow that are fairly stable over time. Furthermore, detailed investigations were carried out on vessel types, ports, and inter-cluster connections. Finally, our observations are discussed in the context of known NIS invasions and future research directions are also presented.
non-indigenous species, species flow network, …
Commercial shipping provides enormous economic benefits worldwide and is responsible for approximately 90% of global trade. However, shipping also imparts large economic and environmental costs by spreading invasive species, or those non-indigenous species (NIS) that damage ecological systems. Shipping can translocate NIS to new areas either through ballast water or hull-fouling, and is responsible for 69% of known aquatic NIS (molnar_assessing_2008). Although only a small portion of transported NIS establish and become invasive, their environmental and economic damages are often large and grow over time (halpern_global_2008; keller_bioeconomics_2009). For instance, we recently estimated that ship-borne aquatic invasive species cost the Great Lakes regional economy $100–800 million annually (rothlisberger_ship-borne_2012). This high cost of ship-born invasions has motivated several efforts to better understand NIS spread and invasion risk through the global shipping network (GSN) (drake_global_2004; kaluza_complex_2010; Kel11; kolzsch_indications_2011; See13). These studies used ship traffic data to create a network, where nodes (i.e., ports) are connected by edges that represent the intensity of shipping traffic. Such networks have been shown to have small-world (Wat98) characteristics, wherein each port is linked to any other port by a small number of “hops” (kaluza_complex_2010; Kel11; kolzsch_indications_2011), and to be very robust with many redundant links (kaluza_complex_2010).
While such initial network analyses are enlightening, they are ultimately inadequate because ship traffic cannot sufficiently capture NIS invasion risk. Rather, invasion risk is likely affected by a complex interplay of ship traffic, ballast uptake/discharge dynamics, survival during transport, propagule pressure, environmental variables, biotic interactions and several other variables that are not yet well characterized (Won13). Incorporating these complexities is a challenging task, since majority of the above relationships and their parameterizations are poorly known. The few studies that have attempted to calculate more realistic measures of invasion risk have relied on probabilistic models that make several simplifying assumptions. For example, (Kel11) combined ship traffic and environmental similarity to estimate relative invasion risk, assuming that simple Euclidean distance between ports’ mean annual temperature and salinity was proportional to risk. This linear relationship between risk and changes in temperature and salinity is not likely for most species, particularly invasive species who tend to exhibit broad environmental tolerances. Most recently, (See13) calculated between-port invasion risk as the product of three probabilities—the probability a species was non-native (based on geographic distance), the probability a species survived transport (based on trip duration), and the probability a species establishes (based on Euclidean environmental similarity). The benefits of these probabilistic approaches are that they provide quantitative estimates. Their drawbacks include unjustifiable simplifying assumptions (i.e., establishment proportional to Euclidean distance, linear propagules pressure invasion risk relationships), high uncertainty, and inability to incorporate “stepping-stone” invasion probabilities.
The graph analysis methods popularized by network science are excellent tools for our goals, as they provide some of the most elegant tools for descriptive analysis of complex, relational data, and they are able to reveal large-scale patterns from a higher level, which is not easily affected by small uncertainties in data.
Specifically, we 1) create a network that represents the general species flow tendency among ports, 2) identify clusters, or groups of ports, in which intense species flow tightly connect the ports in the same cluster, while connections between different clusters are loose, 3) identify ports and ship types that serve as important “inter-cluster connectors”, 4) develop flexible methods to qualitatively assess invasion risk within a cluster based on realistic biogeographic and environmental relationships, and 5) highlight the management implications of our results. We focus here on the spread of species via ballast water, but the method could be easily applied to hull-fouling spread with a few adjustments.
This paper is organized as follows: Section II presents the materials and methods describing the formulation of species flow networks using limited available data, graph clustering approach for understanding the large-scale dynamics of GSN, and an intuitive method that extends graph clustering notions for detailed risk analysis using ecoregion and environmental conditions; Section LABEL:sec:results_discuss presents the results and provides a detailed discussion; and finally, Section LABEL:sec:conc contains the concluding remarks.
Ii Materials and Methods
Our main goal is to understand the large-scale (or coarse-grained) patterns of GSN in order to obtain better insight towards ship-borne NIS invasions. The presented approach is developed in order to exploit the power of network analysis methods in extracting knowledge from largely incomplete data with minimal simplifications and assumptions. We proceed as follows: (i) a network that represents the general species flow tendency among ports is built; then, utilizing a graph clustering method (Ros08) that operates on the basis of flow-dynamics, (ii) a map (Gui05; Tuf06) of the species flow network, i.e., a cogent representation that extracts the main structure of flow while retaining information about relationships among modules (of main structure), is built; finally, using this map that summarizes the species flow dynamics in terms of clusters (or groups) of ports and highlights inter-cluster (i.e., between clusters) and intra-cluster (i.e., within cluster) relationships, (iii) the impact of GSN dynamics on NIS invasions is studied in conjunction with ecological and environmental aspects that govern the species establishment. Let us now illustrate this innovative approach in detail.
Ii-a Datasets and Other Information Sources
Ii-A1 LMIU database
Global and domestic vessel movements for four (4) periods of 1997–1998, 1999–2000, 2002–2003 and 2005–2006, totaling individual voyages corresponding to a total of vessels of various types that move between a total of ports and regions, are acquired from Lloyd’s Maritime Intelligence Unit (LMIU). For each period, the LMIU database contains travel information for vessels such as portID, sail_date and arrival_date, along with vessel metadata, such as vessel_type and DWT (i.e., dead weight tonnage), etc.
Ii-A2 NBIC database
Since vessel movement data (including LMIU) does not provide explicit ballast water exchange amounts (or even whether a vessel dis/charged ballast water), these quantities must be estimated based on some auxiliary data that can sufficiently relate ballast discharge to vessel information given in the LMIU database. Therefore, we utilize the approach suggested in (See13), where ballast water discharge amounts are calculated using a linear regression model per vessel_type basis. For this, the National Ballast Water Clearinghouse (NBIC) database, which contains the date and discharge_volume of all ships visiting U.S. ports from Jan. 2004 to present, is used (see Section II-B1 for details).
Ii-A3 Ecoregion and environmental data
Ecoregions are defined by species composition and shared evolutionary history (Spa07), and thereby capable of providing a more realistic outline of native and invasive ranges. Therefore, we define non-indigenous status based on ecoregion concept in comparison to, for example, geographic distance as used in (See13). Here, ecoregion delineations given by Marine Ecoregions of the World (Spa07) and the Freshwater Ecoregions of the World (Abe08) are used. Then, annual averages of port temperature and salinity are given in the Global Ports Database (GPD) (Kel11) are used for assessment of NIS establishment risk that is based on environmental similarity; the missing values in GPD are supplemented by estimates from the World Ocean Atlas 2009 (WOA09_sal; WOA09_temp) when necessary.
Ii-B Network Modeling for Species Flow Analysis
At the heart of a network analysis lies a graph abstraction of the (often complex) system that is under investigation. This graph must be capable of adequately capturing the system behavior via sets of nodes and edges that model flow/connectivity characteristics. Previous work [@cite] on analysis of GSN impact on NIS invasions, have employed undirected weighted graphs, where nodes are given by the ports (visited by GSN) and edge (and their strength or weight) are derived from traffic intensity between ports. While such modeling is perhaps adequate for network analysis, the task at hand, viz., an analysis based on flow dynamics, a directed network that can adequately represent the directional and asymmetric flow between nodes is mandatory. Therefore, a directed weighted graph that we refer to as the Species Flow Network (SFN) is derived to better represent species flow characteristics among ports. Here, species flow is derived based only on ballast water exchange, and contribution from hull-fouling is not considered. Therefore, the resulting flow dynamics represent the species flow with respect to ballast exchange only. Investigation of bio-fouling is relegated to a future publication (see Section LABEL:sec:conc).
Ii-B1 Species Flow Network (SFN)
Consider a directed graph , where and denote the set of nodes and edges of , respectively. Let the nodes in correspond to ports visited by the GSN and the weight of the directed edge given by represents the total probability of species introduction corresponding to all vessels traveling from port to (without intermediate stopovers), for all .
Estimation of species flow
Species flow between two ports is estimated as in (See13). To summarize, consider a vessel traveling from port to (without intermediate stopovers) in time, during which the species in ballast water die at a mortality rate of (which is set to a constant average of for all routes and vessel types in experiments). In addition, let , and denote the amount of ballast water discharged by vessel at , the efficacy of ballast water management for for the route , and the characteristic constant of discharge, respectively. Then, the probability of vessel introducing species from to (without intermediate stopovers) is given by:
then, the total probability of species introduction for all vessels traveling from to is given by:
where the product is taken over all routes in database s.t. a vessel travels from port to .
Estimation of ballast discharge
Information on ballast dis/charge are largely incomplete to a degree, where estimation of exact quantities exchanged for each and every ship route is almost impossible due to numerous reasons: (i) ballast dis/charges in ports are not recorded globally, and are known to vary significantly by port and ship type; (ii) vessels may have intermediate stopovers, thus exchanging and mixing ballast water with existing water in ballast tanks; and (iii) data are largely unavailable for offshore discharges. Therefore, in order to mitigate the above difficulties, ballast discharge is estimated based on linear regression models on DWT per vessel_type as in (See13). Specifically, linear regression models on DWT for vessels of type Bulk Dry, General Cargo, Ro-Ro Cargo, Chemical, Liquified Gas Tankers, Oil Tankers, Passenger Vessels, Refrigerated Cargo, Container Ships and Unknown/Other) are derived using only the non-zero discharge events recorded in NBIC database.
Furthermore, the relationship of ballast discharge amount to the likelihood of species introduction is not well defined. Therefore, for estimation of (1), is chosen s.t. for a ballast discharge of , when and , i.e., a discharge volume of has a probability of of introducing species if the vessel travels with zero mortalities and has no ballast management strategies in place.
Table LABEL:tb:net_char summarizes the characteristics for SFNs generated for the four (4) LMIU datasets.
|Number of nodes||3971||4045||4264||4250|
|Number of edges||150479||150150||143560||145199|
|Average path length||2.987||2.998||3.018||3.041|
|Average in/out- degree||37.9||37.1||33.7||34.2|