Big Data Meet Cyber-Physical Systems: A Panoramic Survey

Big Data Meet Cyber-Physical Systems: A Panoramic Survey

Rachad Atat, Lingjia Liu, Senior Member, IEEE, Jinsong Wu, Senior Member, IEEE, Guangyu Li, Chunxuan Ye, and Yang Yi, Senior Member, IEEE A part of this work was presented in the Ph.D. thesis of R. Atat under the guidance of Dr. L. Liu [1]. This work is supported in part by the U.S. National Sciecne Foundation (NSF) under grants NSF/ECCS-1802710, NSF/ECCS-1811497, and NSF/CNS-1811720, Chile CONICYT under grant Fondecyt Regular 181809, and China Hunan Provincial Nature Science Foundation under grant 2018JJ2535. Corresponding authors are L. Liu (ljliu@ieee.org) and J. Wu (wujs@ieee.org). R. Atat is with the Electrical and Computer Engineering Department, Texas A&M University at Qatar, Doha, Qatar; L. Liu and Y. Yi are with the Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA; J. Wu is with the Department of Electrical Engineering, Universidad de Chile, Santiago, Chile; G. Li is with School of Computer Science and Engineering, Nanjing University of Science & Technology, 210094 Nanjing, P. R. China; and Chunxuan Ye is with InterDigital Communications, LLC., San Diego, CA 92121, USA.
Abstract

The world is witnessing an unprecedented growth of cyber-physical systems (CPS), which are foreseen to revolutionize our world via creating new services and applications in a variety of sectors such as environmental monitoring, mobile-health systems, intelligent transportation systems and so on. The information and communication technology (ICT) sector is experiencing a significant growth in data traffic, driven by the widespread usage of smartphones, tablets and video streaming, along with the significant growth of sensors deployments that are anticipated in the near future. It is expected to outstandingly increase the growth rate of raw sensed data. In this paper, we present the CPS taxonomy via providing a broad overview of data collection, storage, access, processing and analysis. Compared with other survey papers, this is the first panoramic survey on big data for CPS, where our objective is to provide a panoramic summary of different CPS aspects. Furthermore, CPS require cybersecurity to protect them against malicious attacks and unauthorized intrusion, which become a challenge with the enormous amount of data that is continuously being generated in the network. Thus, we also provide an overview of the different security solutions proposed for CPS big data storage, access and analytics. We also discuss big data meeting green challenges in the contexts of CPS.

cyber-physical systems (CPS), Internet of Things (IoT), context-awareness, social computing, cloud computing, big data, clustering, data mining, data analytics, machine learning, real-time analytics, space-time analytics, cybersecurity, green, energy, sustainability

I Introduction

The growing number of “things”, such as embedded devices, sensors, radio-frequency identification (RFID), and actuators have revolutionized the world through their integrated communication and tight interactions to create pervasive and global cyber-physical systems (CPS). Indeed, it is expected that over 50 billion sensors will be connected to the Internet, with an average of 6.58 devices per person by 2020 [2]. This has allowed the rapid development of myriad of applications in health-care, public safety, environmental management, vehicular networks, industrial automation, and so on. A CPS mainly consists of physical components and a cyber twin interconnected together, where a cyber twin is a simulation model representative of the physical things such as a computer program [3]. Internet of Things (IoT), on the other hand, allows different CPS to be connected together for information transfer. This means that IoT acts as a connection bridge to network different cyber-physical things. The global expansion of interconnected CPS is facilitated by standardization efforts. For instance, the standardization activities of IoT are being led by the industry (AllSeen Alliance, Open Interconnect Consortium, Industrial Interconnect Consortium) and IEEE P2413 project on standards specifications of IoT architectural framework [4]. CPS have resulted in a tsunami of new information, also known as big data, which can help boost revenues of many businesses, by identifying customer needs and providing them with superior services. However, this enormous amount of data is so large in size and complex in real-time that it exceeds the processing capacities of conventional systems. That is why, cloud computing techniques along with machine learning tools, data mining, artificial intelligence, and fog computing can help the sensed data to be easily stored, processed, and analyzed to uncover hidden patterns, unknown correlations and other useful information [5]. That is why big data are referred to as “the 21 century new oil”. The characteristics of big data was well summarized in the Introduction section of [6]. The relevances of big data era and CPS actually are also highly relevant to global sustainability development goals recently discussed in [7].

Before any processing or analysis, data need to be acquired. The technological advancements in sensors have led to smarter, more efficient and low-cost sensors; the fact that facilitated their wide deployment. Two main sources to sense the data from: i) context-aware computing and communications[8], ii) social computing. With context-aware computing and communications[8], data are sensed from physical sensors, virtual sensors which retrieve data using web services technology, logical sensors which combines both virtual and physical sensors such as gathering weather information, global sensors which collect data from middleware infrastructure, and remote sensors for earth sciences applications [9, 10]. As for social computing, participatory sensing and mobile crowd-sensing have led to shaping the structure of social networks, in which users collect and share sensed data using their own smartphones rather than relying on sensors [11, 12, 13].

Cloud computing facilitates big data storage, processing and management in CPS, by breaking them down into workflows, which are then distributed over multiple dedicated servers. This allows CPS to provide pervasive sensing services beyond the capacities of individual things, in addition to lower latency and power consumption and larger scalability.

Once the data have been collected, making sense of them becomes one the most important aspects of CPS. However, it is important first to eliminate redundant information and reduce data complexity so useful information extraction can be efficiently performed. In this survey, we will discuss about different tools to assist with the data mining process, mainly, feature selection, dimensionality reduction, knowledge discovery in databases, information visualization, computer vision, classification/clustering techniques, and real-time analysis.

After the data are transformed into manageable sizes, data mining tools (HDFS [14], MapReduce [15], R [16], S), real-time big data analytic tools (Storm [16, 17], Splunk [18]), and cloud-based big data analytic tools (GFS [19], BigTable [20], MapReduce) can be used to extract useful information and make sense of data, which would revolutionize the field of smart cities, environmental monitoring and others.

The ubiquitous cyber-physical world is susceptible to security threats to a large degree. These security vulnerabilities are made easier with the inability to effectively handle the large amount of data that is constantly flowing through the network; that, in addition to the lack of qualified security experts. Sensitive data stored in the cloud can be accessed or altered by unauthorized users. Cyber-security attacks on the computations, such as false data injection, can affect the integrity and accuracy of extracted results. For all these reasons, research efforts have been shifting towards proposing robust security solutions for big data CPS. In this survey, we provide an overview of these proposed solutions.

In recent years, there have been growing interests in green information and communications technologies. Addressing green issues for CPS allows for a more sustainable and energy efficient systems. Green solutions are proposed for many aspects of CPS, mainly for i) data collection/storage, such as minimizing the number of relay transmissions, removing redundant transmission links, and the use of data compression techniques; ii) CPS computing such as dynamic voltage and frequency scaling and traffic engineering techniques; iii) CPS processing such as designing energy-efficient orchestrators, checkpointing aided parallel execution (CAPE), reducing the amount of exchanged data between clouds, the use of cloudlets which are closer to users than distant clouds, among other solutions.

We summarize the contributions of our paper as follows:

  • First, to the best of our knowledge, this is the first panoramic survey on big data for cyber-physical systems. Unlike other previous relevant literature surveys [21][22][9, 23, 24][25][26][5], this paper provides a broader viewpoint on CPS from different aspects, mainly data collection and storage, processing, analytics, cybersecurity, and green solutions, rather than focusing on specific aspects of CPS.

  • Second, we provide a broader summary of security solutions proposed for big data CPS in data collection, storage, and access, as well as in data processing and analytics; unlike previous survey papers such as [15, 27, 28] that only focused on security of a particular component or element of big data.

  • Third, we provide a summary in big data, storage, and communications for CPS.

  • Fourth, we provide a broader view of big data meeting green challenges for CPS, unlike [29, 30, 31, 6, 32].

Acronym Description
BLE Bluetooth low energy
CAPE Checkpointing aided parallel execution
CPS Cyber-physical systems
D2D Device-to-device
DVFS Dynamic voltage and frequency scaling
GFS Google file system
GIS Geographical information systems
HDFS Hadoop distributed file system
IoT Internet of Things
KDD Knowledge discovery in databases
M2M Machine-to-machine
MBAN Medical body area network
MCS mobile crowd-sensing
MEMS Micro-electromechanical systems
MIMO Multiple input multiple output
MTC Machine-type communications
NoSQL Not only structured query language
RFID Radio-frequency identification
RS Remote sensing
RSS received signal strength
SDN Software defined network
VM Virtual machine
QoC Quality of context
TABLE I: List of acronyms and their descriptions.

The sections of this paper are organized as follows. Section II compares this paper with other related surveys in literature. Section III describes the different sources, generations, and collections of big data for CPS, mainly context-aware computing, communications, and social computing, while Section IX presents different CPS-related big data applications. Then, in Section IV, we discuss about big data caching and routing for better CPS data management. In Section V, we present the different big data processing tools from cloud data processing and multi-cloud processing to big data clustering techniques, NoSQL and fog computing that help facilitate the analysis of large volume of data. In Section VI, an overview of big data analytics techniques and tools is provided. Section VII provides a summary of the different security solutions proposed for CPS, mainly in storage, access and analysis. Section VIII addresses the relevances of big data and green challenges for CPS. Finally, SectionX identifies CPS big data challenges and open issues. Conclusion remarks are presented in Section XI. A list of acronyms used in this paper is provided in Table I.

Ii Relevant Works

Several survey papers have focused on specific aspects of big data, IoT, and CPS. However, to the best of our knowledge, none of them have investigated the interconnections of these concepts together in an extensive survey like this one.

In [21], the authors provided an overview on IoT from definitions to technologies to applications and standardizations, while presenting the different solutions proposed in literature for deterring security threats and preserving data integrity in RFID systems. In [9], the authors provided a brief overview on IoT, with emphasis on sensor networks and their relationships to IoT. Then, context-aware computing definitions, categories and characteristics from an IoT perspective were thoroughly presented. In [23], context data were surveyed for mobile ubiquitous environments, with the main focus being on providing efficient context data distribution for real network systems via highlighting context distribution architecture’s layers, network deployments and taxonomy. In [24], different mobile crowd sensing incentives and types were reviewed to motivate normal users to participate and contribute to different sensing applications. In [25], caching big data techniques to optimize computational time and reduce storage overhead as well as improve system performance, scalability and efficiency were presented. In [5], different cloud computing schedulers using Hadoop-MapReduce with their pros and cons were presented for purpose of improving scheduling in cloud environment. In [22], big data analytics from challenges to solutions to open research issues were explored for efficient information extraction and decisions making.

In [15], different security and privacy protocols for MapReduce were surveyed for integrity, correctness and confidentiality of cloud data computations. As for fog computing, which extends cloud computing to the edge of the networks, [27] reviewed the different security and privacy challenges, solutions and open issues. For specific CPS applications, [28] provided a thorough survey on cybersecurity issues for smart grids, with emphasis on security requirements, network vulnerabilities, and secure communication protocols and architectures. Green challenges facing big data cloud computing and processing were surveyed in [29, 30, 6, 32], with different green IoT applications in [31].

CPS has also been surveyed in literature. For instance, in [33], CPS features, energy control, transmission, resource allocation and software designs were presented. The work [34] discussed the integration of cloud computing with CPS by categorizing them in three different areas: remote brain, big data manipulation, and virtualization. The paper [35] surveyed the general concepts, challenges and applications for CPS.

Most of the previous surveys have not discussed big data in the contexts of CPS, which is the main focus of this paper. Furthermore, the green and security challenges and solutions are explored specifically for big data in CPS. Therefore, this survey paper provides a big umbrella addressing promising research frontiers and insights in many challenges and open issues facing big data meeting CPS. Table II provides a summary comparison of the previously mentioned related surveys in the field.

Survey Scope
[21] IoT definitions, technologies and applications
[9] IoT context-aware definitions, categories and characteristics
[23] Context data distribution for mobile ubiquitous environments
[33]
CPS features, energy, transmission, resource allocation
and software designs
[34] Cloud computing in CPS
[35] CPS general concepts, challenges and applications
[24] Mobile crowd-sensing incentives
[25] Caching big data
[5] Cloud computing schedulers
[22] Big data analytics challenges and solutions
[15] Security and privacy protocols for MapReduce
[27] Fog computing security challenges, solutions and open issues
[28] Cybersecurity for smart grids
[29] Energy efficient cloud data centers and computing
[30] Green big data processing for smart grids
[31] Green IoT applications
[6, 32] Big data meet green challenges
TABLE II: A summary of different related surveys.

Iii Sources, Generations, and Collections of Big Data for CPS

Fig. 1: An illustration of the different CPS big data sources and types.

Several factors have driven the expansive uses of sensors, mainly for micro-electromechanical systems (MEMS) such as accessible design boards (Raspberry pi, Onion, Arduino) [36], and the new efficient hardware architectures and components, which made sensors more robust to hardware wearing from harsh environments. Moreover, many of these sensors incorporate accelerometers that are 1000x more powerful in terms of sensitivity than those used in a Nintendo Wii [37]. In this section, we discuss the different sensed big data sources and types by grouping them into social computing and context-aware computing. An illustration is provided in Fig 1.

Iii-a Context Aware Computing and Communications

The general definition of context-aware communications and networking (CACN) was provided in [8]. Context-aware computing could be considered as CACN operating in the higher networking layers. The data that sensors collect for its specific application purposes are considered raw data, which have been directly collected from the environment without further processing. With raw data alone, it becomes challenging to analyze and interpret them, and let alone the big data generated by the large scale deployment of sensors. For data which provide relevant information that is meaningful and easily interpretable, sensors need to engage in context-aware computing; that is, sensors need to store processed meaningful information, also known as “context information”, that is easily understandable [38, 8]. An example to highlight the difference between raw sensor data and context information would be the blood sugar readings collected by bio-medical sensors on the bodies of patients, which are considered raw data. When these readings are processed and represented as a patient’s average glucose level in the blood, they are referred to as context information. The quality of context (QoC) metric is used to assess the quality, validity, precision and up-to-date context information [23]. Context-aware computing from acquisition, processing, storing and reasoning, can be performed by the applications themselves, or by using libraries and toolkits, or even by using a middleware platform [9].

As for sources of context information, they can be retrieved directly from physical sensors, from virtual sensors (they collect data from different sources by using web services technology and represent it as sensor data), from logical sensors ( a combination of physical and virtual sensors), from a middleware infrastructure (global sensor networks), from context servers (databases, web services), or even manually provided such as retrieving users’ preferences. The context information can be further classified into primary context and secondary context, which provides information on how the data were obtained. For example, reading RFID tags directly from different production parts in industrial plants is considered the primary context, while obtaining the same information from the plant’s database is referred to as the secondary context [9, 21].

Iii-B Remote sensing

Collecting sensor data of objects from a distance is referred to as remote sensing (RS) [39]. RS is an integral part of earth sciences. For instance, space-borne and airborne sensors collect multi-spatial and multi-temporal RS big data from the global atmosphere for purposes of earth observation and climate monitoring [40]. Other remote sensing applications include Google Earth that provides pictures of the earth’s surface, weather reporting, traffic monitoring, hydrology and oceanography [41].

In [42], the authors proposed a big data analytical architecture for real-time RS data processing using earth observatory system. The real-time processing includes filtration, load balancing and parallel processing of the useful RS data. The RS datasets are normally geographically distributed across several data centers, leading to difficulties in loading, scheduling and transmission of data. Moreover, the high dimensionality of the RS data makes their storage and data access rather complicated [40]. That is why, in [43], Wang et al. proposed a wavelet transform to represent RS big data by decomposing the datasets into multiscale detail coefficients, which are estimated using expectation-maximization likelihood. In [44], the authors evaluated the quality of RS data using statistical inference via using the prior knowledge of the dataset to get an unbiased estimator for the quality.

Iii-C Social Computing

With the explosive increase in smartphones usage, mobile data have experienced an unprecedented growth, carrying enormous amount of information on user applications, network performance data, service characteristics, geographic information, subscriber’s profile, and so on [45]. This has led to shaping the notion of “mobile big data”, which, unlike traditional big data in computer networks, have their own unique characteristics. One of these characteristics is the ability to partition mobile data in time and space domains, such as in minutes, hours, days, location, and so on. Furthermore, due to the features of smartphones’ usage, the same traffic, on one hand, can be highly likely requested by a group of subscribers in certain time and location; and on the other hand, subscribers in close proximity may exhibit similar behavior and mobility patterns, all of which can help optimize network performance [46].

Social computing allows the integration of these social behaviors and contexts into web technologies to assist with predicting social dynamics, which can render the operation, planning and maintenance of social wireless networks easier than ever [47, 48]. For instance, due to high social correlations and relationships among subscribers, a user social network can be formed, in which the habit, interests, mobility, and sharing patterns can be used to construct social community structures and analyze communication behaviors. One such an example of user social application is the popular Pokemon Go game, where users in close proximity share real-time maps to hunt for Pokemon characters [46]. Another example where social computing can be beneficial is in emergency situations, such as the spread of infectious diseases, where taking the appropriate policies by analyzing human interactions and predicting the emergency’s evolution can help protect the public health [47]. This article [49] introduced Cybermatics as a broader vision of the IoT (called hyper IoT) to address science and technology issues in the heterogeneous cyber–physical–social–thinking (CPST) hyperspace. Next, we list two different social computing tools for data collection, mainly, participatory sensing and crowd-sensing.

Iii-C1 Participatory sensing

Participatory sensing or community sensing allows users to collect and share information either within social groups (social sensing) or with everyone (public sensing) using their own smart devices [11, 12]. This means that sensors can be substituted by users for purpose of data collection, which can significantly reduce the monetary costs of deploying physical sensors. However, with participatory sensing comes several challenges such as the quality and trustworthiness of collected data, the willingness of participants to engage in the sensing tasks and protecting participants’ personal information.

In [11], the authors proposed a participant coordination architecture that selects the most efficient participants without exposing participants’ personal information to the application server. To protect participants’ privacy, in [50], Chang et al. proposed a secure scheme called PURE which allows participants to reach the global model to estimate via peer reviewing the local regression models. This enables participants to only report intermediate results back to the server without the need of sharing local private data with the server. In [51], Messaoud et al. proposed a mobile sensing scheme that reduces the sensing time required by participants, and increases the fairness of sensing tasks assignment to ensure participants’ commitment to sensing while maintaining same data quality as in non-fair schemes. In an attempt to maximize the overall data quality, in [52], Wang et al. proposed a multi-task allocation framework (MTPS) which pays participants a compensation from a shared budget for each sensing task, with additional compensation if a participant is assigned more than one task. This greedy framework allows the allocation of multiple tasks to participants. In [53], participatory sensing for environmental data collection was used, where the urban resolution metric was used to measure the quality of urban sensing. In [54], participatory sensing was applied to vehicular networks, where location, speed, and fuel consumption of vehicles can be communicated with the server through phones aboard via a WiFi interface to reduce data transfer delay time.

Iii-C2 Mobile crowd-sensing

Mobile crowd-sensing (MCS) can be considered as an extension to participatory sensing. In addition to collecting data from mobile devices (mobile sensing), MCS uses social sensing via integrating and fusing the contributed data from mobile devices with that of the mobile social network services in order to provide solutions to more complex queries [13]. In vehicular networks, with participatory sensing, we can collect warning messages from vehicles to determine the traffic status. However, if in addition, a driver needs to know whether the route is safe to drive on based on authorities’ recommendations, residents’ preferences, and so on, then MCS can be useful (see Fig. 1).

In [55], Xiang et al. used MCS to construct accurate outdoor received signal strength (RSS) maps using error-prone smartphones. In [56], Wang et al. proposed an energy-efficient cost-effective data uploading in MCS via providing incentives to participants to use the appropriate timing and network to upload the data. Data was offloaded to Bluetooth/WiFi gateways via using predictions on users’ calls and mobility. To maintain relatively good performance of MCS applications, a sufficient number of participants need to make contributions to sensing. In [24], the authors discussed about the different incentives for MCS from entertainment, service and monetary incentives, in which participants can be recruited in multiple sensing tasks.

Iv Big Data Storage and Communications

Caching CPS big data can lead to a reduction in the amount of traffic exchanged, which contributes to a better data management, lower latency and energy consumption. In this section, we discuss some of the solutions proposed for CPS big data caching and routing.

Iv-a Big Data Caching and Storage for CPS

The high amount of traffic driven by the popular use of mobile video and online social media applications along with the scarcity of backhaul resources have pushed researchers and mobile operators to find solutions. Content caching in CPS is of high interest, especially that a big proportion of the traffic load originates from fetching data from different sources such as databases, cache servers and network gateways [57]. Rather than caching data from the cloud, performing the caching at the edge of mobile wireless networks, such as base stations and user equipments, offers the advantage of better data management [58, 59]. For instance, in [57], Zeydan et al. proposed a big data proactive caching architecture that predicts popular content from users’ behavior and network characteristics to perform caching at the base stations. This has the advantage of backhaul offloading to the edge, so that data get closer to users, thereby enhancing users’ quality of experience and reducing latency. The proposed architecture is validated using dataset from a Turkish mobile operator, where it was shown that proactive caching can yield 100% user satisfaction by offloading 98% of the backhaul to the edge. The proactive caching at the edge is envisioned to solve big data management in future 5G networks, especially that base stations densification and acquiring new spectrum do not seem quite effective in terms of cost, scalability and flexibility [25, 57]. A new approach to reducing network traffic in telehealth systems was suggested in [60], where a filter is used inside the sensors which associates a scale to each record, to determine the data fields that need to be sent to the server.

Different works have been proposed to deal with challenges facing Hadoop for handling big data storage and processing [61, 62, 63]. For instance, in [62], the authors proposed a novel cache to store the intermediate data, which helps eliminate redundancy in storage and processing of the big data set, in addition to speeding up the performance of the system by fetching the data from the cache rather than running mapper functions. Dache, a data-aware cache framework for big-data applications, was suggested in [63] to accelerate the execution of MapReduce tasks. The transmitters of the tasks send their intermediate data to a cache manager, which is then used to fetch potential processing results.

Cache memory can help speed up CPS communications via increasing the execution speed and decreasing the time spent on memory access to fetch the required data. For instance, in medical CPS, a timely response from medical sensors to servers’ requests is required, where caching can be very useful. However, as mentioned in [64], cache misses are highly likely in CPS, especially when a task is interrupted by a higher priority task, triggering a cache interference cost. A potential solution to this problem is cache partitioning where different tasks with shared resources are isolated and assigned reserved partitions of a cache memory. This can be very useful for real-time CPS applications such as mobile-health, environmental monitoring, traffic surveillance, aerospace and so on, where reliability and predictability are of high importance. Moreover, with the large-scale network of CPS, temporal interferences from a large number of devices contending on shared resources, such as processor cores, buses, I/O devices, can significantly reduce the runtime of CPS and lead to unpredictable systems. Several algorithms and component-based approaches have been proposed to help reduce the temporal interferences of CPS [65, 66, 67].

Iv-B Big Data Communications for CPS

To accommodate the big data environment for CPS, building a resilient network infrastructure for the data-intensive applications and services has become essential [68]. The data collections, the data chunks distribution to data centers and the data delivery to intended users, necessitate a fast and reliable networking to bridge these stages together. For instance, machine-type communications (MTC), a class of technologies related to IoT and defined by the 3rd Generation Partnership Project (3GPP), are becoming more popular as we move towards a smart city. In an attempt to make homes smarter, Apple introduced the HomeKit, which uses Siri to control different things inside the home remotely using the iPhone, iPad and even AppleTV [69]. Samsung, on the other hand, introduced SmartThings that allow the automation of many home tasks using WiFi, Zigbee, Bluetooth low energy (BLE), and Z-Wave [70]. These new technologies allow all the devices to be interconnected and to be communicated together. Many of the MTC communications are envisioned to run over current cellular networks [71], providing MTC devices with ubiquitous coverage, global connectivity, reliability and security. However, this creates a set of challenges for cellular network providers [72], such as the inability to handle MTC traffic on networks optimized for human communications, excessive congestions due to signaling overhead, packet scheduling problems due to MTC traffic requiring a number of radio resources below the minimum allocated to a cellular device;, and the large interferences generated from MTC devices [73]. Different relevant works, such as [74, 75, 76, 77], suggested offloading the MTC traffic onto device-to-device (D2D) communications links. As a matter of fact, Google OnHub router can support direct D2D communications, as well as WiFi, BTL and 802.15.4 using Brillo as a stripped OS version of Android. The D2D technology provides an ideal solution to support the massive communications of MTC devices, especially that these devices are anticipated to be located close to each other [78]. The paper[79] discussed the integration of wireless sensor networks (WSNs) and mobile cloud computing (MCC) and proposed a sensory data processing framework to transmit desirable sensory data to the mobile users in a fast, reliable, and secure manner.

In [80], the authors proposed two different sustainable routing designs called SustainMe. The first model uses a dedicated backup protection in which network components are turned off after data is fully delivered to help save energy. The second model uses shared backup protection to achieve a trade-off between energy efficiency and capacity usage efficiency. The latter is shown to consume less capacity than the the first model, but at the expense of an increase in energy expenditure. In [81], the authors proposed a software defined network (SDN) routing for Hadoop to accelerate the speed of big data delivery through speeding up the MapReduce data shuffling. This is useful for time-sensitive big data applications. As a matter of fact, SDN can significantly improve big data applications, especially which separates the control and data planes, allowing the control plane to be logically centralized so that decisions and network optimization are performed efficiently via having a global view of the network. All these SDN’s features can make the big data acquisition, processing, transmission and storage much easier and faster. On the other hand, big data can also assist in efficiently designing and optimizing SDN functions through traffic engineering, cross-layer design, security and inter- and intra-data center communications [82].

In [68, 83], taking advantage of social and geographical community structures, routing, navigation services, and data delivery performance can all be improved. In [6], different big data routing algorithms for energy efficiency were reviewed, such as selecting routing paths for data centers that achieve the lowest energy costs, parallelizing data transfers by using multiple cores in data routing, and balancing big data traffic through a distributed adaptive routing algorithm that minimizes packet delivery.

In massive high-density environment, data collections and transmissions can be challenging tasks especially with the different data flow information, quantities and characteristics. In [84], cyclists’ data collections and transmissions were studied. Autonomous data collection using automated video analysis can be performed in order to overcome the need for costly manual data collection. For instance, a computer vision system can group users’ features based on spatial proximity data, speed, trajectory and others. The obtained information can be input to an algorithm that obtains valuable information such as traffic stream, number of cyclists, lane density and so on.

In high-density environment, it is important for data to be efficiently collected in a short period of time to ensure up-to-date information. For this purpose, concurrent data collection trees can be valuable where multiple data collection streams can be initiated by different users on same set of devices [85]. This can be extremely helpful for delay-sensitive applications. In [86], the authors proposed concurrent massive data transmission for remote real-time health monitoring system. An input/output(I/O) Completion Port (IOCP) main server dynamically binds Internet protocol (IP) addresses to massive sensors before they can send their data. The main server assigns one of the IOCP servers with the least number of concurrent connections to micro sensors using Oracle Real Application Cluster (RAC). Finally, data storage is separated from Oracle server instances using Network Attached Storage (NAS) technology to allow for greater I/O performance for massive concurrent connections. The paper [87] firstly analyzed the objective of massive access control for machine-type communications (MTC) or machine-to-machine communications (M2M), another terms of CPS used in 3GPP standardizations for wireless cellular networks, and proposed distribution reshaping to enable data arrival distribution with highly coordinated manner to notably reduce multiple access congestions.

Real-time data fusions are important for heterogeneous IoT/CPS data streams and unreliable networks with increasing data size. To support locally available computational values to help real-time analytics, the work [88] proposed the fusion of three different data models with relational, semantical, and big data based data and metadata. The paper [89] proposed two-layer architecture for IoT data analysis, where the first layer is with the service oriented gateway based generic interface to obtain data from multiple interfaces and IoT systems and store them scalably and make relevant real-time analysis to extract high-level activities, while the second layer works for the probabilistic fusion of these high-level activities.

Iv-C Summary and Insights

To summarize, different strategies can be employed to enhance CPS performance via speeding up data collection, processing and distribution. At the CPS traffic level, data collection and processing can be accelerated through caching at the edge, filtering the data to make it more manageable in size, caching execution tasks, and using cloudlets whenever possible. At the CPS devices level, the use of on-fly D2D, SDN, backup protection, and social/geographical community structures along with navigation services can help accelerate data delivery and reduce latency. The use of these technologies and solutions can significantly speed up data handling while at the same time extending the communication range of CPS traffic.

V Big Data Processing for CPS

Cloud computing along with data clustering facilitates the parallel processing and execution of tasks and queries. Mapping and scheduling workflows in a multi-cloud environment speeds up the processing and allows for a better big data management. In this section, we discuss cloud computing, big data clustering, NoSQL and fog computing for big data workflows processing.

V-a Cloud Data Processing

With the large volume of data in the order of exabyte, it becomes almost impractical to process the data on individual machines, no matter how powerful they are. Parallel processing of the data chunks on dedicated servers, such as MapReduce tool proposed by Google, offers advantages over conventional processing methods; however it is still not very effective to handle a large amount of data, mainly due to scalability, latency, availability, and inefficient programming techniques, including but not limited to database management systems [90, 91]. One attractive solution to dedicated servers is the processing on cloud centers, which offers users the ability to rent computing and storage resources in a pay-as-you-go manner [92]. In addition, even though users will be sharing a common hardware, the shared resources appear exclusive to them through machine virtualization via hiding the platform details [93]. However, this approach can create problems in the pay-as-you-go environment due to untruthfulness, unfairness and inefficiency of resources and workload transactions [94].

The main difference between parallelizing tools, such as MapReduce and cloud computing, is that MapReduce uses mappers and reducers to produce intermediate results and final results, respectively. However, public clouds offers users with virtual machines (VMs) with a highly elastic resource allocation [92]. The function of Map is in charge of processing input key-value pairs and generating intermediate key-value pairs; while Reduce function is used to further compress the value set into a smaller set based on the intermediate values with the same keys [95]. To maximize the query rate of remotely located data in an attempt to maximize system performance, the authors in [96] designed a dynamic resource allocation algorithm that takes into account the computations of query streams across the nodes and the limiting number of resources available.

Due to the volume and velocity characteristics of big data, streaming data processing and storage might require different compression techniques to ensure efficiency and scalability. Yang et al. proposed a novel low data accuracy loss compression technique for cloud data processing and storage. A similarity check was performed on partitioned data chunks and a compression is conducted over the data chunks rather than the basic data units [97]. Another similarity check-based compression technique was proposed in [98] using weighted fast compression distance.

Fig. 2: CPS cycle to automation.

Integrating cloud computing in IoT can take the processing of sensing data streams to the next level to provide ubiquitous sensing services beyond the capacities of individual things. When combined with artificial intelligence, machine learning, and neuromorphic computing techniques, it is envisioned that new applications will be developed with automated decision making, which would revolutionize the field of smart cities, industrial plants, environmental monitoring and others (see Fig. 2). Cloud computing will enable IoT applications to have a reduced latency, power consumption and enhanced scalability. Examples of such applications include but not limited to healthcare, where patients’ information can be accessed using a cloudlets-based infrastructure [99]. Cloudlets are clouds that are closer to users to help overcome the high latency and power consumption of distant clouds [100]. Other cloud application examples include vehicles traffic control system [101], genome analysis [102], earth surface analysis [103] and many others.

V-B Multi-Cloud Data Processing

In many IoT scientific applications, the data collection and generation, computation, processing and analysis are broken down into workflows, consisting of interdepending computing entities. Due to the data-intensive nature of IoT applications, the large-scale workflows need to be distributed across multiple cloud centers [104]. To allow the support of multiple applications and to overcome the limitations of current frameworks that are dedicated to a unique type of applications, the authors in [105] proposed a distributed application management framework in multi-cloud environment by using a domain specific language (DSL) to describe applications in a hierarchical manner. However, the inter-cloud communications constitute a big deal of the financial costs of processing workflows due to their large volume. In [104], to optimize system performance, Wu et al. proposed a budget-constrained workflow mapping in multi-cloud environment. An efficient pay-on-demand pricing strategy for streaming big data processing in multi-cloud environment was proposed in [106], to offer a low price for data load processing, while maximizing the revenues of cloud service providers. As for building trust across multiple cloud centers that collaborate together on data storage and processing, Li et al. in [107] proposed a trust-aware monitoring architecture between users and cloud centers with hierarchical feedback mechanism to enhance the robustness and reliability of the quality of service of cloud providers, which helps provide them with a rating based on their trust reputation. This allows users to select services from different cloud providers based on feedback and past service records.

In [108], Wang et al. optimized virtual machine (VM) placement in national cloud data centers to help minimize the energy consumption of data-intensive services, such as planet analysis. In these cloud centers, VMs are assigned to physical servers, which helps provide users with a high quality of service but at the expense of greater energy consumption. The trade-off between energy consumption and quality of service is taken into consideration in the optimization problem. The cost of data access and storage limitations of national cloud centers was considered in [109], where the authors used Lagrangian relaxation based heuristics algorithm to obtain the optimal data centers placement that can reduce data access costs.

The distribution of users’ tasks on geographically distributed cloud centers was addressed in [110], where the authors proposed a big data management solution to maximize system throughput such that fairness of limited resources usage by users is guaranteed and the operational cost of service providers is reduced. To allocate resources of multi-cloud centers to users, a multi-round combinational double auction based mechanism was proposed in [111], where auctions on different VMs were performed by both users and data centers in multiple rounds to maintain a high quality of service level.

V-C Big Data Clustering

Data clustering refers to partitioning a set of objects comprising of attributes into different groups of similar objects and features [112]. Data clustering becomes very useful in big data applications, where there is a high need to process and analyze large volume of data. Estimating the number of clusters becomes important as clustering facilitates the distribution of the data storage, tasks execution, parallel computing, and queries requests [113]. Fig. 3 shows two groups of clustering methods researched in depth in literature: i) hierarchical clustering, and ii) centroid-based clustering. In hierarchical clustering, nearby objects have higher probability of being grouped together than far away objects. On the other hand, in centroid-based clustering, objects closer to the cluster center are grouped together. The paper [114] discussed a flexible multiple clustering analytic and service framework, and a novel tensor-based multiple clusterings (TMC) approach.

One of the most popular centroid-based clustering is k-means due to its computational efficiency and low-complexity implementation. However, as the number of clusters increases, k-means clustering suffers from the empty clustering problem and the increase in number of iterations for convergence. This means that traditional k-means is not suitable for big data applications. Different works have suggested enhanced versions of k-means clustering for purposes of improving clustering quality, execution time, and accuracy. For instance, in [115], the authors used an enhanced version of the k-means clustering, where the initial centroids of the cluster are not selected randomly but based on averaging the data points. This achieves higher accuracy than conventional k-means. Another enhanced version of k-means clustering was suggested in [116], to help eliminate the empty clustering problem of traditional k-means. The clustering approach was based on a combination of Fireworks and Cuckoo-search algorithms with representative points being selected as the centroids. In [117], the authors also used a centroid calculation heuristics to help enhance the clustering performance as the number of clusters increases.

Fig. 3: An illustration comparing centroid-based clustering and hierarchical clustering (adapted from [118] and [119]).

In [112], for instance, the authors used hierarchical clustering in their proposed algorithm, where the number of clusters was estimated visually based on a reordered distance matrix. The data samples were clustered using single linkage (SL), which cuts large edges of the clustering tree in a minimum spanning tree (MST). The algorithm was shown to be superior in terms of clustering speed and accuracy when compared to other clustering algorithms like k-means. Another hierarchical clustering was suggested in [120], where users themselves define the number of clusters based on different similarity measures such as homogeneity and the relative population of each cluster. This helps improve user satisfaction of the clustering algorithm. A hierarchical k-means clustering algorithm was proposed in [121] to find high quality initial centroids. The centroids were obtained based on a hierarchical structure of k-means that consists of several levels, where the first level is the original dataset. Subsequent levels are compromised of smaller size datasets that consist of similar patterns as the original dataset. Fuzzy clustering, another clustering technique, is similar to k-means; however an object can be associated with more than a single cluster depending on its degrees of membership that are usually calculated based on Euclidean distances between the object and the data center [122].

Another big data clustering is clustering features (CF), where a CF-tree includes a summary of data patterns in a cluster. It uses a threshold preset by users to set the size of micro clusters in a CF-tree. However, this method can have time and scalability complexities, especially with the large number of data that needs to be scanned and assessed to construct the tree. Clustering based on summary statistics allows the threshold to be different for different micro clusters by dynamically adjusted it based on regions’ densities [123]. Graph clustering attempts to cluster data points based on network’s structure and nodes’ connections, where similarities between nodes allow them to be connected to build a community. An example would be building a Tweet graph for Twitter online social application where Tweets (nodes) with similarities (similar URL count, similar hashtag count, or similar username count) are connected together with edges [124]. Finally, incremental mining helps deal with the increasing network size and data growth challenges by incrementally updating the clusters without the need for reconstructing them from scratch.

In the context of CPS applications, [125] attempted to perform structuralized clustering for sensor-based CPS for purposes of reducing the energy consumption of the sensor network. The sensor network is basically structured into small equal units, each consisting of a set of clusters, whose number is carefully selected based on the cluster head workload and its distance to the base station to achieve energy efficiency. In [126], the authors tried to solve the cascaded subnetworks’ failures problem in CPS by obtaining an upper bound on the small clusters’ size to mitigate networks’ failure, especially when they are tightly coupled. In [127], the authors proposed a secure clustering method for vehicular CPS, where a trust metric is calculated for each vehicle based on their transmission characteristics in order to create secure clusters. Another secure clustering for vehicular CPS was proposed in [128], where the clustering problem is formulated as a coalition game taking into consideration the relative velocity, position and bandwidth of vehicles. Furthermore, incentives and penalty mechanisms are suggested to prevent selfish nodes from degrading the communication quality performance. A density-based stream data clustering for real-time monitoring CPS applications was suggested in [129], where the authors used FlockStream algorithm to group similar data streams. Each data point is associated with an agent, and similar agents within a visibility range of each other in the virtual space, form a flock allowing for real-time stream clustering. A summary of the different big data clustering techniques for CPS is provided in Table III.

Relevant Research Works Big Data Clustering Techniques
Hierarchical clustering
[120], [130], [131]
[121], [132]
[122], [133], [134]
[123]
[135], [136]
Clustering based on similarity measures
Hierarchical k-means solutions
Fuzzy clustering
Clustering based on summary statistics
Incremental mining
Centroid-based clustering
[112], [137]
[138]
[124], [139]
Minimum Spanning Trees
Consensus clustering statistics
Graph clustering/Spectral clustering
TABLE III: Summary of big data clustering techniques for CPS

V-D NoSQL

Conventional relational database management systems are not suitable for heterogeneous big data processing, as they consist of strict data model of pre-defined data structures and constraints with a fixed schema [140]. NoSQL (Not Only Structured Query Language) relaxes many of the relational databases’ properties such as ACID transactional properties to allow for greater querying flexibility, operational scalability and simplicity, higher availability and faster read/write operations of unstructured big data through replicating and partitioning the data across several nodes [141, 142].

NoSQL databases can store data in three different forms: key-value stores, document databases, and column-oriented databases [143]. In the document databases form, the data is stored in a complex structure form such as XML documents. Column-oriented databases store columns of data in data tables, allowing greater ease of adding and deleting columns compared to row-oriented databases.

Furthermore, two main classes for NoSQL systems: operational NoSQL systems (Cassandra, MongoDB, Oracle NoSQL), and analytical NoSQL systems which are based on MapReduce, Hadoop, and Spark. Operational NoSQL systems include online transaction processing (OLTP) systems, while analytical NoSQL systems include decision support systems (DSSs). The main difference between OLTP and DSSs is that the latter involves processing over larger tables and hence, involves complex queries processing (scanning, joining, and aggregating), while OLTP performs read/write operations for a smaller number of entries in the database [142]. For instance, in [144], the authors used NoSQL for big data workflows execution to improve the scalability, parallelism and execution compared to traditional MapReduce framework.

Advantages Disadvantages
MapReduce parallel processing dedicated resources to users inefficient at handling large data
Single cloud parallel processing
i) scalability and availability in handling
big data; ii) elastic resource allocation
i) shared resources; ii) users need to
pay to rent VMs; iii) untruthfulness
and unfairness in using resources
Multi-cloud parallel processing
i) Better quality of service; ii) overcomes
storage limitations; iii) maximizes system
throughput; iv) achieves fairness of limited
resources
i) requires VMs placement
optimization; ii) might be costly to
implement; iii) increases energy
consumption, iv) complicates data
access and management; v) requires
robust security solutions
TABLE IV: Comparisons among different parallel processing techniques

V-E Fog Data Computing

Cloud computing and services can be extended to the edge of network via the fog computing paradigm. Though both cloud and fog provide data computations, storage and application services to end-users, the distinguished features of fog from cloud include its proximity to end-users, the dense geographical distribution and its mobility support [145]. These features facilitate fog computing for latency-sensitive applications. Consider an example [146] of smart traffic lights and connected vehicles, where an ambulance flashing lights sensed by a video camera could automatically trigger street lights to open lanes for the ambulance to pass through the traffic. In smart grid scenario [145], the fog devices at the edge collect the data generated by grid sensors, real-time process the data, and issue control commands to the actuators. A fog-based intelligent decision support system was proposed in [147] for driver safety and traffic violation monitoring based on the IoT. A smart city speeding traffic surveillance scheme using fog computing paradigm was provided in [148], and its effectiveness was validated by intensive experiments conducted using real-world traffic surveillance video streams. Fog computing could make possible the early predictions of biomarkers to enable automated decisions making in a connected health scenario [149]. A fog computing system for e-Health applications was implemented in [150], where fog nodes were installed in home to achieve the smallest processing time. Disaster decision support systems, deployed with fog nodes, could process acquired real data and trigger alarms in case of an emergency [151]. An augmented brain computer interaction game based on fog computing and linked data was developed in [152]. Other exemplary scenarios [146] for fog computing could be wireless sensor and actuator networks, decentralized smart building control, software defined networks, and so on.

Beyond the latency-sensitive applications, fog computing could also offload the core network traffic and keep the sensitive data inside the network [2]. In [153], fog computing at smart gateways was shown to enhance the health monitoring system through the usage of advanced techniques such as embedded data mining, distributed storage and notification service at the edge of network. In [154], the authors proposed a high level programming model, called mobile fog, for future Internet applications that are geospatially distributed, large-scale and latency-sensitive. The placement and migration method for infrastructure providers was proposed in [155] to incorporate cloud and fog resources. Specifically, the network intensive operators were placed on distributed fog devices while computationally intensive operators were placed in the cloud. Some design goals and challenges in fog computing platform were described in [156], which also suggested several important components of a fog computing platform, i.e., authentication and authorization, offloading management, location services, system monitor, resource management, virtual machine scheduling.

The security and privacy issues were examined in the context of smart grids [28] and machine-to-machine communications [157], and so on. These security solutions for cloud computing may not be directly applicable to fog computing, as fog devices are placed at the edge of networks. One security issue of fog computing is the authentication at different levels of gateways. The compromise of gateways serving as fog devices could lead to the Man-in-the-Middle attack [158]. Since a certain amount of fog networks are connected through wireless, typical wireless attacks (e.g., jamming attacks, sniffer attacks, and so on) could be possible threats for fog computing [27]. Furthermore, the leakage of private information, such as data, location or usage could be the main concerns of end users.

V-F Summary and Insights

From Table IV, we can observe that users’ workloads can be more efficiently processed in a multi-cloud environment, especially that most CPS applications deal with a large volume of data. However, such a solution requires additional optimization techniques to optimize VMs placement for energy, security, fairness, and costs. We have discussed relevant issues in fog computing relevant issues, while there would be also more relevant big data processing issues in cloud computing and edge computing, such as relevant architectures and applications, distributed data analytical frameworks, cyber defense and cyber intelligence as well as convergence and complexity issues.

Vi Big Data Analytics

Big data analytics constitutes one of the most important arenas in big data systems, as it allows to uncover hidden patterns, unknown correlations and other useful information, which in turn assist in boosting the revenues for many businesses. In this section, we present an overview of relevant big data analytics techniques and tools. Readers may find some introductions of big data analytics in [159].

Vi-a Data Mining

One of the interesting features of CPS is the automated decision making. This means that CPS objects are supposed to be smart in sensing, identifying events and interacting with others [160]. The massive data collected by CPS needs to be converted into useful knowledge to uncover hidden patterns to find solutions, enhance system performance and quality of services. The process of extracting this useful information is referred to as data mining. One solution to facilitate the data mining process is to reduce data complexity via allowing objects to capture only the interesting data rather than all of it. Before data mining can be applied to the data, some processing steps need to be completed such as key features selection, preprocessing and transformation of data. Dimensionality reduction is one potential method to reduce the number of features of the data [161]. For instance, in [162], Chen et al. used neural network with k-means clustering via principal component analysis (PCA) to reduce the complexity and the number of dimensions of gene expression data to extract disease-related information from gene expression profiles. Knowledge discovery in databases (KDD) is also used in different CPS scenarios to find hidden patterns and unknown correlations in data so that useful information can be converted into knowledge [163]. One such use of KDD is in smart infrastructures systems, where these systems need to answer queries and make recommendations about the system operation to the facility manager [164].

Tsai et al. [5] broke down the core operations of data mining into three main operations: data scanning, rules construction and rules update. Data scanning is selecting the needed data by the operator. Rules construction includes creating candidate rules by using selection, construction and perturbation. Finally, candidate rules are checked by the operator, then evaluated to determine which ones will be kept for the next iteration. The process of scanning, construction and update operations is repeated until the termination criteria is met. This data mining framework works for deterministic mining algorithms such as k-means, and the metaheuristic algorithms such as simulated annealing and genetic algorithm.

Clustering, classification and frequent pattern are different mining techniques that can be used to make CPS smarter. Clustering methods have already been discussed in Section V-C. Tsai et al. [5] discussed about two different purposes for clustering: i) clustering for infrastructure of IoT, and ii) clustering for services of IoT. Clustering for infrastructure of IoT helps enhance system performance in terms of identification, sensing and actuation, such as in [165], where nodes can exchange information between each other to identify whether they can be grouped together depending on the needs of the IoT applications. As for services of IoT, clustering can help provide higher quality services such as in smart homes [166]. On the other hand, classification does not require prior knowledge to complete the partitioning of objects into clusters, also known as unsupervised learning. Classification tools include decision trees, k-nearest neighbor, naive Bayesian classification, adaboost and support vector machines. Classification can also be done to improve infrastructure as well as services of IoT. Finally, frequent pattern mining is about uncovering interesting patterns such as which items will be purchased together with previously purchased items, or suggest items for customers to purchase based on customer’s characteristics, behavior, purchase history, and so on. Fig. 4 illustrates the CPS big data mining process for useful information extraction.

Fig. 4: CPS big data mining process.

The paper [167] proposed NextCell algorithm to improve the location prediction via utilizing the social interplay mined in cellular call records.

Vi-B Real-time Analytics

Real-time analysis is another approach to produce useful information from massive raw data. Real-time streams data are first converted to a structured form data before being analyzed by big data analysis tools such as Hadoop. Many application domains such as healthcare, transportation systems, environmental monitoring, and smart cities will require real-time decision making and control [12]. For example, Twitter data can be real-time analyzed to enhance the prediction process and to provide useful recommendations to users [168]; terrorist incidents data can be real-time analyzed to predict future incidents [169]; big data stream in healthcare can be analyzed to help medical staff make decisions in real-time, which can help save patients’ lives and improve the healthcare services provided, while reducing medical costs [170]. Near real-time big data analysis architecture for vehicular networks was proposed in [171], which consists of a centralized data storage for data processing and a distributed data storage for streaming processed data in real-time analysis.

In [172], a real-time hybrid-stream big data analytics model was proposed for big data video analysis. The paper [173] considered the online network analysis as a stream analysis issue and proposed to utilize Spark Streaming to monitor and analyze the high-speed Internet traffic data in real-time. The work [174] proposed mobile edge computing nodes deployed on a transit bus with descriptive analytics to explore meaningful patterns of real-time data streams for transit. The paper [175] discussed the term and related concepts of Real Time Analytics (RTA) for industry big data analytical solutions. This paper [176] provided a framework to efficiently leverage big data technology and allow deep analysis of large and complex datasets for real-time big data warehousing.

Arranging the data in a representative form can provide information visualization, which makes the information extraction and understanding of complex large-scale systems much easier [177]. Geographical information systems (GIS) is one important tool of visualization [178], as it can help real-time analysis of many applications such as in healthcare, urban and regional planning, transportation systems, emergency situations, public safety, and so on. In [178], the authors proposed a large-scale system data visualization architecture called X-SimViz, which allows users for real-time dynamic data analytics and visualization. Computer vision is another approach to detecting security anomalies. Visualization can also be useful tool in predicting real-time cyber attacks. For instance, in [130], the authors used computer vision to transform the network traffic data into images using a multivariate correlation analysis approach based on a dissimilarity measure called Earth Mover’s Distance to help detect denial-of-service attacks. A computer vision deep learning algorithm for human activity recognition was proposed in [179]. The model is capable of recognizing twelve types of human activities with high accuracy and without the need of prior knowledge, which is useful for security monitoring applications.

Vi-C Cloud-Based Big Data Analytics

Cloud-based analysis in CPS constitutes a scalable and reliable architecture to perform analytics operations on big data stream, such as extracting, aggregating and analyzing data of different granularities [180]. A massive amount of data are usually stored in spreadsheets or other applications, and a cloud-based analytics service, using statistical analysis and machine learning, helps reduce the big data to a manageable size so information can be extracted, hypothesis can be tested, and conclusions can be drawn from non-numerical data such as photos. Data can be imported from the cloud and users are able to run cloud data analytics algorithms on big datasets, after which data can be stored back to the cloud [181]. For instance, in [182], the authors used cloud computing using MapReduce algorithm to conduct analysis on crime rates in the city of Austin using different attributes like crime type and location to help build a design that prevents future crimes for public safety.

Even though cloud computing is an attractive analytics tool for big data applications, it comes with several challenges, mainly concerning security, privacy and data ownership, which will be discussed further in Section VII. In [99], the authors extended the use of clouds to mobile cloud computing to help overcome the challenge of resources limitations such as memory, battery life and CPU power. A mobile cloud computing architecture was suggested for healthcare applications with discussion on various big data analytic tools available. In [183], the authors suggested using a hybrid cloud computing consisting of public and private clouds to accelerate the analysis of massive data workloads on MapReduce framework without requiring significant modifications to the framework. In a private cloud, cloud services delivered over the physical infrastructure are exclusively dedicated to the tenant. The hybrid cloud uses a set of virtual machines running on the private cloud, which take advantage of data locality, and another set of virtual machines run on a public cloud to run the analysis at a faster rate.

To optimize the utilization of cloud computing resources, predicting the expected workload and the amount of resources needed becomes important to reduce waste. In [184], the authors developed a system that predicts the resources requirements of a MapReduce application to optimize bandwidth allocation to the application, while, in [185], the authors used linear networks along with linear regression to predict the future need of new resources and VMs. When the system fails short in predicting the right amount of resources needed, it becomes incapable of accommodating a high workload demand, leading to anomalies. Anomalies detection is an essential part of big data analytics, as it helps improve the quality of service via checking whether the measurements of the workload observed and the baseline workloads diverge by a specific margin, where the baseline workloads provide a measure on how the demand changes during a period of time based on historical records [186].

Vi-D Spatial-Temporal Analytics

Massively data obtained from widely deployed spatio-temporal sensors have caused grand challenges on data storage, process scalability, and retrieval efficiency. The paper [187] proposed distributed composite spatio-temporal index approach VegaIndexer for efficiently processing the large amount of spatio-temporal sensor data. The paper [188] investigated the big data issues in Internet of Vehicles (IOV) applications and proposed to use clouding based big data space-time analytics to enhance the analysis efficiency. The paper [189] proposed STAnD to determine anomaly patterns for potential malicious events within these spatial-temporal data sets. Massively data obtained from widely deployed spatio-temporal sensors have caused grand challenges on data storage, process scalability, and retrieval efficiency. The spatial distributed CPS nodes also can be used to analyze location information. The paper [190] proposed an efficient indoor positioning based on a new empirical propagation model using fingerprinting sensors, called regional propagation model (RPM), which is based on the cluster based propagation model theory, and then the paper [191] used particle swarm optimization (PSO) to estimate the location information via Kalman filter to update the initial estimated location.

Vi-E Big Data Analytical Tools

In this section, some typical tools are briefly introduced for aforementioned three methods of big data analytics, namely data mining, real-time big data analytics and cloud-based big data analytics.

Vi-E1 Tools for Data Mining

Hadoop [22] is an open source managed by the Apache Software Foundation. There are two main components for Hadoop, namely HDFS [14] and MapReduce [15]. HDFS is developed from an inspiration of GFS [19], and it is a scalable and distributed storage system, which is an appropriate solution for data-intensive applications, such as Gigabyte and Terabyte scale. Rather than just being a storage layer of Hadoop, HDFS is also beneficial to throughput improvement of the system and it supplies efficient fault detection and automatic recovery. MapReduce is a framework which is used to analyze massive data sets in a distributed fashion by means of numerous machines [192]. There are two functions in the mathematical model of MapReduce including Map and Reduce, both of which are available to be programmed. is also an open-source software environment for data mining developed by AT&T Bell Labs [16]. Actually, is a realization of the S language used to explore data, implement statistical analysis and draw plots. Compared with , is more popular and supported by a large number of database manufacturers, such as Teradata and Oracle.

Vi-E2 Tools for Real-Time Big Data Analytics

Storm [16, 17] is a distributed real-time computing system for big data analysis. Compared with Hadoop, Storm is easier to operate and more scalable to provide competitive and efficient services. Storm makes use of distinct topologies for different storm tasks in terms of storm clusters, which are composed of master nodes and worker nodes. The master nodes and worker nodes play two kinds of roles in the fields of big data analysis, namely nimbus and supervisor, respectively. The functions of these two roles are in agreement with jobtracker and tasktracker of the MapReduce framework. Nimbus takes charge of code distribution across the storm cluster, the schedule and assignment of worker nodes tasks, and the whole system surveillance. The supervisor compiles tasks given by nimbus. Splunk [18] is also a real-time platform designed for big data analytics. Based on the web interface, Splunk is available to search, monitor and analyze machine-generated big data, and the results are exhibited in different varieties including graphs, reports, alerts and so on. Compared with other real-time analytical tools, Splunk provides various smart services for commercial operations, system problem diagnosis, and so on.

Fig. 5: An illustration of CPS taxonomy.

Vi-E3 Tools for Cloud-Based Big Data Analytics

As the most popular tool for cloud-based big data analytics, Google’s cloud computing platform [193] consists of GFS [26] (big data storage), BigTable [20] (big data management) and MapReduce (cloud computing), which was discussed in the previous section. GFS is a distributed file system and it is enchanced to meet the requirements of big data storage and usage demands of Google Inc. In order to deal with the commodity component failure problem, GFS facilitates continuous surveillance, errors detection and component faults tolerance. GFS adopts clustered approach that divides data chunks into 64-KB blocks and stores a 32-bit checksum for each block. BigTable supplies highly adaptable, reliable, applicable and dynamic control and management in the field of big data placement, representation, indexing and clustering for enormous and distributed commodity servers, and it constitutes of a row, column, record tablet and time stamp.

Vi-F Summary and Insights

To better extract information from big data, it is of utmost importance to enhance cloud’s analysis performance. A combination of different techniques discussed in this section can be used to optimize cloud computing resources. If VMs and cloud’s resources and requirements can be predicted beforehand, then workloads can be efficiently processed and analyzed by taking advantage of cloud analytical tools previously discussed. Furthermore, using a hybrid cloud can further speed up the analysis of workloads, leading to reduced latency and efficient data mining.

Vii Big Data Cybersecurity and Privacy

In the realm of cyber physical systems, the tight interaction among physical objects which collect and transmit a large volume of data place security threats under the spotlight of attention. With this enormous amount of data that are constantly flowing through the network, it becomes essential to protect the system from cyber attacks[194]. In this section, we provide an overview of the different security solutions proposed for big data storage, access and analytics (see Table V for a summary).

Vii-a Security in Big Data Storage and Access

While data storage in the cloud offers several advantages in terms of data storage, availability, scalability and processing, it increases the chance of malicious attacks, that in addition to potential privacy invasion by cloud operators who can have accesses to sensitive data. All this puts a question mark on whether cloud data storage is feasible, especially for governmental agencies and financial industries. Several works have attempted to solve the security challenges of cloud storage. For instance, Gai et al. proposed a method that splits files into encrypted parts and store them in distributed cloud servers without users’ data being directly reached by cloud service operators [195]. In [196], the authors optimized the data placement on cloud servers that minimizes the retrieval time of data files while guaranteeing their security based on the distance between nodes that store the data chunks, such that the malicious attacker cannot guess the locations of all the data chunks. In [197], the authors suggested that data should be encrypted and decrypted before being sent to clouds. The paper [198] addressed iOS devices as case studies and stressed the possible applications for pairing mode in iOS devices, which supports a trusted relationship between an iOS device and a personal computer, for covert data exfiltration.

When data need to be transferred from one cloud to another, data privacy and integrity become important. In [199], Ni et al. proposed a secure data transfer scheme where users encrypt the data blocks before uploading to the cloud. When transferring from one cloud to another, a security protocol was described using secret keys and a signature checking with polynomial-based authentication was performed without retrieving data from the source cloud. While the mentioned works considered encryption as a way to protect the data from privacy violations, encryption introduces a new challenge: cloud data deduplication, especially when data is shared among many users. Even though deduplication can save up to 95% in terms of needed storage for backup applications [200], and 68% for standard file systems [201], it wastes resources, and consumes energy, and makes the data management very complicated. In [202], Yan et al. attempted to solve the deduplication problems via proposing a scheme, where the users upload the encrypted data to the cloud along with a token for data duplication check, which is then used by the cloud service providers to check whether the data has already been stored. A scheme to verify data ownership was presented to ensure secure data management.

Vii-B Security in Big Data Analytics

Enabling security and privacy aspects of big data analytics has attracted a great attention from the scientific community mainly due to different reasons. First, the data are more likely stored, processed and analyzed in several cloud centers leading to security issues due to the the random locations of data. Second, big data analytics treats sensitive data in similar way to other data without taking security measures such as encryption or blind processing into consideration [203]. Third, big data computations need to be protected from malicious attacks in order to preserve the integrity of the extracted results. In the realm of CPS, an enormous amount of data make the surveillance of security-related information for anomaly detection a challenging task for analysts. In healthcare, for instance, the security issues of information extraction from massive amount of data and accurate analytics are of high importance. Sensitive data recorded in databases need to be protected via monitoring which applications and users get accesses to the data [204]. In order to guarantee a strong secure big data analytics, the following tasks can be performed [205]:

  • Surveillance and monitoring of real-time data streams,

  • Implementation of advanced security controls such as additional authentication and blocking suspicious transactions,

  • Anomaly detection in behavior, usage, access and network traffic,

  • Defending the system against malicious attacks in real-time,

  • Adoption of visualization techniques that give a full overview of network problems and progress in real-time.

In [206], the authors tackled the big data analytics in mobile cellular networks based on random matrix theory, where big data is represented in matrix form of size , where is the number of data samples of a random vector , and is the number of independent realizations of . For cellular networks, big data manifests as big signaling data consisting of

  1. a large number of control messages to ensure reliability, security and efficiency of communications,

  2. big traffic data which require traffic monitoring and analysis to balance network load and optimize system performance

  3. big location data generated by GPS sensors, Bluetooth, WiFi and so on, to assist in different areas such as transportation systems, public safety, crime hot spots analysis and so on,

  4. big radio waveforms data emanating from 5G massive MIMO systems to estimate users’ moving speed for purposes of finding correlation among transmitted signals as well as assist in channel estimation,

  5. big heterogeneous data such as data rate, packet drop, mobility and so on that can be analyzed to ensure cybersecurity.

The work [207] proposed a secure high-order clustering algorithm via fast search density peaks on hybrid cloud for industrial Internet of Things.

Machine learning, among other tools, offers a promising solution to automate many of the above mentioned security-related tasks, especially with the continuous growth of the flowing data in terms of scale and complexity. Through the process of training datasets, machine learning makes possible the detection of future security anomalies via detecting unusual activities in the network traffic. To achieve a higher accuracy, a large volume of training datasets are needed, but this would be at the cost of added overhead and storage constraints. The process of training can be supervised, unsupervised or semi-supervised, depending on whether the outcome of a particular dataset is already known. Particularly, the system starts via classifying similar datasets into clusters to determine their anomaly. A human analyst can then explore and identify any unusual data. The outcome found by the analyst can then be fed back to the training system in order to make it more “supervised”[208]. This has the potential in enabling the training system adapt to new forms of threats without human intervention, so actions can be immediately taken before actual damages occur.

Different approaches for anomaly detection exist in literature such as discretizing the continuous domain into different dimensions such as in the surveillance system in [209], where the author partitioned the surveillance area into a square grid where the positions and velocities of the moving objects falling in each cell are modeled by a Poisson point process. Another approach is the multivariate Gaussian analysis in which data are flagged as abnormal when they lie a number of standard deviations away from the mean. For instance, in [210], the authors used multivariate Gaussian analysis to detect Internet attacks and intrusions via analyzing the statistical properties of the IP traffic captured. In clustering methods such as k-means clustering, data points can be grouped into clusters based on their distance to the center of the cluster. Then, if some data point lies outside of the group cluster, it is considered as an anomaly. The authors in [211] used kernel k-means clustering with local-neighborhood information to detect a change in an image by optimally computing the kernel weights of the image features such as intensity and texture features. As for the artificial neural network approach, one implementation of such a model is the autoencoder, also known as replicator neural network, which flags anomalies based on calculations of the difference between the test data and the reconstructed one. This means that if the error between test and reconstructed data exceeds a specified threshold, then it is considered far away from a healthy system distribution [212]. An example of such an approach is given in [213], where the authors used the autoencoder as a high accuracy and low-latency model to detect anomalies in the energy consumption and operation of smart meters.

Privacy preserving data analytics and mining can be quite challenging tasks since analyzing encrypted data is an inefficient, costly and non-straightforward solution. Homomorphic encryption is one of the solutions proposed to enable analytical operations to be performed on ciphertexts using multiple mathematical operations [214, 215]. For instance, in [216], the authors used Efficient Privacy-preserving Outsourced calculation with Multiple keys (EPOM) homomorphic encryption to encrypt data before sending to cloud. The cloud then uses ID3 algorithm to perform data mining on encrypted data. The algorithm uses a hierarchical tree decision to determine which attributes from a set of samples provide the best prediction or information gain. Another approach for processing mining on encrypted data in cloud using homomorphic encryption was suggested in [217]. A cloud service provider (CSP) collects and stores encrypted data, while a server, referred to as Evaluator, collaborates with CSP to perform mining over encrypted data. A miner submits encrypted mining queries to CSP which in turn computes inner product between vectors to determine the frequency of the mining itemset without CSP and Evaluator having access to the sensitive data. However, homomorphic encryption can be computationally expensive and impractical for big datasets [218].

One potential approach to protect private data during the analytic processes is the k-anonymization proposed in [219]. First, users who access the data need to be authenticated and authorized based on the level of shared results’ privacy. Then, a list of hashed and primary identifiers is generated to act as a data filter for the information that can be accessible by the authorized user. K-anonymization is then applied on the personally identifiable columns in the dataset in order to generalize or suppress values in the output dataset. The result is an k-anonymized list on which analysis and mining can be performed on the authorized accessible data.

Although k-anonymization appears to be a promising adaptable approach to privacy preserving data analytics independent of the underlying processes, its feasibility in the big data contexts is possibly not further evaluated for data types and computational time. Furthermore, k-anonymization can be problematic if different datasets contain same sensitive values [220]. One approach that attempts to solve this problem is the cosine similarity computation protocol suggested in [218, 221]. The proposed approach allows for larger datasets scalability for both binary and numerical data types in a time-efficient manner. The idea is to allow data to be shared without disclosing the sensitive information to unauthorized users. This can be done via computing the scalar product between different vectors of numerical values, such as calculating the cosine of the angle between them. Having the result closer to 1 indicates that vectors are more similar to each other.

On the other hand, many of the big data applications have hierarchical structures in nature, and thus require hierarchical privacy preserving solutions. For instance, in [222], hierarchical cloud and community access control can be implemented to strengthen privacy preserving in smart homes and smart meters. The home controller, which protects household personal data, is connected to a cloud platform through a community network, which provides privacy preserving solutions to homes through data separation, aggregation and fusion. The cloud combines the access control schemes for homes and community in more complex and stronger privacy protection process.

Vii-C Summary and Insights

From Table V, it is clear that a single security solution is not sufficient to ensure a robust system against attackers. It is quite necessary to incorporate different strategies to face security flaws that stem from poor systems designs. For instance, it makes sense to combine advanced security controls with cryptography to guarantee that only legitimate users have accesses to data. However, such a solution might not work well for delay-sensitive CPS applications such as e-health systems, especially in the cases that the small sensors have limited computational capabilities. For such applications, it is more efficient to employ visualization detection along with machine learning-based anomaly detection techniques to protect users’ sensitive data.

Literature Security Solutions Advantages Disadvantages
Key
Encryption
and Digital
Signatures
[26][195][196] Secure data placement on cloud
i) ensures users’ data protection on
clouds; ii) does not require complex
signatures and keys; iii) provides data
privacy at users’ side
i) data might need to be encrypted/decrypted
before transferring to clouds;
ii) can have a long data retrieval time;
iii) not suitable for making real-time network
decisions; iv) not very robust as attackers can
guess data’s locations; v) service providers can
have access to users’ data; vi) complicated data
management and access
[195][197][199][202][223][215] Cryptographic Solutions
i) ensures information protection, data
privacy and integrity; ii) works in
conjunction with other security
solutions for more robust security
i) costly in terms of computational overhead;
ii) can have a large power consumption;
iii) increases latency;
iv) inefficient for many CPS devices with limited
computational capabilities; v) works only for
fixed malicious behaviors
Data Access
Permission
Restrictions
[50][204][224][225] Advanced security controls
i) selective access control;
ii) can be used in conjunction with
cryptography for a more robust security;
iii) good for real-time data streams;
iv) provides additional layer of protection
i) difficult to access for legitimate users;
ii) needs to be combined with other solutions;
iii) not sufficient by itself to protect users’ data;
iv) might increase latency from accessing the data
Visualization
Detection of
Anomalies
[130][208][210][211] [212][213][224][226][227][228][229][230] Anomaly Detection
i) does not require costly solutions
such as signatures and keys;
ii) works for a variety of attacks with
varying malicious behaviors;
iii) can detect abnormal traffic patterns
with no associated signature
i) requires a large dataset for training;
ii) costly in terms of overhead and storage of
training data; iii) might require human analysts’
intervention to make the detection more
textbackslash supervise
[179][231][130][209][211][232] Visualization Techniques
i) ideal for monitoring real-time traffic;
ii) provides global overview of network
problems
i) not accurate if traffic patterns are misclassified;
ii) requires complex statistical analysis of traffic;
iii) might need to be supervised by humans;
iv) might require prior knowledge for a better
intrusion detection
TABLE V: Summary of security solutions proposed for CPS

Viii Big Data Meet Green Challenges for CPS

There are two view directions for big Data meeting green challenges for CPS. Firstly, we address greening big data systems for CPS, including CPS big data collection/storage, computing and processing. Secondly, we discuss big data with CPS toward green applications. Table VI shows different sustainable applications, along with challenges and solutions for a greener CPS. Table VII provides a summary of these solutions proposed in different research papers, while Table VIII groups them into three different categories: green data management, green architectures and green software solutions.

Sustainable CPS
Applications Challenges Solutions
Environmental Monitoring
- Air/Water pollution
- Oil spills pollution
- Noise pollution
- Limited communications
between subnetworks
- Storage limitations
- Provide relay incentives to extend communication  [24, 56, 128],
[145, 233]
-Minimize the number of relay transmissions [233]
- Data reduction techniques(aggregation, compression, network
coding) [57], [58, 59], [60], [61, 62, 63],[184], [234],[235],[236],
[237], [238]
Power Management
- Smart Grids
- Green buildings
- Green infrastructure
- Renewable energy sources
- Redundant transmission
links
- Deduplication
- Redundant information in
data collection
- Remove redundant transmission links using energy-efficient
topology control algorithm and clustering [125],[126],[127],[128],[129],
[239], [240], [241],
- Employ data duplication checks [202]
- Employ compressive sensing- based data collection [242],[243]
Green Transportation
- Fuel management
- Electric vehicles
- Sustainable transport
- Transport planning
- Increased energy expenditure
in cloud data centers
Increased power consumption
of networked devices
- Large volume of exchanged data
between cloud centers
- Efficient utilization of resources [99],[184],[185]
- Integrate energy harvesting from renewable sources [244],[245]
- Sleep scheduling for unused data centers [30],[245]
- Employ energy-efficient hardware(DVFS) [29], [246], [247], [248]
Green Energy
- Energy harvesting
- Energy conservations
- Cloud data centers distant from
users
-VM placement in cloud centers
- Increased energy consumption
in case of hardware/software
failures
- Design energy-efficient cloud data centers [30]
- Activity scheduling (switch devices to sleep mode) [245]
- Power control and adjustment of networked devices [247, 248]
- Design architectures for power saving[30],[246]
- Energy-efficient routing and traffic engineering techniques[56],[80],
[249], [250],[251],[252],[253],[254], [255],[256]
- Employ advanced communication technologies (MIMO)
- Employ cloudlets[99], [100]
- Energy-efficient VM techniques (VM migration, VM placement,
VMallocation) [108],[257]
- Reduce unnecessary executions using backups [258],[259],[260],[261]
TABLE VI: Sustainable CPS: applications, challenges and solutions.

Viii-a Greening big data for CPS

Viii-A1 Green Data Collections and Communications

With a massive number of interacting objects, gathering sensed data poses a challenge in terms of energy consumption, mainly due to the limited communication range between subnetworks, necessitating that objects act as relays for surrounding objects in order to extend their communications. This affects the lifetime of objects since each object needs to relay large volume of data generated by its neighbors [233]. In this context, different solutions have been proposed for different big data applications. In [233], Takaishi et al. proposed energy efficient solutions for data collection in densely distributed sensor networks. In an attempt to reduce the number of relay transmissions needed, sensor nodes transmit their data to a data collector node, the sink node, when they become close in proximity to it. Therefore, it becomes important to figure out the trajectory that the sink node needs to follow the nodes’ information such as location and residual energy, as well as the cluster formation in order to reduce the energy consumption of data collection. Data compression technology is another solution to help deal with challenges of data storage, collection, transmission, processing, and analysis. For instance, in [262], the authors proposed a highly efficient lossy data compression based on smart meters’ load features, states and events with a small reconstruction error. ZIP-IO compression technique was proposed in [263] using FPGA as a potential implementation framework. Video compression is another important tool in big data for surveillance applications. The authors in [264] proposed a background-based coding optimization algorithm that uses the residual gradient and the block edge differences to improve picture quality while achieving a high level of compression.

To achieve higher power savings for machine-to-machine (M2M) or machine type communications (MTC), this paper [265] proposed improved approaches for M2M devices, radio access networks and core networks with simplified activities under optimized signaling flow without introducing negative impacts on legacy human-to-human (H2H) terminals. In wireless cellular networks, when mobile terminals can directly communicate each other without the aids of central stations or base stations, the communications in those scenarios are called device-to-device (D2D) communications [266], which is another term of CPS in those scenarios. The paper [267] studied energy-efficient power control for D2D communications underlaying cellular networks with multiple D2D pairs and co-channel interference caused by resource sharing. The work [268] proposed an architecture using D2D multicast for energy efficient content delivery in cellular networks. Removing redundant transmission links can also reduce energy consumption while increasing network capacity. Topology control algorithms such as local minimum spanning tree (LMST) [239] and Local Tree-based Reliable Topology (LTRT) [240] have low computational complexities and can help obtain the best logical topology which can be beneficial for energy efficient data collections. In addition to removing redundant transmission links, removing redundant data can help save energy, such as in [242], where a compressive-sensing-based collection framework was proposed for reducing data redundancy and saving energy. To implement this solution, an online learning module predicts the amount of data (principal data) that needs to be collected for compressive sensing. This means that the principal data are supposed to represent the whole big data using the compressive sensing technique. Then, each node can locally tweak the collection strategy dynamically depending on neighbors status, residual energy, and link quality. Nodes’ clustering can also contribute to energy saving via reducing the number of data collection and transmissions, such as fan-shaped clustering proposed in [241] for large-scale networks with energy efficient selection of a cluster head and relay node. Other clustering algorithms have existed in literature, such as the well-known Low-Energy Adaptive Clustering Hierarchy (LEACH) [269], which can be useful for big data wireless sensor networks. The work [270] adopted an energy-efficient architecture for Industrial Internet of Things (IIoT) with a sense entities domain, RESTful service hosted networks, a cloud server, and user applications to balance the traffic load and support a longer lifetime of the whole system. The paper [271] mainly studied two important issues: 1) performing cyber-physical systems (CPS) communications over cellular networks with ubiquitous coverage, global connectivity, reliability and security, 2) offloading a proportion of CPS traffic to small cells and freeing more network resources to other users.

Viii-A2 Greening big data computing

While cloud centers are becoming an important aspect of big data computing to process data chunks in parallel, they contribute to a high energy expenditure, leading to increased costs and maintenance [272]. Therefore, research efforts are shifting towards creating sustainable big data computing techniques. Shojafar et al. proposed a job scheduler between servers called MMGreen to reduce energy consumption of computing in cloud data centers [246]. The MMGreen architecture is composed of physical servers hosting VMs connected to a front-end component that manages the incoming workload. Scheduler jobs that use dynamic voltage and frequency scaling (DVFS) technique for energy efficient servers include static scheduler whose power consumption is independent of clock rates and usage, and sequential schedulers which attempt to minimize reconfiguration costs via performing offline resource provisioning via predicting future workload information [29]. The paper [273] proposed 1) thermal-aware and power-aware hybrid energy consumption model jointly including the information of computing, cooling, and migration energy consumption, 2) a tensor-based task allocation and frequency assignment model to reflect the relationship among different tasks, nodes, time slots, and frequencies, 3) a thermal-aware and DVFS-enabled big data task scheduling algorithm for the energy consumption reduction of data centers.

In [246], the authors described different techniques to reduce energy consumption such as DVFS to reduce VMs’ frequencies and real-time adjustment of VMs frequencies processing and switching while maintaining quality of service to users. In [247, 248], M2M power savings were optimized via reducing the execution frequency of some activities without negatively impacting the human-to-human communications.

Besides targeting the energy efficiency of cloud servers, network devices also need energy efficient solutions, since they also contribute to the total energy expenditure of the cloud data center. Traffic engineering is one solution for this problem, which takes advantage of the traffic prediction to turn off network devices such as switches during idle periods in order to reduce power consumption [249, 250, 251, 252]. Traffic engineering techniques, such as the software defined networking (SDN)-based traffic engineering [253], allow the network devices to dynamically adapt to current workload. One problem with traffic engineering techniques is that the predicted traffic pattern might not be accurate due to the variability of big data applications running in the data center. This makes the network configuration suffer from frequent oscillations, since the network configuration needs to be updated frequently leading to performance degradation [254]. To take into consideration this time-varying aggregate traffic load, one proposed solution is to include flow deadlines to measure the speed at which requests’ responses are delivered to users. This allows the design of energy-efficient scheduling and routing for data center networks [254, 255, 256]. Another solution to enhance the inaccurate traffic engineering techniques is to take into consideration the unique features of data centers such as regularity of the topology, VM assignment, application characteristics. Such a framework was proposed in [257], where the energy-saving problem was solved in two steps: i) a VM assignment algorithm that integrates application characteristics and network topology to better understand traffic patterns for energy efficient routing, and ii) an algorithm that minimizes the number of switches and balances traffic flow among them. Experimental results showed a 50% energy savings using the proposed framework.

Viii-A3 Green Processing

An energy-efficient orchestrator for smart grid applications was proposed in [30]. The green orchestrator coordinates sustainability between smart grids and big data enterprises from green infrastructure (data centers) to running green frameworks such as Hadoop MapReduce. The orchestrator’s main components are: i) a green lesser to establish a per-job service-level agreements (SLA) that takes into account the available power, the power consumption statistics of jobs, the network and server states; ii) a pre-execution analyzer that executes jobs based on their power consumption statistics; iii) a network and server states predictors; iv) a network traffic analyzer which helps eliminate redundant traffic using traffic engineering techniques; v) a VMizer that intelligently places VMs such that some nodes are put to sleep; (vi) a pizer that schedules and places processes to a subset of clusters such that system resources are efficiently utilized; and (vii) a post-execution analyzer to analyze the energy profile of completed jobs.

Another green big data processing architecture is checkpointing aided parallel execution (CAPE), which uses checkpoints that save the sates of processes to avoid restarting unnecessary executions from beginning in case of hardware or software failures [258, 259, 260]. CAPE also allows threads of a shared-memory program to be executed in parallel on a distributed memory architecture rather than a shared memory architecture. The CAPE architecture can lead to energy saving since if the execution period of processors is short, they can go to idle mode rather than staying active for the whole execution of the program. This makes it beneficial in processing big data in CPS applications [260, 261].

Another approach to green big data processing is the efficient utilization of network resources by reducing the volume of communications that need to be exchanged in cloud data center networks. For instance, Asad et al. proposed spate coding for the purpose of reducing the amount of exchanged data, but without compromising the rate of information exchange [234]. Spate coding incorporates both index coding and network coding, and uses side information originating from several processes sharing a physical node to encode packets. This coding technique was shown to reduce the volume of communications by 62%, along with other advantages such as improving the utilization of system resources from disk utilization, queue size, and the number of bits transmitted during shuffle phase of Hadoop (200%) [234]. Other approaches to reduce the burden on data centers from the information exchange include traffic flow prediction to reduce network transfers [184, 235], and redundancy elimination scheme to remove redundant information data exchange, among others [236, 237, 238].

Viii-B CPS-based Big Data toward Green Applications

CPS-based big data applications can contribute to greening different sectors, environment, economic, and social/technical issues [6, 32, 31]. As for the environment, efforts are made to reduce air/water pollution as well as the impact of climate changes. For instance, sensors can be deployed to monitor air and water qualities. Using MapReduce or Spark programming frameworks, the concentration of pollutants can be monitored and studied [274]; the air quality not covered via monitoring stations can be estimated [275], and the causalities of air pollutants can be identified using urban big data dynamics [276]. Pollution can also originate from oil spills. Predicting such catastrophes very early can help save beaches, coastlines and waters [6, 32]. Marine oil spills can be detected using a large archive of remote sensing big data [10]; and a real-time warning can be generated from a quantitative data analysis using supervised oil system [277]. Concerning water pollution, underwater sensors can be deployed to monitor water environment, such as water level, water flow, temperature, and pressure. The sensor network can be connected to a cloud platform via a wireless transceiver for analysis and visualization [278]. Noise pollution in cities is another contributing factor to environmental pollution, which can have negative impact on health, especially with the increasing number of circulating vehicles [6, 32]. Noise pollution levels can be predicted via collecting four data sources: complaint data, social media, points of interests, and road network data [279]. These data can be gathered via deploying static municipal sensors, together with participatory sensing with smart phones in order to provide more accurate noise maps [280].

As for green economics, CPS applications can be targeted for optimizing the energy use. One example is FirstFuel, which monitors temperature and lightnings inside buildings by checking the running status of the equipments, such as fans, heating and cooling units [6]. Power management can be realized using sensors monitoring the whole building, as well as using smart meters for electricity consumption measurements. A solution in energy efficient smart meters has been proposed in [281] using coalition game that maximizes the pay-off values of smart meters. An approach to predict daily electricity consumption inside buildings using data analysis was proposed in [282]. The authors used canonical variate analysis to group electricity consumption profiles into clusters in order to identify abnormal energy usage. Energy efficiency can also be employed in transportation systems to reduce fuel wastage. One example is [283], where the authors used electric vehicles (EV) battery model to estimate the driver’s behaviors and driving range to improve energy efficiency.

Social media and participatory sensing can also be used toward greener environments. For instance, in [284], the authors used social media to optimize smart grid management. In [285], participatory sensing was used to implement a navigation service called GreenGPS to allow drivers to obtain customized routes that are the most fuel-efficient.

Literature Green Solutions
Data Collection/Storage
[233]
[239], [240], [241]
[262], [263], [264], [242]
Minimizing number of relay transmissions
Removing redundant transmission links
Data compression techniques
Computing
[29], [246], [247], [248]
[56], [80], [249], [250], [251], [252], [253], [254], [255], [256]
Dynamic voltage and frequency scaling
Traffic engineering
Processing
[30]
[258], [259], [260], [261]
[57], [58, 59], [60], [61, 62, 63],[184], [234],[235], [236], [237], [238]
[99], [100]
[108], [257]
Energy-efficient orchestrators
Checkpointing aided parallel execution (CAPE)
Reducing the amount of exchanged data
Cloudlets
Virtual machine placement for energy efficiency
TABLE VII: Summary of different green solutions proposed for CPS
Green Data Management Solutions Green Architectural Solutions Green Software Solutions
Minimizing number of relay transmissions
Removing redundant transmission links
Data compression techniques
Traffic engineering
Reducing the amount of exchanged data
Dynamic voltage and frequency scaling
Traffic Engineering
Energy-efficient orchestrators
Cloudlets
Virtual machine placement
Traffic Engineering
Reducing the amount of exchanged data
Data compression techniques
Checkpointing aided parallel execution (CAPE)
TABLE VIII: Different Green Solutions Approaches

Ix CPS Big Data Applications

We now present some of the main CPS big data applications in different fields, such as energy utilization, city management, and disaster events applications, along with a public safety case study model. Fig. 6 shows different CPS applications and their big data generation. For instance, the intelligent transportation system would generate big data consisting of driver’s behavior, passenger information, vehicles’ locations, traffic signals management, accidents’ reporting, automated fare calculations, and so on. Each one of the CPS applications generate a large amount of data that need to be stored, processed and analyzed in order to improve services and applications’ performance.

Fig. 6: Big data meet CPS applications (adapted from [286, 287, 288, 289, 290, 291]).
Fig. 7: A public safety model for CPS.

Ix-a Smart Grids

Smart grids constitute an important aspect of sustainable energy utilization and are becoming more popular, especially with the advances in sensing and signal processing technologies. Automated smart decisions based on millions of data and control points play an important role in managing the energy usage patterns, understanding users’ behaviors, reducing the need to build power plants, and addressing supply fluctuations by using renewable resources [244]. The large number of embedded power generator sensors and their communications with different home sensors and appliances are expected to generate a large amount of data. These advanced sensing and control technologies used in smart grids are often limited to a small region such as a city; however they are envisioned to be deployed on a much larger scale such as the whole country. This will introduce several challenges, among which information management, processing and analysis are the main ones [292]. These big data tasks get even more complicated with the increasing number of transactions that need to be processed for millions of customers. For instance, one smart grid utility is expected to handle two million customers with 22 gigabytes of data per day [292]. That is why big data tools from cloud computing [244], mining and analytics [293, 294], performance optimization [295] and others have been dedicated for smart grids applications.

Moreover, for reliable power grid, smart grids highly depend on cyber infrastructure. This poses several challenges such as exposing the physical operations of smart grid systems to cyber security attacks [296]. Furthermore, the collection of users’ energy usage information such as the types of appliances they use, the eating/sleeping patterns, and so on, can be very beneficial in optimizing smart grids’ performance; however users’ privacy can also be affected. In [297], Yassine et al. proposed a game theoretic mechanism to balance between users’ private information and the beneficial uses of data. In [298], a big data architecture for smart grids based on random matrix theory was proposed to conduct high dimensional analysis, identify data correlations, and manage data and energy flows among utilities. The proposed architecture not only allows large-scale big data analysis but also can be used as an anomaly detection tool to detect security flaws in smart grids.

For security threats that occur in short time, a security situational awareness technique was proposed in [299], which uses fuzzy cluster analysis based on game theory and reinforcement learning to enhance security in smart grids. A big data computing architecture for smart grids was proposed in [300] with four main elements: 1) data resource where smart grids data is generated by different devices, companies and systems with complex correlated relationship among them, 2) data storage where only meaningful information is stored and processed with in stream mode or batch mode, 3) data analysis using demand-side management or load forecasting for purposes of categorizing the total demand response in a specific region, and 4) data transmission which bridges the previous elements together. Using this architecture with energy scheduling scheme based on game theory and a generic algorithm-based optimization to obtain the optimal deployment of energy storage devices for each customer, the results show a significant reduction in total costs of consumers over long term. In [301], the combined temporal encoding, delayed feedback networks (DFNs) reservoir computing (RC) implementation, and a multi-layer perceptron (MLP) were used to execute effective attack detection for smart grid networks.

Ix-B Military Applications

Big data can also be exploited to improve military experiences, services, and training. Real-time authentication of command and control messages in cyber-physical infrastructures is of high importance for military services to ensure security. In [302], the authors developed a novel broadcast authentication scheme using special digital signatures for faster signature generation and verification, and packet loss tolerance. This can be useful to efficiently and rapidly secure military communications. In [303], the authors used Markov decision process to propose an approach to identify and reduce attacks’ cost in military operations in order to protect important information through obtaining attack policies. Military satellite communications require being resilient to ensure missions’ success. This can be achieved using matrix-based protection assessment approach based on traditional risk analysis, where an attack can be assessed in terms of both ease of attack and impact of attack [304]. Mitigating the following five core threats allows the satellite communications to be free from weak vulnerabilities that can be easily exploited by attackers: waveform, RF access to enemy, foreign presence, physical access, and traffic concentration [304].

Ix-C City Management

Big data can facilitate daily activities by using smart infrastructures and services. With the increasing number of sensors deployed in in urban environment either indoor or outdoor, from smart phones, smart cards, on-board vehicle sensors and so on, the city is faced with a large amount of information that need to be exploited to detect different urban dynamic patterns [305]. For instance, traffic patterns can be analyzed and routes can be computed to allow people to reach their destinations faster. In [306], the authors proposed to deploy road sensors to obtain information on the overall traffic, such as speed and location of individual vehicles. This information is then processed using graph algorithms by taking advantage of big data tools such as Giraph, Spark and Hadoop. This helps provide real-time intelligent decisions for smart efficient transportation. Since wireless sensor networks (WSN) are the main infrastructures for smart cities to monitor and gather information from the environment, several works have focused on extending network’s lifetime. For instance in [307], the authors proposed using software-defined networking (SDN) controller to reduce WSN traffic and improve decisions making. In [243], compressive sensing and consensus filter were used to reconstruct signals from fewer sensor nodes, leading to power savings. Sleep scheduling with renewable energy resources such as solar harvesting were also suggested in [7828559.Rachad_Harvest] to prolong WSN lifetime. CPS based big data may also support localization applications[190, 191].

An IoT-based general architecture for smart cities was presented in [305], which consists of four different layers: 1) technologies layer consisting of self-configured and remotely-controlled sensors and actuators, 2) middleware layer where data from different sensors are collected to provide context information, 3) management layer where different data analytic tools are used to extract information, test hypothesis and draw conclusions, and 4) services layer consisting of services provided by smart cities based on the previous layers such as environmental monitoring, energy efficiency in buildings, intelligent transportation systems an so on.

Safety systems, such as deploying surveillance systems, are other city management big data applications. A computer vision deep learning algorithm for human activity recognition was proposed in [179]. The model is capable of recognizing twelve types of human activities with high accuracy and without the need of prior knowledge, which makes it useful for security monitoring applications. Crowd detection and surveillance is another safety system big data application. A target individual needs to be easily inferred from visual information. In [231], the authors proposed such a framework that can easily detect target location and update the motion information to improve the detection.

Ix-D Medical Applications

CPS health systems are foreseen to shape the future of tele-medicine in different areas such as cardiology, surgery, patients’ health monitoring, which will significantly enhance the healthcare system by providing timely, efficient and effective medical decisions for a myriad of health applications such as diabetes management, blood pressure and heart rhythm monitoring, elderly support and so on [308]. With 774 million connected health-related devices [309] by 2020, a large volume of data from small-scale networks, such as e-health systems or mobile-health systems, needs to be stored, processed and analyzed to enable timely intervention and better management of patients’ health.

With e-health systems becoming widely deployed in hospitals and health centers, research has been focused on efficiently deploying medical body area networks (MBANs) to reduce interference on medical bands from other devices [310]. In MBANs, biomedical sensors are placed in the vicinity of patient’s body or even inside her to sense health-related vital signals using short-range wireless technologies. The collected data are then multi-hopped to remote stations, so that medical staff can efficiently monitor patients’ physiological conditions and disease progression [310]. For instance, in [311], elderly patients’ health tracking application was proposed, where a mixed positioning algorithm allows for 24-hour monitoring of patients’ activities and transmits an alarm to medical staff through SMS, e-mail or telephone in case of an abnormal event or emergency. However, transmitting this health information in a timely and energy-efficient manner is of utmost importance for e-health systems. In [312], a micro Subscription Management System (SMS) middleware for e-health systems was presented. The SMS platform allows sensor nodes to exchange information to provide event-driven services with dynamic memory and variable payload such as GPS coordinates, Home Context and so on. The designed architecture achieves lower memory overhead, lower software components load time and lower event propagation time than other similar proposals, which are all critical requirements for energy efficiency, reliability and scalability of e-health systems. The paper [313] proposed locality sensitive hashing to learn sensor patterns for monitoring health conditions of dispersed users.

Ix-E Disaster Events Applications

Network resilience and survivability are the utmost requirements for public safety networks. In case of a disaster or emergency event, the people who are first on scene are referred to as first responders, and they include law enforcement, firefighters, medical personnel and others [314]. Some of the major public safety requirements relate to the necessity of first responders to exchange information (voice and/or data) in a timely manner [314]. The big data can be used to support disaster events, such as analyzing big data from high resolution maps, floor plans and on-field video transmissions to transmit warning messages to authorities [315]. The remote sensing big data can be analyzed using a scalable hybrid parallelism approach to reduce the analytics execution time [316]. The large amount of data collected from previous earthquakes can be used to predict the future service availability areas, which can improve preparation and response to such events [317]. A disaster domain-specific search engine can be constructed using big data to make the understanding and preparation of disaster attacks easier and faster for authorities [318].

Ix-F A Public Safety Case Study Model

Fig. 7 depicts a public safety (PS) model, where the base station is unavailable due to network failure, indoor or underground communications or cell-edge locations. This fact necessitates that users form a device-to-device (D2D) network autonomously without assistance of any network infrastructure. In this model, each D2D cluster sends its PS-related data findings such as maps, videos, pictures and so on to a command center for data analysis. This can be done through satellites or by raising in the sky a balloon with attached 4G eNodeB (AeNB) to restore temporarily connectivity (Project ABSOLUTE [319]). LTE is suggested as a technology enabler for PS communications [320]. LTE provides high rate and very low latency IP connectivity. LTE can complement existing Private Mobile Radio (PMR) networks like TETRA/TETRAPOL/Project25. In terms of end-to-end latency requirement, messages are expected to be delivered within 5-10 milliseconds (ms) with a reliability of 99.99% of packets successfully delivered within a time window [321, 322]. Moreover, the downlink peak data rates are expected to be over 50 Mbps with an uplink data rate of over 25 Mbps [323]. The command center should be able to analyze different data types and then transmit timely, useful and accurate dynamic messages to each D2D cluster head. The command center can transmit information such as assessment of the level of assistance needed, emergency notification messages, emergency service providers coordination, and so on. Each D2D cluster head in turn multicasts this information to its members for decision making. This poses a challenge: some users might be far away from the multicast transmitter or their link quality suffers, which would decrease the coverage probability of the receiver and might require several re-transmissions to send the message successfully. Obviously, there is a trade-off between throughput (or end-to-end delay) and reliability due to the random transmission errors caused by the unpredictable behavior of the wireless channel. It is evident that we need to minimize delay, while maintaining some good degree of reliability. In what follows, we briefly derive the end-to-end delay and reliability for public safety networks.

Let be the indicator variable defined as:

The set of nodes that have received successfully in the round of multicasting can be defined as:

(1)

In order to minimize the total delay, , for distributing the data to all nodes, ( is the total number of nodes), we need to select node to multicast in the round of multicasting such that:

(2)

where is the transmission rate between nodes and . Define the set of unvisited nodes as and the set of the visited nodes as Initially, we have , , and . In the round of multicasting by node , gets updated by adding to it the time of last node receiving the packet successfully as:

(3)

where , and is the minimum access time of node . The multicasting rounds continue until either i) all unvisited nodes become visited, i.e., , or ii) When exceeds a threshold , in that case the packet is dropped.

Reliability can be defined as the ability of the network to perform its required functions for a certain period of time without service interruption. A link breakage can occur due to the following reasons:

  • The receiver is outside of the transmission range of the transmitter

  • The packet is lost due to channel errors

  • The packet is lost due to total delay exceeding threshold .

Assuming users are not moving, as in [324], each D2D receiver is randomly and independently located around its corresponding transmitter with isotropic direction and Rayleigh distributed distance with probability density function (PDF):

(4)

The probability that a user is outside the transmission range of the transmitter can be calculated as .

Considering log-normal shadow fading, the signal fades: (1) deterministically with a path-loss exponent , and (2) stochastically represented by a random variable with zero mean and a variance . Therefore, the probability that a packet is not lost due to channel errors is given in [325]:

(5)

where is the minimum threshold required to deliver a packet between nodes.

Finally, we combine what we have discussed to get the link reliability as:

(6)

where .

X Big Data Challenges and Open Issues for CPS

While ongoing research is focusing on CPS enterprise development and applications, effective solutions to combat security flaws have not received the enough attention, which places question marks on the foreseen integration of CPS in critical infrastructures. This matter is made worse with CPS devices having i) computational challenges in ensuring data confidentiality and privacy protection; ii) the semi- or fully-autonomous security management [326]; and iii) the high computational costs of employing cryptography [327]. Low-complexity and lightweight ciphers such as PRESENT [328] have been developed; however research efforts should go beyond cryptography, especially that such solutions can be costly in terms of i) latency, ii) power consumption, and iii) key management complexity [329, 330]. Furthermore, different CPS applications might have different security perimeters with a multitude of interactions among devices. This complicates further the access control decisions, the trustworthiness of entities, and their authentication management. Therefore, there is a high need to define secure interoperation protocols and strategies for dynamic CPS, where devices from different applications interact together to complete specific tasks [331].

Privacy preserving is an important aspect of big data analytics and mining. Homomorphic encryption techniques, as mentioned in Section VII-B, have been suggested to perform analysis on encrypted data in cloud. However, they turned out to be impractical and inefficient in terms of computational time and overhead for large amount of datasets. Although anonymization techniques constitute a more efficient solution for privacy protection big data analytics, further research efforts are needed to determine their applicability for different big data types, and their performance evaluation and endurance in different mining techniques as outlined in Section VI-A. Moreover, future solutions should be based on the hierarchy property of many CPS big data applications [222]. This means that future research should be directed towards proposing integrated privacy preserving protocols in the modules of the CPS hierarchical structure that can perform analytical processes on encrypted variety of big data types in a time- and cost-efficient manner.

Correctness of CPS is another research area under the spotlight of attention. Due to dynamic nature of physical environment, CPS need to constantly be adapted to new situations while operating and functioning properly with little or no human supervision [332]. The use of models can allow the early detection of failures by simulating different components of complex designs to verify the integration of the whole system [333, 334]. CPS components verification to ensure they are working properly or that they meet execution time requirements is another approach to ensure correctness [335, 336, 337]. Future research shall focus on creating robust real-time anomaly detection and correctness techniques that have low overhead and implementation costs.

With CPS being employed on a large scale, data access, routing, transmissions, and processing might consume a significant amount of time, leading to real obstacles in the way of realizing ultra-reliable and low latency communications (URLLC), a key feature for emerging 5G technologies. In URLLC, stringent requirements are placed on latency, throughput, and availability. Future research should be directed towards faster data access from the clouds with fewer data re-transmissions in order to free up resources and reduce latency. Obviously, there is a trade-off between throughput (or end-to-end delay) and reliability due to transmission errors that might be caused by links failure, security policies, caching misses, scarcity of resources and so on. For instance, cross-layer designs protocols between different CPS layers (sensing, middleware, transmission, management and services layers) can be used in a similar way to the wireless communication protocols to help optimize different parameters from latency to reliability and security. An example would be the transmission layer communicating with the sensing layer about integrating data filtering aimed at prioritizing the sensed data in the case of too many re-transmissions.

For faster data collection, research efforts need to shift towards devices that do not need to preconfigure to a network with dynamic on-the-fly D2D connectivity, and without the need of controllers or infrastructure deployment. Furthermore, to extend the communication range among devices, incentivized devices with tokens will replace dedicated relays [338]. In terms of CPS computing, research directions should shift towards faster data processing via moving data processing closer to the sources and speeding-up big data handling. For a faster and efficient CPS analysis, research activities need to be further conducted on information fusion techniques, especially that information originates from heterogeneous devices with varying capabilities. Through perfecting the fusion algorithms, related information can be aggregated for analysis leading to higher quality information [339].

As mentioned in this survey, big data analytics is one of the most important aspects of CPS, as it helps to pave the way for new opportunities and services, in addition to enhancing or even optimizing systems’ performance. However, as discussed in Section III, the sensed data can originate from a variety of sources with unstructured formats. Merging these different data types into single homogeneous structures is vital to ease and speed up the analysis process so meaningful conclusions can be drawn and automated decisions can be made [340]. How to homogenize different data sources remains an open research issue for CPS. Moreover, different CPS applications might need to collaborate together to achieve a specific mission. For instance, the mobile-health application might need to collaborate with the transportation system to get an ambulance as fast as possible in the case of patient’s biomedical sensors revealing an emergency situation. In such scenarios, data analytics becomes a challenging task as it requires to be able to put the different analysis fragments from different CPS applications together to provide broader conclusions and decisions making.

Xi Conclusion

The emerging CPS technologies mutually benefit the technological advancements in big data processing and analysis. When combined with artificial intelligence, machine learning, and neuromorphic computing techniques, CPS will bring about new applications, services and opportunities, all envisioned to be automated with little or without human intervention. This will help revolutionize the “smart planet” concept, where smarter water management, smarter healthcare, smarter transportation, smarter energy, and smarter food will create a radical shift in our lives. In this survey paper, we have provided a broad overview of CPS big data collection, storage, access, caching, routing, processing, and analysis to support understanding and discovering the challenges facing CPS, the existing proposed solutions, and the open issues that are yet to be addressed. Then, we have discussed the security vulnerabilities of CPS and the different security solutions, as well as how big data meet green challenges for CPS systems to address sustainability and environmental concerns. Finally, some challenges and open issues for CPS have been addressed.

References

  • [1] R. Atat, “Enabling cyber-physical communication in 5G cellular networks: Challenges, solutions and applications,” PhD Thesis, University of Kansas, USA, 2017.
  • [2] CISCO, “Fog computing and the internet of things: extend the cloud to where the things are,” white paper, CISCO, Tech. Rep., 2015.
  • [3] J. Lee, B. Bagheri, and H.-A. Kao, “A cyber-physical systems architecture for industry 4.0-based manufacturing systems,” Manufacturing Letters, vol. 3, pp. 18–23, 2015.
  • [4] T. Muhonen, “Standardization of industrial internet and iot (iot–internet of things)–perspective on condition-based maintenance,” University of Oulu, Finland, 2015.
  • [5] C. W. Tsai, C. F. Lai, M. C. Chiang, and L. T. Yang, “Data mining for internet of things: A survey,” IEEE Communications Surveys Tutorials, vol. 16, no. 1, pp. 77–97, First 2014.
  • [6] J. Wu, S. Guo, J. Li, and D. Zeng, “Big data meet green challenges: Greening big data,” IEEE Systems Journal, vol. 10, no. 3, pp. 873–887, Sept 2016.
  • [7] J. Wu, S. Guo, H. Huang, W. Liu, and Y. Xiang, “Information and communications technologies for sustainable development goals: State-of-the-art, needs and perspectives,” IEEE Communications Surveys & Tutorials, vol. 20, no. 3, pp. 2389–2406, 2018.
  • [8] J. Wu, I. Bisio, C. Gniady, E. Hossain, M. Valla, and H. Li, “Context-aware networking and communications: Part 1 [guest editorial],” IEEE Communications Magazine, vol. 52, no. 6, pp. 14–15, June 2014.
  • [9] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Context aware computing for the internet of things: A survey,” IEEE Communications Surveys and Tutorials, vol. 16, no. 1, pp. 414–454, First 2014.
  • [10] M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, “Big data for remote sensing: Challenges and opportunities,” Proceedings of the IEEE, vol. 104, no. 11, pp. 2207–2219, Nov 2016.
  • [11] B. Zhang, Z. Zhang, Z. Ren, J. Ma, and W. Wang, “Energy-efficient software-defined data collection by participatory sensing,” IEEE Sensors Journal, vol. 16, no. 20, pp. 7315–7324, Oct 2016.
  • [12] Y. Sun, H. Song, A. J. Jara, and R. Bie, “Internet of things and big data analytics for smart and connected communities,” IEEE Access, vol. 4, pp. 766–773, 2016.
  • [13] B. Guo, C. Chen, D. Zhang, Z. Yu, and A. Chin, “Mobile crowd sensing and computing: when participatory sensing meets participatory social media,” IEEE Communications Magazine, vol. 54, no. 2, pp. 131–137, February 2016.
  • [14] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171–209, Apr 2014.
  • [15] P. Derbekoa, S. Dolevb, E. Gudesb, and S. Sharmab, “Security and privacy aspects in mapreduce on clouds: A survey,” Computer Science Review, vol. 20, no. 1, pp. 1–28, May 2016.
  • [16] W. Fan and A. Bifet, “Mining big data: Current status, and forecast to the future,” SIGKDD Explor. Newsl., vol. 14, no. 2, pp. 1–5, Apr. 2013. [Online]. Available: http://doi.acm.org/10.1145/2481244.2481246
  • [17] H. Hu, Y. Wen, T. S. Chua, and X. Li, “Toward scalable systems for big data analytics: A technology tutorial,” IEEE Access, vol. 2, pp. 652–687, 2014.
  • [18] P. Zadrozny and R. Kodali, Big data analytics using Splunk: Deriving operational intelligence from social media, machine data, existing data warehouses, and other real-time streaming sources.   Apress, 2013.
  • [19] M. Wang, B. Li, Y. Zhao, and G. Pu, “Formalizing google file system,” in Proc. 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing, Nov 2014, pp. 190–191.
  • [20] P. Derbekoa, S. Dolevb, E. Gudesb, and S. Sharmab, “Security and privacy aspects in mapreduce on clouds,” Comput. Sci. Rev., vol. 20, no. 1, pp. 1–28, May 2016. [Online]. Available: http://dx.doi.org/10.1016/j.cosrev.2016.05.001
  • [21] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” Comput. Netw., vol. 54, no. 15, pp. 2787–2805, Oct. 2010. [Online]. Available: http://dx.doi.org/10.1016/j.comnet.2010.05.010
  • [22] D. Acharjya and K. Ahmed, “A survey on big data analytics: Challenges, open research issues and tools,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 2, pp. 511–518, Feb 2016.
  • [23] P. Bellavista, A. Corradi, M. Fanelli, and L. Foschini, “A survey of context data distribution for mobile ubiquitous systems,” ACM Computing Surveys, vol. 44, no. 4, pp. 24:1–24:45, Sep. 2012. [Online]. Available: http://doi.acm.org/10.1145/2333112.2333119
  • [24] X. Zhang, Z. Yang, W. Sun, Y. Liu, S. Tang, K. Xing, and X. Mao, “Incentives for mobile crowd sensing: A survey,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp. 54–67, Firstquarter 2016.
  • [25] S. Tamboli and S. S. Patel, “A survey on innovative approach for improvement in efficiency of caching technique for big data application,” in Pervasive Computing (ICPC), 2015 International Conference on, Jan 2015, pp. 1–6.
  • [26] B. Rao and L. Reddy, “Survey on improved scheduling in Hadoop MapReduce in cloud environments,” International Journal of Computer Applications, vol. 34, no. 9, pp. 29–33, Nov 2011.
  • [27] S. Yi, Z. Qin, and Q. Li, “Security and privacy issues of fog computing: a survey,” in Wireless Algorithms, Systems, and Applications.   Springer International Publishing, 2015, pp. 685–695.
  • [28] W. Wang and Z. Lu, “Survey cyber security in the smart grid: Survey and challenges,” Comput. Netw., vol. 57, no. 5, pp. 1344–1371, Apr. 2013. [Online]. Available: http://dx.doi.org/10.1016/j.comnet.2012.12.017
  • [29] A. Beloglazov, R. Buyya, Y. C. Lee, A. Zomaya, and Others, “A taxonomy and survey of energy-efficient data centers and cloud computing systems,” Advances in Computers, vol. 82, no. 2, pp. 47–111, 2011.
  • [30] Z. Asad and M. A. R. Chaudhry, “A two-way street: Green big data processing for a greener smart grid,” IEEE Systems Journal, vol. PP, no. 99, pp. 1–11, 2016.
  • [31] C. Estevez and J. Wu, “Recent advances in green internet of things,” in Proc. 2015 7th IEEE Latin-American Conference on Communications (LATINCOM), Nov 2015, pp. 1–5.
  • [32] J. Wu, S. Guo, J. Li, and D. Zeng, “Big data meet green challenges: Big data toward green applications,” IEEE Systems Journal, vol. 10, no. 3, pp. 888–900, Sept 2016.
  • [33] J. Shi, J. Wan, H. Yan, and H. Suo, “A survey of cyber-physical systems,” in Proc. 2011 International Conference on Wireless Communications and Signal Processing (WCSP), Nov 2011, pp. 1–6.
  • [34] R. Chaari, F. Ellouze, A. Koubaa, B. Qureshi, N. Pereira, H. Youssef, and E. Tovar, “Cyber-physical systems clouds: A survey,” Computer Networks, vol. 108, pp. 260 – 278, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1389128616302699
  • [35] V. Gunes, S. Peter, T. Givargis, and F. Vahid, “A survey on concepts, applications, and challenges in cyber-physical systems,” KSII Transactions on Internet and Information Systems (TIIS), vol. 8, no. 12, pp. 4242–4268, 2014.
  • [36] A. P. Nugroho, T. Okayasu, M. Horimoto, D. Arita, T. Hoshi, H. Kurosaki, K.-i. Yasuba, E. Inoue, Y. Hirai, M. Mitsuoka et al., “Development of a field environmental monitoring node with over the air update function,” Agricultural Information Research, vol. 25, no. 3, pp. 86–95, 2016.
  • [37] D. N. Chorafas, Business, Marketing, and Management Principles for IT and Engineering.   Boca Raton, FL, USA: Auerbach Publications, June 22 2011.
  • [38] L. Sanchez, M. Bauer, J. Lanza, R. Olsen, and M. Girod-Genet, “A generic context management framework for personal networking environments,” in Proc. the 2006 Third Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services.   Los Alamitos, CA, USA: IEEE Computer Society, 2006, pp. 1–8.
  • [39] J. A. ALLAN, “A review of: Remote sensing: Principles and interpretation by FLOYD F. SABINS, JR. (San Francisco: W. H. Freeman, 1978.) [pp. 1-426.],” International Journal of Remote Sensing, vol. 1, no. 3, pp. 307–308, 1980.
  • [40] Y. Ma, H. Wu, L. Wang, B. Huang, R. Ranjan, A. Zomaya, and W. Jie, “Remote sensing big data computing,” Future Gener. Comput. Syst., vol. 51, no. C, pp. 47–60, Oct. 2015. [Online]. Available: http://dx.doi.org/10.1016/j.future.2014.10.029
  • [41] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A technical tutorial on the state of the art,” IEEE Geoscience and Remote Sensing Magazine, vol. 4, no. 2, pp. 22–40, June 2016.
  • [42] M. M. U. Rathore, A. Paul, A. Ahmad, B. W. Chen, B. Huang, and W. Ji, “Real-time big data analytical architecture for remote sensing application,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 10, pp. 4610–4621, Oct 2015.
  • [43] L. Wang, H. Zhong, R. Ranjan, A. Zomaya, and P. Liu, “Estimating the statistical characteristics of remote sensing big data in the wavelet transform domain,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 324–337, Sept 2014.
  • [44] H. Xie, X. Tong, W. Meng, D. Liang, Z. Wang, and W. Shi, “A multilevel stratified spatial sampling approach for the quality assessment of remote-sensing-derived products,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 10, pp. 4699–4713, Oct 2015.
  • [45] M. A. Alsheikh, D. Niyato, S. Lin, H. p. Tan, and Z. Han, “Mobile big data analytics using deep learning and apache spark,” IEEE Network, vol. 30, no. 3, pp. 22–29, May 2016.
  • [46] X. Zhang, Z. Yi, Z. Yan, G. Min, W. Wang, A. Elmokashfi, S. Maharjan, and Y. Zhang, “Social computing for mobile big data,” Computer, vol. 49, no. 9, pp. 86–90, Sept 2016.
  • [47] M. Tang, H. Zhu, and X. Mao, “A lightweight social computing approach to emergency management policy selection,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 8, pp. 1075–1087, Aug 2016.
  • [48] M. Parameswaran and A. B. Whinston, “Social computing: An overview,” Communications of the Association for Information Systems, vol. 19, 2007. [Online]. Available: http://aisel.aisnet.org/cais/vol19/iss1/37/
  • [49] H. Ning, H. Liu, J. Ma, L. T. Yang, and R. Huang, “Cybermatics: Cyber–physical–social–thinking hyperspace based science and technology,” Future generation computer systems, vol. 56, pp. 504–522, March 2016.
  • [50] S. Chang, H. Zhu, W. Zhang, L. Lu, and Y. Zhu, “Pure: Blind regression modeling for low quality data with participatory sensing,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 4, pp. 1199–1211, April 2016.
  • [51] R. B. Messaoud and Y. Ghamri-Doudane, “Fair qoi and energy-aware task allocation in participatory sensing,” in 2016 IEEE Wireless Communications and Networking Conference, April 2016, pp. 1–6.
  • [52] J. Wang, Y. Wang, D. Zhang, L. Wang, H. Xiong, S. Helal, Y. He, and F. Wang, “Fine-grained multi-task allocation for participatory sensing with a shared budget,” IEEE Internet of Things Journal, vol. PP, no. 99, pp. 1–1, 2016.
  • [53] L. Liu, W. Wei, D. Zhao, and H. Ma, “Urban resolution: New metric for measuring the quality of urban sensing,” IEEE Transactions on Mobile Computing, vol. 14, no. 12, pp. 2560–2575, Dec 2015.
  • [54] X. Sun, S. Hu, L. Su, T. F. Abdelzaher, P. Hui, W. Zheng, H. Liu, and J. A. Stankovic, “Participatory sensing meets opportunistic sharing: Automatic phone-to-phone communication in vehicles,” IEEE Transactions on Mobile Computing, vol. 15, no. 10, pp. 2550–2563, Oct 2016.
  • [55] C. Xiang, P. Yang, C. Tian, L. Zhang, H. Lin, F. Xiao, M. Zhang, and Y. Liu, “Carm: Crowd-sensing accurate outdoor rss maps with error-prone smartphone measurements,” IEEE Transactions on Mobile Computing, vol. 15, no. 11, pp. 2669–2681, Nov 2016.
  • [56] L. Wang, D. Zhang, Z. Yan, H. Xiong, and B. Xie, “effsense: A novel mobile crowd-sensing framework for energy-efficient and cost-effective data uploading,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 12, pp. 1549–1563, Dec 2015.
  • [57] E. Zeydan, E. Bastug, M. Bennis, M. A. Kader, I. A. Karatepe, A. S. Er, and M. Debbah, “Big data caching for networking: moving from cloud to edge,” IEEE Communications Magazine, vol. 54, no. 9, pp. 36–42, September 2016.
  • [58] M. Ji, G. Caire, and A. F. Molisch, “Wireless device-to-device caching networks: Basic principles and system performance,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 1, pp. 176–189, Jan 2016.
  • [59] K. Poularakis, G. Iosifidis, A. Argyriou, I. Koutsopoulos, and L. Tassiulas, “Caching and operator cooperation policies for layered video content delivery,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, April 2016, pp. 1–9.
  • [60] H. Zhao, K. Gai, J. Li, and X. He, “Novel differential schema for high performance big data telehealth systems using pre-cache,” in High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, Aug 2015, pp. 1412–1417.
  • [61] L. Lundberg, H. Grahn, D. Ilie, and C. Melander, “Cache support in a high performance fault-tolerant distributed storage system for cloud and big data,” in Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, May 2015, pp. 537–546.
  • [62] S. G. Kanbargi and S. K. S, “Cache utilization for enhancing analyzation of big-data increasing the performance of hadoop,” in 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15), vol. 01, Dec 2015, pp. 1–7.
  • [63] Y. Zhao, J. Wu, and C. Liu, “Dache: A data aware caching for big-data applications using the mapreduce framework,” Tsinghua Science and Technology, vol. 19, no. 1, pp. 39–50, Feb 2014.
  • [64] S. C. Suh, U. J. Tanik, J. N. Carbone, and A. Eroglu, Applied cyber-physical systems.   Springer, 2014.
  • [65] M. Wahler, M. Oriol, and A. Monot, “Real-time multi-core components for cyber-physical systems,” in Proc. the 18th International ACM SIGSOFT Symposium on Component-Based Software Engineering, ser. CBSE ’15.   New York, NY, USA: ACM, 2015, pp. 37–42. [Online]. Available: http://doi.acm.org/10.1145/2737166.2737176
  • [66] B. Akesson and K. Goossens, “Architectures and modeling of predictable memory controllers for improved system integration,” in Proc. 2011 Design, Automation Test in Europe, March 2011, pp. 1–6.
  • [67] A. Herkersdorf and M. Paulitsch, “Multicore enablement for embedded and cyber physical systems (dagstuhl seminar 13052),” Dagstuhl Reports, vol. 3, no. 1, 2013.
  • [68] J. Huang, S. Wang, X. Cheng, and J. Bi, “Big data routing in d2d communications with cognitive radio capability,” IEEE Wireless Communications, vol. 23, no. 4, pp. 45–51, August 2016.
  • [69] G. Mone, “Intelligent living,” Commun. ACM, vol. 57, no. 12, pp. 15–16, Nov. 2014. [Online]. Available: http://doi.acm.org/10.1145/2676393
  • [70] S. Tibken, “Samsung, smartthings and the open door to the smart home,” cnet CES, 2015.
  • [71] 3rd Generation Partnership Project, “3GPP specification: 37.868; RAN improvements for machine-type communications,” Tech. Rep., August 2011.
  • [72] K. Zheng, S. Ou, J. Alonso-Zarate, M. Dohler, F. Liu, and H. Zhu, “Challenges of massive access in highly dense LTE-advanced networks with machine-to-machine communications,” IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 12–18, June 2014.
  • [73] A. Laya, L. Alonso, and J. Alonso-Zarate, “Is the random access channel of LTE and LTE-A suitable for M2M communications? a survey of alternatives,” IEEE Communications Surveys Tutorials, vol. 16, no. 1, pp. 4–16, First 2014.
  • [74] M. Condoluci, L. Militano, A. Orsino, J. Alonso-Zarate, and G. Araniti, “LTE-direct vs. WiFi-direct for machine-type communications over LTE-A systems,” in Proc. 2015 IEEE 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Aug 2015, pp. 2298–2302.
  • [75] G. Rigazzi, N. K. Pratas, P. Popovski, and R. Fantacci, “Aggregation and trunking of M2M traffic via D2D connections,” in Proc. 2015 IEEE International Conference on Communications (ICC), June 2015, pp. 2973–2978.
  • [76] G. Rigazzi, F. Chiti, R. Fantacci, and C. Carlini, “Multi-hop D2D networking and resource management scheme for M2M communications over LTE-A systems,” in Proc. 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), Aug 2014, pp. 973–978.
  • [77] S. Wu, R. Atat, N. Mastronarde, and L. Liu, “Improving the coverage and spectral efficiency of millimeter-wave cellular networks using device-to-device relays,” IEEE Transactions on Communications, vol. 66, no. 5, pp. 2251–2265, May 2018.
  • [78] N. K. Pratas and P. Popovski, “Zero-outage cellular downlink with fixed-rate D2D underlay,” IEEE Transactions on Wireless Communications, vol. 14, no. 7, pp. 3533–3543, July 2015.
  • [79] C. Zhu, H. Wang, X. Liu, L. Shu, L. T. Yang, and V. C. M. Leung, “A novel sensory data processing framework to integrate sensor networks with mobile cloud,” IEEE Systems Journal, vol. 10, no. 3, pp. 1125–1136, Sept. 2016.
  • [80] B. Luo, W. Liu, and A. Al-Anbuky, “Sustainme if you can: Sustainable transmission networking design for big data,” in 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2016, pp. 265–270.
  • [81] L. W. Cheng and S. Y. Wang, “Application-aware SDN routing for big data networking,” in 2015 IEEE Global Communications Conference (GLOBECOM), Dec 2015, pp. 1–6.
  • [82] L. Cui, F. R. Yu, and Q. Yan, “When big data meets software-defined networking: Sdn for big data and big data for sdn,” IEEE Network, vol. 30, no. 1, pp. 58–65, January 2016.
  • [83] M. Bakillah, S. H. L. Liang, A. Mobasheri, and A. Zipf, “Towards an efficient routing web processing service through capturing real-time road conditions from big data,” in Proc. Computer Science and Electronic Engineering Conference (CEEC), 2013 5th, Sept 2013, pp. 152–155.
  • [84] M. H. Zaki and T. Sayed, “Automated cyclist data collection under high density conditions,” IET Intelligent Transport Systems, vol. 10, no. 5, pp. 361–369, 2016.
  • [85] C. T. Cheng, N. Ganganath, and K. Y. Fok, “Concurrent data collection trees for iot applications,” IEEE Transactions on Industrial Informatics, vol. 13, no. 2, pp. 793–799, April 2017.
  • [86] S. Chen, X. Zhu, S. Zhang, and J. Wang, “A framework for massive data transmission in a remote real-time health monitoring system,” in Proc. 18th International Conference on Automation and Computing (ICAC), Sept 2012, pp. 1–5.
  • [87] H. Chao, Y. Chen, J. Wu, and H. Zhang, “Distribution reshaping for massive access control in cellular networks,” in 2016 84-th IEEE Vehicular Technology Conference (VTC-Fall), September 2016, pp. 1–5.
  • [88] S. Jabbar, K. R. Malik, M. Ahmad, O. Aldabbas, M. Asif, S. Khalid, K. Han, and S. H. Ahmed, “A methodology of real-time data fusion for localized big data analytics,” IEEE Access, vol. 6, pp. 24 510–24 520, April 2018.
  • [89] A. Akbar, G. Kousiouris, H. Pervaiz, J. Sancho, P. Ta-Shma, F. Carrez, and K. Moessner, “Real-time probabilistic data fusion for large-scale iot applications,” IEEE Access, vol. 6, pp. 10 015–10 027, February 2018.
  • [90] T. Kraska, “Finding the needle in the big data systems haystack,” IEEE Internet Computing, vol. 17, no. 1, pp. 84–86, Jan 2013.
  • [91] D. J. DeWitt and M. Stonebraker. Mapreduce: A major step backwards. Accessed on July 10, 2017. [Online]. Available: http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/
  • [92] D. Wang and J. Liu, “Optimizing big data processing performance in the public cloud: opportunities and approaches,” IEEE Network, vol. 29, no. 5, pp. 31–35, September 2015.
  • [93] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Commun. ACM, vol. 53, no. 4, pp. 50–58, Apr. 2010. [Online]. Available: http://doi.acm.org/10.1145/1721654.1721672
  • [94] S. Tang, B. S. Lee, and B. He, “Towards economic fairness for big data processing in pay-as-you-go cloud computing,” in Proc. 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom), Dec 2014, pp. 638–643.
  • [95] S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian, “A comparison of join algorithms for log processing in mapreduce,” in Proc. the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’10.   New York, NY, USA: ACM, 2010, pp. 975–986. [Online]. Available: http://doi.acm.org/10.1145/1807167.1807273
  • [96] A. Destounis, G. S. Paschos, and I. Koutsopoulos, “Streaming big data meets backpressure in distributed network computation,” in Proc. IEEE INFOCOM 2016 - The 35th IEEE International Conference on Computer Communications, April 2016, pp. 1–9.
  • [97] C. Yang and J. Chen, “A scalable data chunk similarity based compression approach for efficient big sensing data processing on cloud,” IEEE Transactions on Knowledge and Data Engineering, vol. PP, no. 99, pp. 1–1, 2016.
  • [98] J. M. Lillo-Castellano, I. Mora-Jim茅nez, R. Santiago-Mozos, F. Chavarria-Asso, A. Cano-Gonzalez, A. Garcia-Alberola, and J. L. Rojo-Alvarez, “Symmetrical compression distance for arrhythmia discrimination in cloud-based big-data services,” IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 4, pp. 1253–1263, July 2015.
  • [99] L. A. Tawalbeh, R. Mehmood, E. Benkhlifa, and H. Song, “Mobile cloud computing model and big data analysis for healthcare applications,” IEEE Access, vol. 4, pp. 6171–6180, 2016.
  • [100] M. Whaiduzzaman, A. Gani, and A. Naveed, “Pefc: Performance enhancement framework for cloudlet in mobile cloud computing,” in Proc. 2014 IEEE International Symposium on Robotics and Manufacturing Automation (ROMA), Dec 2014, pp. 224–229.
  • [101] D. Zhang, Y. Shou, and J. Xu, “The modeling of big traffic data processing based on cloud computing,” in Proc. 2016 12th World Congress on Intelligent Control and Automation (WCICA), June 2016, pp. 2394–2399.
  • [102] H. Yeo and C. H. Crawford, “Big data: Cloud computing in genomics applications,” in Big Data (Big Data), 2015 IEEE International Conference on, Oct 2015, pp. 2904–2906.
  • [103] I. Zinno, L. Mossucca, S. Elefante, C. D. Luca, V. Casola, O. Terzo, F. Casu, and R. Lanari, “Cloud computing for earth surface deformation analysis via spaceborne radar imaging: A case study,” IEEE Transactions on Cloud Computing, vol. 4, no. 1, pp. 104–118, Jan 2016.
  • [104] C. Q. Wu and H. Cao, “Optimizing the performance of big data workflows in multi-cloud environments under budget constraint,” in Proc. 2016 IEEE International Conference on Services Computing (SCC), June 2016, pp. 138–145.
  • [105] L. M. Pham, A. Tchana, D. Donsez, V. Zurczak, P. Y. Gibello, and N. de Palma, “An adaptable framework to deploy complex applications onto multi-cloud platforms,” in Proc. 2015 IEEE International Conference on Computing Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), Jan 2015, pp. 169–174.
  • [106] H. Li, M. Dong, K. Ota, and M. Guo, “Pricing and repurchasing for big data processing in multi-clouds,” IEEE Transactions on Emerging Topics in Computing, vol. 4, no. 2, pp. 266–277, April 2016.
  • [107] X. Li, H. Ma, W. Yao, and X. Gui, “Data-driven and feedback-enhanced trust computing pattern for large-scale multi-cloud collaborative services,” IEEE Transactions on Services Computing, 2015.
  • [108] S. Wang, A. Zhou, C. H. Hsu, X. Xiao, and F. Yang, “Provision of data-intensive services through energy- and qos-aware virtual machine placement in national cloud data centers,” IEEE Transactions on Emerging Topics in Computing, vol. 4, no. 2, pp. 290–300, April 2016.
  • [109] J. Zhang, J. Chen, J. Luo, and A. Song, “Efficient location-aware data placement for data-intensive applications in geo-distributed scientific data centers,” Tsinghua Science and Technology, vol. 21, no. 5, pp. 471–481, Oct 2016.
  • [110] Q. Xia, Z. Xu, W. Liang, and A. Y. Zomaya, “Collaboration- and fairness-aware big data management in distributed clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 7, pp. 1941–1953, July 2016.
  • [111] Y. Zhao, Z. Huang, W. Liu, J. Peng, and Q. Zhang, “A combinatorial double auction based resource allocation mechanism with multiple rounds for geo-distributed data centers,” in Proc. 2016 IEEE International Conference on Communications (ICC), May 2016, pp. 1–6.
  • [112] D. Kumar, J. C. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie, and T. C. Havens, “A hybrid approach to clustering in big data,” IEEE Transactions on Cybernetics, vol. 46, no. 10, pp. 2372–2385, Oct 2016.
  • [113] C. K. K. Reddy, K. E. B. Chandrudu, P. R. Anisha, and G. V. S. Raju, “High performance computing cluster system and its future aspects in processing big data,” in Proc. 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Dec 2015, pp. 881–885.
  • [114] Y. Zhao, L. T. Yang, and R. Zhang, “A tensor-based multiple clustering approach with its applications in automation systems,” IEEE Transactions on Industrial Informatics, vol. 14, no. 1, pp. 283–291, 2018.
  • [115] R. Shettar and B. V. Purohit, “A mapreduce framework to implement enhanced k-means algorithm,” in Proc. 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Oct 2015, pp. 361–363.
  • [116] J. Karimov and M. Ozbayoglu, “High quality clustering of big data and solving empty-clustering problem with an evolutionary hybrid algorithm,” in Proc. 2015 IEEE International Conference on Big Data (Big Data), Oct 2015, pp. 1473–1478.
  • [117] J. Karimov, M. Ozbayoglu, and E. Dogdu, “k-means performance improvements with centroid calculation heuristics both for serial and parallel environments,” in Proc. 2015 IEEE International Congress on Big Data, June 2015, pp. 444–451.
  • [118] Data Visualization Tips. Accessed on December 17, 2016. [Online]. Available: http://www.sthda.com/english/wiki/hierarchical-clustering-essentials-unsupervised-machine-learning
  • [119] I. K. Kabul. (2016, May 26) Understanding data mining clustering methods. Accessed on December 17, 2016. [Online]. Available: http://blogs.sas.com/content/subconsciousmusings/2016/05/26/data-mining-clustering/
  • [120] Y. Zhao, C. H. Chi, C. Ding, R. Wong, W. Zhao, and C. Wang, “Hierarchical clustering using homogeneity as similarity measure for big data analytics,” in Proc. 2015 IEEE International Conference on Services Computing (SCC), June 2015, pp. 348–354.
  • [121] T. S. Xu, H. D. Chiang, G. Y. Liu, and C. W. Tan, “Hierarchical k-means method for clustering large-scale advanced metering infrastructure data,” IEEE Transactions on Power Delivery, vol. PP, no. 99, pp. 1–1, 2015.
  • [122] A. AlShami, W. Guo, and G. Pogrebna, “Fuzzy partition technique for clustering big urban dataset,” in Proc. 2016 SAI Computing Conference (SAI), July 2016, pp. 212–216.
  • [123] J. Fu, Y. Liu, Z. Zhang, and F. Xiong, “Big data clustering based on summary statistics,” in Proc. 2015 First International Conference on Computational Intelligence Theory, Systems and Applications (CCITSA), Dec 2015, pp. 87–91.
  • [124] S. Dutta, S. Ghatak, M. Roy, S. Ghosh, and A. K. Das, “A graph based clustering technique for tweet summarization,” in Proc. 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Sept 2015, pp. 1–6.
  • [125] J. Cao and H. Li, “Energy-efficient structuralized clustering for sensor-based cyber physical systems,” in Proc. 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, July 2009, pp. 234–239.
  • [126] Z. Huang, C. Wang, A. Nayak, and I. Stojmenovic, “Small cluster in cyber physical systems: Network topology, interdependence and cascading failures,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 8, pp. 2340–2351, Aug 2015.
  • [127] R. S. Bali and N. Kumar, “Secure clustering for efficient data dissemination in vehicular cyber-physical systems,” Future Generation Computer Systems, vol. 56, pp. 476 – 492, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167739X15002836
  • [128] Y. Huo, W. Dong, J. Qian, and T. Jing, “Coalition game-based secure and effective clustering communication in vehicular cyber-physical system (vcps),” Sensors, vol. 17, no. 3, p. 475, 2017.
  • [129] G. Spezzano and A. Vinci, “Pattern detection in cyber-physical systems,” Procedia Computer Science, vol. 52, pp. 1016 – 1021, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1877050915008960
  • [130] Z. Tan, A. Jamdagni, X. He, P. Nanda, R. P. Liu, and J. Hu, “Detection of denial-of-service attacks based on computer vision techniques,” IEEE Transactions on Computers, vol. 64, no. 9, pp. 2519–2533, Sept 2015.
  • [131] J. S. Lee and I. Y. Ko, “Service recommendation for user groups in internet of things environments using member organization-based group similarity measures,” in Proc. 2016 IEEE International Conference on Web Services (ICWS), June 2016, pp. 276–283.
  • [132] S. Chen, X. Yang, and Y. Tian, “Discriminative hierarchical k-means tree for large-scale image classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2200–2205, Sept 2015.
  • [133] H. R. Arkian, R. E. Atani, and S. Kamali, “Fcvca: A fuzzy clustering-based vehicular cloud architecture,” in Proc. 2014 7th International Workshop on Communication Technologies for Vehicles (Nets4Cars-Fall), Oct 2014, pp. 24–28.
  • [134] R. Poorvadevi and S. Rajalakshmi, “A cluster based signature evaluation mechanism for protecting the user data in cloud environment through fuzzy ordering approach,” in Proc. 2015 International Conference on Computing and Communications Technologies (ICCCT), Feb 2015, pp. 392–397.
  • [135] J. C. C. Tseng, J. Y. Gu, P. F. Wang, C. Y. Chen, C. F. Li, and V. S. Tseng, “A scalable complex event analytical system with incremental episode mining over data streams,” in Proc. 2016 IEEE Congress on Evolutionary Computation (CEC), July 2016, pp. 648–655.
  • [136] L. Yang, K. Wang, C. Xu, C. Zhu, and Y. Sun, “An incremental learning classification algorithm based on forgetting factor for ehealth networks,” in Proc. 2016 IEEE International Conference on Communications (ICC), May 2016, pp. 1–6.
  • [137] D. Fu, T. Zhou, ZeyuZheng, Y. Fu, and Y. Tong, “A modified-distance-based minimum spanning tree method for analyzing hierarchical structure of power generation system,” in 2016 12th World Congress on Intelligent Control and Automation (WCICA), June 2016, pp. 422–425.
  • [138] S. Li, G. Oikonomou, T. Tryfonas, T. M. Chen, and L. D. Xu, “A distributed consensus algorithm for decision making in service-oriented internet of things,” IEEE Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1461–1468, May 2014.
  • [139] E. Al-sharoa and S. Aviyente, “Evolutionary spectral graph clustering through subspace distance measure,” in Proc. 2016 IEEE Statistical Signal Processing Workshop (SSP), June 2016, pp. 1–5.
  • [140] Y. S. Kang, I. H. Park, J. Rhee, and Y. H. Lee, “Mongodb-based repository design for iot-generated rfid/sensor big data,” IEEE Sensors Journal, vol. 16, no. 2, pp. 485–497, Jan 2016.
  • [141] R. Cattell, “Scalable sql and nosql data stores,” SIGMOD Rec., vol. 39, no. 4, pp. 12–27, May 2011. [Online]. Available: http://doi.acm.org/10.1145/1978915.1978919
  • [142] J. M. Patel, “Operational nosql systems: What’s new and what’s next?” Computer, vol. 49, no. 4, pp. 23–30, Apr 2016.
  • [143] C. Strauch, U.-L. S. Sites, and W. Kriha, “NoSQL databases,” URL: http://www. christof-strauch. de/nosqldbs. pdf (写邪褌邪 芯斜褉邪褖械薪懈褟 07.11. 2012), 2011.
  • [144] A. Mohan, M. Ebrahimi, S. Lu, and A. Kotov, “A nosql data model for scalable big data workflow execution,” in Proc. 2016 IEEE International Congress on Big Data (BigData Congress), June 2016, pp. 52–59.
  • [145] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and its role in the internet of things,” in Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, ser. MCC ’12.   New York, NY, USA: ACM, 2012, pp. 13–16. [Online]. Available: http://doi.acm.org/10.1145/2342509.2342513
  • [146] I. Stojmenovic and S. Wen, “The fog computing paradigm: Scenarios and security issues,” in Proc. the 2014 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 2.   IEEE, 2014, pp. 1–8. [Online]. Available: http://dx.doi.org/10.15439/2014F503
  • [147] . S. Roy, R. Bose, and D. Sarddar, “A fog-based dss model for driving rule violation monitoring framework on the internet of things,” Journal of Advanced Science and Technology, vol. 82, p. 23鈥?2, 2015.
  • [148] N. Chen, Y. Chen, Y. You, H. Ling, P. Liang, and R. Zimmermann, “Dynamic urban surveillance video stream processing using fog computing,” in Proc. 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), April 2016, pp. 105–112.
  • [149] Y. Cao, S. Chen, P. Hou, and D. Brown, “Fast: A fog computing assisted distributed analytics system to monitor fall for stroke mitigation,” in Proc. 2015 IEEE International Conference on Networking, Architecture and Storage (NAS), Aug 2015, pp. 2–11.
  • [150] R. Craciunescu, A. Mihovska, M. Mihaylov, S. Kyriazakos, R. Prasad, and S. Halunga, “Implementation of fog computing for reliable e-health applications,” in Proc. 2015 49th Asilomar Conference on Signals, Systems and Computers, Nov 2015, pp. 459–463.
  • [151] R. Brzoza-Woch, M. Konieczny, B. Kwolek, P. Nawrocki, T. Szyd艂o, and K. Zieli艅ski, “Holistic approach to urgent computing for flood decision support,” Procedia Computer Science, vol. 51, pp. 2387 – 2396, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1877050915012223
  • [152] J. K. Zao, T. T. Gan, C. K. You, S. J. R. M茅ndez, C. E. Chung, Y. T. Wang, T. Mullen, and T. P. Jung, “Augmented brain computer interaction based on fog computing and linked data,” in Proc. 2014 International Conference on Intelligent Environments, June 2014, pp. 374–377.
  • [153] T. N. Gia, M. Jiang, A. M. Rahmani, T. Westerlund, P. Liljeberg, and H. Tenhunen, “Fog computing in healthcare internet of things: A case study on ecg feature extraction,” in 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, Oct 2015, pp. 356–363.
  • [154] K. Hong, D. Lillethun, U. Ramachandran, B. Ottenwälder, and B. Koldehofe, “Mobile fog: A programming model for large-scale applications on the internet of things,” in Proceedings of the Second ACM SIGCOMM Workshop on Mobile Cloud Computing, ser. MCC ’13.   New York, NY, USA: ACM, 2013, pp. 15–20. [Online]. Available: http://doi.acm.org/10.1145/2491266.2491270
  • [155] B. Ottenwälder, B. Koldehofe, K. Rothermel, and U. Ramachandran, “Migcep: Operator migration for mobility driven distributed complex event processing,” in Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, ser. DEBS ’13.   New York, NY, USA: ACM, 2013, pp. 183–194. [Online]. Available: http://doi.acm.org/10.1145/2488222.2488265
  • [156] S. Yi, Z. Hao, Z. Qin, and Q. Li, “Fog computing: Platform and applications,” in 2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb), Nov 2015, pp. 73–78.
  • [157] R. Lu, X. Li, X. Liang, X. Shen, and X. Lin, “Grs: The green, reliability, and security of emerging machine to machine communications,” IEEE Communications Magazine, vol. 49, no. 4, pp. 28–35, April 2011.
  • [158] L. Zhang, W. Jia, S. Wen, and D. Yao, “A man-in-the-middle attack on 3g-wlan interworking,” in 2010 International Conference on Communications and Mobile Computing, vol. 1, April 2010, pp. 121–125.
  • [159] H. J. Watson, “Tutorial: Big data analytics: Concepts, technologies, and applications,” Communication of the Association for Information Systems, vol. 34, Article 65, pp. 124––168, 2014.
  • [160] F. Qu, F. Y. Wang, and L. Yang, “Intelligent transportation spaces: vehicles, traffic, communications, and beyond,” IEEE Communications Magazine, vol. 48, no. 11, pp. 136–142, November 2010.
  • [161] L. Xu, J. Li, Y. Shu, and J. Peng, “Sar image denoising via clustering-based principal component analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 11, pp. 6858–6869, Nov 2014.
  • [162] T. C. Chen, S. Sanga, T. Y. Chou, V. Cristini, and M. E. Edgerton, “Neural network with k-means clustering via pca for gene expression profile analysis,” in Proc. 2009 WRI World Congress on Computer Science and Information Engineering, vol. 3, March 2009, pp. 670–673.
  • [163] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “Advances in knowledge discovery and data mining,” U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds.   Menlo Park, CA, USA: American Association for Artificial Intelligence, 1996, ch. From Data Mining to Knowledge Discovery: An Overview, pp. 1–34. [Online]. Available: http://dl.acm.org/citation.cfm?id=257938.257942
  • [164] M. Behl and R. Mangharam, “Interactive analytics for smart cities infrastructures,” in Proc. 2016 1st International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE) in partnership with Global City Teams Challenge (GCTC) (SCOPE - GCTC), April 2016, pp. 1–6.
  • [165] V. Kardeby, “Automatic sensor clustering: connectivity for the internet of things,” in Licentiate thesis, Mid Sweden University, Department of Information Technology and Media, 2011.
  • [166] P. Rashidi, D. J. Cook, L. B. Holder, and M. Schmitter-Edgecombe, “Discovering activities to recognize and track in a smart environment,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 4, pp. 527–539, April 2011.
  • [167] D. Zhang, D. Zhang, H. Xiong, L. T. Yang, and V. Gauthier, “Nextcell: predicting location using social interplay from cell phone traces,” IEEE Transactions on Computers, vol. 64, no. 2, pp. 452–463, 2015.
  • [168] A. Sotsenko, M. Jansen, M. Milrad, and J. Rana, “Using a rich context model for real-time big data analytics in twitter,” in Proc. 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), Aug 2016, pp. 228–233.
  • [169] I. Toure and A. Gangopadhyay, “Real time big data analytics for predicting terrorist incidents,” in Proc. 2016 IEEE Symposium on Technologies for Homeland Security (HST), May 2016, pp. 1–6.
  • [170] V.-D. Ta, C.-M. Liu, and G. W. Nkabinde, “Big data stream computing in healthcare real-time analytics,” in Proc. 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), July 2016, pp. 37–42.
  • [171] A. Daniel, A. Paul, and A. Ahmad, “Near real-time big data analysis on vehicular networks,” in Proc. 2015 International Conference on Soft-Computing and Networks Security (ICSNS), Feb 2015, pp. 1–7.
  • [172] K. Wang, J. Mi, C. Xu, L. Shu, and D.-J. Deng, “Real-time big data analytics for multimedia transmission and storage,” in Proc. 2016 IEEE/CIC International Conference on Communications in China (ICCC), July 2016, pp. 1–6.
  • [173] B. Zhou, J. Li, S. Guo, J. Wu, Y. Hu, and L. Zhu, “Online internet traffic measurement and monitoring using spark streaming,” in Proc. IEEE Global Communications Conference (GLOBECOM), Dec. 2017, pp. 1–6.
  • [174] H. Cao, M. Wachowicz, and S. Cha, “Developing an edge computing platform for real-time descriptive analytics,” in Proc. 2017 IEEE International Conference on Big Data (Big Data), Dec. 2017, pp. 4546–4554.
  • [175] S. Trinks and C. Felden, “Real time analytics — state of the art: Potentials and limitations in the smart factory,” in Proc. 2017 IEEE International Conference on Big Data (Big Data), Dec. 2017, pp. 4843–4845.
  • [176] A. R. Ali, “Real-time big data warehousing and analysis framework,” in Proc. 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), March 2018, pp. 43–49.
  • [177] S. Liu, J. Yin, X. Wang, W. Cui, K. Cao, and J. Pei, “Online visual analytics of text streams,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 11, pp. 2451–2466, Nov 2016.
  • [178] P. Chopade, J. Zhan, K. Roy, and K. Flurchick, “Real-time large-scale big data networks analytics and visualization architecture,” in Proc. Emerging Technologies for a Smarter World (CEWIT), 2015 12th International Conference Expo on, Oct 2015, pp. 1–6.
  • [179] L. Mo, F. Li, Y. Zhu, and A. Huang, “Human physical activity recognition based on computer vision with deep learning model,” in Proc. 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings, May 2016, pp. 1–6.
  • [180] S. Singh and Y. Liu, “A cloud service architecture for analyzing big monitoring data,” Tsinghua Science and Technology, vol. 21, no. 1, pp. 55–70, Feb 2016.
  • [181] Microsoft Download Center, “Project daytona: Iterative mapreduce on windows azure,” 2016. [Online]. Available: https://www.microsoft.com/en-us/download/details.aspx?id=52431
  • [182] Y. Yetis, R. G. Sara, B. A. Erol, H. Kaplan, A. Akuzum, and M. Jamshidi, “Application of big data analytics via cloud computing,” in Proc. 2016 World Automation Congress (WAC), July 2016, pp. 1–5.
  • [183] F. J. Clemente-Castellè´¸, B. Nicolae, K. Katrinis, M. M. Rafique, R. Mayo, J. C. Fernè°©ndez, and D. Loreti, “Enabling big data analytics in the hybrid cloud using iterative mapreduce,” in Proc. 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC), Dec 2015, pp. 290–299.
  • [184] M. V. Neves, C. A. F. D. Rose, K. Katrinis, and H. Franke, “Pythia: Faster big data in motion through predictive software-defined network optimization at runtime,” in Proc. Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, May 2014, pp. 82–90.
  • [185] S. Islam, J. Keung, K. Lee, and A. Liu, “Empirical prediction models for adaptive resource provisioning in the cloud,” Future Gener. Comput. Syst., vol. 28, no. 1, pp. 155–162, Jan. 2012. [Online]. Available: http://dx.doi.org/10.1016/j.future.2011.05.027
  • [186] R. Buyya, K. Ramamohanarao, C. Leckie, R. N. Calheiros, A. V. Dastjerdi, and S. Versteeg, “Big data analytics-enhanced cloud computing: Challenges, architectural elements, and future directions,” in Proc. 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), Dec 2015, pp. 75–84.
  • [187] Y. Zhong, J. Fang, and X. Zhao, “VegaIndexer: A distributed composite index scheme for big spatio-temporal sensor data on cloud,” in Proc. 2013 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2013, pp. 1713–1716.
  • [188] D. Zheng, K. Ben, and H. Yuan, “Research of big data space-time analytics for clouding based contexts-aware IOV applications,” in Proc. 2014 Second International Conference on Advanced Cloud and Big Data, Nov. 2014, pp. 150–156.
  • [189] M. Sinda and Q. Liao, “Spatial-temporal anomaly detection using security visual analytics via entropy graph and eigen matrix,” in Proc. 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), Nov. 2017, pp. 511–518.
  • [190] G. Ding, Z. Tan, J. Wu, and J. Zhang, “Efficient indoor fingerprinting localization technique using regional propagation model,” IEICE Transactions on Communications, vol. E97-B, no. 8, pp. 1728–1741, Aug. 2014.
  • [191] G. Ding, Z. Tan, J. Wu, J. Zeng, and L. Zhang, “Indoor fingerprinting localization and tracking system using particle swarm optimization and kalman filter,” IEICE Transactions on Communications, vol. E98-B, no. 3, pp. 502–514, Mar. 2015.
  • [192] K. Singh and R. Kaur, “Hadoop: Addressing challenges of big data,” in Proc. 2014 IEEE International Advance Computing Conference (IACC), Feb 2014, pp. 686–689.
  • [193] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. Ullah Khan, “The rise of Big Data on cloud computing,” Inf. Syst., vol. 47, no. C, pp. 98–115, Jan. 2015. [Online]. Available: http://dx.doi.org/10.1016/j.is.2014.07.006
  • [194] R. Atat, L. Liu, H. Chen, J. Wu, H. Li, and Y. Yi, “Enabling cyber-physical communication in 5G cellular networks: challenges, spatial spectrum sensing, and cyber-security,” IET Cyber-Physical Systems: Theory Applications, vol. 2, no. 1, pp. 49–54, Apr. 2017.
  • [195] K. Gai, M. Qiu, and H. Zhao, “Security-aware efficient mass distributed storage approach for cloud systems in big data,” in Proc. 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), April 2016, pp. 140–145.
  • [196] S. Kang, B. Veeravalli, and K. M. M. Aung, “A security-aware data placement mechanism for big data cloud storage systems,” in Proc. 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), April 2016, pp. 327–332.
  • [197] K. Sekar and M. Padmavathamma, “Comparative study of encryption algorithm over big data in cloud systems,” in Proc. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), March 2016, pp. 1571–1574.
  • [198] C. J. D’Orazio, K.-K. R. Choo, and L. T. Yang, “Data exfiltration from internet of things devices: iOS devices as case studies,” IEEE Internet of Things Journal, vol. 4, no. 2, pp. 524–535, April 2017.
  • [199] J. Ni, X. Lin, K. Zhang, Y. Yu, and X. S. Shen, “Secure outsourced data transfer with integrity verification in cloud storage,” in 2016 IEEE/CIC International Conference on Communications in China (ICCC), July 2016, pp. 1–6.
  • [200] opendedup. (2016) Accessed on July 1, 2017. [Online]. Available: http://opendedup.org/
  • [201] D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” in the 9th USENIX conference on File and stroage technologies, Feb. 2011, pp. 14:1–14:20.
  • [202] Z. Yan, W. Ding, X. Yu, H. Zhu, and R. H. Deng, “Deduplication on encrypted big data in cloud,” IEEE Transactions on Big Data, vol. 2, no. 2, pp. 138–150, June 2016.
  • [203] Y. Gahi, M. Guennoun, and H. T. Mouftah, “Big data analytics: Security and privacy challenges,” in 2016 IEEE Symposium on Computers and Communication (ISCC), June 2016, pp. 952–957.
  • [204] S. Rao, S. N. Suma, and M. Sunitha, “Security solutions for big data analytics in healthcare,” in Advances in Computing and Communication Engineering (ICACCE), 2015 Second International Conference on, May 2015, pp. 510–514.
  • [205] T. Mahmood and U. Afzal, “Security analytics: Big data analytics for cybersecurity: A review of trends, techniques and tools,” in Information Assurance (NCIA), 2013 2nd National Conference on, Dec 2013, pp. 129–134.
  • [206] Y. He, F. R. Yu, N. Zhao, H. Yin, H. Yao, and R. C. Qiu, “Big data analytics in mobile cellular networks,” IEEE Access, vol. 4, pp. 1985–1996, 2016.
  • [207] Y. Zhao, L. T. Yang, and J. Sun, “A secure high-order CFS algorithm on clouds for industrial internet of things,” IEEE Transactions on Industrial Informatics, vol. 14, no. 8, pp. 3766–3774, 2018.
  • [208] D. He, S. Chan, Y. Zhang, C. Wu, and B. Wang, “How effective are the prevailing attack-defense models for cybersecurity anyway?” IEEE Intelligent Systems, vol. 29, no. 5, pp. 14–21, Sept 2014.
  • [209] B. Ristic, “Detecting anomalies from a multitarget tracking output,” IEEE Transactions on Aerospace and Electronic Systems, vol. 50, no. 1, pp. 798–803, January 2014.
  • [210] E. Rocha, P. Salvador, and A. Nogueira, “Detection of illicit network activities based on multivariate gaussian fitting of multi-scale traffic characteristics,” in 2011 IEEE International Conference on Communications (ICC), June 2011, pp. 1–6.
  • [211] L. Jia, M. Li, P. Zhang, Y. Wu, and H. Zhu, “Sar image change detection based on multiple kernel k-means clustering with local-neighborhood information,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 6, pp. 856–860, June 2016.
  • [212] J. Murphree, “Machine learning anomaly detection in large systems,” in 2016 IEEE AUTOTESTCON, Sept 2016, pp. 1–9.
  • [213] Y. Yuan and K. Jia, “A distributed anomaly detection method of operation energy consumption using smart meter data,” in 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), Sept 2015, pp. 310–313.
  • [214] T. Plantard, W. Susilo, and Z. Zhang, “Fully homomorphic encryption using hidden ideal lattice,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 12, pp. 2127–2137, Dec 2013.
  • [215] H. Kumarage, I. Khalil, A. Alabdulatif, Z. Tari, and X. Yi, “Secure data analytics for cloud-integrated internet of things applications,” IEEE Cloud Computing, vol. 3, no. 2, pp. 46–56, Mar 2016.
  • [216] Y. Li, Z. L. Jiang, X. Wang, and S. M. Yiu, “Privacy-preserving id3 data mining over encrypted data in outsourced environments with multiple keys,” in Proc. 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 1, July 2017, pp. 548–555.
  • [217] S. Qiu, B. Wang, M. Li, J. Liu, and Y. Shi, “Toward practical privacy-preserving frequent itemset mining on encrypted cloud data,” IEEE Transactions on Cloud Computing, 2017.
  • [218] R. Lu, H. Zhu, X. Liu, J. K. Liu, and J. Shao, “Toward efficient and privacy-preserving computing in big data era,” IEEE Network, vol. 28, no. 4, pp. 46–50, July 2014.
  • [219] A. Chakravorty, T. Wlodarczyk, and C. Rong, “Privacy preserving data analytics for smart homes,” in Proc. 2013 IEEE Security and Privacy Workshops, May 2013, pp. 23–27.
  • [220] T. M. Truta and B. Vinay, “Privacy protection: p-sensitive k-anonymity property,” in Proc. 22nd International Conference on Data Engineering Workshops (ICDEW’06), 2006, pp. 94–94.
  • [221] Z. Gheid and Y. Challal, “An efficient and privacy-preserving similarity evaluation for big data analytics,” in Proc. 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC), Dec 2015, pp. 281–289.
  • [222] Y. T. Lee, W. H. Hsiao, Y. S. Lin, and S. C. T. Chou, “Privacy-preserving data analytics in cloud-based smart home with community hierarchy,” IEEE Transactions on Consumer Electronics, vol. 63, no. 2, pp. 200–207, May 2017.
  • [223] S. Chen, M. Ma, and Z. Luo, “An authentication framework for multi-domain machine-to-machine communication in cyber-physical systems,” in Proc. 2015 IEEE Globecom Workshops (GC Wkshps), Dec 2015, pp. 1–6.
  • [224] N. Tamani and Y. Ghamri-Doudane, “Towards a user privacy preservation system for iot environments: a habit-based approach,” in Proc. 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), July 2016, pp. 2425–2432.
  • [225] V. Sivaraman, H. H. Gharakheili, A. Vishwanath, R. Boreli, and O. Mehani, “Network-level security and privacy control for smart-home iot devices,” in Proc. 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Oct 2015, pp. 163–167.
  • [226] C. Liu, S. Ghosal, Z. Jiang, and S. Sarkar, “An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed cps,” in Proc. 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS), April 2016, pp. 1–10.
  • [227] P. Y. Chen, S. Yang, and J. A. McCann, “Distributed real-time anomaly detection in networked industrial sensing systems,” IEEE Transactions on Industrial Electronics, vol. 62, no. 6, pp. 3832–3842, June 2015.
  • [228] Y. Kwon, H. K. Kim, Y. H. Lim, and J. I. Lim, “A behavior-based intrusion detection technique for smart grid infrastructure,” in Proc. 2015 IEEE Eindhoven PowerTech, June 2015, pp. 1–6.
  • [229] H. H. Pajouh, R. Javidan, R. Khayami, D. Ali, and K. K. R. Choo, “A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in iot backbone networks,” IEEE Transactions on Emerging Topics in Computing, vol. PP, no. 99, pp. 1–1, 2016.
  • [230] H. Sedjelmaci, S. M. Senouci, and M. Al-Bahri, “A lightweight anomaly detection technique for low-resource iot devices: A game-theoretic methodology,” in Proc. 2016 IEEE International Conference on Communications (ICC), May 2016, pp. 1–6.
  • [231] M. S. Zitouni, J. Dias, M. Al-Mualla, and H. Bhaskar, “Hierarchical crowd detection and representation for big data analytics in visual surveillance,” in Proc. 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oct 2015, pp. 1827–1832.
  • [232] P. Sarigiannidis, E. Karapistoli, and A. A. Economides, “Visiot: A threat visualisation tool for iot systems security,” in Proc. 2015 IEEE International Conference on Communication Workshop (ICCW), June 2015, pp. 2633–2638.
  • [233] D. Takaishi, H. Nishiyama, N. Kato, and R. Miura, “Toward energy efficient big data gathering in densely distributed sensor networks,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 388–397, Sept 2014.
  • [234] Z. Asad, M. A. R. Chaudhry, and D. Malone, “Greener data exchange in the cloud: A coding-based optimization for big data processing,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 5, pp. 1360–1377, May 2016.
  • [235] A. Das, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and C. Yu, “Transparent and flexible network management for big data processing in the cloud,” in Proc. the 5th USENIX Workshop on Hot Topics in Cloud Computing.   Berkeley, CA: USENIX, 2013. [Online]. Available: https://www.usenix.org/conference/hotcloud13/workshop-program/presentations/Das
  • [236] D. Perino, M. Varvello, and K. P. N. Puttaswamy, “Icn-re: Redundancy elimination for information-centric networking,” in Proc. the Second Edition of the ICN Workshop on Information-centric Networking, ser. ICN ’12.   New York, NY, USA: ACM, 2012, pp. 91–96. [Online]. Available: http://doi.acm.org/10.1145/2342488.2342508
  • [237] C. Yan, Y. Song, J. Wang, and W. Guo, “Eliminating the redundancy in mapreduce-based entity resolution,” in Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, May 2015, pp. 1233–1236.
  • [238] K. Lee, D. Kim, and I. Shin, “Reboost: Improving throughput in wireless networks using redundancy elimination,” IEEE Communications Letters, vol. PP, no. 99, pp. 1–1, 2016.
  • [239] N. Li, J. C. Hou, and L. Sha, “Design and analysis of an mst-based topology control algorithm,” IEEE Transactions on Wireless Communications, vol. 4, no. 3, pp. 1195–1206, May 2005.
  • [240] K. Miyao, H. Nakayama, N. Ansari, and N. Kato, “Ltrt: An efficient and reliable topology control algorithm for ad-hoc networks,” IEEE Transactions on Wireless Communications, vol. 8, no. 12, pp. 6050–6058, December 2009.
  • [241] H. Lin, L. Wang, and R. Kong, “Energy efficient clustering protocol for large-scale sensor networks,” IEEE Sensors Journal, vol. 15, no. 12, pp. 7150–7160, Dec 2015.
  • [242] L. Kong, D. Zhang, Z. He, Q. Xiang, J. Wan, and M. Tao, “Embracing big data with compressive sensing: a green approach in industrial wireless networks,” IEEE Communications Magazine, vol. 54, no. 10, pp. 53–59, October 2016.
  • [243] Z. Zhao, J. Feng, and B. Peng, “A green distributed signal reconstruction algorithm in wireless sensor networks,” IEEE Access, vol. 4, pp. 5908–5917, 2016.
  • [244] Y. Simmhan, S. Aman, A. Kumbhare, R. Liu, S. Stevens, Q. Zhou, and V. Prasanna, “Cloud-based software platform for big data analytics in smart grids,” Computing in Science Engineering, vol. 15, no. 4, pp. 38–47, July 2013.
  • [245] K. Li, L. Shu, M. Mukherjee, D. Wang, and L. Hu, “Prolonging network lifetime with sleep scheduling for solar harvesting industrial wsns,” in Proc. 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Dec 2016, pp. 1532–1533.
  • [246] M. Shojafar, C. Canali, R. Lancellotti, and J. Abawajy, “Adaptive computing-plus-communication optimization framework for multimedia processing in cloud systems,” IEEE Transactions on Cloud Computing, vol. PP, no. 99, pp. 1–1, 2016.
  • [247] H. Chao and J. Wu, “15 - optimizing power saving in cellular networks for machine-to-machine (m2m) communications,” in Machine-to-machine (M2M) Communications, C. Antè´¸n-Haro and M. Dohler, Eds.   Oxford: Woodhead Publishing, 2015, pp. 269 – 290. [Online]. Available: http://www.sciencedirect.com/science/article/pii/B9781782421023000150
  • [248] H. Chao, Y. Chen, and J. Wu, “Power saving for machine to machine communications in cellular networks,” in Proc. 2011 IEEE GLOBECOM Workshops (GC Wkshps), Dec 2011, pp. 389–393.
  • [249] P. Mahadevan, P. Sharma, S. Banerjee, and P. Ranganathan, “A power benchmarking framework for network devices,” in Proc. the 8th International IFIP-TC 6 Networking Conference, ser. NETWORKING ’09.   Berlin, Heidelberg: Springer-Verlag, 2009, pp. 795–808.
  • [250] X. Wang, Y. Yao, X. Wang, K. Lu, and Q. Cao, “Carpo: Correlation-aware power optimization in data center networks,” in Proc. 2012 Proceedings IEEE INFOCOM, March 2012, pp. 1125–1133.
  • [251] B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, and N. McKeown, “Elastictree: Saving energy in data center networks,” in Proc. the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10.   Berkeley, CA, USA: USENIX Association, 2010, pp. 17–17. [Online]. Available: http://dl.acm.org/citation.cfm?id=1855711.1855728
  • [252] Y. Zhang and N. Ansari, “Hero: Hierarchical energy optimization for data center networks,” IEEE Systems Journal, vol. 9, no. 2, pp. 406–415, June 2015.
  • [253] Y. Han, S. s. Seo, J. Li, J. Hyun, J. H. Yoo, and J. W. K. Hong, “Software defined networking-based traffic engineering for data center networks,” in Proc. 2014 16th Asia-Pacific Network Operations and Management Symposium (APNOMS), Sept 2014, pp. 1–6.
  • [254] L. Wang, F. Zhang, K. Zheng, A. V. Vasilakos, S. Ren, and Z. Liu, “Energy-efficient flow scheduling and routing with hard deadlines in data center networks,” in Proc. 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS), June 2014, pp. 248–257.
  • [255] H. Zhang, K. Chen, W. Bai, D. Han, C. Tian, H. Wang, H. Guan, and M. Zhang, “Guaranteeing deadlines for inter-data center transfers,” IEEE/ACM Transactions on Networking, vol. PP, no. 99, pp. 1–17, 2016.
  • [256] J. Zhang, K. Li, D. Guo, H. Qi, W. Li, and Y. Jin, “Mdfs: Deadline-driven flow scheduling scheme in multi-resource environments,” IEEE Transactions on Multi-Scale Computing Systems, vol. 1, no. 4, pp. 207–219, Oct 2015.
  • [257] L. Wang, F. Zhang, J. A. Aroca, A. V. Vasilakos, K. Zheng, C. Hou, D. Li, and Z. Liu, “Greendcn: A general framework for achieving energy efficiency in data center networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 4–15, January 2014.
  • [258] E. Renault, “Parallel execution of for loops using checkpointing techniques,” in Proc. the 2005 International Conference on Parallel Processing Workshops, ser. ICPPW ’05.   Washington, DC, USA: IEEE Computer Society, 2005, pp. 313–319. [Online]. Available: http://dx.doi.org/10.1109/ICPPW.2005.65
  • [259] L. Mereuta and E. Renault, “Checkpointing aided parallel execution model and analysis,” in Proc. the Third International Conference on High Performance Computing and Communications, ser. HPCC’07.   Berlin, Heidelberg: Springer-Verlag, 2007, pp. 707–717. [Online]. Available: http://dl.acm.org/citation.cfm?id=2401945.2402023
  • [260] . Renault and S. Boumerdassi, “Towards an energy-efficient tool for processing the big data,” in Proc. 2014 International Conference on Future Internet of Things and Cloud (FiCloud), Aug 2014, pp. 448–452.
  • [261] A. Katal, M. Wazid, and R. H. Goudar, “Big data: Issues, challenges, tools and good practices,” in Contemporary Computing (IC3), 2013 Sixth International Conference on, Aug 2013, pp. 404–409.
  • [262] X. Tong, C. Kang, and Q. Xia, “Smart metering load data compression based on load feature identification,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2414–2422, Sept 2016.
  • [263] S. W. Jun, K. E. Fleming, M. Adler, and J. Emer, “Zip-io: Architecture for application-specific compression of big data,” in Field-Programmable Technology (FPT), 2012 International Conference on, Dec 2012, pp. 343–351.
  • [264] L. Tian, H. Wang, Q. Tang, and Y. Zhou, “Surveillance source compression with background modeling for video big data,” in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), Oct 2016, pp. 105–110.
  • [265] H. Chao, Y. Chen, and J. Wu, “Power saving for machine to machine communications in cellular networks,” in 2011 IEEE GLOBECOM Workshops (GC Wkshps, December 2011, pp. 389–393.
  • [266] H. Chen, L. Liu, T. Novlan, J. D. Matyjas, B. L. Ng, and J. Zhang, “Spatial spectrum sensing-based device-to-device cellular networks,” IEEE Transactions on Wireless Communications, vol. 15, no. 11, pp. 7299–7313, Nov 2016.
  • [267] K. Yang, S. Martin, C. Xing, J. Wu, and R. Fan, “Energy-efficient power control for device-to-device communications.” IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 3208–3220, December 2016.
  • [268] Y. Xu, S. Jiang, and J. Wu, “Towards energy efficient device-to-device content dissemination in cellular networks,” IEEE Access, vol. 6, pp. 25 816–25 828, 2018.
  • [269] A. and H. Y. Kong, “Energy efficient cooperative leach protocol for wireless sensor networks,” Journal of Communications and Networks, vol. 12, no. 4, pp. 358–365, Aug 2010.
  • [270] K. Wang, Y. Wang, Y. Sun, S. Guo, and J. Wu, “Green industrial internet of things architecture: An energy-efficient perspective,” IEEE Communications Magazine, vol. 54, no. 12, pp. 48–54, December 2016.
  • [271] R. Atat, L. Liu, J. Wu, J. Ashdown, and Y. Yi, “Green massive traffic offloading for cyber-physical systems over heterogeneous cellular networks,” Mobile Networks and Applications, pp. 1–9, 2018.
  • [272] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The cost of a cloud: Research problems in data center networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008. [Online]. Available: http://doi.acm.org/10.1145/1496091.1496103
  • [273] H. Liu, B. Liu, L. T. Yang, M. Lin, Y. Deng, K. Bilal, and S. U. Khan, “Thermal-aware and DVFS-enabled big data task scheduling for data centers,” IEEE Transactions on Big Data, vol. 4, no. 2, pp. 177–190, 2018.
  • [274] H. Ayyalasomayajula, E. Gabriel, P. Lindner, and D. Price, “Air quality simulations using big data programming models,” in 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), March 2016, pp. 182–184.
  • [275] J. Y. Zhu, C. Sun, and V. O. K. Li, “Granger-causality-based air quality estimation with spatio-temporal (s-t) heterogeneous big data,” in 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2015, pp. 612–617.
  • [276] J. Y. Zhu, Y. Zheng, X. Yi, and V. O. K. Li, “A gaussian bayesian model to identify spatio-temporal causalities for air pollution based on urban big data,” in 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2016, pp. 3–8.
  • [277] M. Fingas and C. Brown, “Review of oil spill remote sensing,” Marine Pollution Bulletin, vol. 83, no. 1, pp. 9 – 23, 2014. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0025326X14002021
  • [278] G. Suciu, V. Suciu, C. Dobre, and C. Chilipirea, “Tele-monitoring system for water and underwater environments using cloud and big data systems,” in Proc. 2015 20th International Conference on Control Systems and Computer Science, May 2015, pp. 809–813.
  • [279] Y. Zheng, T. Liu, Y. Wang, Y. Zhu, Y. Liu, and E. Chang, “Diagnosing new york city’s noises with ubiquitous data,” in Proc. the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ser. UbiComp ’14.   New York, NY, USA: ACM, 2014, pp. 715–725. [Online]. Available: http://doi.acm.org/10.1145/2632048.2632102
  • [280] S. Hachem, V. Mallet, R. Ventura, A. Pathak, V. Issarny, P. G. Raverdy, and R. Bhatia, “Monitoring noise pollution using the urban civics middleware,” in Proc. 2015 IEEE First International Conference on Big Data Computing Service and Applications (BigDataService), March 2015, pp. 52–61.
  • [281] S. Bera, S. Misra, and M. S. Obaidat, “Energy-efficient smart metering for green smart grid communication,” in Proc. 2014 IEEE Global Communications Conference, Dec 2014, pp. 2466–2471.
  • [282] X. Li, C. P. Bowers, and T. Schnier, “Classification of energy consumption in buildings with outlier detection,” IEEE Transactions on Industrial Electronics, vol. 57, no. 11, pp. 3639–3644, Nov 2010.
  • [283] C. H. Lee and C. H. Wu, “Collecting and mining big data for electric vehicle systems using battery modeling data,” in Proc. 2015 12th International Conference on Information Technology - New Generations (ITNG), April 2015, pp. 626–631.
  • [284] Y. B. Qin, J. Housell, and I. Rodero, “Cloud-based data analytics framework for autonomic smart grid management,” in Proc. the 2014 International Conference on Cloud and Autonomic Computing, ser. ICCAC ’14.   Washington, DC, USA: IEEE Computer Society, 2014, pp. 97–100. [Online]. Available: http://dx.doi.org/10.1109/ICCAC.2014.39
  • [285] F. Saremi, O. Fatemieh, H. Ahmadi, H. Wang, T. Abdelzaher, R. Ganti, H. Liu, S. Hu, S. Li, and L. Su, “Experiences with greengps–fuel-efficient navigation using participatory sensing,” IEEE Transactions on Mobile Computing, vol. 15, no. 3, pp. 672–689, March 2016.
  • [286] E. B. Hamida, H. Noura, and W. Znaidi, “Security of cooperative intelligent transport systems: Standards, threats analysis and cryptographic countermeasures,” Electronics, vol. 4, no. 3, pp. 380–423, 2015.
  • [287] Environmental monitoring. GE3S. Accessed on Accessed on December 1, 2017. [Online]. Available: http://www.ge3s.org/service/environmental-monitoring/
  • [288] Defense industry - remote management application. Defense Industry Remote Management Application. Accessed on December 1, 2017. [Online]. Available: https://www.wti.com/t-defense-remote-management-application.aspx
  • [289] Smart facilities. Facilities Booking System. Accessed on December 1, 2017. [Online]. Available: http://www.facilitiesbooking.com/smart-facilities
  • [290] Naonworks. Accessed on January 10, 2018. [Online]. Available: http://www.naonworks.com/old/inc_html/sub2_3.html
  • [291] Mp08 comunicacions industrials - francesc cazorla. Accessed on June 1, 2018. [Online]. Available: https://sites.google.com/site/fcazorlay/mp08
  • [292] J. Baek, Q. H. Vu, J. K. Liu, X. Huang, and Y. Xiang, “A secure cloud computing based framework for big data information management of smart grid,” IEEE Transactions on Cloud Computing, vol. 3, no. 2, pp. 233–244, April 2015.
  • [293] A. Ukil and R. Zivanovic, “Automated analysis of power systems disturbance records: Smart grid big data perspective,” in Proc. 2014 IEEE Innovative Smart Grid Technologies - Asia (ISGT ASIA), May 2014, pp. 126–131.
  • [294] J. Yang, J. Zhao, F. Wen, W. Kong, and Z. Dong, “Mining the big data of residential appliances in the smart grid environment,” in Proc. 2016 IEEE Power and Energy Society General Meeting (PESGM), July 2016, pp. 1–5.
  • [295] L. Liu and Z. Han, “Multi-block admm for big data optimization in smart grid,” in Proc. 2015 International Conference on Computing, Networking and Communications (ICNC), Feb 2015, pp. 556–561.
  • [296] S. Cui, Z. Han, S. Kar, T. T. Kim, H. V. Poor, and A. Tajer, “Coordinated data-injection attack and detection in the smart grid: A detailed look at enriching detection solutions,” IEEE Signal Processing Magazine, vol. 29, no. 5, pp. 106–115, Sept 2012.
  • [297] A. Yassine, A. A. N. Shirehjini, and S. Shirmohammadi, “Smart meters big data: Game theoretic model for fair data sharing in deregulated smart grids,” IEEE Access, vol. 3, pp. 2743–2754, 2015.
  • [298] X. He, Q. Ai, R. C. Qiu, W. Huang, L. Piao, and H. Liu, “A big data architecture design for smart grids based on random matrix theory,” IEEE Transactions on Smart Grid, vol. 8, no. 2, pp. 674–686, March 2017.
  • [299] J. Wu, K. Ota, M. Dong, J. Li, and H. Wang, “Big data analysis based security situational awareness for smart grid,” IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1–1, 2017.
  • [300] K. Wang, Y. Wang, X. Hu, Y. Sun, D. J. Deng, A. Vinel, and Y. Zhang, “Wireless big data computing in smart grid,” IEEE Wireless Communications, vol. 24, no. 2, pp. 58–64, April 2017.
  • [301] K. Hamedani, L. Liu, A. Rachad, J. Wu, and Y. Yi, “Reservoir computing meets smart grids: Attack detection using delayed feedback networks,” IEEE Transactions on Industrial Informatics, 2017.
  • [302] A. A. Yavuz, “An efficient real-time broadcast authentication scheme for command and control messages,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 10, pp. 1733–1742, Oct 2014.
  • [303] V. Nguyen, M. Guirguis, and G. Atia, “A unifying approach for the identification of application-driven stealthy attacks on mobile cps,” in Proc. 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sept 2014, pp. 1093–1101.
  • [304] G. M. Lehto, G. Edlund, T. Smigla, and F. Afinidad, “Protection evaluation framework for tactical satcom architectures,” in Proc. MILCOM 2013 - 2013 IEEE Military Communications Conference, Nov 2013, pp. 1008–1013.
  • [305] M. V. Moreno, F. Terroso-Saenz, A. Gonzalez-Vidal, M. Valdez-Vela, A. F. Skarmeta, M. A. Zamora, and V. Chang, “Applicability of big data techniques to smart cities deployments,” IEEE Transactions on Industrial Informatics, vol. 13, no. 2, pp. 800–809, April 2017.
  • [306] M. M. Rathore, A. Ahmad, A. Paul, and G. Jeon, “Efficient graph-oriented smart transportation using internet of things generated big data,” in Proc. 2015 11th International Conference on Signal-Image Technology Internet-Based Systems (SITIS), Nov 2015, pp. 512–519.
  • [307] B. T. de Oliveira and C. B. Margi, “Distributed control plane architecture for software-defined wireless sensor networks,” in Proc. 2016 IEEE International Symposium on Consumer Electronics (ISCE), Sept 2016, pp. 85–86.
  • [308] A. Pantelopoulos and N. G. Bourbakis, “A survey on wearable sensor-based systems for health monitoring and prognosis,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 40, no. 1, pp. 1–12, Jan 2010.
  • [309] E. Kartsakli, A. S. Lalos, A. Antonopoulos, S. Tennina, M. D. Renzo, L. Alonso, and C. Verikoukis, “A survey on M2M systems for mhealth: A wireless communications perspective,” Sensors, vol. 14, no. 10, pp. 18 009–18 052, 2014.
  • [310] M. R. Yuce, “Implementation of wireless body area networks for healthcare systems,” Sensors and Actuators A: Physical, vol. 162, no. 1, pp. 116 – 129, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0924424710002657
  • [311] H. Yan, H. Huo, Y. Xu, and M. Gidlund, “Wireless sensor network based e-health system - implementation and experimental results,” IEEE Transactions on Consumer Electronics, vol. 56, no. 4, pp. 2288–2295, November 2010.
  • [312] J. F. Martinez, M. S. Familiar, I. Corredor, A. B. Garcia, S. Bravo, and L. Lopez, “Composition and deployment of e-health services over wireless sensor networks,” Mathematical and Computer Modelling, vol. 53, no. 3-4, pp. 485–503, 2011, telecommunications Software Engineering: Emerging Methods, Models and Tools. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0895717710001548
  • [313] P. Jiang, J. Winkley, C. Zhao, R. Munnoch, G. Min, and L. T. Yang, “An intelligent information forwarder for healthcare big data systems with distributed wearable sensors,” IEEE Systems Journal, vol. 10, no. 3, pp. 1147–1159, Sept 2016.
  • [314] G. Baldini, S. Karanasios, D. Allen, and F. Vergari, “Survey of wireless communication technologies for public safety,” IEEE Communications Surveys and Tutorials, vol. 16, no. 2, pp. 619–641, Feb. 2014.
  • [315] D. Cui, “Risk early warning index system in the field of public safety in big data era,” in Proc. 2015 Sixth International Conference on Intelligent Systems Design and Engineering Applications (ISDEA), Aug 2015, pp. 704–707.
  • [316] U. M. Bhangale, K. R. Kurte, S. S. Durbha, R. L. King, and N. H. Younan, “Big data processing using hpc for remote sensing disaster data,” in Proc. 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2016, pp. 5894–5897.
  • [317] L. Zhong, K. Takano, Y. Ji, and S. Yamada, “Big data based service area estimation for mobile communications during natural disasters,” in Proc. 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), March 2016, pp. 687–692.
  • [318] P. Tin, T. T. Zin, T. Toriu, and H. Hama, “An integrated framework for disaster event analysis in big data environments,” in Proc. 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Oct 2013, pp. 255–258.
  • [319] FP7 Project ABSOLUTE. Accessed on July 10, 2017. [Online]. Available: http://cordis.europa.eu/project/rcn/106035_en.html
  • [320] R. Ferrus, O. Sallent, G. Baldini, and L. Goratti, “LTE: the technology driver for future public safety communications,” IEEE Communications Magazine, vol. 51, no. 10, pp. 154–161, October 2013.
  • [321] K. Flynn, “ETSI Summit on Future Mobile and Standards for 5G, 3GPP, nov. 2013.”
  • [322] A. Lucent, “Government technology how-to guide: for LTE in public safety,” Government Technology Magazine, 2010.
  • [323] D. Witkowski, “Effectively testing 700 MHz public safety LTE broadband and P25 narrowband networks,” White paper by Anritsu Company, Accessed on July 10, 2017. [Online]. Available http://www. anritsu. com/en-US/Products-Solutions/Solution/Effectively-testing-700MHz-public-safety. aspx [Accessed April 2013], 2013.
  • [324] Y. Xiao, K.-C. Chen, C. Yuen, and L. DaSilva, “Spectrum sharing for device-to-device communications in cellular networks: A game theoretic approach,” in Proc. 2014 IEEE International Symposium on Dynamic Spectrum Access Networks (DYSPAN), April 2014, pp. 60–71.
  • [325] A. Bagayoko, B. Paillassa, and R. Dhaou, “Practical link reliability for ad hoc routing protocol,” in Proc. 2011 IEEE Vehicular Technology Conference (VTC Fall), Sept 2011, pp. 1–5.
  • [326] M. Wang and Z. Yan, “Security in D2D communications: A review,” in Proc. 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 1, Aug 2015, pp. 1199–1204.
  • [327] Y. Gong, Y. Fang, and Y. Guo, “Privacy-preserving collaborative learning for mobile health monitoring,” in Proc. 2015 IEEE Global Communications Conference (GLOBECOM), Dec 2015.
  • [328] J. Pospiil and M. Novotn媒, “Evaluating cryptanalytical strength of lightweight cipher present on reconfigurable hardware,” in Proc. 2012 15th Euromicro Conference on Digital System Design (DSD), Sept 2012, pp. 560–567.
  • [329] Z. Li and T. J. Oechtering, “Privacy-aware distributed bayesian detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, pp. 1345–1357, Oct 2015.
  • [330] F. Canelo, B. M. C. Silva, J. J. P. C. Rodrigues, and Z. Zhu, “Performance evaluation of an enhanced cryptography solution for m-health applications in cooperative environments,” in Proc. 2013 IEEE Global Communications Conference (GLOBECOM), Dec 2013, pp. 1711–1716.
  • [331] I. Ray and I. Ray, “Access control challenges for cyber-physical systems.”
  • [332] K. Ly, W. Sun, and Y. Jin, “Emerging challenges in cyber-physical systems: A balance of performance, correctness, and security,” in Proc. 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2016, pp. 498–502.
  • [333] D. Cancila, H. Zaatiti, and R. Passerone, “Cyber-physical system and contract-based design: A three dimensional view,” in Proc. the WESE’15: Workshop on Embedded and Cyber-Physical Systems Education, ser. WESE’15.   New York, NY, USA: ACM, 2015, pp. 4:1–4:4. [Online]. Available: http://doi.acm.org/10.1145/2832920.2832924
  • [334] A. G. T茅llez and M. M. Plè°©, “Multithreaded translation of ptolemy ii designs on multicore platforms,” in Complex, Intelligent and Software Intensive Systems, 2008. CISIS 2008. International Conference on, March 2008, pp. 607–612.
  • [335] H. Chen and S. Mitra, “Synthesis and verification of motor-transmission shift controller for electric vehicles,” in Proc. 2014 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), April 2014, pp. 25–35.
  • [336] J. Espinosa, C. Hernandez, J. Abella, D. de Andres, and J. C. Ruiz, “Analysis and rtl correlation of instruction set simulators for automotive microcontroller robustness verification,” in 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), June 2015, pp. 1–6.
  • [337] M. M. Bersani and M. Garcia-Valls, “The cost of formal verification in adaptive cps. an example of a virtualized server node,” in Proc. 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE), Jan 2016, pp. 39–46.
  • [338] N. Mastronarde, V. Patel, J. Xu, L. Liu, and M. van der Schaar, “To relay or not to relay: Learning device-to-device relaying strategies in cellular networks,” IEEE Transactions on Mobile Computing, vol. 15, no. 6, pp. 1569–1585, June 2016.
  • [339] I. R. Goodman, R. P. Mahler, and H. T. Nguyen, Mathematics of Data Fusion.   Norwell, MA, USA: Kluwer Academic Publishers, 1997.
  • [340] A. Ali, J. Qadir, R. u. Rasool, A. Sathiaseelan, A. Zwitter, and J. Crowcroft, “Big data for development: applications and techniques,” Big Data Analytics, vol. 1, no. 2, 2016.

Rachad Atat received the B.E. degree (with distinction) in computer engineering from the Lebanese American University, Beirut, Lebanon, in 2010, the M.Sc. degree in electrical engineering from the King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia, in 2012, and the Ph.D. degree (with Hons.) in electrical engineering from the University of Kansas (KU), Lawrence, KS, USA, in 2017. He is currently a postdoctoral research associate at Texas A&M University at Qatar, working on dynamic metering allocation with integrated cybersecurity measures in smart grids. His current research interests include smart grids, cybersecurity, and Internet of Things. Dr. Atat was a recipient of the 2016 IEEE Global Communications Conference Best Paper Award, the NSF Travel Grant Award in 2016, the KU Engineering Fellowship Award, and the KAUST Discovery Scholarship Award.

Lingjia Liu (SM’15) received the B.S. degree in Electronic Engineering from Shanghai Jiao Tong University, Shanghai, China, and the Ph.D. degree in Electrical and Computer Engineering from Texas A&M University, College Station, TX, USA. He spent more than three years working in the Standards & Mobility Innovation Laboratory, Samsung Research America (SRA) where he received the Global Samsung Best Paper Award twice (in 2008 and 2010, respectively). He was leading Samsung’s efforts on multiuser MIMO, coordinated multipoint (CoMP), and heterogeneous networks (HetNets) in 3GPP LTE/LTE-Advanced standards.

Lingjia Liu is an Associate Professor in the Bradley Department of Electrical and Computer Engineering (ECE) of Virginia Tech (VT). He is also serving as the Associate Director for Affiliate Relations in Wireless@Virginia Tech. Prior to that he was an Associate Professor in the Electrical Engineering and Computer Sciences Department, University of Kansas (KU). His research interests include emerging technologies for Beyond 5G cellular networks including machine learning for wireless networks, massive MIMO, massive MTC communications, and mmWave communications. Lingjia Liu was a recipient of the Air Force Summer Faculty Fellow from 2013 to 2017, the Miller Scholar at KU in 2014, the Miller Professional Development Award for Distinguished Research at KU in 2015, the 2016 IEEE GLOBECOM Best Paper Award, the 2018 IEEE ISQED Best Paper Award, the 2018 IEEE TAOS Best Paper Award, and the 2018 IEEE TCGCC Best Conference Paper Award.

Jinsong Wu (SM’11) received his Ph.D. in Electrical and Computer Engineering from Queen’s University, Kingston, Canada. He was the leading Editor and a co-author of the comprehensive book, entitled “Green Communications: Theoretical Fundamentals, Algorithms, and Applications”, published by CRC Press in September 2012. His first-authored paper won 2017 IEEE System Journal Best Paper Award. He received 3 best paper awards for IEEE conference papers. He received IEEE Green Communications and Computing Technical Committee 2017 Excellent Services Award for Excellent Technical Leadership and Services in the Green Communications and Computing Community. He also received IEEE Outstanding Leadership Award in 2013 IEEE International Conference on Green Computing and Communications (Greencom) and 2016 IEEE International Conference on Big Data Intelligence and Computing (DataCom), IEEE Computer Society. He is elected Vice Chair, Technical Activities, IEEE Environmental Engineering Initiative, a pan-IEEE effort under IEEE Technical Activities Board (TAB). He was the founder (2011) and founding Chair (2011-2017) of IEEE Technical Committee on Green Communications and Computing (TCGCC). He is also the primary co-founder (2015) and founding Vice-Chair of IEEE Technical Committee on Big Data (TCBD). He was the proposer (2012), founder (2014) and Series Editor (2014 - 2017) of IEEE Series on Green Communication and Computing Networks in the IEEE Communications Magazine. He is Area Editor (2016 - present) of IEEE Transactions on Green Communications and Networking. He was the original proposer (2012) and Series Editor (2014-2016) of the IEEE Journal of Selected Areas on Communications (JSAC) Series on Green Communications and Networking. He is Editor of IEEE Communications Surveys and Tutorials, Associate Editor of IEEE Systems Journal, Associate Editor of IEEE Access. He was the one of the earliest key proposers and a long term promotor from Green Track to a full green symposium in flagship conferences of IEEE Communications Society. He opened and established Big Data Track with general and wide topic coverage in the flagship conferences of IEEE Communications Society, starting at IEEE Globecom 2016. He has been conference chairs and main organizers in a number of IEEE conferences and workshops.

Guangyu Li received the B.S. degree from China University of Mining and Technology and M.S. degree from Tongji University, China, in 2008 and 2011, respectively, and the Ph.D. degree in computer science from University of Paris-Sud, Paris, France, in 2015. He is currently working as an assistant professor with the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, China. His current research interests include routing protocol design for vehicular networks, big data transmission, electric vehicles charging/discharging strategy design in smart grid and traffic control.

Chunxuan Ye received his Ph.D. from University of Maryland, College Park in 2005. Between 2005 and 2012, he worked for InterDigital Communications, Inc. on multiple advanced research projects in the wireless communications field. He worked for Intel on LTE modem chipset development between 2012 and 2015. He has been working for InterDigital Communications, Inc. as a 3GPP RAN1 standards delegate for new radios (5G) since 2015. Chunxuan Ye has more than 30 book chapters, conference and journal papers. He also has more than 60 U.S. and international patent applications with 25 U.S. patents granted. His specialties include wireless communications, wireless standards and technologies, information theory, software and firmware development.

Yang Yi (M’09-SM’16) is an assistant professor in the Bradley Department of Electrical Engineering and Computer engineering at the Virginia Tech. She is an IEEE senior member. She received the B.S. and M.S. degrees in electronic engineering at Shanghai Jiao Tong University, and the Ph.D. degree in electrical and computer engineering at Texas A&M University. Her research interests include very large scale integrated (VLSI) circuits and systems, computer-aided design (CAD), neuromorphic architecture for brain-inspired computing systems, and low-power circuits design with advanced nano-technologies for high-speed wireless systems..

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
313655
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description