Machine Learning Based Network Coverage Guidance System
With the advent of 4G, there has been a huge consumption of data and the availability of mobile networks has become paramount. Also, with the burst of network traffic based on user consumption, data availability and network anomalies have increased substantially. In this paper, we introduce a novel approach, to identify the regions that have poor network connectivity thereby providing feedback to both the service providers to improve the coverage as well as to the customers to choose the network judiciously. In addition to this, the solution enables customers to navigate to a better mobile network coverage area with stronger signal strength location using Machine Learning Clustering Algorithms, whilst deploying it as a Mobile Application. It also provides a dynamic visual representation of varying network strength and range across nearby geographical areas.
The growing demand for mobile network connectivity associated with increased smartphone ownership, greater mobile usage indoors and higher data rates are driving the evolution of mobile networks. Due to the annual increase in the number of cellular subscribers and the increase in competition with other network operators, there is a growing interest by the network operators to maximize the deployment of the network infrastructure to achieve maximum coverage. The definition of network infrastructure is not only limited to electronic components of the network but also the passive elements such as physical sites and towers that are required to operate the network. While the cellular subscribers do not directly perceive the composition or the configuration of the infrastructure, the throughput and latency of the mobile network infrastructure determine the user experience and therefore the network infrastructure and its deployment have been one of the key challenges faced by Mobile Network Operators (MNOs). Since network coverage is of utmost importance to the MNOs, one would expect the users to have a seamless experience with strong uniform network connectivity. However, this is not the case. The network coverage distribution in tall structures /large buildings is inconsistent and significant variations in signal strengths exist. With the increase in dependency on network connectivity and the need for high network speeds, the lack of consistent and strong signal strength is a growing concern.
In this paper, a unique approach to resolve the problems faced by the MNOs (identification of areas with weak signal strengths), as well as the problems faced by end-users (weak signal strengths), is presented. With the help of a mobile application, network strength densities across a region are identified and 360-degree feedback on network conditions to both MNOs and end customers is provided. The solution presented in this paper enables the end-user to navigate to a location with a stronger mobile network and also provides a dynamic visual representation of varying network strength and range across nearby geographical areas. The dynamic visual representation of varying network strength is provided for the different kinds of MNOs that operate in a region. This enables the user to judiciously switch to an MNO with more uniform signal connectivity and also provides the MNOs a comparative view of the signal strengths of their competition.
Ii Related Works
Parallels can be drawn between this paper and the work done in Opensignal, in terms of deploying network strength-based heat-maps. The means of deploying heat-maps in Opensignal is one through a meticulous process of data collection and averaging. It deals with collecting billions of individual measurements every day, from over 100 million devices worldwide, per day. In addition to reaching such extensive limits of data collection, there exists a dependency on partner applications to collect said data, which aids the data collecting venture. An averaging metric is then applied onto this vast database, to produce heat-maps pertaining to a geographical location, and of a particular MNO. While this may be accurate owing to the sheer quantity of data collected, using a clustering model to attribute new data into pre-defined clusters and periodically updating the cluster metric itself, will definitely reduce complexity and be cheaper. Using such algorithms to fill in the blanks of the existing vast geography may be more efficient than trying to get every location’s detail, which essentially amounts to a brute force method. Another outlook is to use such algorithms on existing vast databases- such as the ones in Opensignals- which may be further added to accuracy while reducing complexity and cost.
Iii Problem Statement
Iii-a Poor Network Connectivity
The mobile network user faces the issue of poor internet connectivity multiple times in a day, in a particular area, due to various reasons – most of which are due to external factors such as physical obstruction, multiple reflections due to water bodies or tank chambers, weather at the given instant, etc. But more often than not, this network strength magnitude is not constant over an area; it is not even constant across the same building. There are points in the building where there is an evident increase in network strength. If there were a tangible way to deduce such geographical locations and direct the user towards the same, much of today’s day to day network-related issues could be solved to a good extent, with ease. For instance, by directing the user to a window, from inside a lift.
Iii-B Telecommunication companies
Acquiring sites for the deployment of network infrastructure has become very difficult due to network densification to address demands in indoor environments. Due to closely spaced buildings, there is very little space for indoor base stations to be installed. Furthermore, multiple mobile operators have to compete for the same few sites. The comparative data of signal strengths of all Mobile Network Operators in a particular area could help the telecommunication companies to judiciously deploy network infrastructure or participate in infrastructure sharing.
Iv High Level Solution
This white paper proposes a solution to the above-mentioned problems, by using an interactive mobile application front that would cater to the user and the user’s geographical area’s network problems; while simultaneously building the database in the back end, which could provide greater insight into larger signal network problems as a whole. The solution is a mix of data engineering, vector quantization algorithms, heat maps, geographical area identifications to partition User Endpoints (UEs) based on - geographical location, Long-Term Evolution Received Signal Strength Indicator (LTE RSSI) and Mobile Network Operator (MNO). The vector quantization approach generates clusters and each cluster is associated with an integer tag, thereby normalizing and denoting the respective network strength of that cluster. The clustering model accuracy increases as more UEs start consuming the RSSI mapping service and generate data to build the clustering model. Currently, a window of 10 seconds is used to generate the data. Once the model is built and if the UEs fall under a lower network connectivity area, they are directed to the nearest location, belonging to a region with a stronger network, computed based on optimization algorithms using RSSI of the UEs. The UE is then directed to this new stronger network area, using traditional navigation modules. From a user’s perspective, this framework could serve as an advantageous alternative to current day extant methods which predominantly use a trial and error approach, to compare network signal strengths. On the other hand, the data collected could prove useful to the Network Service providers, as it provides an insight into the continuous gradient of signal strength, across an area. This could help in better frequency planning and cell site deployment upgrades. It can also provide strategic insight into the exact location where a signal tower can be installed, in such a way that it is beneficial to the maximum number of weak network clusters, thereby optimizing cost.
V Detailed Solution
The proposed framework can be divided into 3 phases-
Data Collection and Storage
Cloud Clustering Model
Heat map, Nearest Strong Network Area Navigation
V-a Data Collection and Storage phase
The user’s precise geographical location is determined by collecting the latitude and the longitude of the device. This data is collected using the FusedLocationProviderApi. Fused Location Provider gives accurate locations, and optimizes the battery usage. It combines signals from GPS, Wi-Fi, cell networks, as well as an accelerometer, gyroscope, magnetometer, and other sensors to provide accurate results. It can demarcate various spots within a building or a household.
The effective network strength of the device is a metric of the LTE RSSI data. The carrier RSSI measures the average total received power in the measurement bandwidth over N resource blocks, in decibels. The total received power of the carrier RSSI includes the power from co-channel serving and non-serving cells, adjacent channel interference, thermal noise, etc. It is totally measured over 12-sub carriers including Received Signal (RS) from Serving Cell and Traffic in the Serving Cell. The module used to collect the RSSI data is getAllCellInfo() method under the class TelephonyManager. It requests all available cell information from all radios on the device including the camped/registered, serving, and neighboring cells. The response can include one or more CellInfoGsm, CellInfoCdma, CellInfoTdscdma, CellInfoLte, and CellInfoWcdma objects, in any combination. Refer Table LABEL:tab:my-table.
|MCI||28 bit cell identity|
|MPCI||Physical cell ID|
Network Service Provider
The Network Service Provider- more formally known as Mobile Network Operator- is a provider of wireless communications services that owns or controls all the elements necessary to provide services to an end-user including radio spectrum allocation, wireless network infrastructure, backhaul infrastructure, billing, customer care, provisioning computer systems, and marketing and repair organizations. This information helps segment users based on the MNO they use, which further helps in the clustering models discussed later. The getSimOperatorName method under the class TelephonyManager is used to collect this information.
Every data set collected from the user needs to be distinguishable from another user’s- this is to avoid multiple data entry from the same device in a short period when it does not provide new information. It also helps track the data associated with the device, while moving. The Internet Protocol (IP) address provides this information, as it is unique to a user over a network at a given point of time. The modules used to collect this are InetAddress and NetworkInterface.
V-B Cloud Clustering Model
K- Means Clustering
K Means algorithm is an iterative vector quantization algorithm that partitions the data set into K predefined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It makes the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. Data points are assigned to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The lesser the variation within clusters, the more homogeneous (similar) the data points are within the same cluster. The reasons for the popularity of k-means are ease and simplicity of implementation, scalability, speed of convergence, and adaptability to sparse data.
The way K means algorithm works in this project is as follows:
Specify number of clusters K (5)
Initialize centroids by shuffling data set and randomly selecting K data points for the centroids without replacement, with the dimensions being Latitude, Longitude, and RSSI strength.
Iterate until there is no change to the centroids. i.e. assignment of data points to clusters does not change.
Compute sum of the squared distance between data points and all centroids.
Assign each data point to the closest cluster (centroid) and attribute a normalized network strength integer tag (0-5)
Compute the centroids for the clusters by averaging all data points that belong to each cluster.
The approach K means follows to solve the problem is called Expectation-Maximization. The E-step is assigning the data points to the closest cluster. The M-step is computing the centroid of each cluster. The objective function is:
where for data point x if it belongs to cluster k; otherwise, . is the centroid of ’s cluster.
It is a minimization problem of two parts. First J w.r.t. is minimized treating as a constant. Then J w.r.t. is minimized treating constant. Therefore E step is:
In other words, assign the data point to the closest cluster judged by its sum of squared distance from cluster’s centroid. M-step is:
Which translates to recomputing the centroid of each cluster to reflect the new assignments.
Deployment as Cloud Service
Once the clustering model is functional and accurately models new user data to one of the clusters, it is to be deployed as a cloud service that can run in the back end. It should also update the database with the predicted/clustered mapping of users across areas. The updated database will then be used for the navigation modules discussed later. The output of the model will be the cluster predictions denoted by integer tag, along with the geographical locations. The service is hosted using Google Virtual-Machine (VM) Instance. A virtual machine is a software that acts as an interface between a computer program that has been compiled into instructions understood by the virtual machine and the microprocessor (or “hardware platform”) that actually performs the program’s instructions. A Terminal Multiplexer (TMUX) session is initialized to allow multiple terminal accessibilities, which is facilitated by the VM instance, using the Secure Shell (SSH). TMUX is a protocol that allows multiple short transport segments, independent of application type, to be combined between a server and host pair. The Secure Shell (SSH) Protocol is a protocol for secure remote login and other secure network services over an insecure network. A Cron-Job is used to schedule the service to run as a task every 15 minutes. Essentially, the clustering on all data- including the latest data that was collected at a 10s interval- is modeled, and the output is updated onto the database. This ensures that the changes in the heat map or the general tendencies of variations are always mapped. The 15-minute buffer is optimal- as this is too low a period for massive network property changes to occur, and at the same time gives a sufficient threshold to account for small substantial changes in network details or heat maps, which have to be updated via the clustering model.
V-C Heat map, Nearest Strong Network Area Navigation
Heat map and Nearest Strong Network Detection
The updated database, which contains the geographical location associated with a network strength in the form of normalized tags is presented to the user as a heat map of network strengths. This helps in providing a dynamic visual representation of varying network strength and range across nearby geographical areas, centered around the user. Fig.2 is the heat map of Bangalore generated using sample values of signal strength for illustration purposes.
The database is analyzed with a click of a button “Find Nearest Strong Network”; to find the 3 most optimal nearest stronger network locations to the user, in a 100m radius. The user then can choose the one location which is most comfortable to reach.
Given the user’s real-time location, and the location of the nearest optimal stronger network location (corresponding to a higher tag integer, than the one associated with the user’s location), a walkable route can then be traced to reach from one point to the other. This is done using predefined Application Programming Interfaces (API), such as Direction and Routes API, provided by Google. In Fig.3, the red marker represents the user’s current location and the green marker represents the closest location at which the signal strength is the strongest. The red marker gets updated when there is a change in the user’s current location.
A novel approach, to navigate to better mobile network coverage area with stronger signal strength using Machine Learning Clustering Algorithms while deploying it as a Mobile Application has been discussed and tested. This paper helps in understanding the technicalities involved in collecting relevant data using a mobile application, storing them on a real-time database, Firebase, and using a cloud service to cluster the data using the K-means algorithm. The Mobile Application provides a dynamic visual representation of varying network strength and range across nearby geographical areas, centered around the user and also guides the user to a location with the strongest signal strength. There is a lot of scope for future work. Analyses beyond heat-map and navigation map can be done with the data collected, to determine the density of the population that resides in low signal strength areas, using DBSCALE. Future work extends to areas of implementing the same idea but by using DENCLUE as the clustering model instead of K-Means-Clustering. This could prove to be useful for MNOs looking to increase their connectivity and quality.
We would like to thank Mr. Karthik Natarajan for mentoring us. Without his valuable knowledge and expertise, this would not have been possible. We would also like to thank Prof. M Rajasekar (PES University) for providing us this opportunity and for his continuous support. We extend our thanks to PES University, who provided us a platform that helped us to team up and pursue this project.
- (1994) RFC1692: transport multiplexing protocol (tmux). RFC Editor. Cited by: §V-B2.
- (2008) An introduction to android. Google I/O. Cited by: §V-A2.
- (2000) Textual conventions for internet network addresses. Nework Working Group, pp. 1–16. Cited by: §V-A4.
- (1999) Take command: cron: job scheduler. Linux Journal 1999 (65es), pp. 15–es. Cited by: §V-B2.
- (2020) INDIA mobile network experience report. online at: https://www.opensignal.com/reports/2020/04/india/mobile-network-experience. Cited by: §II.
- (2011) Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (3), pp. 231–240. Cited by: §VI.
- (2017) Definitive guide to firebase. Springer. Cited by: §V-A.
- (2010) Application of k means clustering algorithm for prediction of students academic performance. arXiv preprint arXiv:1002.2425. Cited by: §V-B1.
- (2019) Navigation technology to meet new people using fused location api. In 2019 3rd International Conference on Computing and Communications Technologies (ICCCT), pp. 63–67. Cited by: §V-A1.
- (2009-February 10) Using a virtual machine instance as the basic unit of user execution in a server environment. Google Patents. Note: US Patent 7,490,330 Cited by: §V-B2.
- (2010) DBSCALE: an efficient density-based clustering algorithm for data mining in large databases. In 2010 Second Pacific-Asia Conference on Circuits, Communications and System, Vol. 1, pp. 98–101. Cited by: §VI.
- (2011) Estimating o–d travel time matrix by google maps api: implementation, advantages, and implications. Annals of GIS 17 (4), pp. 199–209. Cited by: §V-C2.
- (2006) The secure shell (ssh) protocol architecture. RFC 4251, January. Cited by: §V-B2.