By Hook or by Crook: Exposing the Diverse Abuse Tactics of Technical Support Scammers

# By Hook or by Crook: Exposing the Diverse Abuse Tactics of Technical Support Scammers

Bharat Srinivasan
Monjur Alam
School of Computer Science
Georgia Institute of Technology
bharat.srini@gatech.edu
School of Computer Science
Georgia Institute of Technology
malam31@gatech.edu
Athanasios Kountouras
Nick Nikiforakis
Mustaque Ahamad
School of Computer Science
Georgia Institute of Technology
kountouras@gatech.edu
Department of Computer Science
Stony Brook University
nick@cs.stonybrook.edu
School of Computer Science
Georgia Institute of Technology
mustaq@cc.gatech.edu
Najmeh Miramirkhani
Manos Antonakakis
Department of Computer Science
Stony Brook University
n.miramirkhani@stonybrook.edu
School of Electrical and Computer
Engineering
Georgia Institute of Technology
manos@gatech.edu
###### Abstract

Technical Support Scams (TSS), which combine online abuse with social engineering over the phone channel, have persisted despite several law enforcement actions. The tactics used by these scammers have evolved over time and they have targeted an ever increasing number of technology brands. Although recent research has provided important insights into TSS, these scams have now evolved to exploit ubiquitously used online services such as search and sponsored advertisements served in response to search queries. We use a data-driven approach to understand search-and-ad abuse by TSS to gain visibility into the online infrastructure that facilitates it. By carefully formulating tech support queries with multiple search engines, we collect data about both the support infrastructure and the websites to which TSS victims are directed when they search online for tech support resources. We augment this with a DNS-based amplification technique to further enhance visibility into this abuse infrastructure. By analyzing the collected data, we provide new insights into search-and-ad abuse by TSS and reinforce some of the findings of earlier research. Further, we demonstrate that tech support scammers are (1) successful in getting major as well as custom search engines to return links to websites controlled by them, and (2) they are able to get ad networks to serve malicious advertisements that lead to scam pages. Our study period of approximately eight months uncovered over 9,000 TSS domains, of both passive and aggressive types, with minimal overlap between sets that are reached via organic search results and sponsored ads. Also, we found over 2,400 support domains which aid the TSS domains in manipulating organic search results. Moreover, to our surprise, we found very little overlap with domains that are reached via abuse of domain parking and URL-shortening services which was investigated previously. Thus, investigation of search-and-ad abuse provides new insights into TSS tactics and helps detect previously unknown abuse infrastructure that facilitates these scams.

## I Introduction

The Technical Support Scam (TSS), in which scammers dupe their victims into sending hundreds of dollars for fake technical support services, is now almost a decade old. It started with scammers making cold calls to victims claiming to be a legitimate technology vendor but has now evolved into the use of sophisticated online abuse tactics to get customers to call phone numbers that are under the control of the scammers.

In their pioneering research on TSS [54], Miramirkhani et. al. explored both the web infrastructure used by tech support scammers and the tactics used by them when a victim called a phone number advertised on a TSS website. They focused on TSS websites reached via malicious advertisements that are served by abusing domain parking and ad-based URL shortening services. Although their work provided important insights into how these services are abused by TSS, it has recently become clear that tech support scammers are diversifying their methods of reaching victims and the ways with which they convince these victims to call them on their advertised phone numbers.

Specifically, recent reports by the FTC and by search engines vendors suggest that scammers are turning to search engine results and the ads shown on search-results pages as novel ways of reaching victim users [11, 31, 5]. These new channels not only allow them to reach a wider audience but also allow them to diversify the ways with which they attempt to convince users to call them. As shown in Figure 1, several actions have been taken to stop TSS but these scams continue to adapt and evade both law enforcement and technical safeguards.

In this paper, we perform the first systematic study of these novel search-and-ad abuse channels. We develop a model for generating tech-support related queries and use the resulting 2,600 queries as daily searches in popular and less popular search engines. By crawling the organic search results and ads shown as a response to our queries (note that we follow a methodology that allows us to visit the websites of ads without participating in click-fraud), we discover thousands of domains and phone numbers associated with technical support scams. In addition to the traditional aggressive variety of technical support scams (where the pages attempt to scare users into calling them, part of Figure 3), we observe a large number of passive technical support scam pages which appear to be professional, yet nevertheless are operated by technical support scammers (Figures 23 show examples of such scams). Using network-amplification techniques, we show how we can discover many more scam pages present on the same network infrastructure, and witness the co-location of aggressive with passive scam pages. This indicates that a fraction of these aggressive/passive scams are, in fact, controlled and operated by the same scammers. We also discover that the lifetime of passive scam pages is significantly larger than aggressive scam pages and find that our collected scams have little-to-no overlap with the scams identified by Miramirkhani et al.’s system during the same period of time. This indicates that our system reveals a large part of the TSS ecosystem that remained, up until now, unexplored.

Our main contributions are the following:

• We design the first search-engine-based system for discovering technical support scams, and utilize it for eight months to uncover more than 9,000 TSS-related domains and 3,365 phone numbers operated by technical support scammers, present in both organic search results as well as ads located on search-results pages. We analyze the resulting data and provide details of the abused infrastructure, the SEO techniques that allow scammers to rank well on search engines, and the long-lived support domains which allow TSS domains to remain hidden from search engines.

• We find that scammers are complementing their aggressive TSS pages with passive ones, which both cater to different audiences and, due to their non-apparent malice, have a significantly longer lifetime. We show that well-known network amplification techniques allow detection systems to not only discover more TSS domains but to also trace both aggressive and passive TSS back to the same actors.

• We compare our results with the ones from the recent TSS study of Miramirkhani et al. [54] and show that the vast majority of our discovered abusive infrastructure is not detected by prior work, allowing defenders to effectively double their coverage of TSS abuse infrastructure by incorporating our techniques into their existing TSS-discovering systems.

## Ii Methodology

We utilize a data-driven methodology to explore TSS tactics and infrastructure that is used to support search-and-ad abuse. To do this, we search and crawl the web to collect a variety of data about TSS websites, and use network-level information to further amplify such data. Our system, which is shown in Figure 4, implements TSS data collection and analysis functions, and consists of the following six modules:

1. The Seed Generator module generates phrases that are likely to be used in search queries to find tech support resources. It uses a known corpus of TSS webpages obtained from Malwarebytes [24] and a probabilistic language modeling technique to generate phrases that serve as input to search queries.

2. Using search phrases, the Search Engine Crawler (SEC) module mines search engines including popular ones such as Google, Bing, and Yahoo! for technical support related content appearing via search results (SRs) and sponsored advertisements (ADs). We also mine a few obscure ones such as goentry.com and search.1and1.com that we discovered are used by tech support scammers. The SR and AD URIs are candidates for active crawling.

3. The Active Crawler Module (ACM) then tracks and records the URI redirection events, HTML content, and DNS information associated with the URIs/domains appearing in the ADs and SRs crawled by the SEC module.

4. Categorization module which includes a well-trained Technical Support Content Classifier (TSSC), is used to identify TSS SRs and ADs using the retrieved content.

5. The Network Amplification Module (NAM) uses DNS data to amplify signals obtained from the labeled TSS domains, such as the host IP, to expand the set of domains serving TSS, using an amplification algorithm.

6. Lastly, using the information gathered about TSS domains, the Clustering Module groups together domains sharing similar attributes at the network and application level.

### Ii-a Search Phrase Seed Generator

Selecting appropriate queries to feed the search engine crawler module is critical for obtaining suitable quality, coverage and representativeness for TSS web content. To do this, we must generate phrases that are highly likely to be associated with the content shown or advertised in TSS webpages. Deriving relevant search queries from a context specific corpus has been used effectively in the past for measuring search-redirection attacks [52]. We use an approach based on joint probability of words in phrases in a given text corpus [53].

We start with a corpus of 500 known technical support scam websites from the Malwarebytes technical support (TSS) domain blacklist (DBL) [24], whose content was available. We were able to find 869 unigrams or single words after sanitizing the content in the corpus for stop words. We further reduce the number of single words or unigrams by only considering words that appear in more than 10 websites. This leaves us with 74 unique words. Using the raw counts of unigrams, we compute the raw bi-gram probabilities of eligible phrases with the chain rule of probability. We then use the Markov assumption to approximate n-gram probabilities [25]. Once we have probabilities of all phrases up to n-grams, we use a probability threshold to pick phrases having probability of occurrence greater than the threshold for each value of . In effect, we develop a language model pertinent to technical support scam websites.

Table I shows the total number of phrases found for different values of and some examples of the phrases found. We restricted the value of to 7, as the value of did not yield any phrases that would be logical as search engine inputs to find online technical support scams. As we can see, yields a lot of popular phrases used in online technical support scams. In total, we were able to identify 2600 English phrases that serve as search queries to the SEC module.

### Ii-B Search Engine Crawler (SEC) Module

The SEC module uses a variety of search engines and the search phrases generated from the TSS corpus to capture two types of listing: traditional search results, sometimes also referred to as organic search results, and search advertisements, sometimes also referred to as paid/sponsored advertisements.

Both Google [14] and Bing [9] provide APIs that can be used to get SRs. However, some of the search engines we considered did not have well documented APIs and vanilla crawlers are either blocked or not shown content such as ADs. In such cases, we automate the process using PhantomJS [26], a headless WebKit “scriptable” with a JavaScript API. It allows us to capture a search page with both SR and AD listings as it would be shown to a real user visiting the search engine from a real browser.

Once we have the raw page p from the search engine in response to a query q, we use straighforward CSS selectors to separate the SRs from ADs. A SR object typically consists of basic components such as the the SR title, the SR URI, and a short snippet of the SR content. An AD object too, typically consists of these components, i.e. the AD title, the advertiser’s URI/domain name, and a short descriptive text. The advertiser also provides the URI the user should be directed to when the AD is clicked. In addition, an AD may also consist of an AD extension component which allows actions to be performed after the AD is rendered (e.g. call extensions that allow the advertiser to embed a phone number as a clickable call button). The main difference between the contents displayed in SRs and ADs is that the content shown in the former is what is seen by the search engine crawler whereas the content in the latter is provided directly by the advertiser. The SR/AD along with its components are logged into a database as a JSON object. The URI component of the ADs and SRs are then inserted into the ADC (AD crawling) and SRC (SR crawling) queues respectively, which then coordinate with the ACM to gather more information about them, as discussed next.

### Ii-C Active Crawler Module (ACM)

The ACM uses the ADC and SRC URI queues to gather more information relevant to an AD/SR. ACM has three submodules that keep track of the following information for each URI seen in the AD/SR: (i) URI tracking, (ii) HTML and Screenshot Capture, and (iii) DNS information. We now discuss each of the submodules corresponding to these.

URI Tracker: The purpose of the URI tracker is to follow and log the redirection events starting from the URI component seen in the AD/SR discussed in the previous module. Barring user clicks, our goal is to capture the sequence of events that a real-world user on a real browser would experience when directed to technical support scams from SR/AD results, and automate this process. Our system uses a combination of python modules PhantomJS [26], Selenium [27] and BeautifulSoup [6] to script a light-weight headless browser. Finally, to ensure wide coverage, we configure our crawlers with different combinations of Referer headers and User-Agents (we discuss the exact settings in Section III). Next, we discuss briefly how automating URI tracking (and other related events) can pose ethical challenges in the case of ADs and how we handle them.

Mimicking AD Clicks: When a user clicks on an AD, the click triggers a sequence of events in which the publisher, AD network and advertiser are involved, before the user lands on the intended webpage associated with the AD. This can be attributed to the way monetization model behind ADs work [44]. For example, the domain name shown in an AD could be gosearch770.xyz while the source URI associated with it is hXXp://54080586.r.msn.com/?ld=d3S-92sO4zd0
&u=www.gosearch770.xyz%2findex.php
. Clicking on the AD may result in the flow of money from the advertiser to the AD network and publisher depending on the charging model such as Cost-per-click (CPC) or Pay-per-click (PPC). Clearly, the intent of our automated crawlers is not to interfere with this monetization model by introducing extraneous clicks. One alternative to actually clicking on the ADs and a way to bypass the AD network is to visit the advertiser’s domain name directly, while maintaining the Referer to be the search engine displaying the AD. In theory, any further redirections from the advertiser’s domain should still be captured.

We chose the strategy that follows the advertiser’s domain while ensuring that the same path (URIs and domain names) that leads to the technical support scam webpage is followed as if we had clicked on the AD. To validate if this was a viable option while maintaining accuracy of the data collection process, we conducted a controlled experiment in which we compared a small number of recorded URI resolution paths generated by real clicks to paths recorded while visiting the advertiser’s domain name directly. We did this for the same set of technical support ADs while keeping the same browser and IP settings. For a set of 50 fake technical support ADs from different search engines identified manually and at random, these paths were found to be identical. This gives us confidence that accurate URI tracking information can be collected for fake technical support ADs without affecting the originating AD networks. For SRs, we just simulate a click on the SR and follow the SR URI component of the SR object. Thus, the outcome of this submodule is the URI redirection path which includes the fully qualified domains (FQDNs) encountered and the method of redirection for both ADs and SRs.

HTML Crawler: The HTML crawler works in conjunction with the URI Tracker. This crawler captures both the raw HTML as well as visual screenshots of webpages shown after following the ADs and SRs. For each domain and webpage , in the path from an AD/SR to the final landing webpage, the crawler stores the full source html and an image of the webpage as it would have appeared in a browser, into a database. It uses a combination of the domain name and timestamp as identifiers for this data, so that it can be easily referenced when needed. The content generated from this module is used in various other modules/submodules in order to decide the threat level of the AD/SR and whether it is a fake technical support AD/SR (Section II-D); extract the toll-free number used (if any); and to cluster campaigns of technical support scams (Section II-F).

Active DNS Crawler: For each domain, , in the path from an AD/SR to the final landing domain, the active DNS crawler logs the IP address, , associated with the domain to form a triplet, based on the DNS resolution process at the time of crawling, . This information is valuable for unearthing new technical support scam domains (Section II-E) and in studying the network infrastructure associated with cross-channel technical support scams (Section IV).

### Ii-D Categorization Module

Although we input technical support phrases to search engines with the aim of finding fake technical support websites, it is possible and even likely that some SRs and ADs lead to websites that are genuine technical support or sometimes even completely unrelated to technical support (i.e. non-technical support). The purpose of this module is to identify the TSS search listings while, at the same time, categorizing the remaining search listings for further analysis.

TSS Landing Page Categories: To categorize all search engine listings obtained during the period of data collection, we first divide the URIs collected from both ADs and SRs into two high-level categories: TSS and Non-TSS, (i.e. those URIs that lead to technical support scam pages and those that lead to benign or unrelated pages). Within each category, we have subcategories: TSS URIs are further separated into those leading to aggressive TSS websites and those leading to passive TSS websites, as mentioned previously. Prior work [54] only focused on aggressive TSS pages but we found that search-and-ad abuse also makes use of passive landing pages which have different modus operandi, and are worth exploring.

Categorization Method: To identify the actual TSS websites, we utilize the following multi-step process:

1. We remove high reputation domains using the Alexa top websites list [4]. It is possible that a high reputation domain is compromised or abused and fake-technical support content is dropped into the website directory but we leave the detection of such unlikely instances for future work. This allows us to avoid certain types of false positives, such as flagging Best Buy’s Geek Squad [12] as a technical support scam.

2. We only retain ADs and SRs whose URL paths lead to final landing pages containing toll-free phone numbers. It is possible for fake-technical support scammers to use phone numbers that are non-toll free but we argue why toll-free numbers make a balanced investment for scammers in Section III-E.

3. Finally, we classify ADs/SRs as technical support or not using the TSS Webpage Classifier, discussed next. This helps weed out non-technical support websites such as blogs, complaint forums etc. which may contain some content (even toll-free numbers) related to technical support (which is perhaps why they appeared in the SRs/ADs in the first place).

TSS Webpage Classifier: We determine an AD/SR as technical support or not based on the webpage content shown in the final landing domain corresponding to an AD/SR. We leverage the observation that a lot of fake technical support websites use highly similar content, language and words to present themselves. This can be represented as a feature vector where features are the words and values are the frequency counts of those words. Thus, for a collection of labeled technical support and non-technical support webpages, we can extract the bag of words after sanitization (such as removing stop words), and create a matrix of feature vectors where the rows are the final landing domains and the columns are the features. We can then train a classifier on these features which can be used to automatically label future webpages.

To that effect, we built a model using the Naive Bayes classification algorithm with 10-fold cross validation on a set comprising of 500 technical support and 500 non-technical support webpages identified from the first few weeks of ADs/SRs data. The performance of the classifier is captured in the ROC Curve shown in Figure 5. We see that a threshold of 0.6 yields to an acceptable true positive rate (sensitivity/recall) of 98.9% and a false positive rate (1-specificity) of 1.5%. Moreover the area under the curve (AUC), which is a measure of the overall accuracy of the trained model, is 99.33% which gives us confidence that the technical vs. non-technical support webpage classification works well. The outcome at this stage, after running it over new and incoming AD/SR data, is a set of final landing technical support webpages originating from an AD/SR. These webpages can be used to trace back and label the associated AD/SR and its corresponding URLs/domain names as relevant to fake technical support scams.

To further separate TSS URIs into those leading to passive/aggressive websites, we use the presence of features extracted from the HTML of the landing TSS website. Aggressive TSS webpages exhibit behavior that contributes to a false sense of urgency and panic through a combination of audio messages describing the problem and continuous pop-up messages/dialogue loops. On the other hand, passive TSS websites adopt the approach of seeming genuine. This is accomplished by using simple textual content, certifications, seals, and other brand-based images. They often present themselves as official tech support representatives of large companies and, because of their non-apparent malice (one would have to call these numbers to realize that they belong to scammers) pose new challenges for the detection of TSS [5]. Because of these differences in aggressive and passive TSS webpages, we look for javascript associated with pop-up dialogues, such as, window.alert(), window.confirm(), and window.prompt() as well as HTML audio tags to identify the aggressive ones.

Although, in this work we are primarily interested in URIs leading to TSS websites, we also bucket AD/SR URIs that lead to non-TSS websites. We do this to give the reader a sense of all search listings appearing next to technical support related search queries. The subcategories include legitimate technical support websites, blogs/forums containing spam or mentioning technical support related matters, complaint websites containing technical support related posts, news websites with technical support content and an “uncategorized” bucket containing everything else.

### Ii-E Network Amplification Module

Using search listings to identify active TSS websites works well for creating an initial level of intelligence around these scams. However, it may be possible to expand this intelligence to uncover more domains supporting TSS that may have been missed by our crawler (possibly because the domains were not actively participating in AD or SEO activity at the time). These domains may be dormant, perhaps, waiting to be circulated at a later stage. However, the give-away for these additional TSS domains could be the sharing of network-level infrastructure with already identified TSS domains. Once we have a set of labeled final-landing domains, , related to fake technical support websites originating from ADs/SRs, we leverage the properties of the Domain Name System (DNS) to find more fake technical support websites via an amplification process which works as follows.

A DNS request results in a domain name, , being resolved to an IP address, , at a particular time, , forming a tuple. For each domain, , we compute two sets: (i) , which is a set of all IPs that have mapped to domain as recorded by the DNS Crawler (Section II-C) within time window , and (ii) , which is the set of domains that have historically been linked with the or subnet in the set within time window , where is also a unit of time (typically one week). Next, we compute , which represents all the domains related to at the network level, as discovered by the - expansion. Now, for each domain , we check if the webpage associated with it is a TSS webpage using the classifier module, Section II-D. If it is true, we add to an amplification set, , associated with . The cardinality of the eventual amplification set gives us the amplification factor, . Finally, we define the expanded set of TSS domains, , as the union of all amplification sets. Combining the initial set of domains, , with the expanded set, , gives us the final set of fake-technical support domains .

The data pertaining to historic DNS resolutions comes from the ActiveDNS Project [3], while the webpages associated with the new domains are obtained by the active HTML crawler module (Section II-C) and, when required, the Internet archive [39]. The final technical support domain set is processed further for analysis.

### Ii-F Clustering Module

The purpose of the clustering module is to identify different TSS campaigns. For example, one campaign may offer technical support for Microsoft whereas another one may target Apple users. We identify the campaigns by finding clusters of related domain names associated with abuse in a given time period or epoch . Once we have the final set of TSS domains, a two step hierarchical clustering process is used. In the first level, referred to as Network CLustering (NCL), we cluster together domain names based on the network infrastructure properties. In the second level, referred to as Application CLustering (ACL), we further separate the network level clusters based on the application level web content associated with the domains in them. This process allows us to produce high quality clusters that can then be labeled with campaign tags.

In order to execute these two different clustering steps, we employ the most common statistical features from the areas of DNS [41] and HTML [61, 64] modeling to build our feature vector. This feature vector embeds network information about not just the final landing domain , but also of all the domains supporting , based on the redirection path to . The vector also captures the agility of the domains: if resolved to multiple different IPs over time, this information would be present. We use Singular Value Decomposition (SVD) [67] to reduce the dimensionality of the sparse feature matrix, and the network clustering module then uses the X-Means clustering algorithm [58] to cluster domains having similar network-level properties. To further refine the clusters, we use features extracted from the full HTML source of the web pages associated with domains in . We compute TF-IDF statistical vector on the bag of words on each cluster  [61]. Since the matrix is expected to be quite sparse, the application cluster submodule performs dimensionality reduction using SVD, like in NCL. Once we have the reduced application based feature vectors representing corresponding domains, this module too uses the X-Means clustering algorithm to cluster domains hosting similar content.

Campaign Labels: This submodule is used to label clusters with keywords that are representative of a campaign’s theme. Let be a cluster produced after NCL and ACL, and let be the set of domains in the cluster. For each domain , we create a set that consists of all the parts of the domain name except the effective top level domain (eTLD) and all parts of the corresponding webpage title , e.g. U(‘abc.exampledomain.com’, ‘title’) = {abc, exampledomain, title}. Next, we compute the set of words using the Viterbi algorithm [45]. Therefore, W(U(‘abc.exampledomain.com’, ‘title’)) = {example, domain, title} since ‘abc’ is not a valid English word or W(U(‘virusinfection0x225.site’, ’System Shutdown Call 877-563-1632’)) = {virus, infection, system, shutdown, call}. Using W, we increment the frequency counter for the word ‘example’, ‘domain’ and ‘title’ in a cluster specific dictionary. In this manner, after iterating over all domains in the cluster, we get a keyword to frequency mapping from which we pick the top most frequent word(s) to attribute to the cluster. Identifying campaigns this way allows us to study properties related to the campaign more readily.

## Iii Results

We built and deployed the system described in Section II to collect and analyze SR and AD domains for TSS. Although the system continues to be in operation, the results discussed in this section are based on data that was collected over a total period of 8 months in two distinct time windows, April 1 to August 31, 2016 initially, and again between Jan 1 - Mar 31, 2017, to study the long running nature of TSS.

Infrastructure Setup: We deploy two distinct nodes on a university network where the SEC and ACM modules for data collection run. One is a desktop class machine with 16GB RAM, a 3.1 GHz quad-core Intel Core i5 processor that runs Mac OS X 10.11. This node simultaneously runs the same data collection code on a virtual machine with Windows Vista guest OS. The other node is a server class machine with 32GB RAM, 8 Intel Xeon quad core processors that runs the Debian 3.2.68 OS. We set the User Agent (UA) to be a version of Chrome, Internet Explorer and the Firefox browser respectively, covering the most commonly used browsers. The Referer field is set based on the search engine to which the process thread is attached. We clear the cookie field every time we query a search engine or make a request to an AD/SR URI. The IP addresses of the nodes are static and assigned from the university subnet. Previous studies [54] have shown that it is more effective to perform such threat data collection from university networks rather than from a public cloud infrastructure. We made similar observations from an experiment we conducted and chose the university network for our work. To make sure that none of the search engine operators throttle our crawlers, we rate limit the number of queries sent each day to a particular search engine.

We crawled 5 search engines for both ADs and SRs, which include Google.com, Bing.com, Yahoo.com, Goentry.com and search.1and1.com. The first three are popular search engines used daily by users while goentry was chosen because it has been linked with browser hijacking and serving unwanted ADs [28, 13]. The last search engine was added to the list after we encountered regular references/links to it among goentry ADs. Each day, the SEC module automatically sends 2,600 different queries, as discussed in Section II-A for technical support-related terms (e.g. microsoft tech support) to the various search engines. It stores the AD and SR URIs returned. We consider the top 100 SR URIs (unless there are fewer) while recording all the AD URIs displayed for each query.

### Iii-a Dataset Summary

In total we collected 14,346 distinct AD URIs and 109,657 distinct SR URIs. Table II presents the breakdown of all the search listings into the different categories. The AD URIs mapped to 4,954 unique Fully Qualified Domain Names (FQDNs), while the SR URIs mapped to 20,463 unique FQDNs. Among the AD URIs, 10,299 (71.79%) were observed as leading to TSS websites. This is a significant portion and shows that ADs related to technical support queries are dominated by those that lead to real scams. It also means that the technical support scammers are actively bidding in the AD ecosystem to flood the AD networks with rogue technical support ADs, especially in response to technical support queries. Such prevalence of TSS ADs is the reason why Bing announced a blanket ban of online tech support ADs on its platform [8] in mid-May, 2016. The TSS AD URIs mapped to 2132 FQDNs. Among the TSS AD URIs and corresponding FQDNs, we found the presence of both aggressive and passive websites. More than two thirds of the URIs were seen to lead to aggressive websites. The ratio between aggressive and passive websites was closer to 4:3 when considering just the TSS AD FQDNs. Past research has only investigated aggressive TSS websites, but our results show that passive websites are also a serious problem.

We did observe legitimate technical support service AD URIs and FQDNs. These comprised about 13.19% of all AD URIs and 29.10% of all AD FQDNs. There were no ADs that pointed to blogs/forums, complaint websites and news sites. About 15% of the AD URIs remained uncategorized: however, it is worth mentioning that on manual inspection, one set of the URIs/domains seen in the uncategorized bucket led to other shady (and perhaps temporary) search portals such as govtsearches.com, finecomb.com, us.when.com and many more. These search portals show more ADs and SRs in response to the original search query. This pattern of creating on-the-go search portals and linking them to each other via ADs to form a nexus is intriguing and worthy of exploration in itself. We leave this for future work.

Among the SR URIs, 59,500 (54.26%) were observed leading to TSS websites. The URIs mapped to 3,583 (17.51%) FQDNs. Among the TSS SR URIs, we again found the presence of those leading to both aggressive and passive TSS varieties. The sheer number of such URIs is surprising as, unlike ADs, it is harder to manipulate popular search engine algorithms to make rogue websites appear in search results. However, as we discuss later, we observe that using black hat SEO techniques, TSS actors are able to trick the search engine ranking algorithms. Compared to ADs, we found that almost 76% TSS SR URIs lead to aggressive TSS websites while the remaining lead to passive TSS websites, again pointing to the prevalence of the common tactic of scare and sell [30]. Although TSS SR URIs were frequently seen interspersed in search results, SR URIs also consisted of non-TSS ones. Among these we observed 3.39% legitimate technical support service URIs, 9.13% blog/forum URIs, 9.12% URIs linked to complaint websites and 11.05% URIs pointing to news articles (mostly on technical support scams). The remaining 13.05% URIs were uncategorized.

We also report aggregate statistics for FQDNs after combining ADs and SRs data. We see that in total there were 5134 TSS FQDNs found, with URIs corresponding to 3166 FQDNs leading to aggressive websites and 1968 leading to passive websites. These together comprise of about 22.1% of the total number, 23,195 FQDNs retrieved from the entire dataset. One interesting observation is that majority of the FQDNs seen in ADs were not seen in the SRs and vice versa, with only a small amount of overlap in the TSS AD FQDNs and TSS SR FQDNs, consisting of 581 FQDNs. It suggests that the resources deployed for TSS ADs are different from those appearing in TSS SRs.

Support and Final-landing TSS domains: The purpose of support domains is to conduct black hat SEO and redirect victims to TSS domains but not host TSS content directly. We found 38.3% of the TSS search listing URIs did not redirect to a domain different from the one in the initial URI, while the remaining 61.7% redirected to a domain different from the one in the initial URI. There were an additional, 2,435 support domains found. Moreover, one might expect the use of popular URL shortening services such as bit.ly or goo.gl for redirections and obfuscation, but this was rarely the case, which we found surprising.

When a TSS URI appearing in the search listings is clicked, it leads to the webpage that lures the victim into the technical support scam. This webpage could be hosted on the same domain as the domain of the URI, or on a different domain. We refer to this final domain name associated with the technical support scam webpage as the final landing TSS domain. Furthermore, it is possible that the path from the initial SR/AD URI to the final landing webpage consists of other intermediate domains, which are mainly used for the purpose of redirecting the victim’s browser. This is discussed in Section II-C. Figure 5(a) plots the number of final-landing TSS domains discovered by our system over time across the various search engines. A bi-weekly trend shows that, across all search engines, we are able to consistently find hundreds of final-landing TSS domains and webpages. Bing, Google, Goentry, Yahoo and search.1and1.com, all act as origination points to technical support scam webpages. This suggests that these specialized scammers are casting a wide net. Starting mid-May 2016, we see a sudden dip in the number of TSS domains found on Bing. We suspect that this is most likely correlated to Bing’s blanket ban on technical support advertisements [8, 7]. However, as we can see, activity, contributing mainly to SR based TSS, picked up again during July, 2016, continuing an upward trend in Jan to Mar 2017.

Goentry, which was a major source of technical support ADs leading to final landing TSS domains during our initial period of data collection saw a significant dip during the second time window. We suspect this may be due to our data collection infrastructure being detected or law enforcement actions against technical support scammers in India [18, 17], which is where the website is registered. In total we were able to discover 1,626 unique AD originated final landing TSS FQDNs, and 2,682 unique SR originated final landing TSS FQDNs. Together, we were able to account for 3,996 unique final landing TSS FQDNs that mapped to 3,878 unique final landing TSS TLD+1 domain names.

### Iii-B Search Phrases Popularity and SR Rankings

Since we use search queries to retrieve SRs and ADs, one question is the popularity of search phrases used in these queries which can serve as an indicator of how frequently they are used to find tech support related websites. We use popularity level derived from Google’s keyword planner tool [20] that is offered as part of its AdWords program. The popularity of a search phrase is measured in terms of the average number of global monthly searches for the phrase during the time period of data collection. Figure 5(b) shows the distribution of technical support search phrases based on their popularity. We can see that out of the 2600 phrases associated with TSS, about one third (32.7%) were of very low popularity, e.g. ‘kaspersky phone support’ with less than 100 average global monthly searches, one third (33.5%) were of low popularity, e.g. ‘norton antivirus technical support’ with 101-1,000 hits per month on average, while there were 25.1% phrases that had medium levels of popularity, e.g. ‘hp tech support phone number’ with 1,001-10,000 average hits. At the higher end, 7% of the technical support phrases had moderately high levels of popularity, e.g. ‘dell tech support’, ’microsoft support number’ with 10,001-100,000 hits per month on average, and 1.7% of the technical support search phrases were highly searched for, e.g. ‘lenovo support’ with greater than 100,000 hits per month globally. As we can see, we have a fairly even distribution of technical support search terms with varying levels of popularity ranging from low to high (in relative terms).

One may expect that less popular search terms are prone to manipulation in the context of both ADs and SRs, while more popular ones are harder to manipulate due to competition, making it more difficult for the technical support scammers to promote their websites via bidding (in the case of ADs) or SEO (in the case of SRs). To validate this, we measure the number of total TSS URIs found per search phrase (referred to as pollution level), as a function of the popularity of the phrase. Since the popularity levels of phrases are gathered from Google, we only consider the TSS URIs (both AD and SR as seen on Google) to make a fair assessment. Figure 5(c) depicts a box plot that captures the pollution levels for all search phrases grouped by the popularity levels except the ones with very low popularity. By comparing the median number of TSS URIs (depicted by the red line(s)) from different popularity bands, we witness that as the popularity level of a search term increases, the pollution level (i.e. the absolute number of TSS URIs), decreases. We can make several additional observations: (i) there is definite pollution irrespective of the popularity level: in other words, more than a single TSS URI appeared in almost all of the technical support search queries we considered, as can be seen from the floor of the first quartile in every band; (ii) while many (50%) low popularity search terms (e.g. those with 101-1000 hits per month) yielded 28 or more TSS URIs, there were outliers even among the high popularity search terms that accounted for the same or even more number of TSS URIs; and lastly, (iii) the range in the number of TSS URIs discovered per query varied more widely in the case of low popularity terms as compared to higher popularity terms. Overall, these results indicate that TSS scammers are intent on pushing their target websites among (i) high-impact results, in spite of the challenges in doing so, while (ii) simultaneously picking low hanging fruits by widely spreading their websites among the search listings associated with less popular technical support search queries.

To effectively target victims, it is not merely enough to make TSS URIs appear among the search results. It is also important to make them appear high in the search rankings. To measure this, we show the distribution of TSS SR URIs based on their ranking/position among the search results for different search engines. We use four brackets to classify the TSS SR URIs based on its actual position: 1-25 position (high rank), 26-50 position, 51-75 position and 76-100 position (low rank). If the same URI appears in multiple search positions, for example on different days, we pick and associate the higher of the positions with the URI. We do this to reflect the worst-case impact of a TSS SR URI. Thus, each unique URI is eventually counted only once. Figure 5(d) summarizes our findings. We see that all 5 search engines return TSS URIs that are crowding out legitimate technical support websites by appearing high in the rankings. This makes it hard to trust a high ranking URI as legitimate. Bing had the highest percentage (30.4%) among all its TSS URIs appearing in the top 25 search results, followed by Yahoo (27.6%), Goentry (24.8%), search.1and1 (22.6%) and Google (19.6%). Note that here we are not comparing the absolute number of TSS URIs between the search engines. TSS URIs are seen distributed across all position bands, again pointing to the pervasive nature of the TSS pollution problem.

### Iii-C Network Amplification Efficacy

The network-level amplification approach did pose a number of challenges. The first challenge lies in the fact that sometimes technical support websites are hosted on public cloud infrastructure. Thus, the set for such domains can yield an overwhelming number of domains to process for the TSS webpage classifier. We avoid this by excluding rhip-rhdn sets, , having size greater than a reasonable operator specified threshold, . The other challenge lies in the fact that sometimes the webpages associated with the rhip-rhdn domains, , are not retrievable. This could be because the webpage is parked, taken down or expired. Further, even the Internet archive may not have snapshots of the webpage associated with the domain in the desired time window. In such cases, we are forced to exclude the domain from further consideration even when there is evidence of it being linked to technical support scams, e.g. based on the domain name itself.

Using these heuristics, and dropping any domains having amplification factor , we are conservatively left with only 2,623 domains in the set that contributed to the rhip-rhdn expansion set, . Figure 7 plots the cumulative distribution of the amplification factor of these domains. As we can see, around 60% domains had while the remaining 40% domains had , with the maximum value equal to 275. Note that there could be overlap between the amplification sets, , for different ’s. Also worth noting is the fact that having a low amplification value does not necessarily mean that there are no other TSS domains on the subnet as it could be that some of DNS records associated with domains on the network were not previously recorded/seen by the deployed sensors. With ISP scale DNS records, the amplification values can potentially be much greater. In all, the total number of unique FQDNs hosting TSS content, = 9,221, with 3,996 TSS FQDNs coming from the final landing websites in search listings and 5,225 additional TSS FQDNs discovered as a result of network-level amplification. These 9,221 FQDNs mapped to 8,104 TLD+1 domains. Thus, even though amplification is non-uniform, it helps in discovering domains that may not be visible by search listings alone.

The network amplification process allowed us to identify 840 passive-type TSS domains co-located with one or more aggressive TSS domains. This indicates that some of the passive scams are operated by the same scammers who operate the aggressive ones. This is likely part of a diversification strategy where, depending upon the method of retrieving users, scammers can show different types of pages: e.g. aggressive ones for those involved in “malvertising” redirections and passive ones for those that are already in the market for technical support services.

### Iii-D Domain Infrastructure Analysis

In this section, we analyze all the domain names associated with technical support scams discovered by our system. This includes the final landing domains that actually host TSS content as well as support domains, whose purpose is to participate in black hat SEO or serve as the redirection infrastructure.

Most abused TLDs: First, we analyze the final landing TSS domain names. Table III shows the most abused TLDs in this category. The .com TLD appeared in 25.56% final landing TSS domain names, making it the most abused TLD. Next, 16.21% domain names had .xyz as the TLD, making it the second most abused TLD. .info, .online and .us each had greater than 6% domain names registered to them completing the top five in this category. Other popular gTLDs included .website, .site, .tech, .support, while the ccTLDs included .in, .tk, .co and .tf. Among the support domains, the top three most popular TLDs were .xyz, .win and .space. Although .xyz was once again very popular like in the case of the final landing TSS domains, both .win and .space were exclusive to this category. We also compared the TLDs associated with the final landing TSS domain names with those discovered by ROBOVIC, the system developed by Miramirkhani et al. [54]. For an overlapping data collection period between January to March of 2017. We found that 4 out of the top 10 TLDs associated with TSS domains served by abusing domain-parking and ad-based URL shortening services were different from those discovered in our dataset. The TLDs that were rarely visible in our dataset included .club, .pw, .trade and .top. Thus, there are differences with respect to the preference of domain name registration between these two different tactics.

Domains Lifetimes: Next, we look at the lifespan of final landing and support domains. The lifetime of a final landing TSS domain is derived by computing the difference between the earliest and most recent date that the domain was seen hosting TSS content. This computation is based on data from our crawler and the Internet archive. The lifetime of a support domain is derived based on earliest and the most recent date that the domain was seen redirecting to a final-landing TSS domain. Figure 8 plots the lifetimes of these two categories of domains with the final landing domains split up into the passive and aggressive types. Final landing TSS domains of the aggressive type had a median lifetime of 9 days with close to 40% domains having a lifetime between 10-100 days, and the remaining 10% domains having a lifetime greater than a 100 days. In comparison, final landing TSS domains of the passive type had a much longer median lifetime of 100 days. Some of the domains in this category had a lifetime of over 300 days. Clearly, passive TSS domains outlast those of the aggressive type. The reason for this could be attributed to the nature of these domains, with the aggressive domains being clear candidates for reporting/take-down and the passive ones getting the benefit of doubt (as they tend to appear legitimate and conduct the fraud mainly via the phone channel). Irrespective of the reason, it suggests that passive TSS websites have the potential to do harm for long time periods. In comparison, support domains had a median lifetime of 60 days, with 33% domains having a lifetime greater than 100 days. Generally, this is a longer lifetime relative to final landing TSS domains of the aggressive type. It indicates that the domains that are used for the sole purpose of black hat SEO or redirection are relatively stable and reusable (due to the long-lived nature), helping their cause to redirect to final landing TSS domains when desired and yet remain unnoticed. As we discuss later, in addition to blacklisting the final landing domains, take down/blacklisting of these support domains would lead to a more effective defense in breaking parts of the TSS abuse infrastructure.

Overlap with Blacklists: Using domains and phone numbers from a large number of public blacklists (PBL)  [24, 1, 15, 38, 23, 33, 32, 19, 29, 16, 2, 22], we verify if and when a TSS resource appeared in any of the PBLs. We collected data from these lists beginning Jan 2014 up until April 2017, encompassing the AD/SR data collection period, which allows us to make fair comparisons.

We start with 800notes.com, which is a crowdsourced directory of unknown callers. It consists of complaints by users who post about telephony scams, not just technical support scams. We extracted phone numbers and domain names appearing in the complaints. We find that only 14.2% of final landing TSS FQDNs were reported in the complaints.

Next, we look at a more exclusive TSS blacklist released by Malwarebytes. These public blacklists are specific to TSS and are regularly updated. According to their website, they use both crowdsourced and internal investigations to generate the list. Over time there were 4,949 unique FQDNs and 1,705 phone numbers listed on the list. We found that 18.1% of the final landing TSS FQDNs identified by our system were also listed in these lists. As for phone numbers, 20.3% from our TSS dataset were also seen in the list.

Next, we queried the domains against the Google Safe Browsing list using their API. We found that 9.6% final landing TSS FQDNs and 5.2% second-level domains (TLD+1) from those identified by our system as fake technical support were also listed in Google’s system and were all labeled as “Social Engineering.” Since Google does not have a public list of abusive phone numbers, we leave the corresponding field blank. Lastly, we checked PBLs that typically include botnet C&C domains, malware sites and other unsafe domains serving malicious content. These include all the lists mentioned before except 800notes, Malwarebytes TSS list, and Google safe browsing list. We find that together these cover just 5.3% FQDNs and 3.4% TLDs from our list.

Next, we check the domains against VirusTotal which is comprised of feeds from multiple AV engines. We found that 22.6% final landing TSS FQDNs and 10.8% TLD+1s listed on it. While this list gave the greatest coverage in TSS domain name blacklisting, we still found significant scope for improvement in terms of coverage. Moreover, it is still lower (in relative terms) as compared to the findings by Miramirkhani et al. [54] where close to 64% of their TSS domain set was listed on VirusTotal. We suspect that this is in part due to some of the passive TSS domains which largely go undetected. Overall, these results are not very surprising since these are traditional blacklists whose intelligence is targeted towards other types of abusive domains, such as, botnet domains. A similar outcome in terms of the efficacy of these lists has been reported in an SMS-spam domain abuse [64].

Cumulatively, these lists cover only 26.8% FQDNs, that were found to be involved in TSS by our system. Moreover, out of the 26.8% blacklisted FQDNs, 8.2% were already present in one of the lists when our system detected them, while the remaining 18.6% were detected by our system 26 days in advance, on average. Moreover, when we cross-listed the support domains against these lists, we found that 1% of those were present in any of these lists. This reinforces the point made in Section III-D regarding blacklisting support domains for effective defense against TSS. Table IV summarizes these findings. This analysis suggests that while exclusive TSS blacklists are a good idea alongside traditional PBLs, there is much scope for improvement by detecting these domains using an automated system such as ours.

### Iii-E Phone Number Analysis

Once a victim calls the phone number listed on the fake TSS website, a call center operator uses voice-based interactions to social engineer the victim to pay up for the fake/unwanted technical support services. Based on data from tollfreenumbers.com, we conduct an analysis of the toll-free numbers listed on the TSS webpages. We look at two attributes associated with each toll-free number, while it was being abused: (i) the age of the toll-free number, and (ii) the toll-free number provider. We conduct this analysis for 3,365 unique toll-free numbers found on technical support scam webpages.

Registration Date/Age and Providers: The age of a toll-free number gives us an idea of when the number was purchased and registered by the technical support scammer who abuses it. We estimate this by fetching data that tells us the year in which the number last changed ownership and whether it is in active use. With both these factors combined, we can estimate the earliest possible time (not the exact time) when the organization or individual responsible for the account to which the toll-free number is linked could have potentially begun the abuse. Figure 9 plots the relative percentage of active toll-free numbers based on the year in which it last changed ownership. Close to 16% toll free numbers were registered in 2014, 24.4% in 2015, 28.7% in 2016 and 13.5% in early 2017, totaling to 82.6% of all toll free numbers in the period between Jan 2014 - Mar 2017. We were unable to find any information for 4.9%, and the remaining12.5% were registered prior to 2014. The relatively recent timing of these registrations and their volume suggests that search-and-ad abuse TSS scams are a more recent phenomenon and are on the rise.

Even though the biggest provider of TSS-related toll-free phone numbers is WilTel Communications, contrary to the findings of Miramirkhani et al. [54], who find that four providers account for more than 90% of the phone numbers, we observe a much larger pool of providers, each responsible for a smaller fraction of numbers. The top four providers account for less than 40% of the identified TSS phone numbers. The reason for this disparity is likely because our system identifies scams and scammers that ROBOVIC misses who clearly have their own preferences for obtaining toll-free phone numbers (we discuss this more in Section V).

Presence in Complaints: Among phone numbers found on TSS websites identified by our system, 16.8% toll-free numbers were present in the complaint reports. Also, 26.1% were cumulatively present across all blacklists that included phone numbers.

### Iii-F Campaigns

The Clustering module (Section  II-F) produces clusters consisting of final landing domains that share similar network and application features. Table V lists some of the major campaigns attributed by our system and the resources associated with them. First, although TSS are notoriously synonymous with Microsoft and its products, we found that many other brands are also targets of TSS campaigns. These brands include Apple, Amazon, Google and Facebook among others. Microsoft, however, remains on top of the most abused brands with 4 out of the top 5 TSS campaigns targeting Microsoft and its products. Second, we observed that TSS campaigns tend to advertise services targeted at particular brands and its line of products. For example, certain TSS campaigns advertise services only for Gmail accounts or Norton Antivirus or Firefox browser or the Windows OS. The outcome from the victim’s perspective can vary depending on the product: examples include credential theft, genuine product key phishing, browser compromise and remote hijacking of the OS in the aforementioned cases respectively. This behavior is likely because the call center agents are trained to specialize in technical aspects associated with a particular type of product/service which could be a device (e.g. kindle), software (e.g. browser) or OS (e.g. Windows Vista) rather than generic technical support. Such a brand based view can be used to alert companies about campaigns targeting them so that they in turn can take appropriate action in stemming the campaign or alert their users about it.

The identified campaigns allow us to study the relationship between domains and the phone numbers advertised by them. We find that the churn rate in phone numbers is comparable to the churn rate of domain names for certain campaigns while there also exist campaigns where the churn rate in phone numbers is very low as compared to the domains names. Evidence of both cases is present in Table V. The first, third and fourth campaigns listed in the table represent a N-N relationship between domain names and phone numbers i.e. each final-landing domain associated with the campaigns is likely to advertise a different phone number. However due to the routing mechanism of the toll-free numbers, the calls to them may end up in the same call center. On the contrary, the second to last campaign depicts a N-2 relationship between domain names and phone numbers with 42 final-landing domains sharing just 2 toll-free numbers. By analyzing the clusters produced, we find that the presence of support domains is not ubiquitous. Only certain campaigns tend to make use of support domains. Furthermore, these campaigns are associated with SEO behavior and tend to be of the aggressive type.

In total, 368 clusters were produced after both network and application level hierarchical clustering. Next, we present case studies of two campaigns to highlight TSS tactics.

## Iv Case Studies

To gain deeper insights into TSS abuse infrastructure, we discuss two specific case studies. The first one illustrates the use of support domains for black hat SEO and the second one demonstrates the use of browser hijacking to serve TSS ads.

### Iv-a Black Hat SEO TSS Campaign

In this case study, we analyze the largest TSS campaign from Table V to highlight the technique used to promote the TSS websites and the infrastructure used to grow and sustain the campaign over time. The campaign primarily targeted Bing.com users. It consisted of 452 support domains, 662 final landing domains which mapped to 216 IPs over time and advertised 521 unique phone numbers. The campaign was first detected on 04/16/2016 and was active as recently as 03/30/2017. This is based on the first and last date on which the domains belonging to this campaign were identified by our system and added to the TSS dataset. Video evidence of this exclusive campaign, with its unique and previously unreported (to the best of our knowledge) characteristics and can be found via the URL: https://vimeo.com/229219769. To view the video, one would require the access password: ArXiv2017.

A search for “microsoft tech support”, for instance, would yield a TSS support domain such as zkhubm.win among the SRs. Clicking on the SR would redirect the user’s browser to a final landing TSS domain. The domain then uses aggressive scareware tactics to convince the victim about an error in their Windows machine. Then the victim is coerced into contacting the TSS call center. The social engineering and monetization would then take place over the phone channel, thus completing a typical TSS. Table VI lists some of the support domains and the final landing domains to which they redirect.

SEO Technique: The support domains use black hat SEO techniques sometimes referred to as spamdexing to manipulate the SRs. Specifically, the support domains seen on the search page act as doorway pages to final landing TSS domains. However, they use cloaking techniques such as text stuffing and link stuffing, consisting of technical support related keywords and links, to hide their real intent from search engine crawlers and get promoted up the SR rankings. Figure 9(a) shows what a crawler/user would see when visiting a support domain if the Referer header does not indicate that the originating click happened on a search results page.

IP Infrastructure Insights: Figure 9(b) shows the spread of the IPv4 address space and how domains in this campaign map to it. We plot the fraction of support and final landing TSS domains as a function of the IP address space and make the following observations: (i) IP space used by support domains is quite different and decoupled from where final landing TSS domains are located, and ii) while the address space for fake technical support domains is fragmented, the entire set of support domains are concentrated in a single subnet, 185.38.184.0/24. IP to AS mapping for the subnet points to AS# 13213 under the name UK2NET-AS, GB. The ASN has country code listed as ME, Montenegro. The IP-Geo location data too points to an ISP in Montenegro, Budva. In contrast, IP’s associated with final landing TSS domains pointed to different AS#’s 31815, Media Temple, Inc, AS# 13335, Cloudflare and AS# 26496 GO-DADDY-COM-LLC based on IP to AS mapping data. They were geographically located in the US based on IP-Geo data. The fragmentation in the hosting infrastructure for the final landing TSS domains gives the technical support scammers a reliable way to spread their assets. The decoupling of the infrastructure between support domains and final landing TSS domains indicates that the technical support scammers are using the support domains as a “service” to offload the work of SEO. These support domains could well serve other types of scams and command a price for their specialization at a later time or in parallel. Finally, from a defense perspective, focusing takedown efforts on these intermediate domains will likely have a larger effect that the takedown of individual final TSS domains.

Although this campaign is largely of the aggressive type, none of the domains we found appear in the data we received from Miramirkhani et al. [54] (Section V). We believe this is because it is purely a search-based campaign which does not rely on malvertising. Thus, even for aggressive TSS webpages, our system is able to find new abuse infrastructure that cannot be found by previously explored techniques.

### Iv-B Hijacking the Browser to Serve TSS ADs: Goentry.com

Goentry.com has been linked with browser hijacking where malicious software changes the browser’s settings without user permission to inject unwanted advertisements into the user’s browser [28, 13]. We noticed Goentry.com serving TSS ADs during the initial stages of this research and decided to probe it further. We use this case study to provide insights into evolving tactics being used by TSS actors.

Website Content: The homepage of goentry.com is a simple page with a Goentry logo, a search bar and a tagline “Goentry protects you from Government/NSA Spying on your Searches.” The output for a search query contains ADs on the top and right side of the results page, related search terms followed by search results. The website’s root directory reveals content related to Goentry’s SEO and website design services that includes their contact, toll-free number.

<div class=”row_content”>
<div class=”title”>
</div>
<div class=”description”>Browse Now For Com Online</div>
<div class=”link”>
<div class=”link-left” style=”float:left;”>gosearch770.xyz</div>
</div>
</div>

Based on the source, we observed the common use of the Universal Event Tracking (UET) [37] tags as trackers which allow the scammer to measure analytics such as the number of people that visited a specific page or a section of the website, amount of time they spent on the website etc. For example, the following code snippet corresponding to an AD seen on goentry.com, gosearch770.xyz, is tracked with UET tag id 54080586, and acts as a doorway page which redirects to fake technical support websites such as error-error-error-2.xyz, critical-warning-message-2.xyz, portforyou.xyz and many others, while monitoring the site analytics.

Server-side Scripts: Our initial suspicion was that the search service is either using readily available APIs such as Google’s Custom Search Engine to power their searches or it was running customized scripts. Using the source of the page, we found references to a server-side PHP script. Due to configuration errors from the side of the scammers, we were able to obtain parts of that script which revealed that the search-results page would react to the presence of certain keywords by adding tech-support ads to the returned page (e.g. when the users would search for the word “ice” the returned page would include ads about tech-support and the removal of a specific strain of ransomware called “the ICE Cyber Crime Center virus” [36]).

Domain Registration and IP: The website’s registration records show that it was created on 01/22/2014 to an organization called Macrofix Technical Services Private Limited which is associated to the website macrofix.com. This website advertises techical support services and is known to be a scam [21].

## V Discussion and Limitations

Comparison with Past TSS studies: Given the recent work of Miramirkhani et al. [54], who analyzed technical support scams and proposed a system for their discovery (ROBOVIC), in this section, we compare our results with the findings of that previous study, and show that, while there is an overlap in our findings, our TSS-discovery system allows us to find scammers that were completely ignored by Miramirkhani et al.’s ROBOVIC.

For the purpose of a direct comparison, we were able to obtain data from Miramirkhani et al. for the period Jan-Mar 2017, which overlaps with the second time window of data collection in our work. Specifically, we received a list of 2,768 FQDNs discovered by their tool (2441 second-level domains), 882 toll-free phone numbers and 1,994 IP addresses. Upon intersecting these sets with our own data, we found 0/2,768 FQDNs and 0/2,441 second-level domains that were common. Moreover, in terms of server and telephony infrastructure, we discovered that the two datasets had 92/1,994 common IP addresses of servers hosting TSS and 5/882 common toll-free phone numbers. We also discovered frequent use of “noindex” [10] meta tags in the HTML source of webpages associated with domains in Miramirkhani et. al. dataset which was noticeably missing from webpages in our dataset.

Given this near-zero intersection of the two datasets, we argue that our approach is discovering TSS infrastructure that ROBOVIC is unable to find. Next to discovering aggressive tech support pages that ROBOVIC missed, a core contribution of our work is focusing on “passive” TSS which manifest mostly as organic search results. These pages are unlikely to be circulated over malvertising channels: a benign-looking tech support page is unlikely to capture the attention of users who were never searching for technical support in the first place.

Since public blacklists are still unable to capture the vast majority of TSS (Section III-D), our work complements the work of Miramirkhani et al. Specifically, by taking advantage of our system, blacklist curators would double the number of TSS domains, IP addresses, and phone numbers that could be added to their blacklists.

Limitations: Like all real-world systems, our work is not without its limitations. Our choice of using PhantomJS for crawling search results and ads can, in principle, be detected by scammers who can use this knowledge to evade our monitors. We argue that replacing PhantomJS with a real browser is a relatively straightforward task which merely requires more hardware resources. Similarly, our choice of keeping our crawler stateless could lead to evasions which would again be avoided if one used a real, Selenium-driven, browser. Finally, while we provide clustering information for the discovered TSS, in the absence of ground truth, we are unable to guarantee the accuracy of this clustering or to attribute these clusters back to specific threat actors.

## Vi Related Work

As mentioned throughout this paper, Miramirkhani et al. [54] performed the first analysis of technical support scams (TSS) by focusing on scams delivered via malvertising channels and interacting with scammers to identify their modus operandi. In recent work, Sahin et al. [60] investigated the effectiveness of chatbots in conversing with phone scammers (thereby limiting the time that scammers have available for real users).

The use of both the Internet and telephony channels to conduct cross-channel attacks is enabled by technologies that led to the convergence of these channels. Researchers have identified the evolving role of telephony and how phone numbers play a central role in a wide range of scams, including Nigerian scams, phishing, phone-verified accounts, and SMS spam [64, 48, 64, 47, 66, 56, 49, 43].

In addition to telephony-specific work, researchers have analyzed a range of underground ecosystems detailing their infrastructure and identifying the parties involved, in addition to potential pressure points [55, 57, 52, 63]. Since TSS is a type of underground ecosystem, we borrowed ideas found in prior work, such as, the appropriate setting of User Agent and Referrer crawler parameters used by Leontiadis et al. during their analysis of drug scams [52] to make requests appear as if they originated from a real user clicking on a search result. Also, search-redirection based drug scams discovered by them rely on compromising high-reputation websites while the TSS scams discovered by our system rely on black hat SEO and malicious advertisement tactics.

Finally, there have been numerous studies that cluster abuse/spam infrastructure and campaigns based on URLs [65], IP infrastructure [41, 42] and content [40]. Similar hierarchical clustering techniques too have been shown effective in multiple contexts [64, 62, 51, 50]. In terms of countermeasures, prior work has shown the ineffectiveness of traditional blacklists in protecting services, such as instant messaging (IM) [59], and social media [46, 65]. Unfortunately, until blacklist curators adopt systems such as our own, blacklists will also be ineffective against technical support scams.

## Vii Conclusions

In this paper, we analyzed Technical Support Scams (TSS) by focusing on two new sources of scams: organic search results and ads shown next to these results. Using carefully constructed search queries and network amplification techniques, we developed a system that was able to find thousands of active TSS. We identify the presence of long-lived support domains which shield the final scam domains from search engines and shed light on the SEO tactics of scammers. In addition to aggressive scams, our system allowed us to discover thousands of passive TSS pages which appear professional, and yet display phone numbers which lead to scammers. We showed that our system discovers thousands of TSS-related domains, IP addresses, and phone numbers that are missed by prior work, and would therefore offer a marked increase of protection when incorporated into systems generating blacklists of malicious infrastructure.

## References

• [1] “800notes - Directory of UNKNOWN Callers,” http://800notes.com/.
• [2] “abuse.ch - the swiss security blog.” https://www.abuse.ch/.
• [3] “Active DNS Project,” https://www.activednsproject.org/.
• [4] “Alexa Topsites,” http://www.alexa.com/topsites.
• [5] “Bad Ads Trend Alert: Shining a Light on Tech Support Advertising Scams,” http://bit.ly/2y2rbnq.
• [6] “BeautifulSoup,” https://pypi.python.org/pypi/beautifulsoup4.
• [7] “Bing Ads bans ads from third-party tech support services,” http://searchengineland.com/bing-bans-third-party-tech-support-ads-249356.
• [8] “Bing brings in blanket ban on online tech support ads,” https://goo.gl/6bgPFF.
• [9] “Bing Search API,” http://datamarket.azure.com/dataset/bing/search.
• [10] “Block search indexing with ‘noindex’,” https://support.google.com/webmasters/answer/93710?hl=en.
• [11] “FTC - Tech Support Scams,” http://bit.ly/1XIF9RV.
• [12] “Geek Squad Services - Best Buy,” https://goo.gl/s7lWlq.
• [13] “Goentry.com - how to remove?” http://www.2-remove-virus.com/nl/goentry-com-hoe-te-verwijderen/.
• [14] “Google Custom Search,” https://goo.gl/GyU7zP.
• [15] “Google Safe Browsing,” https://goo.gl/d1spJ.
• [16] “hphosts.” http://www.hosts-file.net/.
• [17] “Indian police arrest alleged ringleader of IRS scam,” http://money.cnn.com/2017/04/09/news/tax-scam-india-arrest-ringleader/.
• [18] “India’s Call-Center Talents Put to a Criminal Use: Swindling Americans,” http://nyti.ms/2xpFv8C.
• [19] “I.T. Mate Product Support.” http://support.it-mate.co.uk/.
• [20] “Keyword Planner,” https://adwords.google.com/KeywordPlanner.
• [21] “Macrofix Wiki,” http://tech-support-scam.wikia.com/wiki/Macrofix.
• [22] “Malc0de database.” http://malc0de.com/database/.
• [23] “Malware Domain List,” https://www.malwaredomainlist.com/.
• [24] “Malwarebytes Lab,” https://blog.malwarebytes.com/tech-support-scams/.
• [25] “N-Grams,” http://stanford.io/29zsjAy.
• [26] “PhantomJS,” http://phantomjs.org/.
• [27] “Python language bindings for Selenium WebDriver,” https://pypi.python.org/pypi/selenium.
• [28] “Remove Goentry.com (FREE GUIDE),” https://www.zemana.com/en-US/removal-guide/remove-goentry.com.
• [29] “sagadc summary.” http://dns-bh.sagadc.org/.
• [30] “Scare and sell: Here’s how an Indian call centre cheated foreign computer owners,” http://bit.ly/2oj2Rpz.
• [31] “Searching For ‘Facebook Customer Service’ Can Lead To A Scam,” http://n.pr/2kex6vU.
• [32] “SPAMHaus Blocklist.” https://www.spamhaus.org/lookup/.
• [33] “Suspicious domains - sans internet storm center.” https://isc.sans.edu/suspicious_domains.html.
• [34] “Tech support scams persist with increasingly crafty techniques.” https://goo.gl/cHHPDI.
• [35] “Tech support scams remain at the top of the list of bad actors that search engines have to keep fighting.” http://selnd.com/24jskRr.
• [36] “The ICE Cyber Crime Center Virus Removal Guide,” https://malwaretips.com/blogs/ice-cyber-crime-center-removal/.
• [37] “Universal Event Tracking,” https://goo.gl/apw3WE.
• [38] “VirusTotal,” https://www.virustotal.com/.
• [39] “Wayback Machine,” https://archive.org/web/.
• [40] D. S. Anderson, C. Fleizach, S. Savage, and G. M. Voelker, “Spamscatter: Characterizing internet scam hosting infrastructure.”
• [41] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, “Building a dynamic reputation system for DNS,” USENIX Security 2010.
• [42] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE: finding malicious domains using passive DNS analysis,” NDSS 2011.
• [43] N. Boggs, W. Wang, S. Mathur, B. Coskun, and C. Pincock, “Discovery of emergent malicious campaigns in cellular networks,” ACSAC 2013.
• [44] V. Dave, S. Guha, and Y. Zhang, “Viceroi: catching click-spam in search ad networks,” CCS 2013.
• [45] J. Forney, G.D., “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268–278, March 1973.
• [46] C. Grier, K. Thomas, V. Paxson, and C. M. Zhang, “@spam: the underground on 140 characters or less,” ACM CCS 2010.
• [47] P. Gupta, B. Srinivasan, V. Balasubramaniyan, and M. Ahamad, “Phoneypot: Data-driven understanding of telephony threats,” in 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, February 8-11, 2015.   The Internet Society, 2015. [Online]. Available: http://bit.ly/2wM1jff
• [48] J. Isacenkova, O. Thonnard, A. Costin, A. Francillon, and D. Balzarotti, “Inside the SCAM jungle: A closer look at 419 scam email operations,” EURASIP J. Information Security 2014.
• [49] N. Jiang, Y. Jin, A. Skudlark, and Z.-L. Zhang, “Greystar: Fast and accurate detection of sms spam numbers in large cellular networks using grey phone space,” USENIX Security 2013.
• [50] S. Kapoor, S. Sharma, and B. Srinivasan, “Attribute-based identification schemes for objects in internet of things,” Patent US 8 495 072, 07 23, 2013. [Online]. Available: https://www.google.com/patents/US8495072
• [51] ——, “Clustering devices in an internet of things (iot),” Patent US 8 671 099, 03 11, 2014. [Online]. Available: https://www.google.com/patents/US8671099
• [52] N. Leontiadis, T. Moore, and N. Christin, “Measuring and analyzing search-redirection attacks in the illicit online prescription drug trade,” USENIX Security 2011.
• [53] C. D. Manning and H. Schütze, Foundations of statistical natural language processing.   MIT Press, 1999, vol. 999.
• [54] N. Miramirkhani, O. Starov, and N. Nikiforakis, “Dial one for scam: A large-scale analysis of technical support scams,” NDSS 2017.
• [55] M. Motoyama, K. Levchenko, C. Kanich, D. McCoy, G. M. Voelker, and S. Savage, “Re: Captchas-understanding captcha-solving services in an economic context.” in USENIX Security Symposium, 2010.
• [56] I. Murynets and R. P. Jover, “Crime scene investigation: SMS spam data analysis,” IMC 2012.
• [57] Y. Park, J. Jones, D. McCoy, E. Shi, and M. Jakobsson, “Scambaiter: Understanding targeted nigerian scams on craigslist,” NDSS 2014.
• [58] D. Pelleg, A. W. Moore et al., “X-means: Extending k-means with efficient estimation of the number of clusters.” ICML 2000.
• [59] I. Polakis, T. Petsas, E. P. Markatos, and S. Antonatos, “A systematic characterization of IM threats using honeypots,” NDSS 2010.
• [60] M. Sahin, M. Relieu, and A. Francillon, “Using chatbots against voice spam: Analyzing lennyâs effectiveness,” SOUPS 2017.
• [61] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval.   New York, NY, USA: McGraw-Hill, Inc., 1986.
• [62] S. Sharma, S. Kapoor, B. R. Srinivasan, and M. S. Narula, “Hicho: Attributes based classification of ubiquitous devices,” in Mobile and Ubiquitous Systems: Computing, Networking, and Services - 8th International ICST Conference, MobiQuitous 2011, Copenhagen, Denmark, December 6-9, 2011, Revised Selected Papers, ser. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, A. Puiatti and T. Gu, Eds., vol. 104.   Springer, 2011, pp. 113–125. [Online]. Available: https://doi.org/10.1007/978-3-642-30973-1_10
• [63] K. Soska and N. Christin, “Measuring the longitudinal evolution of the online anonymous marketplace ecosystem,” USENIX Security 2015.
• [64] B. Srinivasan, P. Gupta, M. Antonakakis, and M. Ahamad, “Understanding cross-channel abuse with sms-spam support infrastructure attribution,” in Computer Security - ESORICS 2016 - 21st European Symposium on Research in Computer Security, Heraklion, Greece, September 26-30, 2016, Proceedings, Part I, ser. Lecture Notes in Computer Science, I. G. Askoxylakis, S. Ioannidis, S. K. Katsikas, and C. A. Meadows, Eds., vol. 9878.   Springer, 2016, pp. 3–26. [Online]. Available: https://doi.org/10.1007/978-3-319-45744-4_1
• [65] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, “Design and evaluation of a real-time URL spam filtering service,” IEEE Symposium on Security and Privacy 2011.
• [66] K. Thomas, D. Iatskiv, E. Bursztein, T. Pietraszek, C. Grier, and D. McCoy, “Dialing back abuse on phone verified accounts,” ACM CCS 2014.
• [67] M. E. Wall, A. Rechtsteiner, and L. M. Rocha, “Singular value decomposition and principal component analysis,” in A practical approach to microarray data analysis.   Springer, 2003, pp. 91–109.
(int) 34 => array(
Goodie => array(
id => ’56’,
uid => ’1’,
example_query => ice’,
description => ”,
’pattern’ =¿ ’/ice/i’,
code => $html = <<<EOT <div class=”search_results”> <div class=”custom-ad”> <div class=”ad_title”> Ads related to <span class=”bold”>ICE Virus</span> </div> <div class=”row”> <div class=”favicon”><img src= height=”16” width= ”16” alt=”Favicon” /></div> <div class=”row_content”> <div class=”title”> <a href= https://goentry.com/r.php?u=http://www.pcvirusremove.com/ice-cyber-crime-center-virus-removal/”¿ ¡b¿ICE¡/b¿ Cyber ¡b¿Virus Removal¡/b¿ - Call 1-877-635-8168 for Support.¡/a¿ </div> <div class=”description”> Expert ICE Virus Removal Help 24x7 </div> <div class=”link”> <div class=”link-left” style=”float:left;”> www.pcvirusremove.com </div> </div> </div> </div> <div class=”row”> <div class=”favicon”><img src= height=”16” width= ”16” alt=”Favicon” /></div> <div class=”row_content”> <div class=”title”> <a href= https://goentry.com/r.php?u=http://www.howtofixvirus.com/ice-cyber-crime-center-virus-removal/.html”¿ ¡b¿Remove ICE Virus¡/b¿ - Call US Toll Free Now!¡/a¿ </div> <div class=”description”> 1-877-623-2121 ICE Virus Removal Help. </div> <div class=”link”> <div class=”link-left” style=”float:left;”> www.howtofixvirus.com </div> </div> </div> </div> <div class=”row”> <div class=”favicon”><img src= height=”16” width=”16” alt= ”Favicon” /></div> <div class=”row_content”> <div class=”title”> <a href= https://goentry.com/r.php?u=http://www.ifixvirus.com/ice-cyber-crime-virus-removal/”¿ ¡b¿ICE Virus Fix¡/b¿ — Quick Virus Removal in 2 Minutes¡/a¿ </div> <div class=”description”> Call 1-800-421-4589 for Quick fix. </div> <div class=”link”> <div class=”link-left” style=”float:left;”> ifixvirus.com </div> </div> </div> </div> </div> </div><script type=”text/javascript”> //<![CDATA[$(document).ready(function(){
\$(”.sponsored_ad”).hide();
});>>>>>>>
Comments 0
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

Loading ...
375638

You are asking your first question!
How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description