I am a research scientist at Georgia Institute of Technology performing research in the areas of network and computer security. In 2018, I received my PhD from Georgia Institute of Technology under the advisement of Manos Antonakakis, and before that, I graduated from from both Wake Forest University (M.S., Computer Science, ’11) and Duke University (B.S., Computer Science, ’06).
My research interests are focused at the intersection of security, networking, and data science. Much of my previous work has focused on performing large scale, empirical studies using real world networking datasets. These efforts have led to data-driven solutions to existing and emerging threats as well as several active projects including the Active DNS and YourThings projects.
Performed research into characterizing the malicious threats seen in cellular data networks.
Performed research into exploiting graph structure for identity resolution and de-anonymization.
Performed research in the area of cyber security and helped to setup and deploy a network security testbed.
Development of virtual terminal web application and payment gateway for the company's transaction processing engine.
Developed business intelligence and data warehousing solutions for the Distributed Asset Industry.
The Domain Name System is a critical piece of infrastructure that has expanded into use cases beyond its original intent. DNS TXT records are intentionally very permissive in what information can be stored there, and as a result are often used in broad and undocumented ways to support Internet security and networked applications. In this paper, we identified and categorized the patterns in TXT record use from a representative collection of resource record sets. We obtained the records from a data set containing 1.4 billion TXT records collected over a 2 year period and used pattern matching to identify record use cases present across multiple domains. We found that 92% of these records generally fall into 3 categories; protocol enhancement, domain verification, and resource location. While some of these records are required to remain public, we discovered many examples that unnecessarily reveal domain information or present other security threats (e.g., amplification attacks) in conflict with best practices in security.
In-browser cryptojacking is a form of resource abuse that leverages end-users’ machines to mine cryptocurrency without obtaining the users’ consent. In this paper, we design, implement, and evaluate Outguard, an automated cryptojacking detection system. We construct a large ground-truth dataset, extract several features using an instrumented web browser, and ultimately select seven distinctive features that are used to build an SVM classification model. Outguard achieves a 97.9% TPR and 1.1% FPR and is reasonably tolerant to adversarial evasions. We utilized Outguard in the wild by deploying it across the Alexa Top 1M websites and found 6,302 cryptojacking sites, of which 3,600 are new detections that were absent from the training data. These cryptojacking sites paint a broad picture of the cryptojacking ecosystem, with particular emphasis on the prevalence of cryptojacking websites and the shared infrastructure that provides clues to the operators behind the cryptojacking phenomenon.
The Mirai botnet, composed primarily of embedded and IoT devices, took the Internet by storm in late 2016 when it overwhelmed several high-profile targets with massive distributed denial-of-service (DDoS) attacks. In this paper, we provide a seven-month retrospective analysis of Mirai’s growth to a peak of 600k infections and a history of its DDoS victims. By combining a variety of measurement perspectives, we analyze how the botnet emerged, what classes of devices were affected, and how Mirai variants evolved and competed for vulnerable hosts. Our measurements serve as a lens into the fragile ecosystem of IoT devices. We argue that Mirai may represent a sea change in the evolutionary development of botnets—the simplicity through which devices were infected and its precipitous growth, demonstrate that novice malicious techniques can compromise enough low-end devices to threaten even some of the best-defended targets. To address this risk, we recommend technical and nontechnical interventions, as well as propose future research directions.
Domain squatting is a common adversarial practice where attack- ers register domain names that are purposefully similar to popular domains. In this work, we study a specific type of domain squatting called “combosquatting,” in which attackers register domains that combine a popular trademark with one or more phrases (e.g., bet- terfacebook[.]com, youtube-live[.]com). We perform the first large- scale, empirical study of combosquatting by analyzing more than 468 billion DNS records—collected from passive and active DNS data sources over almost six years. We find that almost 60% of abusive combosquatting domains live for more than 1,000 days, and even worse, we observe increased activity associated with combosquat- ting year over year. Moreover, we show that combosquatting is used to perform a spectrum of different types of abuse including phishing, social engineering, affiliate abuse, trademark abuse, and even advanced persistent threats. Our results suggest that com- bosquatting is a real problem that requires increased scrutiny by the security community.
Both the operational and academic security communities have used dynamic analysis sandboxes to execute malware samples for roughly a decade. Network information derived from dynamic analysis is frequently used for threat detection, network policy, and incident response. Despite these common and important use cases, the efficacy of the network detection signal derived from such analysis has yet to be studied in depth. This paper seeks to address this gap by analyzing the network communications of 26.8 million samples that were collected over a period of five years.
Using several malware and network datasets, our large scale study makes three core contributions. (1) We show that dynamic analysis traces should be carefully curated and provide a rigorous methodology that analysts can use to remove potential noise from such traces. (2) We show that Internet miscreants are increasingly using potentially unwanted programs (PUPs) that rely on a surprisingly stable DNS and IP infrastructure. This indicates that the security community is in need of better protections against such threats, and network policies may provide a solid foundation for such protections. (3) Finally, we see that, for the vast majority of malware samples, network traffic provides the earliest indicator of infection—several weeks and often months before the malware sample is discovered. Therefore, network defenders should rely on automated malware analysis to extract indicators of compromise and not to build early detection systems.
Most modern cyber crime leverages the Domain Name Sys- tem (DNS) to attain high levels of network agility and make detection of Internet abuse challenging. The majority of malware, which represent a key component of illicit Internet operations, are programmed to locate the IP address of their command-and-control (C&C) server through DNS lookups. To make the malicious infrastructure both agile and resilient, malware authors often use sophisticated communication methods that utilize DNS (i.e., domain generation algorithms) for their campaigns. In general, Internet miscreants make extensive use of short-lived disposable domains to promote a large variety of threats and support their criminal network operations.
To effectively combat Internet abuse, the security community needs ac- cess to freely available and open datasets. Such datasets will enable the development of new algorithms that can enable the early detection, track- ing, and overall lifetime of modern Internet threats. To that end, we have created a system, Thales, that actively queries and collects records for massive amounts of domain names from various seeds. These seeds are collected from multiple public sources and, therefore, free of privacy con- cerns. The results of this effort will be opened and made freely available to the research community. With three case studies we demonstrate the detection merit that the collected active DNS datasets contain. We show that (i) more than 75% of the domain names in PBL appear in our datasets several weeks (and some cases months) in advance, (ii) existing DNS research can implemented using only active DNS, and (iii) malicious campaigns can be identified with the signal provided by active DNS.
Domains are frequently used as trust anchors, but any individual that re-registers an expired domain implicitly inherits the residual trust associated with the domain's prior use. We find that adversaries can, and do, use malicious re-registration to exploit domain ownership changes—undermining the security of both users and systems. In fact, we find that many seemingly disparate security problems share a root cause in residual domain trust abuse. With this study we shed light on the seemingly unnoticed problem of residual domain trust by measuring the scope and growth of this abuse over the past six years. During this time, we identified 27,758 domains from public blacklists and 238,279 domains resolved by malware that expired and then were maliciously re-registered. To help address this problem, we propose a technical remedy and discuss several policy remedies. For the former, we develop Alembic, a lightweight algorithm that uses only passive observations from the Domain Name System (DNS) to flag potential domain ownership changes. Using this algorithm, we identify several instances of residual trust abuse, including an expired APT domain that was available for immediate re-registration.
Garbled circuits offer a powerful primitive for computation on a user’s personal data while keeping that data private. Despite recent improvements, constructing and evaluating circuits of any useful size remains expensive on the limited hardware resources of a smartphone, the primary computational device available to most users around the world. In this work, we develop a new technique for securely outsourcing the generation of garbled circuits to a Cloud provider. By outsourcing the circuit generation, we are able to eliminate the most costly operations from the mobile device, including oblivious transfers. After proving the security of our techniques in the malicious model, we experimentally demonstrate that our new protocol, built on this role reversal, decreases execution time by 98% and reduces network costs by as much as 92% compared to previous outsourcing protocols. In so doing, we demonstrate that the use of garbled circuits on mobile devices can be made nearly as practical as it is becoming for server-class machines.
Much of the attention surrounding mobile malware has focused on the in-depth analysis of malicious applications. While bringing the community valuable information about the methods used and data targeted by malware writers, such work has not yet been able to quantify the prevalence with which mobile devices are actually infected. In this paper, we present the first such attempt through a study of the hosting infrastructure used by mobile applications. Using DNS traffic collected over the course of three months from a major US cellular provider as well as a major US non-cellular Internet service provider, we identify the DNS domains looked up by mobile applications, and analyze information related to the Internet hosts pointed to by these domains. We make several important observations. The mobile malware found by the research community thus far appears in a minuscule number of devices in the network: 3,492 out of over 380 million (less than 0.0009%) observed during the course of our analysis. This result lends credence to the argument that, while not perfect, mobile application markets are currently providing adequate security for the majority of mobile device users. Second, we find that users of iOS devices are virtually identically as likely to communicate with known low reputation domains as the owners of other mobile platforms, calling into question the conventional wisdom of one platform demonstrably providing greater security than another. Finally, we observe two malware campaigns from the upper levels of the DNS hierarchy and analyze the lifetimes and network properties of these threats. We also note that one of these campaigns ceases to operate long before the malware associated with it is discovered suggesting that network-based countermeasures may be useful in the identification and mitigation of future threats.
An important component of network resource management and security enforcement is recognizing the applications active on a network. Unfortunately payload encryption and the use of non-standard ports render traditional application identification methods marginally useful. Newer in-the-dark application discovery methods can contend with these conditions, but still rely on packet level information that may not be readily available to administrators.
This paper describes the initial findings and future directions of a technique that uses network motifs (e.g. overrepresented interaction subgraphs) to identify network activity. Modeling the flow-level network interactions as a graph, the proposed approach identifies sets of frequently occurring subgraphs useful to infer the applications. Initial results show this approach can achieve an average accuracy of 85% in mapping motifs to applications. We argue that performance can be improved by incorporating features into motifs that provide information about vertices and edges while preserving the ability for system administrators to gather such feature information from flow-level traces. Specific issues that arise in the collection of computer network interaction data and in dealing with the scale of such data are also highlighted.