TDLens:Toward,an,Empirical,Evaluation,of,Provenance,Graph-Based,Approach,to,Cyber,Threat,Detection

【www.zhangdahai.com--其他范文】

Rui Mei,Hanbing Yan,Qinqin Wang,Zhihui Han,Zhuohang Lyu

1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China

2 School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China

3 National Computer Network Emergency Response Technical Team/Coordination Center of China(CNCERT/CC),Beijing 100029,China

*The corresponding author,email:yhb@cert.org.cn

Abstract:To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in network traffc.Several studies have begun to focus on threat detection methods based on provenance data of host-level event tracing.On the other side,with the signifcant development of big data and artifcial intelligence technologies,largescale graph computing has been widely used.To this end,kinds of research try to bridge the gap between threat detection based on host log provenance data and graph algorithm,and propose the threat detection algorithm based on system provenance graph.These approaches usually generate the system provenance graph via tagging and tracking of system events,and then leverage the characteristics of the graph to conduct threat detection and attack investigation.

Keywords:cyber threat detection;causality dependency graph;data provenance

Modern information systems in enterprises and other organizations have been important and irreplaceable driving force for better doing business.However,these systems are constantly being targeted by not only Advanced Persistent Threats(APTs)but also disgruntled insider employees[1–4].More enterprises have begun to deploy security system such as security operation center(SOC),threat detection software(TDS),and security information and event management(SIEM)to respond to cyber threats.These systems usually continuously monitor diverse activities in hosts or enterprise network for identifying potentially suspicious activities and send alerts to cyber analysts.

Figure 1.Reduced version of forward tracing graph for a remote access trojan(RAT)delivered by exploitation of MS Office vulnerability.

Among those proposed threat detection approaches,a type of threat detection algorithm based on the system provenance graph is considered to be an ideal method,with strong behavioral representation capabilities and relatively high effciency.Recently,many works in the security community have focused on the application of the system provenance graph.Figure 1 shows an example of exploiting a vulnerability in MS Offce(CVE-2017-11882)to spread the trojan for subsequent command and control(C2).In the fgure,the nodes marked in red are the critical system entities of intrusion.In other words,an ideal threat detection tool can identify red nodes and their paths as abnormal activities within a relatively lower response time.Although scholars have proposed a variety of novel graph-based threat detection approaches,few studies have made empirical evaluations of these methods.In this paper,we try to conduct an in-depth analysis and empirical assessment of several representative threat detection approaches.We desire that this effort can point out the direction for the improvement of graphbased threat detection approaches.

In summary,this paper makes the following contributions:

·We propose and implement a provenance graphbased analysis framework for the evaluation of threat detection algorithm.The framework has robust underlying critical modules,including operating system log collector,system provenance graph generation module,and preprocessing module.In addition,our framework also has a fexible interface for loading diverse threat detection algorithms and provides the capability on the comparison of the experiment results between multiple algorithms.

·We select and implement state-of-the-art threat detection approaches as evaluation objects for further evaluation.To this end,we collect benign dataset in a real-world IT environment,and simulate multiple attack scenarios in an isolated environment to collect malicious dataset.This datasets ensures the effectiveness and interpretability of the evaluation.

·We adopt crosswise comparison and longitudinal assessment to gain an in-depth understanding and insight into fve threat detection approaches.Our evaluation results provide a solid foundation for the improvement direction of the threat detection approach.

1.1 Adversarial Model

In this paper,threat detection approaches we evaluate are based on the following adversarial model:(1)Threat actors can be external attackers,who usually launch APT attacks,or insider employees with limited authorized access.(2)The adversaries deliver the attack payload through vulnerability exploitation,social engineering,etc.(3)The attack behavior will always cause changes that are inconsistent with normal behaviors,even if it is a small fuctuation.(4)The adversary does not know the details of the detection model.

1.2 Definitions

We list basic defnitions used in the rest of this paper.

System Entityrefers to performers or targets of system operations which are also called the system subject and the system object respectively.Our evaluationconsiders three types of system entities:process,file,andnetwork socket.

Table 1.The system entities and their relations we consider.

System Eventrefers to the operation of the subject to the object.An event can be denoted by a quad

System Provenance Graphrefers to a directed graph to denote the control dependency(e.g.,parent process start child process)and data dependency(e.g.,process send data to network socket)between subjects and objects.Formally,it is defned asG=

Causality Dependency.Given two eventse1=

In this section,we discuss graph-based threat detection approaches.The threat model and basic defnitions described below constitute the foundation for crosswise comparison and longitudinal evaluation of several state-of-the-art detection approaches in this paper.Then,we outline the main types and characteristics of threat detection methods based on system provenance graph.Finally,we describe in detail fve representative threat detection approaches and use them as the evaluation objects for further analysis.

2.1 Graph-Based Threat Detection

Due to the natural attributes of the system provenance graph,the threat detection algorithm based on the graph can usually leverage two types of information,namely,the attribute information of the vertices and edges of the graph,and the structure information of the graph.Previous study[5]has shown that an ideal TDS needs to consider three indicators,namely fast response,high effciency,and high accuracy.In order to fnd a balance between these three indicators,many pieces of research have been proposed.These research works mainly are divided into three categories,which are described as follows:

(1)Outlier Detection,aka Anomaly Detection,attempts to identify the abnormal event(i.e.,the relationship between anomalous system entities).Thus,this kind of approaches will model regular behaviors by collecting historical data or data from parallel systems.The normal model can be trained by not only manual extraction of features but also automatically embedding via machine learning.Therefore,anomaly detection algorithms are widely used for threat detection recently.

(2)Graph Matching.The conventional graph matching method frst defnes several suspicious local structure features or vertex attributes of graphs,then fnds whether there are suspicious local structures or vertices with similar attributes in the system provenance graph for identifying cyber threats.With the help of artifcial intelligence algorithms,many recent works leverage machine learning models to automatically learn abnormal features from labeled data.In general,the graph matching method is a computationally intensive method,which requires strong computing power.

(3)Tag Propagation.The tag propagation-based threat detection mechanism is basically to mark the entities of interest and spread the tags along the system control fow and data fow to discover potential threats.This approach mainly has two phases,namely tag initialization and tag propagation.Since this method uses an incremental approach to gradually explore the execution path,it can be applied to streaming data.

Table 2.Comparison of provenance graph-based approaches.

2.2 Approaches to Be Evaluated

For the abnormal events given by the threat detection method,the cyber analyst needs to further verify whether they are true alarms.Therefore,for threat detection methods,not only the precision and effciency is required,but also the interpretability of alarms is more important.The provenance graphbased approach is more ideal for threat detection due to its interpretability and simplifcation of implementation and deployment.The representative and state-of-the-art works are ProvDetector,NoDoze,PrioTracker,Poirot,and M&L[6–10].This paper will take these fve approaches as evaluation objects for crosswise comparison and longitudinal evaluation.Table 2 shows the summary of the characteristics of these three approaches[5].Among them,A-1,A-2,and A-3 are anomaly detection approahces.A-4 adopts graph matching method for threat detection,while A-5 is a tag propagation approach.

As an empirical evaluation,we will not review the details of these approaches in this paper.We outline their main steps as follows:

A-1 ProvDetector[6].

1)Calculate theregularity scoreof each edge in the system provenance graph based on its rareness and stability,which is defned as:

In Equation(1),H(e)is the set of hosts that eventehappens on whileHis the set of all the hosts in the enterprise.INandOUTdenote the stability extent ofuandvnodes duringntime windows.

2)Convert the directed graph(DG)into a directed acyclic graph(DAG),and fnd the top k longest paths(that is,the top k paths with the largest anomaly scores).

3)Introduce natural language processing(NLP)technique,and generate k sentences from k paths and embedding them used by doc2vec model[11].

4)Use the Local Outlier Factor(LOF)algorithm[12]to detect abnormal paths.

A-2 NoDoze[7].

1)Calculate theanomaly scoreof given dependency path in the system provenance graph based on its rareness and stability.Theanomaly scorecan be calculated as follows:

whereINandOUTare the same as Equation(1),whileMmeans the happening probability of this specifc event.

2)Normalize theanomaly scoreof each path based on training a bias parameter.

3)Sort all paths with anomaly score and select top k paths based on the difference of anomaly score between two adjacent paths.

4)Merge top k paths to a subgraph for anomaly detection.

A-3 PrioTracker[8].

1)Calculate therareness scoreof each edge,which is defned as:

whereMis the same as Equation(3).Ifehas not been observed by reference model,rs(e)=1,otherwise,rs(e)=M(e).

2)Calculate thefanout scoreof each edge,which is defned as:

whereσis a hyperparameter and set to be 0.3 empirically,andfanout(e)is out-degree of sink node ofe.Ifereaches a read-only fle in backward tracking,fs(e)=0.Ifereaches a write-only fle in forward tracking,fs(e)=σ.Otherwise,fs(e)=1/fantout(e).

3)Cauculate the priority score and leverage the Hill Climbing algorithm[13]to optimize the feature weights.

In equation(6),αandβneed to be determined by the Hill Climbing algorithm.

4)Maintain a queue to list the abnormal events based on their priority for anomaly detection by the cyber analyst before a given deadline.

5)Continuously update the reference model to improve the accuracy of anomaly scores.

A-4 Poirot[9].

1)Extract Indicators of Compromise(IoCs)related to known attacks and their relationships from cyber threat intelligence,and build causal dependency graphs,namely query graphs,denoted asGq.

2)Find all candidate matching setsi:j,whereiandjrepresent nodes inGqand system provenance graphGp,respectively.Then start from the matching with the highest matching probability(i.e.seed node),and expand the search range to fnd other node matching.

3)Leverage node alignment and graph alignment algorithms,and extract the subgraphs fromGpwith higher similarity betweenGqandGp.

A-5 M&L[10].

1)Collect the logs of the service application for pretraining,and obtain the event handling loop.

2)Initialize a tag for each system event in the event handling loop.

3)Track the propagation of the tag in the causal dependency graph.When the propagation of the tag exceeds the current event handling loop,it will no longer be tracked.

4)According to the tag propagation path in(3),prune irrelevant causal dependency on the system provenance graph.

5)Adopt na¨ıve forward tracking and backward tracking to analyze suspicious events.

In this section,we conduct several experiments to evaluate the correctness,effectiveness,and effciency of the aforementioned provenance threat detection approaches based on system provenance graph,and crosswise comparison between them as well.we go over the specifcs of the experiment setup and attack scenarios utilized for evaluation.Next,we present metrics for quantitatively comparing different dimensions of these techniques.Finally,we attempt to explain what caused the experiment results we observed in order to have a better understanding of the characteristics of diverse threat detection approaches.

In particular,to evaluate the characteristics of each approach,we search for answers to the following research questions:

RQ1:How accurate are these approaches to threat detection?(§3.3)

RQ2:How much can these approaches reduce the causality dependency graph of a true alert without sacrifcing the vital information needed for investigation?(§3.4)

RQ3:What are the runtime performance of these approaches?(§3.5)

RQ4:How effective are these approaches to saving storage space of log data?(§3.6)

3.1 Experiment Setup

As an initial step,we designed and implemented an analysis framework for empirical evaluation of the aforementioned threat detection approaches.Figure 2 shows the evaluation framework we implemented.It mainly includes the following three parts:tagging and tracking of system events(i.e.log collection and storage),generation of system provenance graph,implementation and deployment of diverse graph-based threat detection methods for obtaining detection outputs,and comparison with assessment metrics.First,modern operating system usually has built-in system event tagging and tracking mechanism,e.g.Event Tracing for Windows(ETW)[14]in the Windows platform and Linux audit framework[15]in the Linuxplatform.We have implemented event tracking and log collection module on the two mainstream platforms of Windows and Linux respectively.For Windows,we developed a log collection agent leveraging SilkETW[16],an open source framework proposed by FireEye.For Linux,we developed a corresponding log collector based on auditd,which is natively supported in the Linux kernel.All collected log data is stored in an Elasticsearch(ES)[17]server.Next,for the captured log data of all system events,we introduced previous foundational work BackTracker[18]to generate whole-system provenance graph.To effectively store the graph data structure and facilitate subsequent graph computing,we further implemented graph-based forward tracing,backward tracing,and bidirectional tracing algorithms based on the open source framework NetworkX[19].Then,we implemented the aforementioned graph-based threat detection algorithms,carried out threat detection on consistent datasets,and output the threat detection results.Finally,several metrics were calculated to evaluate different threat detection approaches and attempted to answer four research questions mentioned above.

Table 3.Overview of attack scenarios.

Figure 2.The overview of our analysis framework.

Table 4.Event collection in real-world endpoints.

To conduct crosswise comparison and longitudinal assessment of diverse methods on consistent datasets,we built a series of datasets to carry out subsequent experiments.First of all,we captured OS log data given permission by volunteers in a real-world IT enterprise.We deployed log collection agents on 5 heavily used endpoints and transmitted log data to the ES server.To obtain enough representative system provenance data with diverse types of uses,the 5 endpoints we selected were used for different purposes in their daily work.For example,an executive manager who daily send and receive emails and edit documents,fnancial personnel who frequently operate Excel,and IT systems administrator who heavily access remote servers.After gathering over one week,we fnally constructed our benign dataset for further analysis.

Besides the benign dataset we captured in a realworld environment mentioned above,we also considered construction of cyber threat dataset.To prepare malicious behavior dataset,we selected 6 typical cyber threat scenarios,including the following attack features:vulnerability exploitation,data exfltration,backdoor malware,phishing email,lateral movement,etc.Each scenario could have one or more attack features.Based upon our full knowledge of the cyber attack process,we manually collected the ground truth of these attack cases,so we could perform analysis with explicit interpretability.Considering that such attack cases may have unpredictable consequences on real-world IT systems,we simulated these attack scenarios in a virtual environment.We obtained malware samples and analysis reports from VirusTotal[20]and utilized them in the aforementioned attack scenarios,as well as the open-source penetration testing framework Metasploit[21],to perform vulnerabil-ity exploitation and C2.While simulating such attack cases,we adopted the consistent method and deployed the same agent as the collection of the benign dataset to capture the corresponding system provenance data,and analyzed it based on the ground truth to obtain correlated threat alerts that could identify these attacks.Table 3 details 6 types of attack scenarios and true threat events.Noting that the true alert denoted by a triad

Table 5.Event data of attack cases.

We deployed our analysis framework on a machine equipped with AMD Opertron™6136 CPU(10M Cache,16 Cores,2.4GHz)and 64GB of physical memory running Kali Linux OS.All modules in our analysis pipeline including provenance data preprocessing,causality graph generation,threat model training,and metrics assessment were totally deployed and conducted on this machine.

3.2 Dataset

Since the threat detection approaches we evaluated are all based on system provenance graph,we need to further generate the whole-system provenance graph after obtaining the benign and malicious system event log datasets detailed in§3.1.To better quantify the evaluation metrics,we generate provenance graphs for benign dataset and malicious dataset respectively.Table 4 summarizes the scale of the benign dataset.Raw Logs column indicates the number of system events and size of raw data.We collect a total of about 40GB of raw log data and have 46 million events in total.After generating whole-system provenance graphs,we can see the usage of storage space has reduced dramatically.Moreover,we adopt log reduction algorithms[22–24]to flter out irrelevant events for generating correlation graphs for further reducing the scale of the provenance graph without loss of semantics.Finally,we obtain the correlation graphs of 5 endpoints thatonly account for 51.84MB with about 90 thousand vertices and 255 thousand edges in total.

Table 6.Comparison of detection accuracy.

On the other hand,we obtain raw logs data from the replay process of 6 attack scenarios mentioned above and leverage the consistent algorithm to generate whole-system provenance graphs and corresponding correlation graphs.Table 5 shows the details of the number of events and the scale of correlation graphs.It is worth mentioning that the number of events generated in the 6 attack scenarios accounted for about 8.3‰of all benign data captured in a real-world environment,which is in line with our general expectations for the percentage of real-world cyber attacks.

3.3 Detection Accuracy

We blend the aforementioned benign dataset and the malicious dataset of each attack scenario to build 6 sets of data.Then we feed the dataset to each threat detection approach to threat analysis.It is worth mentioning that approaches A-4 and A-5 are not suitable for all attack scenarios.Since A-4 leverages the graph matching method,it needs to construct a query subgraph based on known threat intelligence.Among the 6 attack scenarios,only S-3 can obtain consistent threat intelligence for enough incidents exploiting CVE-2017-11882.In addition,because approach A-5 is mainly applied to service process,e.g.,Apache HTTP Server,FTP Service,etc.The characteristic of these services is that there is an event-handling loop during their execution.Therefore,using the method of tag propagation to limit the causal dependency of system events within an event-handling loop can mitigate the dependency explosion problem.To this end,S-4 and S-5 are suitable for the application of A-5,which has FTP service and RDP service respectively.Table 6 shows the detection accuracy of each method we evaluate.The indicator Critical Unit(CU)represents the key malicious events that need to be detected based on our comprehensive understanding of the ground truth,which are listed in detail in Table 3.All Unit(AU)denotes the number of alarms output by each threat detection method we evaluate.Due to the different detection granularity adopted by different detection approaches,AU is considered an abnormal system event in A-3 and A-5,suspicious path in A-1 and A-2,and anomaly subgraph in A-4.While True Unit(TU)indicates the number of true alarms,that is CUs detected by each approach.

For each attack scenario,we marked the best approach with bold font in Table 6.Among them,A-2 gains the best results in three of the 6 scenarios,and also obtains better results in the other three scenarios.This is mainly because A-2 not only assigns an anomaly score to each event by constructing a reference model,but also refers to external threat intelligence,e.g.,malicious IP,malicious fle extensions,etc.,which undoubtedly improves the accuracy of the anomaly score of the event.We can also see that A-3 and A-4 do not achieve good results in all attack scenarios.For A-3,due to the use of single event(edge in graph)granularity as the alarm unit,this greatly weakens the ability to detect modern multi-stage attacks.As for A-4,as modern sophisticated attacks cannot have full knowledge in advance,the effect of detection methods based solely on threat intelligence is not ideal.As a particular approach,although A-5 is only applicable to S-4 and S-5,it can effectively output all CUs due to alleviation of the dependency explosion problem.Indeed,the shortcoming of A-5 is also obvious,that is it outputs too many AUs i.e.false alarms,thus it is diffcult to work in practical applications.

3.4 Time Saved Conducting Investigation

Since current cyber threat detection and investigation is still a manual or semi-automated labor-intensive task,an important indicator for evaluating cyber threat detection approaches is the time saved by the attack investigation.This includes not only the number of alarms output by the threat detection method but also whether the alarms are presented in a way that facilitates threat analysis.Since the methods evaluated in this paper are all based on the system provenance graph,all alarms are fed back to security analysts via graphs.This visualization greatly improves the effciency of the analyst’s correlation analysis.To facilitate the quantitative evaluation of diverse methods,we focus on general indicators such as alarm precision and recall rate to assess these approaches.

Figure 3.Comparison of response time.

Table 7 shows the performance of these methods we evaluate in 6 attack scenarios.As discussed in§3.3,Method A-2 achieves the best results in recall rates of all 5 approaches.Not only that,but also A-2 is the best among all evaluation methods in terms of precision rate.This is mainly because A-2 uses the causal path as the alarm unit,so it can capture more CUs with fewer alarms.Although A-1 also adopts a similar method to A-2,both the precision rate and the recall rate are slightly weaker than A-2.This may be due to the introduction of the Local Outlier Factor(LOF)anomaly detection algorithm to extract the abnormal path,which is sensitive to the homogeneity of captured system events.Noting that A-5 uses the single event as the alarm unit,so a large number of alarms appear during the detection process.We sort the number of the same alarms in ascending order and extract the top 5000 distinct alarms for analysis.Although A-5 achieves a 100%recall rate in the two applicable attack scenarios,the low F-measure makes this method impossible to use in practice.

3.5 Runtime Performance

To answer RQ3,we measure the runtime overhead of these approaches we evaluate for 6 attack scenarios if applicable.Figure 3 shows the comparison of the response time of different methods.We see that four approaches except A-5 can accomplish the steps includ-ing feature representation,graph embedding,graph query,and anomaly detection within 10 minutes.This is because although A-5 is an instrumentation-free method,to capture the event handling loop of the server application,a lot of pre-training work is still required,and this part of the work takes more than 10 minutes to complete.In fact,we spend 6 hours determining the event handling loop of a specifc version of the service program to obtain a better detection effect.In addition,we can see that the response time of A-1 is much higher than that of A-2,A-3,and A-4.This is mainly because A-1 needs to convert the system provenance graph from the directed graph(DG)to the directed acyclic graph(DAG)in order to adopt the shortest path algorithm between two vertices with lower time complexity in graph computing.This process consumes more time.A-4 has a similar response time to A-2 and A-3,but it spends time on generating query subgraphs based on threat intelligence,rather than the A-2 or A-3 anomaly detection stage.

Table 7.Comparison of recall and precision rates,and F-measure.

Table 8.Comparison of space overhead.

3.6 Space Overhead

To evaluate the storage space overhead of these approaches in the threat detection process,we record the size of the intermediate results generated by diverse methods at different stages in the experiment.Table 8 lists the space overhead of different methods.The values in the table are the averages of the results of the same method in all 6 attack scenarios.Since not every method has an intermediate output at all stages,we can see a big difference in the space usage of these methods.Method A-1 is the approach with the largest space utilization rate.The main reason is that in the process of converting DG to DAG,a large number of nodes and corresponding edges need to be split,which greatly expands the scale of the graph.A-2 and A-3 have similar space overhead.While A-4 and A-5 have very small space usage because they are not based on anomaly detection algorithms,so no additional reference models and feature representation are needed.

In this section,we discuss the reasons for choosing these 5 provenance graph-based threat detection approaches for evaluation,and under which conditions they are worth applying.In addition,we discuss the threats to validity for discovering limitations of this paper.

4.1 Approaches Selection

As mentioned in§2.1,there are three main threat detection methods based on the provenance graph,namely outlier detection(aka anomaly detection),graph matching,and tag propagation.We choose A-1,A-2,and A-3 as typical anomaly detection-based methods for two main reasons:(1)these three methods are a series of continuous research work.Concretely,A-3 uses single event as the alarm unit,and A-2 further uses the causal path level as the alarm unit.While the latest method A-1 optimizes the causal path embedding method of A-2.Through the evaluation of three methods,we can discover the impact of subtle technical details on the threat detection results.(2)Other similar methods almost use the same technical architecture as these three methods,so the help for crosswise comparison of such methods is not signifcant.In addition,A-4 is a representative threat detection method based on graph matching.It generates a set of query subgraphs based on threat intelligence to identify whether there are known threats in the system provenance graph.A-5 is a method of tag propagation.By setting tags for system events in the same event handling loop,the problem of dependency explosion can be effectively alleviated.Choosing A-4 and A-5 can give us insight into the detection results and impact factors between different types of approaches.

4.2 Approaches Applicability

From our empirical evaluation,we can see that the approaches A-1,A-2,and A-3 can be applied to threat detection in a variety of attack scenarios.However,the evaluation results show that A-3 is unable to detect modern multi-stage cyber attacks due to the use of event-granularity alarms.In addition,compared with A-1,the approach A-2 has outstanding performance in both precision rate and recall rate.A-1 is more suitable for threat detection of homogeneous system events,e.g.specifc stealthy malware.As a graph matching method based on threat intelligence,A-4 relies more on the completeness of threat knowledge,so its ability to detect unknown threats is weak.A-5 is especially suitable for threat detection of server-side applications which receive users’requests and respond to them.

4.3 Threats to Validity

we discuss three threats that affect the validity of our evaluation.

First,only three types of events are retained in our evaluation as mentioned in Table 1.Without loss of generality,it still works for new other event types.Since these methods we evaluate use the same dataset and provenance graph generation algorithm,this will not affect the evaluation results.

Second,our dataset does not seem to be too large.It would not affect the comparison of different threat detection approaches.Instead,it would be easier to fnd what factors cause these diversities.

Third,the virtual environment that we used to collect the malicious dataset might be slightly different from the real-world environment.To eliminate this deviation,we(1)choose malware that is not sensitive to the environment,and(2)ignore system events that are used for virtual machine control and management.In other words,this preprocessing could prevent the threat detection algorithm from overftting when training the model due to environmental differences.

DARPA has launched the Transparent Computing(TC)Project in 2015,to make new efforts toward threat analysis[25].The TC project attempts to fnd a high-fdelity and visualized method to abstract the interaction between components in an opaque system,to implement more effective security policies on distributed systems or multi-component systems.The high complexity of modern information systems obscures the connections between securityrelated events.Therefore,it is diffcult to fnd advanced attack patterns such as APT with existing methods.The TC project develops the ability to disseminate security-related information,trace the full knowledge of the provenance of events,and ensure that the interaction of the components is consistent with the security policy.

With the funding of the TC project,besides the 5 representative approaches evaluated in this paper,the security community has proposed a large body of research work for cyber threat detection[5,26–29]including our previous work CTScopy[30].SWIFT[31]proposed a real-time threat detection technique to improve the throughput of causal tracing.This method designed a memory database to reduce the disk operation time as much as possible.Pagoda[32]introduced a hybrid approach that takes into account the anomaly degree of both a single provenance path and the whole provenance graph.Its subsequence work PGaussian[33]presented a method leveraging gaussian distribution scheme to detect variants leveraging realtime memory database for improving performance.

Cyber threat investigation and attack reconstruction are another application of system provenance graphs.Given the true alarms,attack reconstruction can help security analysts fully characterize the entire process of cyber attacks[34–36].The latest development in this aspect is ATLAS[37],a framework for building end-to-end security incidents from existing audit logs.It uses a combination of causality analysis,NLP,and machine learning techniques to model and identifes different attack patterns through sequence-based analysis.ATLAS can recover the key attack steps that constitute an attack story with high accuracy and effciency.

Leveraging system provenance graph for large-scale malware analysis is a new effort.Our previous work RansomLens[38]proposed a novel approach to analyze the inter-family and intra-family behaviors of ransomware based on the provenance graph of running instances of ransomware.It maps the behavioral characteristics of the ransomware to the indicators of the corresponding provenance graph,e.g.identifying the key encryption process of the ransomware through the closeness centrality(CC)indicator of the graph.This graph computing-based approach can effectively conduct large-scale malware analysis.

In this paper,we implement an analysis framework for empirically evaluating provenance graph-based threat detection approaches.To evaluate diverse kinds of threat detection methods,we collect benign data from the real-world environment,and malicious dataset via simulating attack scenarios in an isolated virtual environment.We perform crosswise comparison and longitudinal assessment for fve state-of-the-art approaches,and the evaluation results are of great signifcance for understanding provenance graph-based threat detection approaches.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.This work is supported by National Natural Science Foundation of China(No.U1736218)and National Key R&D Program of China(No.2018YFB0804704).The analysis infrastructure and dataset of this work are partially supported by CNCERT/CC.

推荐访问:Provenance Graph Evaluation

本文来源:http://www.zhangdahai.com/shiyongfanwen/qitafanwen/2023/0427/590167.html

  • 相关内容
  • 热门专题
  • 网站地图- 手机版
  • Copyright @ www.zhangdahai.com 大海范文网 All Rights Reserved 黔ICP备2021006551号
  • 免责声明:大海范文网部分信息来自互联网,并不带表本站观点!若侵害了您的利益,请联系我们,我们将在48小时内删除!