Two-phase PFAC algorithm for multiple patterns matching on CUDA GPUs

Wei Shen Lai, Chao Chin Wu, Lien Fu Lai, Min Chi Sie

Research output: Contribution to journalArticle

Abstract

The rapid advancement of high speed networks has resulted in a significantly increasing number of network packets per second nowadays, implying network intrusion detection systems (NIDSs) need to accelerate the inspection of packet content to protect the computer systems from attacks. On average, the pattern matching process in a NIDS consumes approximately 70% of the overall processing time. The conventional Aho–Corasick (AC) algorithm, adopting a finite state machine to identify attack patterns in NIDSs, is too slow to meet the requirement of high speed networks. In view of this, several studies have used the features of a graphics processing unit (GPU) to improve the core searching process of the AC algorithm. For instance, parallel failureless Aho-Corasick (PFAC) algorithm improves the process of pattern matching effectively by removing backward branches in the original finite state machine created using the AC algorithm. In this way, boundary detection can be avoided totally if we allocate an individual thread to each byte of an input stream to identify any pattern starting at the thread’s starting position. However, through analysis, we found that this algorithm experiences a serious load imbalance problem. Therefore, this paper proposes a two-phase PFAC algorithm to address the problem. A threshold is predefined to divide execution into two phases, and the failureless finite state machine is also decoupled into two parts accordingly. In the first phase, every thread identifies patterns by running the tiny part of the decoupled failureless finite state machine that are stored in fast shared memory. In the second phase, all the threads requiring further searching in a same block are regrouped into a few warps for less branch divergence. According to experimental results, the proposed algorithm shows a performance improvement of 50% compared to the PFAC algorithm.

Original languageEnglish
Article number270
JournalElectronics (Switzerland)
Volume8
Issue number3
DOIs
Publication statusPublished - 2019 Mar

Fingerprint

Pattern matching
Finite automata
Intrusion detection
HIgh speed networks
Graphics processing unit
Packet networks
Computer systems
Inspection
Data storage equipment
Processing

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Cite this

@article{753d1fa308b94cd5bd49eed119984eda,
title = "Two-phase PFAC algorithm for multiple patterns matching on CUDA GPUs",
abstract = "The rapid advancement of high speed networks has resulted in a significantly increasing number of network packets per second nowadays, implying network intrusion detection systems (NIDSs) need to accelerate the inspection of packet content to protect the computer systems from attacks. On average, the pattern matching process in a NIDS consumes approximately 70{\%} of the overall processing time. The conventional Aho–Corasick (AC) algorithm, adopting a finite state machine to identify attack patterns in NIDSs, is too slow to meet the requirement of high speed networks. In view of this, several studies have used the features of a graphics processing unit (GPU) to improve the core searching process of the AC algorithm. For instance, parallel failureless Aho-Corasick (PFAC) algorithm improves the process of pattern matching effectively by removing backward branches in the original finite state machine created using the AC algorithm. In this way, boundary detection can be avoided totally if we allocate an individual thread to each byte of an input stream to identify any pattern starting at the thread’s starting position. However, through analysis, we found that this algorithm experiences a serious load imbalance problem. Therefore, this paper proposes a two-phase PFAC algorithm to address the problem. A threshold is predefined to divide execution into two phases, and the failureless finite state machine is also decoupled into two parts accordingly. In the first phase, every thread identifies patterns by running the tiny part of the decoupled failureless finite state machine that are stored in fast shared memory. In the second phase, all the threads requiring further searching in a same block are regrouped into a few warps for less branch divergence. According to experimental results, the proposed algorithm shows a performance improvement of 50{\%} compared to the PFAC algorithm.",
author = "Lai, {Wei Shen} and Wu, {Chao Chin} and Lai, {Lien Fu} and Sie, {Min Chi}",
year = "2019",
month = "3",
doi = "10.3390/electronics8030270",
language = "English",
volume = "8",
journal = "Electronics (Switzerland)",
issn = "2079-9292",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "3",

}

Two-phase PFAC algorithm for multiple patterns matching on CUDA GPUs. / Lai, Wei Shen; Wu, Chao Chin; Lai, Lien Fu; Sie, Min Chi.

In: Electronics (Switzerland), Vol. 8, No. 3, 270, 03.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Two-phase PFAC algorithm for multiple patterns matching on CUDA GPUs

AU - Lai, Wei Shen

AU - Wu, Chao Chin

AU - Lai, Lien Fu

AU - Sie, Min Chi

PY - 2019/3

Y1 - 2019/3

N2 - The rapid advancement of high speed networks has resulted in a significantly increasing number of network packets per second nowadays, implying network intrusion detection systems (NIDSs) need to accelerate the inspection of packet content to protect the computer systems from attacks. On average, the pattern matching process in a NIDS consumes approximately 70% of the overall processing time. The conventional Aho–Corasick (AC) algorithm, adopting a finite state machine to identify attack patterns in NIDSs, is too slow to meet the requirement of high speed networks. In view of this, several studies have used the features of a graphics processing unit (GPU) to improve the core searching process of the AC algorithm. For instance, parallel failureless Aho-Corasick (PFAC) algorithm improves the process of pattern matching effectively by removing backward branches in the original finite state machine created using the AC algorithm. In this way, boundary detection can be avoided totally if we allocate an individual thread to each byte of an input stream to identify any pattern starting at the thread’s starting position. However, through analysis, we found that this algorithm experiences a serious load imbalance problem. Therefore, this paper proposes a two-phase PFAC algorithm to address the problem. A threshold is predefined to divide execution into two phases, and the failureless finite state machine is also decoupled into two parts accordingly. In the first phase, every thread identifies patterns by running the tiny part of the decoupled failureless finite state machine that are stored in fast shared memory. In the second phase, all the threads requiring further searching in a same block are regrouped into a few warps for less branch divergence. According to experimental results, the proposed algorithm shows a performance improvement of 50% compared to the PFAC algorithm.

AB - The rapid advancement of high speed networks has resulted in a significantly increasing number of network packets per second nowadays, implying network intrusion detection systems (NIDSs) need to accelerate the inspection of packet content to protect the computer systems from attacks. On average, the pattern matching process in a NIDS consumes approximately 70% of the overall processing time. The conventional Aho–Corasick (AC) algorithm, adopting a finite state machine to identify attack patterns in NIDSs, is too slow to meet the requirement of high speed networks. In view of this, several studies have used the features of a graphics processing unit (GPU) to improve the core searching process of the AC algorithm. For instance, parallel failureless Aho-Corasick (PFAC) algorithm improves the process of pattern matching effectively by removing backward branches in the original finite state machine created using the AC algorithm. In this way, boundary detection can be avoided totally if we allocate an individual thread to each byte of an input stream to identify any pattern starting at the thread’s starting position. However, through analysis, we found that this algorithm experiences a serious load imbalance problem. Therefore, this paper proposes a two-phase PFAC algorithm to address the problem. A threshold is predefined to divide execution into two phases, and the failureless finite state machine is also decoupled into two parts accordingly. In the first phase, every thread identifies patterns by running the tiny part of the decoupled failureless finite state machine that are stored in fast shared memory. In the second phase, all the threads requiring further searching in a same block are regrouped into a few warps for less branch divergence. According to experimental results, the proposed algorithm shows a performance improvement of 50% compared to the PFAC algorithm.

UR - http://www.scopus.com/inward/record.url?scp=85063473810&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063473810&partnerID=8YFLogxK

U2 - 10.3390/electronics8030270

DO - 10.3390/electronics8030270

M3 - Article

AN - SCOPUS:85063473810

VL - 8

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

SN - 2079-9292

IS - 3

M1 - 270

ER -