TY - GEN
T1 - BEAM - An Algorithm for Detecting Phishing Link
AU - Cleon Liew, Sea Ran
AU - Law, N. F.
N1 - Publisher Copyright:
© 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA).
PY - 2022/11
Y1 - 2022/11
N2 - This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.
AB - This paper aims to develop an attention-based phishing detector by performing sub-word tokenization and fme-tuning the Bidirectional Encoder Representation from Transformers (BERT) model. It is called BERT embedding attention model (BEAM). Our proposed BEAM method contains five building blocks: a data pre-processing block to extract components according to the URL structure, a tokenization block to tokenize the individual URL components into a number of sub-words, an embedding block to produce a numerical sequence representation, an encoding block to give a context feature vector and a classification block for phishing URL detection. The subword tokenization allows us to characterize the relationship among connecting subwords, while the attention mechanism in the BERT allows the proposed model to focus selectively on important parts contributing to phishing behavior. We have compared our proposed BEAM method with other existing state-of-the-art phishing detection methods such as CNN, Bi-LSTM, and machine learning models (random forest and XGBoost). Experimental results confirm that our proposed BEAM method effectively detects phishing links and outperforms other existing methods.
UR - http://www.scopus.com/inward/record.url?scp=85146290644&partnerID=8YFLogxK
U2 - 10.23919/APSIPAASC55919.2022.9979860
DO - 10.23919/APSIPAASC55919.2022.9979860
M3 - Conference article published in proceeding or book
AN - SCOPUS:85146290644
T3 - Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
SP - 598
EP - 604
BT - Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Y2 - 7 November 2022 through 10 November 2022
ER -