DarkBERT AI Can Fight Cyber Crimes: Trained On The Dark Web

In an unprecedented move, a group of South Korean academics developed an LLM trained exclusively on information from the dark web, DarkBERT. The objective behind the creation of artificial intelligence tool is to outperform existing language models and aids threat researchers, cybersecurity, and law enforcement experts in combating cyber threats.

What Is DarkBERT?

It is an artificial intelligence tool Roberta Architecture-based transformer-based encoder model. The tool is based on LLM and specifically trained on millions of dark web pages, including data from hacker forums, scamming websites, and other criminal internet sources.

However, the word dark web refers to an uncountable vast area of the internet using standard web browsers. However, the area is famous for its anonymous markets and websites, which are renowned for criminal activities, including the trafficking of stolen data, firearms, and narcotics.


The researchers work smartly and use the ToR network to access the dark web and gather data for Darkbert’s training. To create a polished dark web database, they meticulously sorted the available data using various strategies such as category balancing, deduplication, and pre-processing to produce a refined dark Web database. Later, it was then given to Roberta for nearly 15 days to generate Drakbert.

Cybersecurity Applications of DarkBERT

DarkBERT excels at spotting specific potential threats and has an exceptional understanding of the common language used by cybercriminals. It can conduct research on the dark web and identify and highlight cybersecurity threats like data breaches and Ransomware. This makes it a valuable tool to fight against cyber threats and other Ransomware.

According to researchers, it resembles two well-known NLP models Beer, and Roberta, evaluating their performance across three critical cybersecurity-related cases.

Check Dark Web Forums For Potentially Hazardous Topics

Monitoring dark web forums, specifically used to exchange unlawful information, is essential. It is a challenging task to discover potentially harmful posts. But if we evaluate it manually, it may be time-consuming. Therefore, security specialists will get an advantage from the automation of the process.

Locate Websites That Stores Sensitive Information

Ransomware and hacker groups are super active these days. They use the dark web to set up leak different sensitive websites to reveal official and secret information stolen from firms refusing to pay Ransome demands.

Many things circulated occasionally, and fraudsters’ posts leaked information to the dark web. The leaked data includes passwords and bank information, intending to sell it.

Identify Threat Related Keywords On The Dark Web

Drakbert works on the fill-mask function, a language model feature from the Bert family, to efficiently work on words directly associated with illegal activity, such as drug sales on the dark web.

On the other hand, other models proposed generic words and keywords unrelated to drugs. Such as multiple professions, darkberg created drug-related terms when ‘MDMA’was concealed on a specific drug sales website.

Hence, the tool’s ability to find words and keywords associated with any illegal activity may lead to addressing emerging cyber risks.


