ChatGPT

Study confirms that ChatGPT is good at hacking

April 18, 2024

The results obtained by new academic research warn of the limits to which the AI model can reach in the field of programming and computer security.

Since the first reviews of ChatGPT, it has been said that the language model can help, (although not without some failures), programmers to review and correct code. This led some to warn of the potential of this technology to be used by hackers or cybercriminals, who could use it to develop malware or find weak points.

These ideas now go from suspicions to well-founded concerns. A study published by an American university states that OpenAI’s GPT-4 artificial intelligence model is capable of exploiting vulnerabilities in real systems simply by reading security reports that describe a failure in a certain service or platform.

To develop the experiment, the group of four UIUC software engineers collected “a data set of 15 vulnerabilities from one day that includes those categorized as critical severity in the CVE description.”

The CVE is a list called Common Vulnerabilities and Exposures . It is a registry funded by the United States that publicly collects vulnerabilities discovered in the code of some software or service, with the aim of alerting users and development teams so that they can take action.

This way, when ChatGPT is provided with the CVE document , “ GPT-4 is able to exploit 87% of these vulnerabilities compared to 0% for all other models we tested (GPT-3.5, open source LLM ) and open source vulnerability scanners (ZAP and Metasploit),” the study indicates.

The one-day vulnerabilities that the study talks about are those potential risks that have been detected in the software code, but have not yet been corrected. Therefore, the door is open for someone with the necessary knowledge to exploit these vulnerabilities for as long as they remain present on whatever platform.

Cutting off the AI’s access to the CVE list would prevent the model from performing these tasks. However, this list is publicly accessible and experts do not consider it appropriate to limit access to it.

Outperforms most competing models

As the researchers report, OpenAI’s GPT-4 AI surpasses all other LLMs (large language models) in terms of generating malicious code. The only two artificial intelligences that have not entered the study are the also powerful Claude 3 from Anthropic and Gemini 1.5 Pro from Google. The four academics did not have access to them, so they have not been able to find out the precision of these two AIs in this task.

Speaking to the media, a professor at the university from which the study comes confirms that GPT-4 can in fact implement on its own the necessary steps to exploit vulnerabilities that other open source vulnerability scanners cannot find.

Therefore, we are not only talking about an effective way to locate and exploit vulnerabilities, but also about a cheap method. According to the study, the price of carrying out a successful attack with GPT-4 would be $8.80 , therefore being around 2.8 times cheaper than hiring a human cybersecurity professional to carry out the same task in half an hour.

“Our findings raise questions about the widespread deployment of highly trained LLM officers,” says the study, which can be accessed at this link .