ChatGPT 4 can exploit 87% of one-day vulnerabilities: Is it really that impressive?


After reading about the recent cybersecurity research by Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang, I had questions. While initially impressed that ChatGPT 4 can exploit the vast majority of one-day vulnerabilities, I started thinking about what the results really mean in the grand scheme of cybersecurity. Most importantly, I wondered how a human cybersecurity professional’s results for the same tasks would compare.


To get some answers, I talked with Shanchieh Yang, Director of Research at the Rochester Institute of Technology’s Global Cybersecurity Institute. He had actually pondered the same questions I did after reading the research.


What are your thoughts on the research study?


Yang: I think that the 87% may be an overstatement, and I think it would be very helpful to the community if the authors shared more details about their experiments and code, as they’d be very helpful for the community to look at it. I look at large language models (LLMs) as a co-pilot for hacking because you have to give them some human instruction, provide some options and ask for user feedback. In my opinion, an LLM is more of an educational training tool instead of asking LRM to hack automatically. I also wondered if the study referred to anonymous, meaning with no human intervention at all.


Compared to even six months ago, LLMs are pretty powerful in providing guidance on how a human can exploit a vulnerability, such as recommending tools, giving commands and even a step-by-step process. They are reasonably accurate but not necessarily 100% of the ..

Support the originator by clicking the read the rest link below.