Master Thesis

We offer several master thesis projects to the students following one of the master’s programs at the University of Namur. Those projects can cover one or more topics related to the research done at the SNAIL team or explore new directions. It is also possible to propose your own project related to our ongoing research. If you think you have a great idea, do not hesitate to contact us, but make sure you have clearly identified the research aspect and novelty of your proposal.

The project can be conducted at the computer science faculty, in collaboration with the members of the team, at another Belgian organization (industry, research center, university, …) with which we have an ongoing collaboration, or abroad at another university in our network.

If you study at a different university and you would like to do a research internship in the context of one of our projects, you should ask your own university supervisor to contact us. We have limited places available but are always interested in new research opportunities.

Master Thesis Projects

Current and Past Projects

.js-id-ongoing
MuLLSA: Mutation with LLM and Static Analysis
MuLLSA: Mutation with LLM and Static Analysis

Looking after bugs and vulnerabilities is one of the most important tasks in computer science, especially in the context of web applications. There are many techniques to detect and prevent these issues, one of the most widely used being mutation testing. However, creating mutants manually is a time-consuming and error-prone pro- cess. To address this, we perform a combination of static analysis and an LLM to automatically generate mutants. In this study, we compare the performance of an LLM in producing mutants based on three different static analysis tools: KAVe, WAP, and the LLM itself. Our results show significant variability between tools. Mutants produced using traditional static analysers vary heavily depending on the type of vulnerability, and tend to perform better when tools are combined. When it comes to the LLM, the quality of mutants is more consistent across different vulnerabilities, and the overall code coverage is significantly higher than traditional approaches. On the other hand, LLM-generated mutants have a higher success rate in passing initial verification, but often contain syntactic or semantic errors in the code. These findings suggest that LLMs are a promising addition to automated vulnerability testing workflows, especially when used in conjunction with static analysis tools. However, further refinement is needed to reduce the generation of incorrect or invalid code and to better align with real-world exploitability.

Testing for digital arts : An explorative case study on TouchDesigner
Testing for digital arts : An explorative case study on TouchDesigner

As creative coding in interactive art becomes increasingly popular in the digital art world, the need to test work to ensure that it matches the artist’s expectations is essential. The problem is that the programs supporting these works can behave unexpectedly or cause problems. Some studies have shown that is possible to test these projects manually, but the use of automated tests has been little studied. The research aims to explain to what extent it is possible to implement automated tests for functional and performance testing in interactive installation projects using hybrid development tools. To answer this question, we performed a case study on the Wall of fame project in TouchDesigner. Observations on automated test experiments and an interview were made. The results of the observations showed that it was possible to carry out functional and performance tests with limitations on the reliability of the data. Difficulties were identified : TouchDesigner dependencies, operator limitations, user interface interactions, lack of native test environments, performance test limitations and maintenance difficulties. Solutions have also been found to resolve these issues.

Improving automated unit test generation for machine learning libraries using structured input data
Improving automated unit test generation for machine learning libraries using structured input data

The field of automated test case generation has grown considerably in recent years to reduce software testing costs and find bugs. However, the techniques for automatically generating test cases for machine learning libraries still produce low-quality tests and papers on the subject tend to work in Java, whereas the machine learning community tends to work in Python. Some papers have attempted to explain the causes of these poor-quality tests and to make it possible to generate tests in Python automatically, but they are still fairly recent, and therefore, no study has yet attempted to improve these test cases in Python. In this thesis, we introduce 2 improvements for Pynguin, an automated test case generation tool for Python, to generate better test cases for machine learning libraries using structured input data and to manage better crashes from C-extension modules. Based on a set of 7 modules, we will show that our approach has made it possible to cover lines of code unreachable with the traditional approach and to generate error-revealing test cases. We expect our approach to serve as a starting point for integrating testers’ knowledge of input data of programs more easily into automated test case generation tools and creating tools to find more bugs that cause crashes.

Training machine learning models for vulnerability prediction and injection using datasets of vulnerability-inducing commits
Training machine learning models for vulnerability prediction and injection using datasets of vulnerability-inducing commits

Multiple techniques exist to find vulnerabilities in code, such as static analysis and machine learning. Although machine learning techniques are promising, they need to learn from a large quantity of examples. Since there is not such large quantity of data for vulnerable code, vulnerability injection techniques have been developed to create them. Both vulnerability prediction and injection techniques based on machine learning usually use the same kind of data, thus pairs of vulnerable code, just before the fix, and their fixed version. However, using the fixed version is not realistic, as the vulnerability has been introduced on a different version of the code that may be way different from the fixed version. Therefore, we suggest the use of pairs of code that has introduced the vulnerability and its previous version. Indeed, this is more realistic, but this is only relevant if machine learning techniques can properly learn from it and the patterns learned are significantly different than with the usual method. To make sure of this, we trained vulnerability prediction models for both kind of data and compared their performance. Our analysis showed a model trained on pairs of vulnerable code and their fixed version is unable to predict vulnerabilities from the vulnerability introducing versions. The same goes for the opposite, despite both models are able to properly learn from their data and detect vulnerabilities on similar data. Therefore, we conclude that the use of vulnerability introducing codes for machine learning training is more relevant than the fixed versions.

Leveraging Large Language Models to Automatically Infer RESTful API Specifications
Leveraging Large Language Models to Automatically Infer RESTful API Specifications

Application Programming Interfaces, known as APIs, are increasingly popular in modern web applications. With APIs, users around the world are able to access a plethora of data contained in numerous server databases. To understand the workings of an API, a formal documentation is required. This documentation is also required by API testing tools, aimed at improving the reliability of APIs. However, as writing API documentations can be time-consuming, API developers often overlook the process, resulting in unavailable, incomplete or informal API documentations. Recent Large Language Model technologies such as ChatGPT have displayed exceptionally efficient capabilities at automating tasks, disposing of data trained on billions of resources across the web. Thus, such capabilities could be utilized for the purpose of generating API documentations. Therefore, the Master’s Thesis proposes the first approach Leveraging Large Language Models to Automatically Infer RESTful API Specifications. Preliminary strategies are explored, leading to the implementation of a tool entitled MutGPT. The intent of MutGPT is to discover API features by generating and modifying valid API requests, with the help of Large Language Models. Experimental results demonstrate that MutGPT is capable of sufficiently inferring the specification of the tested APIs, with an average route discovery rate of 82.49% and an average parameter discovery rate of 75.10%. Additionally, MutGPT was capable of discovering 2 undocumented and valid routes of a tested API, which has been confirmed by the relevant developers. Overall, this Master’s Thesis uncovers 2 new contributions: