Specifying and Fuzzing Machine-Learning Models

  • Machine Learning (ML) models are increasingly prevalent in safety critical systems, from self-driving cars to aviation, where failures can result in catastrophic outcomes. While researchers have addressed specific properties like robustness and fairness, specifying and checking general functional-correctness expectations from ML models remains challenging. This thesis introduces novel tools and approaches inspired by software testing concepts to specify and fuzz ML artifacts for their functional correctness. Software testing identifies bugs by running programs with given inputs, facing two main challenges: generating test inputs and finding test oracles. Fuzzing is a widely adopted method for generating test inputs, while specifications address the oracle problem. These techniques and concepts have proven effective and crucial for validating software reliability. We tailor these methods to assess ML model reliability. One of the biggest recent advancements in machine learning has been in solving sequential decision-making problems where agents learn action policies. In this thesis, we devise techniques to test action policies’ reliability. Beyond checking if policies lead to undesirable outcomes, we address: how can we identify undesirable yet avoidable outcomes? We present novel test oracles based on metamorphic relations and develop the \(\pi-\)fuzz framework to identify bugs in action policies. Next, we formalize metamorphic relations as k safety properties, or hyperproperties, describing relationships between multiple input-output pairs. We show hyperproperties can specify functional correctness across various ML domains. To express these, we create NOMOS, a declarative, domain-agnostic specification language with an automated testing framework. We demonstrate its effectiveness in finding bugs across various domains including image classification, sentiment analysis, and speech recognition. We also extend NOMOS to accommodate code translation models. Overall, this thesis contributes to the field by providing a specification language and novel automated testing frameworks to validate the reliability and safety of ML models which are now prevalent in our daily lives.

Download full text files

Export metadata

Additional Services

Search Google Scholar
Metadaten
Author:Hasan Ferit Eniser
URN:urn:nbn:de:hbz:386-kluedo-91428
DOI:https://doi.org/10.26204/KLUEDO/9142
Advisor:Maria Christakis
Document Type:Doctoral Thesis
Cumulative document:Yes
Language of publication:English
Date of Publication (online):2025/08/21
Year of first Publication:2025
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2025/06/13
Date of the Publication (Server):2025/08/22
Page Number:XII, 113
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung (CC BY 4.0)