Decidable Fragments and Solver Techniques for String and Sequence Constraints
- Automated reasoning about complex data structures is fundamental to reliable software verification. Strings, in particular, present a unique challenge in Satisfiability Modulo Theories (SMT) due to the intricate interaction between algebraic structure and arithmetic constraints on length, as well as rich regular-language properties arising from real- world regular-expression libraries. These interactions underpin many verification tasks, including symbolic execution, safety checking of string-heavy code, and analysis of parameterised systems, but often push existing SMT-based approaches to their limits.
This thesis addresses these challenges from a computational perspective, fundamentally organised around the idea of constraint propagation through string functions and regular languages. The first part introduces a proof system for Regular Constraint Propagation
(RCP) and identifies a large decidable class of string constraints, the orderable fragment, for which RCP is complete. This fragment strictly subsumes several major known decidable classes, including the chain-free and straight-line fragments. We also prove a limitation theorem: even when augmented with Nielsen-style transformations and the Cut rule, RCP cannot refute all unsatisfiable constraints over word equations and regular languages.
Building on these foundations, the thesis presents OSTRICH2, a modular string solver that integrates RCP with equational splitting, algebraic-datatype reasoning, cost- enriched solving, and preprocessing. OSTRICH2 provides native support for SMT-LIB Unicode strings, ECMAScript-style regular expressions, and string-to-string transductions. A detailed system description is presented, and experiments show that this portfolio architecture is competitive, particularly on unsatisfiable SMT-LIB instances.
We then demonstrate how modern string solvers enable new forms of verification. We introduce HornStr, the first CHC-based solver over the SMT-LIB string theory, which encodes Regular Model Checking (RMC) as constrained Horn clauses. HornStr synthesizes inductive invariants using automata learning techniques combined with string solvers. This yields effective verification for a range of parameterised systems and produces the incremental benchmark suite for the QF S division in the SMT-LIB standard.
Two complementary contributions broaden the scope of the work. We investigate sequences over infinite alphabets, establishing the first systematic decidability and com- plexity results for combining concatenation with expressive constraints like Parametric Automata (PA) and parametric transducers. Additionally, we conduct a study in al- gorithmic learning of monadic decomposition, which yields a polynomial-time algorithm for learning finite unions of integer hypercubes. This result provides a geometric technique for analysing complex Presburger length constraints, which enables their translation into the monadic fragment and thus makes them suitable for established string solver techniques.