Python library for identifying (learning) minimal DFAs from labeled examples by reduction to SAT.
Table of Contents
If you just need to use dfa
, you can just run:
$ pip install dfa
For developers, note that this project uses the poetry python package/dependency management tool. Please familarize yourself with it and then run:
$ poetry install
dfa_identify
is centered around the find_dfa
and find_dfas
function. Both take in
sequences of accepting and rejecting "words", where are word is a
sequence of arbitrary python objects.
-
find_dfas
returns all minimally sized (noDFA
s exist of size smaller) consistent with the given labeled data. -
find_dfa
returns an arbitrary (first) minimally sizedDFA
.
The returned DFA
object is from the dfa library.
from dfa_identify import find_dfa
accepting = ['a', 'abaa', 'bb']
rejecting = ['abb', 'b']
my_dfa = find_dfa(accepting=accepting, rejecting=rejecting)
assert all(my_dfa.label(x) for x in accepting)
assert all(not my_dfa.label(x) for x in rejecting)
Because words are sequences of arbitrary python objects, the
identification problem, with a
↦ 0 and b
↦ 1, is given below:
accepting = [[0], [0, 'z', 0, 0], ['z', 'z']]
rejecting = [[0, 'z', 'z'], ['z']]
my_dfa = find_dfa(accepting=accepting, rejecting=rejecting)
There are also active variants of find_dfa
and find_dfas
called
find_dfa_active
and find_dfas_active
resp.
An example from the unit tests:
import dfa
from dfa_identify import find_dfa_active
def oracle(word):
return sum(word) % 2 == 0
lang = find_dfa_active(alphabet=[0, 1],
oracle=oracle,
n_queries=20)
assert lang == dfa.DFA(
inputs=[0,1],
label=lambda s: s,
transition=lambda s, c: s ^ bool(c),
start=True
)
# Can also send in positive and negative examples:
lang = find_dfa_active(alphabet=[0, 1],
positive=[(0,), (0,0)],
negative=[(1,), (1,0)],
oracle=oracle,
n_queries=20)
The is also support for learning decomposed DFAs following, Learning Deterministic Finite Automata Decompositions from Examples and Demonstrations. FMCAD`22. These are tuples of DFAs whose labels are combined to determine the acceptence / rejection of string, e.g., via conjunction or disjunction.
Similar to learning a monolithic dfa, this functionality can
be accessed using find_decomposed_dfas
. For example:
from dfa_identify import find_decomposed_dfas
accepting = ['y', 'yy', 'gy', 'bgy', 'bbgy', 'bggy']
rejecting = ['', 'r', 'ry', 'by', 'yr', 'gr', 'rr', 'rry', 'rygy']
# --------------------------------------
# 1. Learn a disjunctive decomposition.
# --------------------------------------
gen_dfas = find_decomposed_dfas(accepting=accepting,
rejecting=rejecting,
n_dfas=2,
order_by_stutter=True,
decompose_via="disjunction")
dfas = next(gen_dfas)
monolithic = dfas[0] | dfas[1] # Build DFA that is the union of languages.
assert all(monolithic.label(w) for w in accepting)
assert not any(monolithic.label(w) for w in rejecting)
# Each dfa must reject a rejecting string.
assert all(all(~d.label(w) for d in dfas) for w in rejecting)
# At least one dfa must accept an accepting string.
assert all(any(d.label(w) for d in dfas) for w in accepting)
# --------------------------------------
# 2. Learn a conjunctive decomposition.
# --------------------------------------
gen_dfas = find_decomposed_dfas(accepting=accepting,
rejecting=rejecting,
n_dfas=2,
order_by_stutter=True,
decompose_via="conjunction")
dfas = next(gen_dfas)
monolithic = dfas[0] & dfas[1] # Build DFA that is the union of languages.
assert all(monolithic.label(w) for w in accepting)
assert not any(monolithic.label(w) for w in rejecting)
There are two forms of "minimality" supported by dfa-identify
.
- By default, dfa-identify returns DFAs that have the minimum number of states required to seperate the accepting and rejecting set.
- If the
order_by_stutter
flag is set toTrue
, then thefind_dfas
(lazily) orders the DFAs so that the number of self loops (stuttering transitions) appearing the DFAs decreases.find_dfa
thus returns a DFA with the most number of self loops given the minimal number of states.
This library currently uses the encodings outlined in Heule, Marijn JH, and Sicco Verwer. "Exact DFA identification using SAT solvers." International Colloquium on Grammatical Inference. Springer, Berlin, Heidelberg, 2010. and Ulyantsev, Vladimir, Ilya Zakirzyanov, and Anatoly Shalyto. "Symmetry Breaking Predicates for SAT-based DFA Identification.".
The key difference is in the use of the symmetry breaking clauses. Two kinds are exposed.
- clique (Heule 2010): Partially breaks symmetries by analyzing conflict graph.
- bfs (Ulyantsev 2016): Breaks all symmetries so that each model corresponds to a unique DFA.
There are many other python libraries that perform DFA and other automata inference.
- DFA-Inductor-py - State of the art passive inference via reduction to SAT (as of 2019).
- z3gi: Uses SMT backed passive learning algorithm.
- lstar: Active learning algorithm based L* derivative.
The primary goal of this library is to loosely track the state of the art in passive SAT based inference while providing a simple implementation and API.