Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix architecture issues #9

Open
bygu4 opened this issue Oct 13, 2024 · 38 comments
Open

Fix architecture issues #9

bygu4 opened this issue Oct 13, 2024 · 38 comments
Assignees

Comments

@bygu4
Copy link
Collaborator

bygu4 commented Oct 13, 2024

Some modules of pyformlang have a strange design that makes this project hard to maintain and develop, and we should think about what we can do to make the architecture cleaner and more concise.

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

This should be in EpsilonNFA:

@classmethod
def from_networkx(cls, graph):
"""
Import a networkx graph into an finite state automaton. \
The imported graph requires to have the good format, i.e. to come \
from the function to_networkx

"""
enfa = finite_automaton.EpsilonNFA()
for s_from in graph:

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

We have to get rid of this abstract method, otherwise we are bound to have cyclic imports:

def to_deterministic(self):
""" Turns the automaton into a deterministic one"""
raise NotImplementedError

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

This shouldn't return None:

def to_state(given: Any) -> State:
""" Transforms the input into a state
Parameters
----------
given : any
What we want to transform
"""
if given is None:
return None
if isinstance(given, State):
return given
return State(given)

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Probably should be for Any type, because we cast these parameters to State and Symbol anyway:

def add_transition(self, s_from: State, symb_by: Symbol,
s_to: State) -> int:
""" Adds a transition to the nfa

"""
s_from = to_state(s_from)
symb_by = to_symbol(symb_by)
s_to = to_state(s_to)
temp = self._transition_function.add_transition(s_from, symb_by, s_to)

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Same as previous:

def remove_transition(self, s_from: State, symb_by: Symbol,
s_to: State) -> int:
""" Remove a transition of the nfa

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Maybe it's a good idea to rename TransitionFunction to DeterministicTransitionFunction, and also create an interface TransitionFunction from which we can inherit both transition function classes

class TransitionFunction:
""" A transition function in a finite automaton.
This is a deterministic transition function.

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

I think it would be better to make an overload for this method, as it's output type is different depending on the number of input parameters:

def __call__(self, s_from: State, symb_by: Symbol = None) -> List[State]:
""" Calls the transition function as a real function

Also probably should return Set rather than List

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Iterable[Any] as we cast it to Symbols:

def accepts(self, word: Iterable[Symbol]) -> bool:
""" Checks whether the epsilon nfa accepts a given word

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Add same method for TransitionFunction (but it always returns True):

def is_deterministic(self):
""" Whether the transition function is deterministic

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

I think we can try moving this to NFA and making it a classmethod (like NondeterministicFiniteAutomaton.remove_epsilon_transitions(enfa))

def remove_epsilon_transitions(self) -> "NondeterministicFiniteAutomaton":
""" Removes the epsilon transitions from the automaton
Returns
----------
dfa : :class:`~pyformlang.finite_automaton.\
NondeterministicFiniteAutomaton`
A non-deterministic finite automaton equivalent to the current \
nfa, with no epsilon transition
"""
from pyformlang.finite_automaton import NondeterministicFiniteAutomaton
nfa = NondeterministicFiniteAutomaton()
for state in self._start_state:

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Same here with DeterministicFiniteAutomaton.from_epsilon_nfa(enfa)

def to_deterministic(self) -> "DeterministicFiniteAutomaton":
""" Transforms the epsilon-nfa into a dfa
Returns
----------
dfa : :class:`~pyformlang.finite_automaton\
.DeterministicFiniteAutomaton`
A dfa equivalent to the current nfa

But it will require some refactoring to make sure we don't use protected enfa members

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Same story:

def to_deterministic(self) -> "DeterministicFiniteAutomaton":
""" Transforms the nfa into a dfa
Returns
----------
dfa : :class:`~pyformlang.deterministic_finite_automaton\
.DeterministicFiniteAutomaton`
A dfa equivalent to the current nfa

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

Cyclic import here between Regex and EpsilonNFA

def to_regex(self) -> "Regex":
""" Transforms the EpsilonNFA to a regular expression
Returns
----------
regex : :class:`~pyformlang.regular_expression.Regex`
A regular expression equivalent to the current Epsilon NFA
Examples
--------
>>> enfa = EpsilonNFA()
>>> enfa.add_transitions([(0, "abc", 1), (0, "d", 1), \
(0, "epsilon", 2)])
>>> enfa.add_start_state(0)
>>> enfa.add_final_state(1)
>>> regex = enfa.to_regex()
>>> regex.accepts(["abc"])
True
"""
from pyformlang.regular_expression import Regex
enfas = [self.copy() for _ in self._final_states]

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

I think it would be better to make an overload for this method, as it's output type is different depending on the number of input parameters:

def __call__(self, s_from: State, symb_by: Symbol = None) -> List[State]:
""" Calls the transition function as a real function

Also probably should return Set rather than List

Same in NondeterministicTransitionFunction and FiniteAutomaton

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 13, 2024

We should also rework DoublyLinkedList because Node can access List

def __init__(self,
list_in,
next_node=None,
previous_node=None,
value=None):
self.next_node = next_node
self.previous_node = previous_node
self.value = value
self.list_in = list_in
def delete(self):
"""Delete the current node"""
self.list_in.delete(self)

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 14, 2024

This is strange, probably should be in PythonRegex:
https://github.com/Aunsiels/pyformlang/blob/4f36e28492a529d67129dcecaf9d07ca69fb0cf6/pyformlang/regular_expression/regex.py#L546-L573
No, actually we can just remove it

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 14, 2024

I think we could rework Regexable to be inherited by FiniteAutomaton, and it should contain not a method turning it to Regex straightaway, but rather a bunch of methods that would allow to turn it to Regex in the Regex class itself. Or we could just get rid of Regexable

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 14, 2024

https://github.com/Aunsiels/pyformlang/blob/4f36e28492a529d67129dcecaf9d07ca69fb0cf6/pyformlang/finite_automaton/deterministic_finite_automaton.py#L448-L453
self._start_state could be empty here. Property should return Optional[State].
And _start_state should be renamed to _start_states really

@bygu4

This comment was marked as outdated.

@bygu4

This comment was marked as resolved.

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 14, 2024

def _get_production(self, current_symbol, count=0):
next_symbols = []
next_productions = []
for son in self.sons:
next_symbol = "A" + str(count)
count += 1
# pylint: disable=protected-access
new_prods, count = son._get_production(next_symbol, count)
next_symbols.append(next_symbol)
next_productions += new_prods
new_prods = self.head.get_cfg_rules(current_symbol, next_symbols)
next_productions += new_prods
return next_productions, count
def __repr__(self):
return self.head.get_str_repr([str(son) for son in self.sons])

self.head may be None here, maybe we can set it as Empty Symbol in the constructor

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 14, 2024

Probably that's better. It's pretty unclear what Regexable should mean, and it's used only for finite automata anyway
architecture2 drawio

@gsvgit
Copy link
Member

gsvgit commented Oct 15, 2024

@bygu4

  • What the reason to add from_epsilon_nfa to DFA? If we support the chain of transformations, let it be an explicit chain always: EpsilonNFA -> NFA -> DFA.
  • Why to_epsilon_nfa, not from_regexp for EpsilonNFA? Just for uniformity.
  • I'm not sure that I realize hierarchy on Regex*. Why not just functions like from_string or from_python_regex?

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 15, 2024

@gsvgit

  • Yes, I think that's reasonable.
  • We could turn the dependency one way or another, maybe it would be easier to leave to_epsilon_nfa for Regex since it already stores ENFA as a member. But ultimately, we need to think what would make architecture more straightforward. Also we should see what are the relations between regular_expression and cfg.
  • Should we rework regular_expression too? PythonRegex already inherits Regex, doesn't seem like any problems there.

@gsvgit
Copy link
Member

gsvgit commented Oct 16, 2024

  • We could turn the dependency one way or another, maybe it would be easier to leave to_epsilon_nfa for Regex since it already stores ENFA as a member. But ultimately, we need to think what would make architecture more straightforward. Also we should see what are the relations between regular_expression and cfg.

The idea to store ENFA in regex looks strange. Conversion of regex to ENFA is just yet another conversion, similar to ENFA to NFA conversion and so on.

Yes, analysis of relations between CFG, Regexps, RSM, and automata required before final architecture proposal.

  • Should we rework regular_expression too? PythonRegex already inherits Regex, doesn't seem like any problems there.

Maybe not now, but why not. The final goal is to make whole pyformlang better.

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 16, 2024

  • What the reason to add from_epsilon_nfa to DFA? If we support the chain of transformations, let it be an explicit chain always: EpsilonNFA -> NFA -> DFA.

I thought more about it, and maybe it's better to leave from_epsilon_enfa to avoid unnecessary copying. It's already implemented in a suiting way, and that's probably what the maintainer would want.

@bygu4

This comment was marked as resolved.

@bygu4

This comment was marked as resolved.

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 19, 2024

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 20, 2024

This is meant to return StackSymbol, but pda.Epsilon is not it. Probably should rework pda object design

def get_stack_symbol_from(self, stack_symbol):
"""Get a stack symbol"""
if isinstance(stack_symbol, cfg.Epsilon):
return pda.Epsilon()
if self._inverse_stack_symbol[stack_symbol] is None:
value = str(stack_symbol.value)
if isinstance(stack_symbol, cfg.Terminal):
value = "#TERM#" + value
temp = pda.StackSymbol(value)
self._inverse_stack_symbol[stack_symbol] = temp
return temp
return self._inverse_stack_symbol[stack_symbol]

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 20, 2024

I would suggest building context free grammar from PDA in the CFG class

def to_cfg(self) -> "cfg.CFG":
""" Turns the language L generated by this PDA when accepting \
on empty \
stack into a CFG that accepts the same language L
Returns
----------
new_cfg : :class:`~pyformlang.cfg.CFG`
The equivalent CFG
"""
self._cfg_variable_converter = \
CFGVariableConverter(self._states, self._stack_alphabet)
start = cfg.Variable("#StartCFG#")
productions = self._initialize_production_from_start_in_to_cfg(start)
states = self._states
for transition in self._transition_function:
for state in states:
self._cfg_variable_converter.set_valid(
transition[INPUT][STATE],
transition[INPUT][STACK_FROM],
state)
for transition in self._transition_function:
for state in states:
self._process_transition_and_state_to_cfg(productions,
state,
transition)
return cfg.CFG(start_symbol=start, productions=productions)

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 20, 2024

This class is used both for converting PDA to CFG and for intersecting CFG with a deterministic FA. To prevent ambiguity, let's use it for converting PDA only. Also, for simplicity, I would suggest removing CFG/DFA intersection, leaving an option of turning CFG to PDA:

def intersection(self, other: Any) -> "CFG":

With that we could get rid of importing Regex. I think regex module depending on cfg is the right way to go

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 20, 2024

I think it would be better to define intersection of PDA with DFA explicitly:

def intersection(self, other: Any) -> "PDA":

And then we can use Regex -> ENFA -> DFA transition chain

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 21, 2024

This class is used both for converting PDA to CFG and for intersecting CFG with a deterministic FA. To prevent ambiguity, let's use it for converting PDA only. Also, for simplicity, I would suggest removing CFG/DFA intersection, leaving an option of turning CFG to PDA:

On other hand, that would mean converting CFG to PDA and back again, so maybe we can just use less strict annotations for the variable converter

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 25, 2024

_delta should probably be Dict[(State, Symbol), Set[(State, Symbol)]]:

if head in self._delta:
self._delta[head].append((s_to, output_symbols))
else:
self._delta[head] = [(s_to, output_symbols)]

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 25, 2024

If it returns IndexedGrammar maybe we should define intersection in the IndexedGrammar class

def intersection(self, indexed_grammar):

@bygu4
Copy link
Collaborator Author

bygu4 commented Oct 25, 2024

We should define this for FST only, and then transition chain Regex -> ENFA -> FST can be used:

def intersection(self, other: Any) -> "IndexedGrammar":

@bygu4
Copy link
Collaborator Author

bygu4 commented Nov 1, 2024

@property
def boxes(self) -> dict:
""" The set of boxes """
return self._nonterminal_to_box

set(self._nonterminal_to_box.values()) if we want to return a set of Boxes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants