Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to version 3.3.0 causes a KeyError on initialising connection to neo4j DB #378

Closed
jonadaly opened this issue Nov 13, 2018 · 78 comments

Comments

@jonadaly
Copy link
Contributor

We have a neo4j instance, populated with the help of neomodel v3.2.9. The v3.3.0 of neomodel breaks when attempting to initialise the connection to the database:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/neomodel/util.py", line 150, in _object_resolution
    resolved_object = self._NODE_CLASS_REGISTRY[frozenset(a_result_attribute[1].labels)].inflate(
KeyError: frozenset({'Person'})

This appears to be as a result of the self._NODE_CLASS_REGISTRY changes included in v3.3.0, which appear to be breaking. Is this a known issue? Given that this is a minor version bump, we would not expect breaking changes to be included.

@aanastasiou
Copy link
Collaborator

aanastasiou commented Nov 15, 2018

Hello @jdalydt

Thanks for letting us know about this. No, it is not a known issue so far and I am sorry if it has caused a headache to you guys.

Can you please confirm if import neomodel comes before any other neomodel imports you might be doing on the file that raises the error?

This is related to the way neomodel creates a mapping between labels and native classes to be able to properly instantiate a class given a generic Node as retrieved from the database. To make sure that every call references the same neomodel "instance", import model has to be placed first.

All the best

P.S.: Can I please ask @Winawer, @mstefaniak10 and @manoadamro to confirm if they also receive a similar error and let us know if the fix that is suggested in this post fixed it?

@jonadaly
Copy link
Contributor Author

Adding import neomodel at the top of the file producing the error has no effect.

The full stack trace is:

File "./lib/python3.6/site-packages/neomodel/match.py", line 564, in get_or_none
    return self.get(**kwargs)
  File "./lib/python3.6/site-packages/neomodel/match.py", line 548, in get
    result = self._get(limit=2, **kwargs)
  File "./lib/python3.6/site-packages/neomodel/match.py", line 539, in _get
    return self.query_cls(self).build_ast()._execute()
  File "./lib/python3.6/site-packages/neomodel/match.py", line 445, in _execute
    results, _ = db.cypher_query(query, self._query_params, resolve_objects=True)
  File "./lib/python3.6/site-packages/neomodel/util.py", line 32, in wrapper
    return func(self, *args, **kwargs)
  File "./lib/python3.6/site-packages/neomodel/util.py", line 202, in cypher_query
    results = self._object_resolution(results)
  File "./lib/python3.6/site-packages/neomodel/util.py", line 162, in _object_resolution
    raise ModelDefinitionMismatch(a_result_attribute[1], self._NODE_CLASS_REGISTRY)
neomodel.exceptions.ModelDefinitionMismatch: Node with labels Person does not resolve to any of the known objects

@manoadamro
Copy link
Contributor

@aanastasiou I can also confirm that this fix had no effect.

@aanastasiou
Copy link
Collaborator

@manoadamro & @jdalydt thank you very much for such a quick response.

At this point, I do not have enough information to give you something more useful. The module passes its tests and it is also used actively in a project I am working on which has not produced any such errors. So, there is not much to work with at my end I am afraid.

Can I please ask you to create a small test case demonstrating how this fails?

In the meantime, the exception is thrown because neomodel has retrieved a Node with a subset of labels that does not match the subset of labels inferred from the (model) class hierarchy (if that is of any help).

Would like to hear more about this so we can fix it however.

All the best

@manoadamro
Copy link
Contributor

@aanastasiou That's not going to be easy.
The problem is, the database was populated using v3.2.9 and is being accessed using v3.3.0.
I'm not sure that can be done with a unit test

@aanastasiou
Copy link
Collaborator

@manoadamro I understand. When I said "...small test case..." I meant something like this. That repo outlines how we got to this change really.

I have not tried accessing the database across versions as you describe, but there should not be a problem as long as the labels that are inferred from the class hierarchy match the labels of the nodes in the database.

This is why, I cannot really tell you, what might be going wrong. I will give it a try to setup "cross-version access" myself, but in the meantime, if you can isolate the problem in a small test script that would be helpful too.

@manoadamro
Copy link
Contributor

I have started a repo, will let you know how it goes 👍

@aanastasiou
Copy link
Collaborator

Thanks for letting me know, BTW, have you run neomodel_install_labels on your database? (here is an example). Does running :schema on the Neo4J Browser returns indices and constraints matching your class hierarchy as expected? If not, then could you give that a try first please?

@manoadamro
Copy link
Contributor

yeah we have this: neomodel_install_labels the_repo.app.py --db bolt://neo4j:neo4j@localhost:7687 as a part of our start up script

@aanastasiou
Copy link
Collaborator

aanastasiou commented Nov 16, 2018

@manoadamro Quick question, related to my earlier comment: Do you have anywhere in your code, something like:

from mymodels import myentity

And within that file not having imported neomodel?

If yes, can you please try this:

import neomodel
from mymodels import myentity

And let me know if that improved the situation?

@manoadamro
Copy link
Contributor

@aanastasiou currently trying to get my head around another error at the moment.

the write_transaction decorator is failing every second time with:
neo4j.exceptions.SecurityError: Failed to establish secure connection to 'EOF occurred in violation of protocol (_ssl.c:749)'

when write_transaction is removed, it works every single time.

🤷‍♂️

@aanastasiou
Copy link
Collaborator

@manoadamro Thanks for letting me know. Wish I could be more helpful but without knowing a bit more about your setup, it is difficult to comment on the exception snippets :/

@manoadamro
Copy link
Contributor

@aanastasiou I have managed to fix one occurrence of the error with your suggested fix (not all though)

this causes unused imports and forces bad code style :(

@aanastasiou
Copy link
Collaborator

@manoadamro That is great, thank you for letting me know. I am not advocating bad code style with my suggestion but since that worked, I now have something to work with on my end and a better idea of how to go about fixing the problem.

If you come up with any specific fixes in the meantime, please let me know or go ahead with a PR.

All the best

@aanastasiou
Copy link
Collaborator

@manoadamro

Background:

  1. neomodel 3.3.0 solves a specific issue with it that allows it to instantiate classes properly in a hierarchy. Prior to this, if a class had an association with an abstract class, then upon instantiation it would instantiate a node to that abstract class, even if subsequent specialisations of that class pointed to specialised versions of the abstract class. You can read more about this here.

  2. This was solved by adding a mapping (a dictionary) in neomodel.util.Database() (tuple(node labels)=>node class) that is populated by the NodeMeta class upon a given StructuredNode definition.

The issue:

  1. The consequence of this is that if a class is not defined, it is unknown to the mapping and consequently it will raise ModelDefinitionMismatch

  2. Now, the question is, how do you get into this condition? So far, I have not been able to reproduce the problem EXACTLY as you mention. But here is the closest I got to:

    • Suppose that we have a project with modules neomodel, A, B, C, D where A,B,C,D are purely various specialisations of StructuredNode (i.e. an application's models).
      • A depends on B
      • C depends on A
      • D depends on A
  3. From the point of view of Node inheritance, it is impossible for a node class to be imported and not end up in the neomodel mapping.

    • The reason why, here, is trivial. Any import, imports the whole module and depending on its form will choose to import a particular symbol to the calling namespace. So doing from D import AClassDefinedInD, imports D (which will import A which imports B) and subsequently AClassDefinedInD to the local namespace.
    • Therefore upon doing this, the mapping is populated with the full lineage of the class and it can instantiate it properly.
  4. BUT, Nodes can maintain relationships with other nodes. In this case, from the point of view of referential integrity we need to know:

    • The data types of the relationship's end points, the direction of the relationship and the multiplicity (one-to-one, one-to-many, etc)
    • The properties of the relationship.
  5. And here is where the issue is because neomodel (currently) does not follow the relationships of a class to check if it might attempt to load a class that has not yet been defined. Therefore, if AClassDefinedInD contains some some_aclass_attribute = neomodel.Relationship("C.ModelWhatever", "REL_TO", model=AClassToWhatever), the C module will not have been imported when importing D which will cause a runtime error if the property is traversed.

    • Notice here, if you did not have to traverse the property, no error will be raised because neomodel is not going to try to instantiate the object on the other side of the relationship which is the one whose definition is not yet in the mapping.
    • This was partly why it was a bit difficult to reproduce the error with trivial attempts.

Temporary Fix:

Now, is the solution for neomodel to actually try and traverse a relationship and automate the loading? (maybe that's too slow (?)). Shall we make the type parameter of the Relationship an actual data type rather than a string? (Maybe that's a bit too constraining (?)).

I have not thought about this yet, maybe the fix ends up being a combination of the two.

As a temporary solution however, I would suggest that you examine your Relationship definitions within a particular module and make sure that you import the classes that might be referenced within that module. Yes, this means that you might end up importing classes that are not used within the module, but this is a temporary fix so that you continue using 3.3.0 which is better at handling inheritance (and also includes a number of other very useful contributions).

Can I please ask you to confirm if this fixes the issue for you?

@manoadamro
Copy link
Contributor

@aanastasiou
Thanks, I really appreciate your efforts here, I will try what you suggested here and get back to you.
👍

@robinedwards
Copy link
Collaborator

Hey @aanastasiou thanks for looking into this. From my recollection the full module path gets stored in the relationship definition. we just need to check if that exists in sys.modules prior to loading and if its missing import it.

@aanastasiou
Copy link
Collaborator

Hi @robinedwards, I have not looked much into resolving it yet but thanks for pointing this out, it really helps.

All the best

@aanastasiou
Copy link
Collaborator

@manoadamro Quick note as I am going through this to automate the loading of a potential class:

The relationship's end-point model can well be a class.

Which then does not lead to any unused imports. We can make this clearer in the documentation.

@theterg
Copy link

theterg commented Nov 23, 2018

I am also running into this issue in a different context which may be of use. I'm attempting to use neomodel from a django environment via django-neomodel. In this environment I have even less control over the ordering of import statements or the threading context in which neomodel is imported. I'm not very well versed on django internals, but it looks as though the global db object is re-instantiated many times during the lifecycle of a django process.

During django startup I see the db object being recreated and my models successfully added to _NODE_CLASS_REGISTRY. However during a webrequest to a View object, I see the db object have been re-instantiated immediately before servicing the request, and my model never gets added back to the _NODE_CLASS_REGISTRY.

It's worth mentioning that neomodel works perfectly fine when invoked from a normal django shell python manage.py shell. I may try to dig a little deepr, but thought i'd give you a heads-up. Maybe a global db object is difficult to manage in this context?

EDIT: I can confirm that reverting to 3.2.9 solves the problem.

@aanastasiou
Copy link
Collaborator

Hey, thanks a lot for contributing your experience, this is very useful. Please see below:

I'm not very well versed on django internals, but it looks as though the global db object is re-instantiated many times during the lifecycle of a django process.

That should not be a problem because neomodel.util.Database inherits from threading.local. So, anything that happens within the same "script" will still use the same neomodel.db object.

during a webrequest to a View object, I see the db object have been re-instantiated immediately before servicing the request, and my model never gets added back to the _NODE_CLASS_REGISTRY.

This is a problem. The short term answer to that is to make sure that all the models that the view is likely to use have been imported to the view (please see earlier discussion in this thread). We are working on a better solution for that.

Maybe a global db object is difficult to manage in this context?

That might be the case but I do not feel that we have enough data to decide categorically on that yet. For example, it is not necessary for the whole model to be loaded to the registry for the resolution to work. Only the part that will be required is necessary. And this does not require a global context.

I can confirm that reverting to 3.2.9 solves the problem.

I have not worked very much with neomodel AND Django, but there was a reason for the 3.3.0 change and while challenging, I think that it is worth working towards a better inheritance "compliance".

All the best

@theterg
Copy link

theterg commented Nov 23, 2018

Thank you for the quick reply and apologies if I came across as critical of your efforts. It is true that I have a small fear and skepticism towards a global db instance being hidden inside of a module at import time. But it's far outweighed by the simplicity and ease of use of the neomodel interface. It was incredibly easy to start using, and the model definitions so far seem super concise and lightweight. I'm grateful for this!

@aanastasiou
Copy link
Collaborator

@theterg Hi, no need for apologies, don't worry about it.

One thing we have not done about this issue is to create a script that replicates the problematic behaviour. At the moment, I am working along the lines of this and this is why I said that I do not have experience in using neomodel with Django.

Can I please ask you the following:

  1. Can you please see previous recommendation regarding importing all of the classes that potentially might be needed by your View and let us know if that solves the problem? Please note that it is not so much the Node classes themselves but also the classes of Node endpoints in Relationships. The ones that are usually passed as strings. (or in other words, cls_name here). So, if you have any relationships that currently define that model as string, please make sure that you import the class they refer to. Once you do that, the registry will be updated with the labels of the class and that should be enough to stop the runtime exception from being raised. We are currently looking into doing this automatically upon loading a relationship.

  2. If 1 does not work for you, can you please create a minimal Django project that demonstrates the failure when neomodel is used in conjunction with Django? This should give us more information for the particular Django use case.

All the best

@aanastasiou
Copy link
Collaborator

aanastasiou commented Nov 26, 2018

Update:
The script that is mentioned in this stack overflow post, works in my development environment without raising the exception. (at least the GET part)

Neo4j-community edition 3.4.9, neomodel 3.3.0, Flask 1.0.2, Python 3.5.2. The only modification that I did was to add:

if __name__ == "__main__":
    create_app().run()  

...at the end of the script and run it with python main.py.

Update 2: No it doesn't :/

The script fails when the application really tries to access the database and THAT is how the problem reveals itself because the neomodel.db during the Flask() instantiation is totally different than the neomodel.db that is created to serve the GET request.

I am not very familiar with the internals of Flask but this could be resolved by somehow preserving the neomodel.db that is created in create_app() (from the example above).

Please let me know if any of you think otherwise but I think that this should be mentioned explicitly in the documentation. Essentially, prior to 3.3.0 neomodel.db did not have state other than whether it was connected to the database or not. But to resolve the objects properly, the "state" (as in, what is in _NODE_CLASS_REGISTRY) needs to be preserved between calls. I think that this is normal and acceptable considering the kind of functionality that is delivered but it should be mentioned explicitly in the documentation.

@aanastasiou
Copy link
Collaborator

aanastasiou commented Jan 8, 2019

@robertlagrant Then it sounds that a new issue should be created. Can you please do this and remove the previous comment to keep this thread focused on 378? (EDIT: It would be interesting to see a description of the "cluster problem" because I have used neomodel in a clustered environment without problems but that was around the time neo4j 3.4.7 or 3.4.8 was around and I was only doing parallel reads. So, as you can see, the devil is in the details which I would be very interested in finding out more about.)

In brief, the situation (about issue 378) is described further below but also, if you scroll back in this discussion you will find additional details about specific issues that I might omit in the following general description:

  1. All the calls from anywhere within an application that uses neomodel are routed via one Database object that is created during module instantiation.

  2. Part of the responsibilities of this object is transaction handling.

    1. A CRUD operation occurs within a transaction context. Transactions can be created / joined automatically or manually.
    2. Prior to executing a given CRUD query, a handle to a transaction is obtained. If a transaction is ongoing it is re-used, otherwise a new one is created.
    3. As per Neo4J's modes of operation, for the DBMS to be able to enforce constraints to a bunch of operations, they have to occur within the same "transaction".
    4. To enforce this mode of operation, the Database is "tied" to the process ID.
  3. Neomodel 3.3.0 introduced state to the Database object via a node-class registry that is used to resolve a neo4j Node object to a local datamodel object.

  4. Within the context of a single-process (and potentially even multi-threaded) application, state is preserved and everything works as expected.

  5. But, with WSGI firing a new process to serve requests, the neomodel.Database object that is instantiated during application creation is not the same object that is attempted to be accessed during the serving of a request.

    • Potential solutions:
      1. Structure the code in such a way that for every request, neomodel has to reload (i.e. re-import) all required data model classes for a particular request.
      2. Save and restore the Database attribute that is responsible for the object resolution
        • This has to be done in a web-framework specific way.
      3. In the long-term we are trying to provide two things:
        1. A way to ensure that neomodel has imported all the required classes BEFORE attempting a CRUD operation. This is already happening in an automatic way, all we would be doing would be to provide a way to specify which classes must be loaded along with module instantiation.
        2. Web framework specific ways of providing database access via neomodel. For example, in the case of Flask, it is possible to provide application wide access to a specific object via an extension.
      4. But in order to come up with one definitive solution for the next release, we need to be clear on how these two issues interract. For example, it might be possible to simply provide a Flask extension that takes care of internals without major changes to neomodel.

This is where we are at the moment. If you would like to help, I would appreciate it if you provided web-framework specific views on how to deal with preserving state, especially for Django. At the moment I am trying to see how to do this with Flask and @mjmare has looked into Pyramid (but I expect that it has a similar mechanism).

There is a bit more work that has taken place in the background, you can see gists and repos further up but overall this is where we are at the moment.

All the best

@aanastasiou
Copy link
Collaborator

@jonadaly @manoadamro @theterg @mzgajner @mostafa @mjmare @MardanovTimur @phoebebright @robertlagrant @robinedwards

Here is a minimal Gist to show:

  1. The way neomodel.db state must be preserved between application creation and serving of the request

  2. The basic skeleton of what will later become something like a flask-neomodel-manager extension for flask. I expect that something similar would have to be created separately for Django, Pyramid and other frameworks if in the meantime we have not come up with a better alternative.

All the best

@jberends
Copy link

jberends commented Jan 22, 2019

@aanastasiou AFAIK in Django you have a DatabaseWrapper object that is thread specific that manages the connection to the database. Each new thread (in development each request is handled in a new thread) or process spawned by some threading/process manager will recreate that DatabaseWrapper with a ConnectionManager inside.

The schema definition is the Django Model definition itself (as created by the user and ensured consistency with the schema in the database by schema migrations once during server startup) and a ModelManager will use the DatabaseWrapper to (eventually lazy just-in-time) create a (new) connection to the database and fire queries using a Cursor. I.o.w. Schema and Connection are disconnected from each other and not dependent. Once the main process is instantiated is assumes a consistent database against the modelschema.

One may choose to re-use the connection, but that is most of the time not the case (never in development, optional in production) to reduce the number of simultaneous connections 'open' to the database. Better to open, do transaction, close. Is neomodel assuming a single open connection?

I am curious to see how neomodel manages this as we experience the same problems with 3.3.0 as anyone else. Will make some effort to do testing.

@aanastasiou
Copy link
Collaborator

aanastasiou commented Jan 22, 2019

@jberends

AFAIK in Django you have a DatabaseWrapper object that is thread specific that manages the connection to the database....

My question was specifically about how Django's middleware uses neomodel. However, from what I see here, this is not very far off from the way neomodel is set up as well. The only problem is maintaining the state of the neomodel.db object (more about this further below).

Is neomodel assuming a single open connection?

I am assuming you mean one connection that needs to remain connected to the server. If that is the case, then no.

I am curious to see how neomodel manages this as we experience the same problems with 3.3.0 as anyone else. Will make some effort to do testing.

Please see this gist which is also linked in my last post. This is a very crude outline of how the problem is dealt with in Flask and it shows how to "save" the state between calls.

I have not looked any further details into this but I am fairly confident that applications can be isolated from each other so that it is possible to run different applications on the same server, all using neomodel but over different connections and schemata.

As mentioned above, I think that this is going to have to be solved on a "per-framework" basis. Django has its own middleware infrastructure so does Pyramid and other frameworks.

I think that it would help if there was a repository that demonstrated how Django models integrate with neomodel (which provides its own models. Do you inherit from both? Do you build Django wrappers around neomodel classes to "force" it to talk to the database via neomodel?). We could then use that one to discuss ways forward. If you have experience with Django middleware and would like to create one for neomodel, even better.

All the best

@robertlagrant
Copy link
Contributor

@aanastasiou two stupid questions:

  1. we use waitress, which doesn't make new processes, so why would we be affected?
  2. with WSGI servers that do create a new process, don't they either a) have to reinitialize everything from scratch, including the imports, or b) already have the memory initialised? (or c) you don't understand, Rob, and here's what you're missing)

@aanastasiou
Copy link
Collaborator

@robertlagrant Thanks for your comment. There are no stupid questions. I do not have a problem with accepting any misunderstanding on my part and I would be glad to get any deeper insight by anyone who would like to offer it.
(Rob?)

  1. I have not used waitress. Do you think you can provide a waitress based counterexample of this situation where the neomodel.db object's state is preserved but this still leads to problems?

  2. a) Yes they do, you might want to see this and this on how this fails. I appreciate that this thread is getting too long by now but it does contain quite a few details on the problem.

@jberends
Copy link

jberends commented Jan 24, 2019

@aanastasiou
I have created a scaffold project for django and neomodel which is able to re-create the conditions between 3.2.9 and 3.3.0.

The repo can be found here: https://github.com/jberends/django_neo_3.3.0
Guide to install can be found here: https://github.com/jberends/django_neo_3.3.0/blob/master/README.md

Indeed it seems that the _NODE_CLASS_REGISTRY is not populated inside the request stream.
I will add some notes in the readme to indicate where the neomodel specific configuration is located.

I use Pycharm and I included a 'run' command that may be used to run/debug the server once loaded.

What I found out is:

  • with neomodel 3.3.0 you can safely create and save a graph node (http://localhost:8000/persons/create/)
  • with neomodel 3.3.0 you cannot retrieve graph nodes, even made with 3.3.0 as the registry fails.
  • found that in neomodel/core.py the db = Database() object does not really seem to be a singleton as indicated in the Database.__init__(). Each thread does import neomodel and hence re-create (and re-__init__) a Database object. While the NodeMeta is only called once in Django during initialisation of the classes. That db object is properly populated with a _NODE_CLASS_REGISTRY, but thread-local Database objects not (re-set to {} in the __init__ of the Database object in each thread). Do you concur with this analysis? Isn't then a possible solution a more robust way of creating a singleton of the Database() object?

update 1445 - Found a possible solution

I am sure it is the design of the 'singleton' of the Database object. The object definition of the Database class is subclassed from threading.local(py2, py3). Which states:

class threading.local¶

A class that represents thread-local data. Thread-local data are data whose values are thread specific. To manage thread-local data, just create an instance of local (or a subclass) and store attributes on it:
[...]
The instance’s values will be different for separate threads.

Once I subclass from object, it works. And while the Database object only contains 'glue' information and does not hold data, it could be a solution in a multi threaded environment (also multi process as each process itself re-inits the django world on its own and have their own process-local Database object with correct parameters.)

A patch would look like:

--- neomodel/util.py
+++ neomodel/util.py
@@ -42,7 +42,7 @@
     db.cypher_query("MATCH (a) DETACH DELETE a")
 
 
-class Database(local):
+class Database(object):
     def __init__(self):
         self._active_transaction = None
         self.url = None

@aanastasiou
Copy link
Collaborator

aanastasiou commented Jan 24, 2019

@jberends Thank you very much for this, the respository is very useful anyway. Regarding the rest of it, it is as you describe it. The Database derives from local and the problem emerges once there is instantiation from different processes.

The bit that is going to be complicated then is this one.

In other words, multi-threading application transactions are pushed under the same "unit of work" but across processes a new connection is created.

This change would mean that even multi-thread calls would create a new connection. Does anyone see any problems with that (?)

Edit: @jberends Just to be clear about something: When people have been talking about integration with Django so far, I assumed that it was about getting Django models to work together with neomodel. That is, having a Django User stored via neomodel, The repository you created seems to be using neomodel separately, to simply access a Graph database.

@jberends
Copy link

jberends commented Jan 24, 2019

@jberends Thank you very much for this, the respository is very useful anyway. Regarding the rest of it, it is as you describe it. The Database derives from local and the problem emerges once there is instantiation from different processes.

Even 'threads' re-instantiate the Database object, not only processes.

The bit that is going to be complicated then is this one.
In other words, multi-threading application transactions are pushed under the same "unit of work" but across processes a new connection is created.
This change would mean that even multi-thread calls would create a new connection. Does anyone see any problems with that (?)

This might be tricky indeed. If I understand correctly:

  • with a Database instance that is shared across threads;
  • only a single transaction can be active (as I understand from the Database.begin()/end()/rollback() methods.
  • all threads share the single interface to the database (reuse the connection).

If this is the case, one would think to separate the concept of storing the 'Model-metadata from the database' and a Connection to the database (which normally can even open and is closed each transaction (eg for postgres) to limit amount of connections to the database). If I see correctly the driver.session is the connection object that is used to execute cypher.

So maybe the Database object can be subclassed from threading.local hower the _NODE_CLASS_REGISTRY should live in a proper-proper singleton.

Conclusion: Factor out the NODE_CLASS_REGISTRY (in short: NCR) into its own separate singleton object, and do not change the rest of the Database object. The Database object can then access the NCR itself in 'read' mode. Other functions that populate the NCR should then populate that singleton, through its interface.

Edit: @jberends Just to be clear about something: When people have been talking about integration with Django so far, I assumed that it was about getting Django models to work together with neomodel. That is, having a Django User stored via neomodel, The repository you created seems to be using neomodel separately, to simply access a Graph database.

Sure, I get that. However neomodel does already provide a nice way to define the model and deal with NodeSet results. A more integrated solution with more 'django-feel' would help traction of adoption of neo4j in django-land IMHO, but that is more for django-neomodel to pick up on.

Update 17:00

I made a small POC implementation on https://github.com/KE-works/neomodel/tree/feature_class_registry_singleton and that works, except 2 tests that alter the registry directly. I did not have time to fix those. This is an ugly POC using globals to provide a dict. A class implementation may be wiser.

Also credits to my collegue @oikonomou with which I had an extensive discussion on the topic. 🤝

@aanastasiou
Copy link
Collaborator

aanastasiou commented Jan 24, 2019

@jberends

If this is the case, one would think to separate the concept of storing the Model-metadata from the
database' and a Connection to the database (which normally can even open and is closed each
transaction (eg for postgres) to limit amount of connections to the database). If I see correctly the
driver.session is the connection object that is used to execute cypher.

Well, a bit more work has taken place around this and a "batched" operations feature (but if possible via multiple units of work and batches of units of work). I have left it as a long term TODO because I was trying to solve it via Database(). The current form is convenient because all Node/Rel calls have to go through the Database independently of where they "live". When you create a StructuredNode, you don't have to specify that its calls should go via a specific Database. If you start creating multiple Database objects, you are going to have to have a way of specifying that you want a particular StructuredNode to route its calls through a particular Database object. This is one thing. The other thing is exposing information of one application from another if there is one massive registry devoted to all instances. It needs a little bit more thinking and this is why, personally, I have left it as a longer term TODO. So, thank you for this note, as my thinking was not going through meta-classes at all.

By the way, if in the meantime someone comes up with a meta-class based solution faster, please submit a pull request.

Factor out the NODE_CLASS_REGISTRY...

Alright, I will see what I can do, but at the moment, I cannot say that I will have this ready by XYZ deadline. If anyone else would like to go towards that direction, please go ahead.

Sure, I get that. However neomodel...

Thanks, I appreciate getting into the trouble of creating the repository and because of this I thought I'd clarify that the way that case fails is clear (to me) and it would not need further demonstration.

@mostafa
Copy link
Contributor

mostafa commented Jan 28, 2019

I've tested the @oikonomou changes, but to no avail! The issue persists!

@aanastasiou
Copy link
Collaborator

@mostafa Thank you for testing this out. Less exclamation marks please. We are all pitching in to this as much as we can.

Can you please confirm that your test-case is not any different than the one described here?

@aanastasiou
Copy link
Collaborator

@robertlagrant A possible way forward if waitress simply does multiple threads (?). Please see the pull request (work not completed as far as I can tell) and discussion there.

@jberends
Copy link

I've tested the @oikonomou changes, but to no avail! The issue persists!

What is your setup?
Do you use django? Or something else?
Which python version are you using?

We've been testing it in Python 3.6 with latest django 2.1 and it works, also on more evolved cases. Furthermore other testcases (in py27) passes.

@aanastasiou
Copy link
Collaborator

aanastasiou commented Jan 28, 2019

@jonadaly @manoadamro @theterg @mzgajner @mostafa @mjmare @MardanovTimur @phoebebright @robertlagrant @robinedwards @jberends @oikonomou

Can I please ask you guys to give it a go and test this branch?

It incorporates a number of pull requests the most important of which for this issue is the one by @oikonomou that so far seems to have solved this issue:

  • It works with this Flask test-case (main_naive.py);
  • It works with this very simple Django test case
    • @mostafa, there is a tiny little bit of configuration you need to do so that the Django app gets to connect successfully to your server; and
  • It also seems to work in a very brief multithreaded test I run over a tiny little cluster to test explicit transactions.

@mjmare, @robertlagrant talked about Pylons / Pyramid before so it would be nice to hear from you if it works there too, although, I do not see why it should not.

Looking forward to hearing from you

@jberends
Copy link

@aanastasiou I can confirm that the branch you provided works without flaws.

@aanastasiou
Copy link
Collaborator

aanastasiou commented Feb 1, 2019

@jberends Thanks for taking the time to perform any additional tests, just bear with me as I am going through a number of other contributions at the moment prior to making the next release please.

@jonadaly
Copy link
Contributor Author

jonadaly commented Feb 5, 2019

@aanastasiou The changes appears to work:

  • I populated a flask+neo4j app with some data using neomodel 3.2.9
  • Using neomodel 3.2.9, I was able to POST additional entities successfully
  • Using neomodel 3.3.0, I was unable to POST additional entities as described in this issue
  • Using the version in the branch specified above, I was able to post entities successfully

Subsequent switching between the versions has the behaviour you would expect (3.2.9 and the above branch work; 3.3.0 results in the node registry problems).

@aanastasiou
Copy link
Collaborator

@jonadaly Thank you for taking the time to do this.

@robertlagrant
Copy link
Contributor

We've had it running in dev for a couple of weeks and it seems stable to us.

@jonadaly
Copy link
Contributor Author

@aanastasiou Given that this seems to be working, when can we expect to see a release with this fix included?

(Appreciate the hard work from all parties investigating and fixing - it's just this has been an issue for a while now so would be good to get it resolved ASAP!)

@aanastasiou
Copy link
Collaborator

Hello everyone, the release was made last Friday. We are now on 3.3.1. I am sorry I did not make a special announcement.

I would also like to thank everyone who took the time to look into any aspect of this issue and give us feedback or suggestions.

All the best

@jberends
Copy link

jberends commented Feb 14, 2019

@aanastasiou

Did you write any changelog or other releasenotes? Next time I can help with that if you like.

@aanastasiou
Copy link
Collaborator

@jberends Yes, I did:

Version 3.3.1 2019-02-08
 * Fixed a number of warnings due to deprecations both within neomodel and pytest and overall improvements in
   code style (#381) - Abhishek Modi
 * Added support for spatial data through neomodel.contrib.spatial_datatypes (#384) - Athanasios Anastasiou
 * Added the ability to filter "left-hand statements" (in NodeSet expressions) too (#395) - Grzegorz Grzywacz
 * Refactor the Node Class Registry to make util.Database a true Singleton (#403) - Giorgos Oikonomou
     * Many thanks to Giorgos Oikonomou, Jon Daly, Adam Romano, Andrew Tergis, Mato Žgajner, Mostafa Moradian, mjmare,
       Phoebe Bright, Robert Grant, jberends and anyone else who helped flag, track and correct this bug. For more
       information, please see: https://github.com/neo4j-contrib/neomodel/issues/378
     * Added the ClassAlreadyDefined exception to prevent against accidental redefinitions of data model classes (#409)
       - Athanasios Anastasiou
 * Added the ability to lazily `.nodes.get()` and `.nodes.all()` (#406) - Matan Hart
 * Fixed a bug in the assumed direction of relationships in _build_merge_query (#408) - MrAnde7son

Will make sure that these changes are reflected in the repository too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests