Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for supporting ancestor keys in fixtures #3

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

john2x
Copy link
Contributor

@john2x john2x commented Jan 31, 2015

If ancestor keys are desired, then the whole ancestor
path will need to be defined explicitly in the fixture data.

Custom keys are specified by the '__key__' special attribute with
its value being an array of kind and id pairs. The array will
need to have an even number of elements (the same specs
as defined here https://cloud.google.com/appengine/docs/python/ndb/entities)

e.g.

[{ ...
    "__key__": ["ParentKind", "parent_id", "SomeKind", "some_id"]
 }]

P.S.
on line # 59 in loader.py, I moved the post_processor(obj) call to outside of the for loop (I believe this was a bug? As calling the post_processor in the loop would trigger it for each attribute. Let me know if that was intended instead.)

EDIT: After reviewing, it seems __key__ (instead of _key) would be a better attribute name?

@rbanffy
Copy link
Owner

rbanffy commented Feb 2, 2015

Thanks a lot, John!

I was imagining in turning the JSON loader inside out so that ancestors would be .put() before their children, allowing ancestor/children to be defined using the same syntax as non-ancestor children.

    {
        "__kind__": "Person",
        "born": "1968-03-03T00:00:00",
        "first_name": "John",
        "last_name": "Doe",

        ...

        "__children__": [
            {
                "__kind__": "Person",
                "born": "1970-04-27T00:00:00",

                ...

                "__children__": [
                    {
                        "__kind__": "Person",
                        "born": "1980-05-25T00:00:00",
                        "first_name": "Bob",

                        ...

                        "userid": 3
                    }
                ]
            }
        ]
    },

Would you mind splitting the postprocessor fix and the README update as separate pull requests so we can discuss ancestors a little more before we go this way or the other? My children key seems to me (obviously) more natural (the JSON itself is a hierarchy) but is a lot more work than your approach and I'd like to mature both ideas before we commit to The One True Way of doing ancestors.

@john2x
Copy link
Contributor Author

john2x commented Feb 2, 2015

Hi!

I've created the two separate pull requests for the README and post_processor changes.

Hmm one potential issue with the nested way of defining ancestors/children is that two entities can't share the same parent/child, as each will have its own 'copy' of its parent/child.

Also, I just realized while testing this that the ancestor entities don't actually need to be saved first before any of its descendants. Setting an entity's parent to a key with no value is perfectly valid, although querying its parent later on would fail.

EDIT: My proposal would be to something like:

[{"__kind__": "Person",
  "born": "1968-03-03T00:00:00",
  "first_name": "John",
  "last_name": "Doe",
  "children": [["Person", 2], ["Person", 3]]},
 {"__kind__": "Person",
  "__key__": ["Person", 2],
  "born": "1980-03-03T00:00:00",
  "first_name": "Junior",
  "last_name": "Doe",
  "children": []},
 {"__kind__": "Person",
  "__key__": ["Person", 3],
  "born": "1985-03-03T00:00:00",
  "first_name": "Jane",
  "last_name": "Doe",
  "children": []}]

@rbanffy
Copy link
Owner

rbanffy commented Feb 2, 2015

Not sure. The original idea I had was to use the "__children__" key with a list of dicts as its value. This way, an ancestor could have any number of direct descendants.

Once you save an entity to the datastore, you cannot change its ancestor. If we want to be able not to specify a key, we need it .put() (and a key generated) before we .put() the children.

@john2x
Copy link
Contributor Author

john2x commented Feb 3, 2015

The original idea I had was to use the "children" key with a list of dicts as its value. This way, an ancestor could have any number of direct descendants.

Ah right, that handles the multiple descendants scenario. Hmm but how would entities that share the same KeyProperty references be defined?

If we want to be able not to specify a key, we need it .put() (and a key generated) before we .put() the children.

Ah yes, that would be one of the requirements with my proposal, that keys need to be explicitly specified if you want to use it as a parent/child somewhere (which for me isn't so bad, since keys are easy enough to define).

Also, it helps keep your fixtures clean by allowing to keep entities of different kinds in separate fixture files. e.g.:

person.json
----------------
[{"__kind__": "Person",
  "born": "1968-03-03T00:00:00",
  "first_name": "John",
  "last_name": "Doe",
  "pets": [["Pet", "spot"]]}]

pet.json
----------
[{"__kind__": "Pet",
  "__key__": ["Pet", "spot"],
  "name": "Spot",
  "species": "dog"}]

Downsides though is that it would only work with NDB and would introduce backward incompatible changes.

Thanks!

@rbanffy
Copy link
Owner

rbanffy commented Feb 4, 2015

I have no big problem with being backward-incompatible. This tool is very new and not many people are using it.

I'm not sure of what you meant by "how would entities that share the same KeyProperty references be defined?". Using the same syntax we use for non-ancestor references, we simply have a list of objects (that could be of multiple kinds) under the __children__ key.

There are reasons to keep different kinds on different files, but having explicit identifiers referring data on different files will introduce inter-file dependencies - it would make no sense to import the children if you do not import their parents before.

@john2x
Copy link
Contributor Author

john2x commented Feb 5, 2015

I'm not sure of what you meant by "how would entities that share the same KeyProperty references be defined?"

I mean something like this:

[
 {"__kind__": "Person",
  "first_name": "John",
  "last_name": "Doe",
  "__children__pets__": [{
    "__kind__": "Pet",
    "name": "Spot"
  }]},
 {"__kind__": "Person",
  "first_name": "Jane",
  "last_name": "Doe",
  "__children__pets__": [{  // would create a separate Pet for "Spot"?
    "__kind__": "Pet",
    "name": "Spot"
  }]},
]

@rbanffy
Copy link
Owner

rbanffy commented Feb 5, 2015

You are right. In this case you'd have to model Spot as the parent Pet and John and Jane as the __children__ Person's, but it seems there is a space for both methods.

In any case, let's, for now, include a __key__ JSON key that lets you manually define the key. We could then have a __parent__ that would take kind name and key and make an ndb.Key out of them. To keep it simple, I'd go with your idea of 2-item lists instead of a dict.

This way we'd have::

[
    {"__kind__": "Person",
    "__key__": "jdoe",
    "first_name": "John",
    "last_name": "Doe"
    },
   {"__kind__": "Person",
    "__parent__": ["Person", "jdoe"],
    "first_name": "Jane",
    "last_name": "Doe",
    },
]

equivalent to:

john = Person(id = 'jdoe', first_name = 'John', last_name = 'Doe').put()
jane = Person(parent = ndb.Key('Person', 'jdoe'), first_name = 'Jane', 
    'last_name' = 'Doe').put()

Maybe it'd be better to rename __key__ to __id__ to be less confusing to someone familiar with ndb.

@john2x
Copy link
Contributor Author

john2x commented Feb 5, 2015

Yes, having __key__, __id__ and __parent__ would bring it closer to parity with the model constructor.

it seems there is a space for both methods.

I agree. So for scenarios where a nested definition of KeyProperty references is a better fit, the __children__some_property__ form will do. And for other scenarios where it doesn't fit (as with my use-case), specifying keys to some_property will also work.

(Would the __children__ form take precedence if/when both methods are used in a fixture (e.g. by mistake)?)

@rbanffy
Copy link
Owner

rbanffy commented Feb 6, 2015

Since __children__ defines entities whose parent is the current entity and __parent__ points to a specific entity that may or may not be in the same imported file (or exist) through a key generated from a parent kind and an id, it would not make sense to declare a __parent__ key on an object that's under a __children__ key. Manually declaring __key__ should be harmless.

We should probably throw an error.

If ancestor keys are desired, then the whole ancestor
path will need to be defined explicitly in the fixture data.

Custom keys are specified by the `'_key'` special attribute with
its value being an array of kind and id pairs. The array will
need to have an even number of elements (the same specs
as defined here https://cloud.google.com/appengine/docs/python/ndb/entities)

e.g.

[{ ...
  "_key": ["ParentKind", "parent_id", "SomeKind", "some_id"]
}]
@john2x
Copy link
Contributor Author

john2x commented Feb 8, 2015

Hello! I pushed some updates (rebased from recent master).

So with these changes, __key__, __parent__ and __id__ are now valid ways to define an entity's key.

Also added support for __children__, but it's not as elegant as I wanted it to be, as I had to re-create the object with a new key and delete the old one (https://github.com/rbanffy/appengine-fixture-loader/pull/3/files#diff-afdeeb01fcabc03e6ad8a633b948d6a5R74). Maybe there's a better way?

Somewhat off-topic, but one thing I'm confused about the __children__key_property__ form. I just realized that it's not very intuitive.

e.g. for this definition

{
   "__kind__": "Person",
   "name": "John"
   "__children__owner__": [
     {
       "__kind__": "Dog",
       "name": "Fido"
     }
   ]
}

At a glance, one would think that "John" has an "owner" KeyProperty with value of "Fido"'s key. But in fact it is the reverse, it is "Fido" that has an "owner" KeyProperty with "John" as the value.

@rbanffy
Copy link
Owner

rbanffy commented Feb 8, 2015

We can avoid the delete-and-recreate-with-key issue by parsing the whole JSON into an in-memory list and creating all objects from the top-down (rather than bottom-up as json.load seems to do) so that keys are always available by the time you create the children. I've been reluctant to do this because simply invoking json.load is very concise and I have a somewhat misguided preference for loading things as they are needed rather than doing so upfront. This is a rather foolish take on the problem because these JSON files are not meant to become too large (even though I am using them to store initial application state).

I'll give that top-down approach a spin. From there, it should be easy to hook up a simpler version of your code without the create-delete-recreate part.

@rbanffy
Copy link
Owner

rbanffy commented Feb 9, 2015

develop branch is almost there. I'm ashamed to have added partial support to __children__ without proper tests but your tests should cover it nicely. Support for __key__ is also still missing, but should be easy to add. When it's implemented, your tests should pass.

@rbanffy
Copy link
Owner

rbanffy commented Feb 12, 2015

Just made some additions. __id__ is the name of the attribute. __parent__ is not done yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants