Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different meanings set on values when reading null #1649

Closed
xamoom-raphael opened this issue Mar 23, 2016 · 15 comments
Closed

Different meanings set on values when reading null #1649

xamoom-raphael opened this issue Mar 23, 2016 · 15 comments
Assignees
Labels
api: datastore Issues related to the Datastore API.

Comments

@xamoom-raphael
Copy link

Hey,
we have a list in our datastore with ids and sometimes nulls in it. Like here:

["null","null","a953c730fc2e4647b3a5b4e88fa4f0ce","59ce754ddf03401eb09abd50c977cf67","null"]    

When we try to fetch them, we get the "Different meanings set on values within a list_value" Exception.

Is this a bug, that the method does not handles null?

_get_meaning Method:

def _get_meaning(value_pb, is_list=False):
    meaning = None
    if is_list:
        # An empty list will have no values, hence no shared meaning
        # set among them.
        if len(value_pb.list_value) == 0:
            return None

        # We check among all the meanings, some of which may be None,
        # the rest which may be enum/int values.
        all_meanings = set(_get_meaning(sub_value_pb)
                           for sub_value_pb in value_pb.list_value)
        meaning = all_meanings.pop()
        # The value we popped off should have been unique. If not
        # then we can't handle a list with values that have more
        # than one meaning.
        if all_meanings:
            raise ValueError('Different meanings set on values '
                             'within a list_value')
    elif value_pb.meaning:  # Simple field (int32)
        meaning = value_pb.meaning

    return meaning
@dhermes dhermes added the api: datastore Issues related to the Datastore API. label Mar 23, 2016
@dhermes
Copy link
Contributor

dhermes commented Mar 23, 2016

Thanks for the report. This is at the moment a "feature not a bug", though we are certainly open to ideas / discussion.

@pcostell WDYT? Also an aside, how does one put a null in a list before v1beta3?

@xamoom-raphael
Copy link
Author

We managed to get null into lists by saving them from app-engine using python ndb.
On app-engine it is no problem to save or load these entities.

@dhermes
Copy link
Contributor

dhermes commented Mar 23, 2016

Also, for context, _get_meaning and associated code were added in #1241.

Also original design discussion (go to comment, issue is long). I'm trying to track down the discussion of the decision to use a single type for an entire list, but can't find it.

@xamoom-raphael Can you provide a tiny ndb snippet? I'll store my own data with the remote API and then check the raw output to see what would need to be done to accommodate.

@xamoom-raphael
Copy link
Author

This is a screenshot from our entity in datastore.
edit_entity_-_xamoom-tricia

And here is a shortened example of what we are doing on app-engine.
The model LocalizedContentInformation has a repeated property containing ContentBlock models. Some of these nested models have a file_id but some don't. This results in lists containing nulls for file_id. Which works fine on app-engine, but causes this error if we want to access this data using gcloud python client on compute-engine.

class ContentBlock(ndb.Model):
    content_block_type = ndb.IntegerProperty(required=True)​
    public = ndb.BooleanProperty(required=True)
    title = ndb.StringProperty()
    text = ndb.TextProperty()
    file_id = ndb.StringProperty()
​
class LocalizedContentInformation(ndb.Model):
    content_blocks = ndb.StructuredProperty(ContentBlock,repeated=True)

@dhermes
Copy link
Contributor

dhermes commented Mar 23, 2016

Thanks. I'll try to dig into it today (I am running a training so I may not get to it until the evening).

@pcostell
Copy link
Contributor

This was possible in the the private GAE proto for Datastore which stores properties as a repeated proto field. To write a list you just write a property with the same name multiple times and if you don't put any value it is considered null.

It seems like we should allow heterogeneous lists to be read since we can store them in Datastore.

@dhermes
Copy link
Contributor

dhermes commented Mar 24, 2016

OK I reproduced the example.

@pcostell I / we made the decision not to allow heterogeneous meanings since it didn't seem possible within GAE. (We may also require everything in a list to be indexed / not-indexed for the same reason.) It's unclear to me if we should allow null as a special case or if we should just make it a free-for-all.

If we did try to do this, it'd be hard to get right:

  • Retrieve a list with 4 entries: [1, 2, 3, 4] which has meanings [15, None, None, 7]
  • The user doesn't even realize there are meanings
  • The user changes the list to [1, 2, 4, 7, 11] but the _meanings on the Entity is unchanged (we consider them to be there just for compat, so not user facing)
  • How do [1, 2, 4, 7, 11] and [15, None, None, 7] play nice when it's time to make a put() back to the server?

@pcostell
Copy link
Contributor

It seems like meanings and indexing information should follow the same
rules as if the user used protos directly. In this case they would be
linked to the value, so changing the individual value would clear it, or
adding a new value would have no meaning / default indexing info.

On Wed, Mar 23, 2016, 6:05 PM Danny Hermes notifications@github.com wrote:

OK I reproduced https://gist.github.com/dhermes/054fac0d3a4147ac96d5
the example.

@pcostell https://github.com/pcostell I / we made the decision not to
allow heterogeneous meanings since it didn't seem possible within GAE. (We
may also require everything in a list to be indexed / not-indexed for the
same reason.) It's unclear to me if we should allow null as a special case
or if we should just make it a free-for-all.

If we did try to do this, it'd be hard to get right:

  • Retrieve a list with 4 entries: [1, 2, 3, 4] which has meanings [15,
    None, None, 7]
  • The user doesn't even realize there are meanings
  • The user changes the list to [1, 2, 4, 7, 11] but the _meanings on
    the Entity is unchanged (we consider them to be there just for compat,
    so not user facing)
  • How do [1, 2, 4, 7, 11] and [15, None, None, 7] play nice when it's
    time to make a put() back to the server?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1649 (comment)

@xamoom-raphael
Copy link
Author

Hey guys, can you give me an update on the issue.
Will there be a fix/change in the next version of gcloud-python?

@jgeewax
Copy link
Contributor

jgeewax commented Apr 7, 2016

@dhermes : Update?

@dhermes
Copy link
Contributor

dhermes commented Apr 7, 2016

No update right now. This needs a discussion though before we commit to any changes. At a glance, the suggestion from @pcostell of following the same rules as the protos may make it hard to present a higher-level interface to users.

@pcostell
Copy link
Contributor

@dhermes What can we do to make movement here?

@jgeewax
Copy link
Contributor

jgeewax commented Apr 27, 2016

@xamoom-raphael : Just a heads up, the conversation here has gotten a bit theoretical when I think you just want to be able to read your data. Sorry about that.

Hang tight while we try to figure something out... If you have any ideas, chime in, but we have to be careful that whatever we do to deal with this situation doesn't give a "false sense of security" or overly complicate things under the hood.


I'm joining the bandwagon of "not sure how we could do this in a way that makes sense across the library".

Some thoughts...

  • @pcostell : Is there a list of meanings somewhere? The whole magic number thing should probably be documented, right?
  • AFAIK, our goal was to make a library that would work equally well across run-time environments (GCE, EC2, GAE, etc). I don't think this is the same thing as "able to do everything NDB can do exactly as it does it".
  • If we find a list of values, each with different meanings, we can always not error out, but this would mean that re-saving that entity will lose the meanings, which to me seems like the worse version as it mucks with data unintentionally.

I can think of a couple different options:

  1. Keep track of meanings somewhere, and then put them back when we save.
    This seems like a lot of extra complexity for something that's described in our docs as "The meaning field should only be populated for backwards compatibility." We already support this field for the common cases, but taking on nested entities with heterogeneous meanings under the hood seems... tricky and of debatable value.
  2. Error on save rather than load.
    Instead of throwing an error when loading a protobuf, throw an error when you try to save an entity that was loaded from a proto containing values with heterogeneous meanings. (In other words, we keep a flag saying whether we run into this condition, and saving requires an explicit clear_meanings=True when saving the entity or errors out.)
  3. Add a flag to ignore meanings
    Allow users to specify a flag saying that we should "ignore meanings" generally in Datastore, with a big huge warning saying "this clears out meanings from data you stored in App Engine" and links to a doc about the very specific use case you might have for flipping this flag.

@pcostell
Copy link
Contributor

pcostell commented Apr 27, 2016

The meaning field is documented as existing only to provide backwards compatibility (with App Engine). So we explicitly don't want any new libraries from explicitly populating it, although we do want to maintain existing set fields for compatibility.

@dhermes
Copy link
Contributor

dhermes commented Apr 27, 2016

@pcostell @jgeewax I think I have an idea how to do this. Starting now. (Sorry I meant to give this a shot last week.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API.
Projects
None yet
Development

No branches or pull requests

4 participants