Skip to content

Conversation

@tkaemming
Copy link
Contributor

@tkaemming tkaemming commented Mar 7, 2017

This is basically a combination of several different independent changes:

  • Abstracts out the pattern of {index, key} parsing for commands. I'd like to find an even more generic way to split out this pattern for handling different structures like {index, key, data}, but haven't figured out a clean way to do this yet. (Stay tuned!)
  • Adds helper to make HGETALL responses return a mapping-style table rather than an list-style table.
  • Adds IMPORT and EXPORT commands for extracting and loading group data from the database, as well as MinHashIndex methods for calling them. This allows for moving data across scopes, as well as provides a method for performing basic introspection of the data contained in the database.

#nochanges

@getsentry-bot
Copy link
Contributor

getsentry-bot commented Mar 7, 2017

1 Warning
⚠️ Changes to build requirements

Generated by 🚫 danger

@tkaemming tkaemming force-pushed the similarity-import-export branch from a71340b to f41895e Compare March 8, 2017 01:53
@tkaemming tkaemming requested a review from LewisJEllis March 8, 2017 18:59
@tkaemming
Copy link
Contributor Author

tkaemming commented Mar 8, 2017

I'm planning on doing a few things but this is mostly done:

  • Adding comments to IMPORT and EXPORT explaining the input and output types as well as behavior.
  • Trying to further abstract collect_index_key_pairs into something more declarative and extensible, sort of like a variadic implementation of build_argument_parser.

@tkaemming
Copy link
Contributor Author

(This is done pending any review feedback.)

Copy link
Contributor

@JTCunning JTCunning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python bits look good. Deferring to @LewisJEllis for the actual approval.

setup.py Outdated
'cqlsh',
# /cassandra
'datadog',
'msgpack-python',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd pin this for <0.5.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@LewisJEllis LewisJEllis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, comments are just minor tweaks/curiosity.

local entries = build_variadic_argument_parser({
{'index', identity},
{'key', identity},
{'data', function (value)
Copy link
Contributor

@LewisJEllis LewisJEllis Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pass cmsgpack.unpack here? shouldn't be like in JS where if you do that you mess up on this binding (but maybe i'm wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, not sure why I didn't think of this earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end
local results = {}
for i = 1, #arguments, #fields do
local value, _ = parser(table.slice(arguments, i, i + #fields - 1), i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this use of slice in a loop makes me wish it was an in-place slice of some sort instead of copying every time...but these operations are already pretty heavy on IO and this doesn't affect the complexity overall

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I might look at making a slice iterator real quick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to kick this one down the road for a bit because I don't think it's going to make a significant performance improvement here, but I'd like to move a lot of the functional stuff to iterators in the future for improved memory utilization so I will probably tackle it then.

local response = redis.call(
'HGETALL',
source_bucket_frequency_key
local response = redis_hgetall_response_to_table(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this helper func cleans up that repeated loop pattern and lets variable assignment happen in the for loop w/pairs, good stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that weird sequence-as-mapping response was driving me nuts.

-- The destination bucket frequency key may have not
-- existed previously, so we need to make sure we set
-- the expiration on it in case it is new.
if touched then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we eliminate the touched flag and just do if next(buckets) == nil then here? PIL on next func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat trick, will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--[[
Exports data that is located at the provided ``index`` and ``key`` pairs.
Generally, this data should be treated as opaque method for extracting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder why this doesn't just return a single big msgpack blob? Would be theoretically more portable/hands-off for anything passing it around, but your PR comment mentions introspection use cases that I'm guessing motivate doing it this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way each input pair is associated with a single result. This is consistent with the responses from other commands, and also makes it so that if you do EXPORT A B and then need to IMPORT A and IMPORT B into different scopes that is possible without having to unpack the return value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. I'll trust that that hypothetical use is some realistic possibility.

{'key', identity},
{'data', function (value)
return cmsgpack.unpack(value)
end}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency nit: some spots have trailing comma in array-like-tables, some spots don't...i think this is the outlier in not having it, not sure if you've made a conscious decision one way or the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably just forgot to do that here. I've carried over from Python my habit of adding a trailing comma to sequence data structures. I'll fix this for consistency, but is there any sort of general Lua style guide that would apply here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to follow this (but not strictly) and it roughly says "trailing comma is okay but discouraged"; I think I agree that the trailing comma is less common but I'm totally fine with it myself. If we want to decide to follow it we can, but I think your Lua style is generally fine and the consistent trailing comma wfm.

)
end
)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writin some lisp over here except someone macro'd end to be another paren token

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁

]

for idx, key in items:
arguments.extend([idx, key])
Copy link
Contributor

@LewisJEllis LewisJEllis Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just do arguments.extend(itertools.chain(*items)) instead of the for loop, not sure which you prefer; the for loop is certainly more self-explanatory but i know you like itertools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is entirely for readability since I didn't actually write documentation or type annotations for any of this.

arguments,
)

def import_(self, scope, items, timestamp=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol reserved keywords

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, right? 😭

@tkaemming tkaemming merged commit 8086224 into master Mar 14, 2017
@tkaemming tkaemming deleted the similarity-import-export branch March 14, 2017 18:38
@github-actions github-actions bot locked and limited conversation to collaborators Dec 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants