-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor command line and add manually set fields and expansions #549
Conversation
For example,
So you can run:
To only get Maybe this will be common enough that it would warrant a
Thoughts? Right now i'm inclined to leave it as is - to be aligned with the API. |
Gonna finish this so it works in a bit, and also gonna also try to use |
It looks like it removes 100 lines of code, which seems like a good direction to head in. |
For the record I'm still opposed to customising fields, apart from the --no-context-annotations option. If people need that level of control I think other packages might be more suitable. Since there's also no downsides (apart from the context annotations affecting the pagesize) I'm not in favour of customisation for the sake of customisation. I would suggest that if we do go down this path, I'd prefer that we actively discourage using it - if people are frequently dropping bits and pieces, it means that downstream tools consuming Twitter V2 JSON suddenly have a lot more variation to contend with. Putting the grumbling aside the kwargs refactor and standardised decorators are great! Suggestion:
|
Ah i thought i replied to this earlier... but yeah - i also think we should try to minimize people being able to shoot themselves in the foot and make things simple. But i'm also constantly finding the need to be able to make arbitrary API call parameters and having the command line have the same capability as the API really helps. On the bright side - If we document a way of using it, people won't mess up - nobody seems to read With that in mind, and another related idea, i've a suggestion: a
default is still 100 per page, full expansions, specifying
(get everything, default)
(everything except context annotations, sets max results 500.)
eg output for What do you think? |
That could work, and we could avoid the arbitrary scale and extra parameter
by using boolean flags and more concrete names to align with the intended
use cases:
1. No flag set implies your data fidelity Max unless fields are customised
2. --no-context-annotations is your medium
3. --minimal-fields with a doc note that this also sets
no-context-annotations is your minimum
And depending on how complicated the code paths look, maybe make these
flags exclusive with manual field settings so we don't have to think about
the semantics of --minimal-fields and whatever the user provided?
…On Thu, 30 Sep 2021, 23:44 Igor Brigadir, ***@***.***> wrote:
Ah i thought i replied to this earlier... but yeah - i also think we
should try to minimize people being able to shoot themselves in the foot
and make things simple. But i'm also constantly finding the need to be able
to make arbitrary API call parameters and having the command line have the
same capability as the API really helps.
On the bright side - If we document a way of using it, people won't mess
up - nobody seems to read --help strings anyway (lol), and if someone
does and uses these "undocumented" features, they are also likely to know
what they're doing.
With that in mind, and another related idea, i've a suggestion: a
--data-coverage or --data-fidelity parameter for users:
twarc2 search "foobar" results.jsonl
default is still 100 per page, full expansions, specifying --data-coverage
will give you a preset of expansions and fields:
twarc2 search --data-coverage max "foobar" results.jsonl
(get everything, default)
twarc2 search --data-coverage high "foobar" results.jsonl
(everything except context annotations, sets max results 500.)
twarc2 search --data-coverage low "foobar" results.jsonl
eg output for low:
https://gist.github.com/igorbrigadir/e3c1b11f7c12cbb4656ddfd0be97b1ea (i
wanted IDs only, but it's mandatory to get text and user names it seems)
This is the minimal representation of tweets that still has all the
references to everything - may be useful for large crawls that are aiming
to analyze networks for example.
What do you think?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#549 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACADAUK2IOCDMMEP7BQ5AP3UERSVBANCNFSM5E6NSVGA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I added some more things and validation, here's how it behaves now:
setting any I think that covers it well enough! |
I fell behind a bit with this but will fix it up to include the latest changes in main |
I think that's all of them, but i haven't tested all the commands yet. |
I might have some time later today to run some testing - is there anything
in particular you'd want me to look at?
…On Fri, 15 Oct 2021 at 04:48, Igor Brigadir ***@***.***> wrote:
I think that's all of them, but i haven't tested all the commands yet.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#549 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACADAUIZFOCAYJCVRVNDWCDUG4QZDANCNFSM5E6NSVGA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I think I did break a few things after all, so I'll have another go at fixing it up, but yeah - the |
Almost there i think! I still have a feeling something somewhere is wrong though. |
I guess this would be where having a test of the CLI would be useful eh? Why don't we merge this work, since it's a lot, and put it out there and see what breaks? |
Yeah why not! Since the tests pass, maybe we can merge, and I'm going to attempt to add proper click tests in another PR? https://click.palletsprojects.com/en/8.0.x/testing/ or wait a bit more for me to add these to this PR? I'm fine with either way. |
Yeah, even some basic click tests would be an awesome thing to add. But I don't think they are needed for this PR. |
I was testing with |
@edsu - I think you need --archive set as well - I don't think the 500 tweets/page is available on the standard access endpoint. I did just do some very basic checking myself and it seems to be mostly working - just a couple of minor things I'll push up in a minute. |
Had another pass though and found a few more things missing or not working. I think i tried all the commands now with |
This was just released as part of v2.8.0! Thanks for all the work @igorbrigadir and spot checking @SamHames. |
Fix #493
will also Fix #550
This also refactors the
command2.py
click command line options.Needs a good bit of testing to make sure all the old commands still work.