Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to manually specify expansions and fields #493

Closed
igorbrigadir opened this issue Jun 22, 2021 · 9 comments · Fixed by #549
Closed

Add ability to manually specify expansions and fields #493

igorbrigadir opened this issue Jun 22, 2021 · 9 comments · Fixed by #549

Comments

@igorbrigadir
Copy link
Contributor

Normally, twarc aims to grab everything. But this seems like it's causing problems in the API if the requests are too big, eg #449

It would be good to have a manual override for the expansions and fields. The extra command line parameters to align with the API https://developer.twitter.com/en/docs/twitter-api/data-dictionary/using-fields-and-expansions should have:

--expansions "author_id,geo.place_id" where the valid ones are: https://github.com/DocNow/twarc/blob/main/twarc/expansions.py#L16

Same for:

--user-fields

--tweet-fields

--media-fields

--poll-fields

--place-fields

Ideally it should also complain with an error or automatically set things fro you - if you specify --poll-fields but fail to specify attachments.poll_ids in --expansions. It would be nice to parse these and validate them for the user, but if that's too complicated and cumbersome, just a check and a warning should be enough.

@edsu
Copy link
Member

edsu commented Jun 22, 2021

I thought this might be coming :-) I'd really like to shield the user as much as possible from this complexity. I also don't want us to bend over backwards because of Twitter's Fail Whale. That being said it would be nice for twarc to be able to work...

@igorbrigadir
Copy link
Contributor Author

Yeah - i'm kinda leaning towards making these available, but not encouraged - maybe not inferring and helping with the settings after all. Just directly reading the settings.

@edsu
Copy link
Member

edsu commented Jun 23, 2021

If Twitter doesn't fix their API then we won't have much choice.

@SamHames
Copy link
Contributor

I'm a hard disagree on this one right now. We're really only a few months into early access, I think it's a little premature to be working around Twitter's API instability. Especially since that has impacts on downstream plugins.

Also I live 15000km from most of the internet, so 503's aren't exactly rare ;)

@igorbrigadir
Copy link
Contributor Author

Yeah it's a good bit of work implementing it alright - i would still do it just to support the API "as is" but maybe as a lower priority - maybe for these 503s the changes here might actually be a better solution: https://github.com/DocNow/twarc/compare/503_search_all_workaround

@SamHames
Copy link
Contributor

How are we feeling about this now?

Based on a bit more handson work with the API, the only thing I'd really want to turn off is the context annotations so that I can collect data faster. Maybe instead of full customisability an off-by-default --exclude-context-annotations flag to support the 500 requests/page would cover most of this?

@igorbrigadir
Copy link
Contributor Author

igorbrigadir commented Sep 28, 2021

I think since the expansions code can deal with missing expansions easily, it shouldn't be a problem.

I've been meaning to put these in for the same reason, trading off context annotations for bigger pages - but without anything complicated or clever, it'll assume you know what you're specifying. I'll make the PR later!

@edsu
Copy link
Member

edsu commented Sep 28, 2021

Yes, being able to turn off context-annotations came up in some work I was doing recently. It would be nice to be able to selectively be able to turn them off for the 5X turbo boost.

@igorbrigadir
Copy link
Contributor Author

I may have stumbled on a way to refactor all the command line options too.. harking back to #514 - see the #549 PR for what i mean

@edsu edsu closed this as completed in #549 Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants