-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor main file/commands and expand dataclass usage #61
Refactor main file/commands and expand dataclass usage #61
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I think that the ideas here are really good and they put us in a good place to run the code and manage the data.
I think all of the issues I've pointed out are minor and should be easily fixable.
Also @JamesKohlsRepo , please do leave your own review! I think learning how to review code is a really important skill to have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Travis is the one to keep happy, not me! But my tuppence is: LGTM
Hullo, @skyfenton. Here's a tiny feature request. In wiki_query(), please delete the surrounding pair of SPACE characters.
I thought we needed to add a title .strip() call, based on what the log was telling me, but it turns out it's just an output artifact that should be removed. BTW, here's a pair of handy f-string syntax tips that might be of interest:
The first is great for quick debugging -- it gives the identifier followed by its value. Also, |
Ok, I finally found a substantive issue. This PR 61 introduces a new Consider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work updating this PR! I think the only thing left is testing, but I guess we've already set the (unfortunate) precedent that we don't write tests, so this is fine as is.
Fixed in recent change, maybe we should setup mypy or some other type checker (or maybe at least know about errors even if we don't want them to block)?
Good catch! Renamed dataclasses.py to schemas.py instead, which I think is fitting, even more so if we eventually use pydantic. Also removed spaces from outputs with titles and wrapped with double quotes. |
I added a couple tests for the flatten_values and wiki_query functions using the new dataclasses if you want to take a look. I discovered the order we get the genres of a movie is non-deterministic, so I sort the list before checking for equality. Maybe a good goal to set for future testing is some percentage of coverage? Is writing tooling/tests for read csv/write csv/process worth it? |
By the way, I'm gonna try to polish the cli a little more so I'll publish the pr once I'm done. Originally this draft was just to get @JamesKohlsRepo's attention so he can merge in the dataclasses we were working on. |
Sounds good! Nice work. |
…ttps://github.com/noisebridge/MediaBridge into 58-connect-wiki_to_netflixmongo-insertionmain-file
Hmm, this is slightly sad. I'm accustomed to "trust the author!" settings. Oh, well, I'll just reapprove.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-approving...
I wonder if there is a way to require review/approval, but not re-approval? |
Yup. Here are two items. And for background, my personal philosophy is that Authors are smart and good, they mean well, the Author is always right, can always choose to ignore a Reviewer's remark. My role as Reviewer is simply to offer a different perspective on the code that perhaps was never considered. Once a PR has a 👍 thumbs up on it, I wish for Author to be able to quickly e.g. fix a typo and merge to I usually click github repo Settings to allow this on Pull Requests:
Then everyone who clicks the giant green Merge button gets a Squash, so To answer your original question, it's on branch protection Rules. Make sure that "Dismiss stale pull request approvals when new commits are pushed" is disabled. Which ensures that previous approvals won't be invalidated when new commits are added to the branch. |
Okay yup, I found the rule, thanks! |
Adds CLI commands for processing data into a csv and triggering a load of the csv into MongoDB.
Also creates dataclasses for input and enriched data (MovieData and EnrichedMovieData) and refactors code in wiki_to_netflix so that we can normalize the netflix data and write data into csvs based on these classes. Additionally, now when we write a list of MovieData objects to a csv file, the first line of the csv will consist of a header naming each column so that the entire file is mostly self-documenting (though, without explicit types). For example, movie_titles.csv writes EnrichedMovieData (matches), so the resulting csv file looks like: