Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier debugging of SPARQL queries #53

Merged
merged 2 commits into from
Dec 8, 2024
Merged

Easier debugging of SPARQL queries #53

merged 2 commits into from
Dec 8, 2024

Conversation

audiodude
Copy link
Collaborator

@audiodude audiodude commented Dec 6, 2024

Add a --num_rows argument to more easily specify that value on the command line without having to edit the code. If --full is specified, it overrides --num_rows and all rows are processed. The wiki_to_netflix module still contains a default value for num rows for when it is run directly.

Add a --missing_out_path argument, which will collect all the missing movies and put them in their own CSV. If this argument isn't specified, no missing movie CSV is written. This path is relative to the output directory, so if you specify --missing_out_path=my_missing.csv you will find your file at out/my_missing.csv.

Here is an example of how to use these new arguments:

$ pipenv run dev -m missing.old.csv -n 1000
[WARNING] Could not find movie id 1 ('Dinosaur Planet', '2003')
....
missing: 486 (48.60%)
found: 514 (51.40%)
total: 1000

$ pipenv run dev -m missing.new.csv -n 1000
[WARNING] Could not find movie id 1 ('Dinosaur Planet', '2003')
....
missing: 524 (52.40%)
found: 476 (47.60%)
total: 1000

$ wc -l out/missing.old.csv
     486 out/missing.old.csv
$ wc -l out/missing.new.csv
     524 out/missing.new.csv
$ diff out/missing.old.csv out/missing.new.csv
5d4
< 7,null,8 Man,1992,null,null
7d5
< 10,null,Fighter,2001,null,null
16a15
> 28,null,Lilo and Stitch,2002,null,null
27a27
> 44,null,Spitfire Grill,1996,null,null
31a32
> 54,null,We're Not Married,1952,null,null
38d38
...

Note that in this example, the QUERY variable in wiki_to_netflix was manually edited between run. A future enhancement might be to allow the user to specify a file or python module path that contains the query to use.

Copy link
Collaborator

@skyfenton skyfenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. I have a simple change in my branch to use the definitions file for paths in wiki_to_netflix (which also uses pathlib objects instead of os.path functions), but this still works for now.

@skyfenton
Copy link
Collaborator

Just one question, how do we make use of the missing outputs csv?

@audiodude
Copy link
Collaborator Author

Just one question, how do we make use of the missing outputs csv?

You can see it in my example above. You can use wc -l <file> to see how many misses are in each file. You can use diff to see which movies are in one file and not the other. You can also presumably import the CSVs into a spreadsheet and compare them that way.

Copy link
Collaborator

@skyfenton skyfenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, looks good! If it's useful to you then it's useful to someone.

@audiodude audiodude merged commit d79c3ab into main Dec 8, 2024
4 checks passed
@audiodude audiodude deleted the new-cli-args branch December 8, 2024 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants