Easier debugging of SPARQL queries #53

audiodude · 2024-12-06T18:26:12Z

Add a --num_rows argument to more easily specify that value on the command line without having to edit the code. If --full is specified, it overrides --num_rows and all rows are processed. The wiki_to_netflix module still contains a default value for num rows for when it is run directly.

Add a --missing_out_path argument, which will collect all the missing movies and put them in their own CSV. If this argument isn't specified, no missing movie CSV is written. This path is relative to the output directory, so if you specify --missing_out_path=my_missing.csv you will find your file at out/my_missing.csv.

Here is an example of how to use these new arguments:

$ pipenv run dev -m missing.old.csv -n 1000
[WARNING] Could not find movie id 1 ('Dinosaur Planet', '2003')
....
missing: 486 (48.60%)
found: 514 (51.40%)
total: 1000

$ pipenv run dev -m missing.new.csv -n 1000
[WARNING] Could not find movie id 1 ('Dinosaur Planet', '2003')
....
missing: 524 (52.40%)
found: 476 (47.60%)
total: 1000

$ wc -l out/missing.old.csv
     486 out/missing.old.csv
$ wc -l out/missing.new.csv
     524 out/missing.new.csv
$ diff out/missing.old.csv out/missing.new.csv
5d4
< 7,null,8 Man,1992,null,null
7d5
< 10,null,Fighter,2001,null,null
16a15
> 28,null,Lilo and Stitch,2002,null,null
27a27
> 44,null,Spitfire Grill,1996,null,null
31a32
> 54,null,We're Not Married,1952,null,null
38d38
...

Note that in this example, the QUERY variable in wiki_to_netflix was manually edited between run. A future enhancement might be to allow the user to specify a file or python module path that contains the query to use.

… SPARQL queries

skyfenton

Lgtm. I have a simple change in my branch to use the definitions file for paths in wiki_to_netflix (which also uses pathlib objects instead of os.path functions), but this still works for now.

skyfenton · 2024-12-07T05:46:44Z

Just one question, how do we make use of the missing outputs csv?

audiodude · 2024-12-07T14:18:04Z

Just one question, how do we make use of the missing outputs csv?

You can see it in my example above. You can use wc -l <file> to see how many misses are in each file. You can use diff to see which movies are in one file and not the other. You can also presumably import the CSVs into a spreadsheet and compare them that way.

skyfenton

Sorry for the delay, looks good! If it's useful to you then it's useful to someone.

audiodude added 2 commits December 6, 2024 10:16

Add --num_rows and --missing_out_path options for better debugging of…

5b39aa6

… SPARQL queries

Add docs for new params

82e08fd

audiodude requested review from skyfenton, cocomittens, JamesKohlsRepo and smaysenhalder December 6, 2024 18:27

skyfenton reviewed Dec 7, 2024

View reviewed changes

skyfenton approved these changes Dec 8, 2024

View reviewed changes

audiodude merged commit d79c3ab into main Dec 8, 2024
4 checks passed

audiodude deleted the new-cli-args branch December 8, 2024 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier debugging of SPARQL queries #53

Easier debugging of SPARQL queries #53

audiodude commented Dec 6, 2024 •

edited

Loading

skyfenton left a comment

skyfenton commented Dec 7, 2024

audiodude commented Dec 7, 2024

skyfenton left a comment

Easier debugging of SPARQL queries #53

Easier debugging of SPARQL queries #53

Conversation

audiodude commented Dec 6, 2024 • edited Loading

skyfenton left a comment

Choose a reason for hiding this comment

skyfenton commented Dec 7, 2024

audiodude commented Dec 7, 2024

skyfenton left a comment

Choose a reason for hiding this comment

audiodude commented Dec 6, 2024 •

edited

Loading