Skip to content

Commit

Permalink
Merge branch 'main' into merge_netflix_streamlining_function_calls
Browse files Browse the repository at this point in the history
  • Loading branch information
audiodude committed Oct 24, 2024
2 parents 9991f42 + d5d271b commit 162cf04
Show file tree
Hide file tree
Showing 19 changed files with 80 additions and 2 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
data
out
.env
.pytest_cache
__pycache__
2 changes: 2 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ name = "pypi"
requests = "==2.26.0"
python-dotenv = "==1.0.1"
tqdm = "==4.66.5"
pytest = "==8.3.3"
pytest-cov = "==5.0.0"

[dev-packages]

Expand Down
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# Noisebridge Python Project
# What is MediaBridge?

https://www.noisebridge.net/wiki/Python_Project_Meetup
MediaBridge is a project being developed at the [Noisebridge](https://github.com/noisebridge) hackerspace in San Francisco, CA, USA. See also the [Noisebridge hompage](https://www.noisebridge.net/wiki/Noisebridge) and the [wiki entry for this project](https://www.noisebridge.net/wiki/Python_Project_Meetup).

MediaBridge is in a _very_ early stage of the development. It's intended functionality is to provide recommendations that _bridge_ media types. So for example, you might say you're interested in the film _Saw_ and MediaBrige might recommend the video game _Silent Hill_ or a Stephen King book. For now, we are working on simply returning recommendations for movies, based on the [Netflix Prize dataset](https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data).

Currently, we are only accepting contributions from members of the project who meet in person at Noisebridge.

## Testing

To run unit tests,

1. Ensure `pipenv` is installed
2. Run `pipenv run pytest`
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,19 @@ def process_data(test=False):
missing_count = 0
processed_data = []

<<<<<<< HEAD:src/data_processing/wiki_to_netflix.py
netflix_data = read_netflix_txt(os.path.join(DATA_DIR, 'movie_titles.txt'), test)
num_rows = len(netflix_data)
=======
netflix_data = read_netflix_txt(os.path.join(data_dir, 'movie_titles.txt'), test)
>>>>>>> main:mediabridge/data_processing/wiki_to_netflix.py

netflix_csv = os.path.join(OUT_DIR, 'movie_titles.csv')

wiki_movie_ids_list, wiki_genres_list, wiki_directors_list = wiki_query(netflix_data, user_agent)

num_rows = len(wiki_movie_ids_list)

for index, row in enumerate(netflix_data):
netflix_id, year, title = row
if wiki_movie_ids_list[index] is None:
Expand Down
6 changes: 6 additions & 0 deletions mediabridge/data_processing/wiki_to_netflix_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from wiki_to_netflix import format_sparql_query, wiki_query, process_data
from wiki_to_netflix_test_data import EXPECTED_SPARQL_QUERY

def test_format_sparql_query():
QUERY = format_sparql_query("The Room", 2003)
assert QUERY == EXPECTED_SPARQL_QUERY
45 changes: 45 additions & 0 deletions mediabridge/data_processing/wiki_to_netflix_test_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
EXPECTED_SPARQL_QUERY ='''
SELECT * WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch" ;
wikibase:endpoint "www.wikidata.org" ;
mwapi:search "The Room" ;
mwapi:language "en" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P31/wdt:P279* wd:Q11424 .
{
# Get US release date
?item p:P577 ?releaseDateStatement .
?releaseDateStatement ps:P577 ?releaseDate .
?releaseDateStatement pq:P291 wd:Q30 .
}
UNION
{
# Get unspecified release date
?item p:P577 ?releaseDateStatement .
?releaseDateStatement ps:P577 ?releaseDate .
FILTER NOT EXISTS { ?releaseDateStatement pq:P291 ?country }
}
FILTER (YEAR(?releaseDate) = 2003) .
?item rdfs:label ?itemLabel .
FILTER (lang(?itemLabel) = "en") .
OPTIONAL {
?item wdt:P136 ?genre .
?genre rdfs:label ?genreLabel .
FILTER (lang(?genreLabel) = "en") .
}
OPTIONAL {?item wdt:P57 ?director.
?director rdfs:label ?directorLabel.
FILTER (lang(?directorLabel) = "en")}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
'''
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 4 additions & 0 deletions mediabridge/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from mediabridge.data_processing import wiki_to_netflix

q = wiki_to_netflix.format_sparql_query('The Room', 2003)
print(q)
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 2 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[pytest]
python_files = *_test.py

0 comments on commit 162cf04

Please sign in to comment.