Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update medium and large fixtures to parse XML special characters #283

Closed
willcravitz opened this issue Nov 24, 2023 · 9 comments · Fixed by #287
Closed

Update medium and large fixtures to parse XML special characters #283

willcravitz opened this issue Nov 24, 2023 · 9 comments · Fixed by #287
Assignees
Labels
bug Something isn't working x/game-management
Milestone

Comments

@willcravitz
Copy link
Contributor

willcravitz commented Nov 24, 2023

In issue #258 we noted that there are several characters in the game descriptions such as 
 that we need to parse correctly. It turns out that it's hard to parse these characters on the frontend, and it's better to parse them when we make the call to the BGG API. This means updating the script to generate fixtures so that descriptions are parsed before they get stored.

@willcravitz
Copy link
Contributor Author

willcravitz commented Nov 25, 2023

Here is what the updated script looks like. I am adding it to the wiki page here to make it easier to find for future reference.

import requests
import xml.etree.ElementTree as ET
import xml.sax.saxutils
import json

def bgg_get_game_details(bgg_ids):
    BGG_BASE_URL = "https://www.boardgamegeek.com/xmlapi2/"
    ids_string = ",".join(map(str, bgg_ids))
    details_url = f"{BGG_BASE_URL}thing?id={ids_string}&stats=1"
    details_response = requests.get(details_url)
    details_text = xml.sax.saxutils.unescape(details_response.text)
    try:
        details_root = ET.fromstring(details_text)
    # Skip the batch if it cannot be parsed after escaping special characters
    except: 
        return []

    games_data = []
    for item in details_root.findall("item"):
        # Extract game details as before, but for each item
        name_element = item.find(".//name[@type='primary']")
        if name_element is None:
            continue  # Skip games without a name

        # Retrieve the complexity value from the API response, convert it to a float,
        # and round it to two decimal places. If the value is not found, default to None.
        if item.find(".//averageweight") is not None:
            complexity_value = item.find(".//averageweight").get("value")
            rounded_complexity = (
                round(float(complexity_value), 2) if complexity_value else None
            )
        else:
            rounded_complexity = None

        # Construct a dictionary with the game's details, parsing various elements from the API response
        game_data = {
            "BGG_id": item.get("id"),
            "name": item.find(".//name").get("value"),
            "image": item.find(".//image").text
            if item.find(".//image") is not None
            else "/static/images/no_picture_available.png",
            "description": item.find(".//description").text,
            "year_published": int(item.find(".//yearpublished").get("value"))
            if item.find(".//yearpublished") is not None
            else None,
            "min_players": item.find(".//minplayers").get("value"),
            "max_players": item.find(".//maxplayers").get("value"),
            "expected_playtime": item.find(".//playingtime").get("value"),
            "min_playtime": item.find(".//minplaytime").get("value"),
            "max_playtime": item.find(".//maxplaytime").get("value"),
            "suggested_age": item.find(".//minage").get("value"),
            "complexity": rounded_complexity,  # The rounded complexity rating of the game
            # Missing fields: category, mechanics
        }

        games_data.append(game_data)

    return games_data


def create_fixture_file(num_entries, filename="fixture_file.json", batch_size=10):
    fixtures = []
    fetched_games = 0
    bgg_id = 1

    while fetched_games < num_entries:
        batch_ids = list(range(bgg_id, bgg_id + batch_size))
        games_data = bgg_get_game_details(batch_ids)

        for game_data in games_data:
            if fetched_games >= num_entries:
                break

            fixture_entry = {
                "model": "games.game",
                "pk": fetched_games + 1,
                "fields": {
                    "name": game_data.get("name", ""),
                    "description": game_data.get("description", ""),
                    "year_published": game_data.get("year_published", None),
                    "image": game_data.get("image", ""),
                    "rules": "",
                    "min_players": game_data.get("min_players", ""),
                    "max_players": game_data.get("max_players", ""),
                    "suggested_age": game_data.get("suggested_age", ""),
                    "expected_playtime": game_data.get("expected_playtime", ""),
                    "min_playtime": game_data.get("min_playtime", ""),
                    "max_playtime": game_data.get("max_playtime", ""),
                    "complexity": game_data.get("complexity", None),
                    "BGG_id": game_data.get("BGG_id", ""),
                    "categories": [], 
                    "mechanics": [], 
                },
            }

            fixtures.append(fixture_entry)
            fetched_games += 1
            print(f"Added game {fetched_games} of {num_entries}")

        bgg_id += batch_size

    with open(filename, "w", encoding="utf-8") as f:
        json.dump(fixtures, f, ensure_ascii=False, indent=4)


if __name__ == "__main__":
    # This will create a fixture with 50 games
    fixture_size = 50
    create_fixture_file(fixture_size, f"games-fixture-{fixture_size}.json")

@willcravitz
Copy link
Contributor Author

Opened up a new branch game-management/update-fixtures to update the medium and large fixtures.

@willcravitz
Copy link
Contributor Author

I generated new 5, 50, and 500 game fixtures using the updated script. The descriptions do not have special characters like &#10; anymore and line breaks show up as expected.

@willcravitz
Copy link
Contributor Author

If this script works then it should also be used to update game autofilling from issue #97.

@majorsylvie
Copy link
Contributor

majorsylvie commented Nov 27, 2023

Here is what the updated script looks like. I am adding it to the wiki page here to make it easier to find for future reference.

import requests
import xml.etree.ElementTree as ET
import xml.sax.saxutils
import json
...

I've edited your comment to add py after your triple backtick (```) which tells github's markdown interpreter that this is a block of python code.

This adds syntax highlighting and makes and code comment you write significantly easier to read.

This also works for many other languages, and usually works by adding the language name or file extension to the top.

So any code block, instead of being:

```
some code here :)
```

would be

```language_file_extension (py, js, c, cpp)
some code here :)
```

eg:

```py
def hello_world():
  print("hello world <3")
```

Markdown has many great formatting tools and you deserve to know what that entails!

@willcravitz
Copy link
Contributor Author

Thank you @majorsylvie! This is a great tip and I'll be sure to use it in the future.

@frowenz
Copy link
Contributor

frowenz commented Nov 27, 2023

Repeating a question from the pull request associated with this issue:

One lingering questions is Where does the script to generate this code belong?.

Currently, we have a bunch of scripts in wiki pages: 1, 2, 3. Given that some of this is code is:

  1. being reused and improved upon and
  2. contains functions that are useful in other places (e.g., BGG autofilling , BGG Autofilling Working on Create Pages #196, would benefit from the same intelligent parsing),

I think it might be worth putting these scripts or parts of these scripts into a utilities folder, say utils. What do you think? @willcravitz

@github-project-automation github-project-automation bot moved this from In Progress to Done in CMSC 22000 Scrum Board Nov 29, 2023
@willcravitz
Copy link
Contributor Author

willcravitz commented Nov 29, 2023

Closing comments: This issue resulted in replacing our existing 5, 50, and 500 games fixtures with new fixtures that parse the XML received from the BGG API so that there are no special characters. The manual 5 game fixture was kept but renamed. The modified script was added to the wiki but as @frowenz commented we plan to add it to an appropriate location in the codebase in the future.

Developer contributions:
@frowenz wrote the initial script to generate fixtures and made suggestions about where to put the code in the future.
@willcravitz modified the script to parse special XML characters and used this to generate the new fixtures.

@majorsylvie
Copy link
Contributor

Issue Score: Excellent

Comments:
Thank you @frowenz for bringing in relevant discussion from the related PR.
Thank you @willcravitz for your consistent and meaningful updates!

Happy to teach y'all the markdown code block formatting :)

Thank you both for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working x/game-management
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants