-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken images on some plugin pages #156
Comments
Which returns this text: Meanwhile on https://github.com/simonw/datasette-atom |
Possibly relevant:
Those talk about timeouts though, which I think isn't the same thing as this signature problem. |
I'm going to force re-build the README for datasette.io/build_directory.py Lines 178 to 185 in 55e8fb1
|
That did fix it: https://datasette.io/plugins/datasette-atom So I think the fix may be to change the logic about fetching README files so it fetches if the repo has been updated OR if the README was last fetched more than X days ago. Not sure what value to use for X. |
Found some relevant documentation. https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-anonymized-urls
https://github.blog/2014-01-28-proxying-user-images/
https://github.com/atmos/camo was archived in April 2021. |
Another clue: curl -i 'https://camo.githubusercontent.com/42a02d342f39e3dc05195df216b9f89fa84de19d94547b4c68a02ec12f0eaf2d/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f73696d6f6e772f6461746173657474652d61746f6d3f696e636c7564655f70726572656c6561736573266c6162656c3d6368616e67656c6f67'
Note |
I think GitHub upgraded from that Node sURL, ok := encoding.DecodeURL(p.config.HMACKey, sigHash, encodedURL)
if !ok {
http.Error(w, "Bad Signature", http.StatusForbidden)
return
} |
Having seen that code, my hunch is that GitHub changed their In the DB at the moment I store both the raw markdown and the rendered HTML: https://datasette.io/content/repos/209091256 That shows that using GitHub's markdown API doesn't produce the In my code I'm calling out to def fetch_readme(token, full_name, html=False):
headers = make_headers(token)
if html:
headers["accept"] = "application/vnd.github.VERSION.html"
url = "https://api.github.com/repos/{}/readme".format(full_name)
response = requests.get(url, headers=headers)
if response.status_code != 200:
return None
if html:
return rewrite_readme_html(response.text)
else:
return base64.b64decode(response.json()["content"]).decode("utf-8") Switching to rendering through the Markdown API would fix this by removing |
datasette.io/build_directory.py Lines 187 to 197 in 428d631
I could flip |
Probably easiest to cut |
Pieces I need to fix this. curl 'https://api.github.com/repos/simonw/datasette/readme' -H 'Accept: application/vnd.github.raw' To get back the raw Markdown. import httpx
body = """
# Example
This is example Markdown
"""
response = httpx.post(
"https://api.github.com/markdown",
json={
# mode=gfm would expand #13 issue links, provided you pass
# context=simonw/datasette too
"mode": "markdown",
"text": body,
},
headers=headers,
)
if response.status_code == 200:
markdown_as_html = response.text |
Looking again at this code: if repos_to_fetch_readme_for:
print("Fetching README for {}".format(repos_to_fetch_readme_for))
github_to_sqlite_repos.callback(
db_filename,
usernames=[],
auth="auth.json",
repo=repos_to_fetch_readme_for,
load=None,
readme=True,
readme_html=True,
) Is it JUST fetching the READMEs, or is it also populating the database with other important information? If it's pulling other key information too then I should leave that in there but rename |
Here's the implementation of def repos(db_path, usernames, auth, repo, load, readme, readme_html):
"Save repos owned by the specified (or authenticated) username or organization"
db = sqlite_utils.Database(db_path)
token = load_token(auth)
if load:
for loaded_repo in json.load(open(load)):
utils.save_repo(db, loaded_repo)
else:
if repo:
# Just these repos
for full_name in repo:
repo_id = utils.save_repo(db, utils.fetch_repo(full_name, token))
_repo_readme(db, token, repo_id, full_name, readme, readme_html)
else:
if not usernames:
usernames = [None]
for username in usernames:
for repo in utils.fetch_all_repos(username, token):
repo_id = utils.save_repo(db, repo)
_repo_readme(
db, token, repo_id, repo["full_name"], readme, readme_html
)
utils.ensure_db_shape(db)
def _repo_readme(db, token, repo_id, full_name, readme, readme_html):
if readme:
readme = utils.fetch_readme(token, full_name)
db["repos"].update(repo_id, {"readme": readme}, alter=True)
if readme_html:
readme_html = utils.fetch_readme(token, full_name, html=True)
db["repos"].update(repo_id, {"readme_html": readme_html}, alter=True) I'm going to assume there are other reasons to call it and rename the variable to |
I'm also going to drop that |
Last question: how best to upgrade the existing records? I'm worried about rate limits so I don't want to force update READMEs for everything I think I'll add code which, every time the build script runs, picks the three oldest repos which have Then in a few days time everything should be fixed. |
Those build logs included:
But... while these pages have non-broken images they still seem to be proxied through |
https://datasette.io/plugins/datasette-atom
The text was updated successfully, but these errors were encountered: