Fix Spanish dialect selection #553

mmcauliffe · 2024-07-19T15:46:37Z

[ x ] Updated Unreleased in CHANGELOG.md to reflect the changes in code or data.

mmcauliffe · 2024-07-19T16:24:01Z

One open question I have is whether to include additional country selectors for Latin America (i.e. Columbia, Chile, etc) to the scrape's config, but I don't know how prevalent these are wiktionary or if they're wanted in the wider Latin America dialect file.

kylebgorman

LGTM and I appreciate the new test, which is very needed.

Okay, so I think the right way to test this though is to run the big scrape on Spanish and incorporate changes into the PR. For that you'd install (from this PR), navigate to data/scrape and issue ./scrape --restriction spa && ./postprocess and wait (about 12-24 hours), then stage and commit the changed files. Is this feasible on your end?

kylebgorman · 2024-07-20T19:59:34Z

One open question I have is whether to include additional country selectors for Latin America (i.e. Columbia, Chile, etc) to the scrape's config, but I don't know how prevalent these are wiktionary or if they're wanted in the wider Latin America dialect file.

We don't have any discoverability for dialect strings. Ideally there'd be some way to get a list of them in descending frequency order and then you could manually cluster and write back into languages.json. Nice feature request though.

mmcauliffe · 2024-07-22T17:10:27Z

Sure, I can run that in the next couple of days. I did do some work for getting single parse for multiple dialects working, but it's a bit hacky and I'll rerun the scape using just this current branch.

kylebgorman · 2024-07-22T17:19:12Z

Sure, I can run that in the next couple of days. I did do some work for getting single parse for multiple dialects working, but it's a bit hacky and I'll rerun the scape using just this current branch.

Excited to see this, this will be a huge improvement we've wanted forever.

kylebgorman

LGTM, big thanks to you for this.

mmcauliffe added 2 commits July 19, 2024 08:46

Fix Spanish dialect selection

b322b13

Fix linter errors

d3f31d1

kylebgorman reviewed Jul 20, 2024

View reviewed changes

mmcauliffe added 2 commits July 27, 2024 12:26

Update big scrape files for spanish

4e07c1d

Update changelog

cd1f601

kylebgorman approved these changes Jul 27, 2024

View reviewed changes

kylebgorman merged commit 130a0d7 into CUNY-CL:master Jul 27, 2024
10 checks passed

kylebgorman added a commit to kylebgorman/wikipron that referenced this pull request Jul 27, 2024

Updates to version 133 to release CUNY-CL#553.

87457b0

kylebgorman added a commit that referenced this pull request Jul 27, 2024

Updates to version 133 to release #553. (#554)

5c5096c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Spanish dialect selection #553

Fix Spanish dialect selection #553

mmcauliffe commented Jul 19, 2024

mmcauliffe commented Jul 19, 2024

kylebgorman left a comment

kylebgorman commented Jul 20, 2024

mmcauliffe commented Jul 22, 2024

kylebgorman commented Jul 22, 2024

kylebgorman left a comment

Fix Spanish dialect selection #553

Fix Spanish dialect selection #553

Conversation

mmcauliffe commented Jul 19, 2024

mmcauliffe commented Jul 19, 2024

kylebgorman left a comment

Choose a reason for hiding this comment

kylebgorman commented Jul 20, 2024

mmcauliffe commented Jul 22, 2024

kylebgorman commented Jul 22, 2024

kylebgorman left a comment

Choose a reason for hiding this comment