Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Multiple CV services(or endpoints) return 403(access denied) for Iranian IPs #4747

Open
farooqkz opened this issue Jan 13, 2025 · 2 comments
Assignees

Comments

@farooqkz
Copy link
Contributor

Describe the bug
Mozilla CV uses multiple service from Google, and they are sanctioned for Iranian IPs.

  • When trying to download the dataset, CV says download is successful. But if you are attempting to download from an Iranian IP, Google won't let you download it. Returning 403.
  • It's impossible for Iranian IPs to authenticate a user. The mozilla authentication server returns "Access denied" for Iranian IPs.
  • When trying to get clips for validation, CV says there are no more clips to validate. But actually there are. It's just that Iranian IPs can't fetch with a 200 from Google endpoints. This really confuses the users and the Persian community has already lost many significant contributions from long time contributors. Because they were thinking there aren't clips to validate anymore.

To Reproduce
Get an Iranian IP then:

  • Try to get clips for validation(just visit like https://commonvoice.mozilla.org/en/listen). It will say there are no more clips for validation.
  • Try to sign in, you'll be blocked by Mozilla's authentication service. It's show an "Access denied" page
  • Try downloading a dataset. Then see that the endpoint from Google returns an XML saying "Access denied" and that "this service is not available in your location"

Expected behavior
Having an Iranian IP shouldn't block users from all above.

Screenshots
N/A

Desktop or Mobile (please complete the following information):
N/A

Additional Hardware (were you using headphones, an external speaker or an external microphone?):
N/A

Additional context
I would suggest moving off from the Google services to alternative privacy friendly EU ones. The less footprint from big privacy invader techs like Google, the better users could support the platform. CV is promoting something around open culture, which has close relation ships to FOSS. In the Persian CV community, already a vast majority of contributors are already open culture advocates.

As the storage services go, perhaps it would make sense to use a CDN or something at least for datasets. They are usually located in multiple regions. Perhaps even more CDN networks. No reason to lock open data to fetch only from Google.

@farooqkz farooqkz added the Bug label Jan 13, 2025
@jessicarose
Copy link
Collaborator

Thanks so much for getting in touch, we agree that Google's access issues are proving a barrier to a number of our language communities globally. Common Voice made the move onto GCP hosting due to a Mozilla-wide infrastructure shift, so short term we don't have an incredible amount of control over hosting for the platform and datasets. We've currently got work to investigate alternative access routes ticketed for our team and in the backlog and we're working to add more engineering capacity to make sure that important access pathways like this are addressed more quickly in the future.

In the short term, please do let me apologize for the additional burden this puts on our dataset users and our language communities in the range of global regions not currently able to access GCP hosted services.

@jessicarose jessicarose self-assigned this Jan 15, 2025
@HarikalarKutusu
Copy link
Contributor

It is actually pretty easy to solve in short term. Just use FTP mirrors which can be reached everywhere, and for DL statistics merge the two numbers. They even do not be high capacity servers as they will be used only by a smaller portion of global population.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants