You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Mozilla CV uses multiple service from Google, and they are sanctioned for Iranian IPs.
When trying to download the dataset, CV says download is successful. But if you are attempting to download from an Iranian IP, Google won't let you download it. Returning 403.
It's impossible for Iranian IPs to authenticate a user. The mozilla authentication server returns "Access denied" for Iranian IPs.
When trying to get clips for validation, CV says there are no more clips to validate. But actually there are. It's just that Iranian IPs can't fetch with a 200 from Google endpoints. This really confuses the users and the Persian community has already lost many significant contributions from long time contributors. Because they were thinking there aren't clips to validate anymore.
Try to sign in, you'll be blocked by Mozilla's authentication service. It's show an "Access denied" page
Try downloading a dataset. Then see that the endpoint from Google returns an XML saying "Access denied" and that "this service is not available in your location"
Expected behavior
Having an Iranian IP shouldn't block users from all above.
Screenshots
N/A
Desktop or Mobile (please complete the following information):
N/A
Additional Hardware (were you using headphones, an external speaker or an external microphone?):
N/A
Additional context
I would suggest moving off from the Google services to alternative privacy friendly EU ones. The less footprint from big privacy invader techs like Google, the better users could support the platform. CV is promoting something around open culture, which has close relation ships to FOSS. In the Persian CV community, already a vast majority of contributors are already open culture advocates.
As the storage services go, perhaps it would make sense to use a CDN or something at least for datasets. They are usually located in multiple regions. Perhaps even more CDN networks. No reason to lock open data to fetch only from Google.
The text was updated successfully, but these errors were encountered:
Thanks so much for getting in touch, we agree that Google's access issues are proving a barrier to a number of our language communities globally. Common Voice made the move onto GCP hosting due to a Mozilla-wide infrastructure shift, so short term we don't have an incredible amount of control over hosting for the platform and datasets. We've currently got work to investigate alternative access routes ticketed for our team and in the backlog and we're working to add more engineering capacity to make sure that important access pathways like this are addressed more quickly in the future.
In the short term, please do let me apologize for the additional burden this puts on our dataset users and our language communities in the range of global regions not currently able to access GCP hosted services.
It is actually pretty easy to solve in short term. Just use FTP mirrors which can be reached everywhere, and for DL statistics merge the two numbers. They even do not be high capacity servers as they will be used only by a smaller portion of global population.
Describe the bug
Mozilla CV uses multiple service from Google, and they are sanctioned for Iranian IPs.
To Reproduce
Get an Iranian IP then:
Expected behavior
Having an Iranian IP shouldn't block users from all above.
Screenshots
N/A
Desktop or Mobile (please complete the following information):
N/A
Additional Hardware (were you using headphones, an external speaker or an external microphone?):
N/A
Additional context
I would suggest moving off from the Google services to alternative privacy friendly EU ones. The less footprint from big privacy invader techs like Google, the better users could support the platform. CV is promoting something around open culture, which has close relation ships to FOSS. In the Persian CV community, already a vast majority of contributors are already open culture advocates.
As the storage services go, perhaps it would make sense to use a CDN or something at least for datasets. They are usually located in multiple regions. Perhaps even more CDN networks. No reason to lock open data to fetch only from Google.
The text was updated successfully, but these errors were encountered: