[BUG] Multiple CV services(or endpoints) return 403(access denied) for Iranian IPs #4747

farooqkz · 2025-01-13T13:58:32Z

Describe the bug
Mozilla CV uses multiple service from Google, and they are sanctioned for Iranian IPs.

When trying to download the dataset, CV says download is successful. But if you are attempting to download from an Iranian IP, Google won't let you download it. Returning 403.
It's impossible for Iranian IPs to authenticate a user. The mozilla authentication server returns "Access denied" for Iranian IPs.
When trying to get clips for validation, CV says there are no more clips to validate. But actually there are. It's just that Iranian IPs can't fetch with a 200 from Google endpoints. This really confuses the users and the Persian community has already lost many significant contributions from long time contributors. Because they were thinking there aren't clips to validate anymore.

To Reproduce
Get an Iranian IP then:

Try to get clips for validation(just visit like https://commonvoice.mozilla.org/en/listen). It will say there are no more clips for validation.
Try to sign in, you'll be blocked by Mozilla's authentication service. It's show an "Access denied" page
Try downloading a dataset. Then see that the endpoint from Google returns an XML saying "Access denied" and that "this service is not available in your location"

Expected behavior
Having an Iranian IP shouldn't block users from all above.

Screenshots
N/A

Desktop or Mobile (please complete the following information):
N/A

Additional Hardware (were you using headphones, an external speaker or an external microphone?):
N/A

Additional context
I would suggest moving off from the Google services to alternative privacy friendly EU ones. The less footprint from big privacy invader techs like Google, the better users could support the platform. CV is promoting something around open culture, which has close relation ships to FOSS. In the Persian CV community, already a vast majority of contributors are already open culture advocates.

As the storage services go, perhaps it would make sense to use a CDN or something at least for datasets. They are usually located in multiple regions. Perhaps even more CDN networks. No reason to lock open data to fetch only from Google.

jessicarose · 2025-01-15T14:42:59Z

Thanks so much for getting in touch, we agree that Google's access issues are proving a barrier to a number of our language communities globally. Common Voice made the move onto GCP hosting due to a Mozilla-wide infrastructure shift, so short term we don't have an incredible amount of control over hosting for the platform and datasets. We've currently got work to investigate alternative access routes ticketed for our team and in the backlog and we're working to add more engineering capacity to make sure that important access pathways like this are addressed more quickly in the future.

In the short term, please do let me apologize for the additional burden this puts on our dataset users and our language communities in the range of global regions not currently able to access GCP hosted services.

HarikalarKutusu · 2025-01-15T22:08:17Z

It is actually pretty easy to solve in short term. Just use FTP mirrors which can be reached everywhere, and for DL statistics merge the two numbers. They even do not be high capacity servers as they will be used only by a smaller portion of global population.

farooqkz added the Bug label Jan 13, 2025

jessicarose self-assigned this Jan 15, 2025

jessicarose added Investigate and removed Bug labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Multiple CV services(or endpoints) return 403(access denied) for Iranian IPs #4747

[BUG] Multiple CV services(or endpoints) return 403(access denied) for Iranian IPs #4747

farooqkz commented Jan 13, 2025

jessicarose commented Jan 15, 2025

HarikalarKutusu commented Jan 15, 2025

[BUG] Multiple CV services(or endpoints) return 403(access denied) for Iranian IPs #4747

[BUG] Multiple CV services(or endpoints) return 403(access denied) for Iranian IPs #4747

Comments

farooqkz commented Jan 13, 2025

jessicarose commented Jan 15, 2025

HarikalarKutusu commented Jan 15, 2025