Description
Currently, it's possible to set all kinds of time zones, such as:
In [7]: to_datetime(['2020-01-01']).tz_localize('+01:00')
Out[7]: DatetimeIndex(['2020-01-01 00:00:00+01:00'], dtype='datetime64[ns, UTC+01:00]', freq=None)
In [10]: to_datetime(['2020-01-01']).tz_localize('CET')
Out[10]: DatetimeIndex(['2020-01-01 00:00:00+01:00'], dtype='datetime64[ns, CET]', freq=None)
In [9]: to_datetime(['2020-01-01']).tz_localize('Cuba')
Out[9]: DatetimeIndex(['2020-01-01 00:00:00-05:00'], dtype='datetime64[ns, Cuba]', freq=None)
In https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, it's recommended that people use an 'Area/Location' time zone identifier instead - e.g. 'Africa/Lagos'
instead of the first, 'Europe/Paris'
instead of the second, and 'America/Havana'
instead of the third.
Trying to pass non-area/location tz-identifiers opens people up to common misconceptions and traps about time zones, e.g. that despite Greenwich being in London, London does not observe GMT (it only does for half the year)
In https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, for every single tz-identifier which isn't in the 'Area/Location' format, there's a link to one which is, suggesting to use that one instead.
Would it be safe to make such a restriction?
cc @mroeschke @jbrockmendel @pganssle @rebecca-palmer (sorry for the pings, would really value your input here if possible!)
This would go hand-in-hand with #50887. What we'd get to in the end would be:
Current behaviour (pandas 2.0.1):
In [11]: to_datetime(['2020-01-01 00:00+01:00'])
Out[11]: DatetimeIndex(['2020-01-01 00:00:00+01:00'], dtype='datetime64[ns, UTC+01:00]', freq=None)
In [12]: to_datetime(['2020-01-01 00:00+01:00']).tz_convert('+02:00')
Out[12]: DatetimeIndex(['2020-01-01 01:00:00+02:00'], dtype='datetime64[ns, UTC+02:00]', freq=None)
New behaviour (pandas 3.x):
In [11]: to_datetime(['2020-01-01 00:00+01:00'])
Out[11]: DatetimeIndex(['2019-12-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
In [12]: to_datetime(['2020-01-01 00:00+01:00']).tz_convert('+02:00')
UnknownTimeZoneError: 'Please use Area/Location time-zone-identifier, see https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
In [13]: to_datetime(['2020-01-01 00:00+01:00']).tz_convert('Europe/Athens')
Out[13]: DatetimeIndex(['2020-01-01 01:00:00+02:00'], dtype='datetime64[ns, Europe/Athens]', freq=None)