Non-ascii tag creates empty slug, exception django.urls.exceptions.NoReverseMatch #3721

candlerb · 2019-12-01T08:57:18Z

Environment

Python version: 3.5.2
NetBox version: 2.6.7

Steps to Reproduce

edit a device
add a tag consisting soley of non-ASCII characters, e.g. 台灣
save
go to Organization > Tags (/extras/tags/) in the GUI

Expected Behavior

Tag to be functional

Observed Behavior

I get an exception:

<class 'django.urls.exceptions.NoReverseMatch'>

Reverse for 'tag' with arguments '('',)' not found. 1 pattern(s) tried: ['extras\\/tags\\/(?P<slug>[-a-zA-Z0-9_]+)\\/$']

If I query the database, I find that a tag has been created with an empty slug (hence cannot match the regexp pattern which requires at least one character)

netbox=# select * from extras_tag;
 id | name | slug | color  | comments |  created   |         last_updated
----+------+------+--------+----------+------------+-------------------------------
  1 | temp | temp | 9e9e9e |          | 2019-10-12 | 2019-10-12 16:56:36.745793+00
  2 | 台灣 |      | 9e9e9e |          | 2019-11-18 | 2019-11-18 08:07:13.025115+00
(2 rows)

I also find I get this exception at the home page (/), but not at /dcim/devices/.

Cleanup

netbox=# delete from extras_taggeditem where tag_id in (select id from extras_tag where slug='');
DELETE 1
netbox=# delete from extras_tag where slug='';
DELETE 1

The exception persists for 15 minutes or until you invalidate the cacheops cache:

cd /opt/netbox/netbox
python3 manage.py nbshell
>>> import cacheops
>>> cacheops.invalidate_all()

Links

Reported previously as #2853, #3079, #3717 (insufficient detail to reproduce).

Reported on the google group here, here, here, here.

The text was updated successfully, but these errors were encountered:

DanSheps · 2019-12-02T14:57:48Z

This looks to be a simple missing dependancy of TagBase for "unidecode", as otherwise slugify() in TagBase just returns the "tag" without any cleanup.

Will ponder how to best deal with this.

kobayashi · 2019-12-05T05:35:13Z

+1 just adding unidecode as a required package. That one is simplest.
An another way is override slugify described in taggit docs though, that will lead lots of work or make it complicated just for setting slug from tag name like some comments here

candlerb · 2019-12-05T08:52:50Z

Firstly, I think there should be a validation constraint on the tags model and/or table to forbid empty slugs. That would at least prevent getting the database into a broken state.

Next, I note that this problem goes away if tags had to be created before using them (separate feature request #3703). In that case there would be an "Add Tag" page, with a field for the slug, and empty slugs could be rejected before saving them - just like "Add Device Role" currently.

Users would then be forced to create a non-empty slug which is meaningful to them.

(I've also checked what happens when you do a CSV-import of device roles: the "slug" column is mandatory. So there's no problem here either: Netbox does not auto-assign slugs from names)

Now, I have to say unidecode is quite impressive, although it's nearly 2MB of dependency:

>>> unidecode.unidecode("台灣")
'Tai Wan '

It could be exposed as an AJAX completion feature. However, once you've decided to create tags in a form, you'd probably be better off using a Javascript unidecode library in the browser instead.

jeremystretch · 2019-12-06T18:46:02Z

After digging into this, I see that there are two separate issues:

Tags can be created on-demand with a blank slug. This is not permitted by the model's field definition, but the manner in which tags are being created escapes this validation.
Unicode characters are effectively being stripped out by Tag's slugify() function assuming the unidecode package is not installed.

As an immediate workaround, I can confirm that installing unidecode will enable valid slugification:

$ pip install unidecode
... 
Installing collected packages: unidecode
Successfully installed unidecode-1.1.1
$ ./manage.py nbshell
i### NetBox interactive shell (jstretch-workstation)
### Python 3.6.8 | Django 2.2.6 | NetBox 2.6.8-dev
### lsmodels() will show available models. Use help(<model>) for more info.
>>> t=Tag()
>>> t.slugify('台灣')
'tai-wan'

(This works because django-taggit attempts to import unidecode and utilizes a no-op wrapper function if that fails.)

I'll work on preventing Tags from being created with blank slugs. But as far as the slugification piece goes, I can't speak to whether the behavior provided by unidecode is valuable.

candlerb · 2019-12-06T19:34:54Z

Clearly some slug is required, so a default of _1 would work (I believe there is existing logic which cycles _2, _3 etc to ensure uniqueness of slugs) Users could edit them afterwards to make them meaningful, if desired.

DanSheps · 2019-12-06T22:02:30Z

I think using the unidecode as a slugifier would be the better way to go, IMO, rather then going with a standard incremental index (which we already have on the pk and if we can semi-properly decode unicode, why not?)

I will do some more testing with it, but I think it will be a "good enough" effort on our part to close it out.

It will not handle Japanese, however that is just the nature of the language (uses the same characters as Chinese)

### Python 3.6.8 | Django 2.2.4 | NetBox 2.6.7
### lsmodels() will show available models. Use help(<model>) for more info.
>>>
>>> t=Tag()
>>> t.slugify("元気")
'yuan-qi'
>>> t.slugify("人")
'ren'
>>> t.slugify("一")
'yi'
>>> t.slugify("四")
'si'
>>> t.slugify("市")
'shi'
>>> t.slugify("左")
'zuo'
>>> t.slugify("右")
'you'
>>> t.slugify("川")
'chuan'

Again though, I think this is a good enough approach for something like this.

candlerb · 2019-12-06T22:34:24Z

And to be clear, is #3073 likely to be rejected? (Since if #3073 is accepted, this requirement goes away)

DanSheps · 2019-12-07T02:02:40Z

Did you mean a different issue? That one is already closed.

DanSheps · 2019-12-07T02:05:23Z

I think you meant #3703

I can't speculate myself, however #3703 would not necessarily negate the need for proper Unicode handling in the Tags so this requirement would likely still be present.

candlerb · 2019-12-07T07:50:50Z

Sorry, yes #3703. It would mean that tags would be handled in exactly the same way as device roles, platforms, device types, manufacturers etc: the user would be required to enter a non-empty slug (of their choice) at time of creation of the tag.

jeremystretch · 2019-12-09T20:06:37Z

To be clear, #3703 is a feature request separate from the issue being addressed here. This needs be resolved regardless of the response to #3703.

It seems we have two options:

Require the installation of unidecode as discussed above, which will return a phonetic (?) representation of the Unicode string.
Allow Unicode characters in the tag slug. Django's slugify function has an option to permit this:

    if allow_unicode:
        value = unicodedata.normalize('NFKC', value)
    else:
        value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')

An example:

>>> slugify('What does 台灣 mean?', allow_unicode=True)
'what-does-台灣-mean'

I don't have a strong opinion either way.

candlerb · 2019-12-09T20:23:20Z

To be clear, #3703 is a feature request separate from the issue being addressed here

Yes...

This needs be resolved regardless of the response to #3703.

...I disagree, because if #3703 is implemented, then it solves this issue at the same time.

Consider for example Device Roles. You can already create a Device Role whose full name consists entirely of high unicode code points. But there is no need to apply unidecode on them to synthesise a slug; you simply cannot save a Device Role with an empty slug. This forces the user to assign a non-empty, ASCII-compatible slug themselves.

#3703 would make tags handled the same. Tags would be created explicitly before use; tags could not spring into existence any other way.

jeremystretch · 2019-12-09T20:44:17Z

This forces the user to assign a non-empty, ASCII-compatible slug themselves.

Right, this is a problem. Slugs should be generated automatically regardless. (This is not limited to tags.)

candlerb · 2019-12-09T21:16:33Z

Sure, but that's a separate enhancement which ought to have a new ticket. Here you go: #3741

Regarding unicode slugs: they need special encoding for use in URLs. It can be done - but since the main point of slugs is to be embedded in URLs, that's something users will need to be aware of.

DanSheps · 2019-12-11T14:07:24Z

Another option would be to use urllib.quote() however the slug this would generate would not be user friendly.

DanSheps · 2020-01-07T16:31:26Z

>>> t=Tag()
>>> t.slugify("
>>> t.slugify("つす")
'tsusu'
>>> t.slugify("一つす")
'yi-tsusu'

It looks like hiragana works, but intermixing with kanji you get partial chinese and partial japanese. Any idea's @kobayashi, have you dealt with unicode in your work at all?

DanSheps · 2020-01-07T16:58:22Z

From PyPI:

On the other hand transliteration (i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system) of languages like Chinese, Japanese or Korean is a very complex issue and this library does not even attempt to address it. It draws the line at context-free character-by-character mapping. So a good rule of thumb is that the further the script you are transliterating is from Latin alphabet, the worse the transliteration will be.

So it looks like unidecode will only transliterate the standard Chinese meaning. I would like to hear from people with experience in other languages, and see if there is a reasonable approximation for most other languages.

kobayashi · 2020-01-08T04:21:26Z

Slugify with allow_unicode=True is the better way If we want to keep the meaning of tags. unidecode handle only Chinese / partial Japanese? meaning.

Here is the sample of Japanese. Modern browsers can be resolve the escapaed URL characters.

>>> Tag.objects.get(pk=1)
<Tag: こんにちは>
>>> Tag.objects.get(pk=1).get_absolute_url()
'/extras/tags/%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF/'
>>>

If there is no objections, I would like to implement it.

DanSheps added status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application labels Dec 2, 2019

jeremystretch closed this as completed in 606f3da Jan 17, 2020

This was referenced Jan 21, 2020

Release v2.7.2 #3980

Merged

Non-ascii tag on site breaks tag listing and home page #4028

Closed

jeremystretch mentioned this issue Jan 28, 2020

Auto-creation of slugs for non-ASCII users #3741

Closed

lock bot locked as resolved and limited conversation to collaborators Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-ascii tag creates empty slug, exception django.urls.exceptions.NoReverseMatch #3721

Non-ascii tag creates empty slug, exception django.urls.exceptions.NoReverseMatch #3721

candlerb commented Dec 1, 2019 •

edited

Loading

DanSheps commented Dec 2, 2019 •

edited

Loading

kobayashi commented Dec 5, 2019

candlerb commented Dec 5, 2019 •

edited

Loading

jeremystretch commented Dec 6, 2019

candlerb commented Dec 6, 2019

DanSheps commented Dec 6, 2019 •

edited

Loading

candlerb commented Dec 6, 2019

DanSheps commented Dec 7, 2019

DanSheps commented Dec 7, 2019

candlerb commented Dec 7, 2019 •

edited

Loading

jeremystretch commented Dec 9, 2019 •

edited

Loading

candlerb commented Dec 9, 2019

jeremystretch commented Dec 9, 2019

candlerb commented Dec 9, 2019

DanSheps commented Dec 11, 2019

DanSheps commented Jan 7, 2020

DanSheps commented Jan 7, 2020

kobayashi commented Jan 8, 2020

Non-ascii tag creates empty slug, exception django.urls.exceptions.NoReverseMatch #3721

Non-ascii tag creates empty slug, exception django.urls.exceptions.NoReverseMatch #3721

Comments

candlerb commented Dec 1, 2019 • edited Loading

Environment

Steps to Reproduce

Expected Behavior

Observed Behavior

Cleanup

Links

DanSheps commented Dec 2, 2019 • edited Loading

kobayashi commented Dec 5, 2019

candlerb commented Dec 5, 2019 • edited Loading

jeremystretch commented Dec 6, 2019

candlerb commented Dec 6, 2019

DanSheps commented Dec 6, 2019 • edited Loading

candlerb commented Dec 6, 2019

DanSheps commented Dec 7, 2019

DanSheps commented Dec 7, 2019

candlerb commented Dec 7, 2019 • edited Loading

jeremystretch commented Dec 9, 2019 • edited Loading

candlerb commented Dec 9, 2019

jeremystretch commented Dec 9, 2019

candlerb commented Dec 9, 2019

DanSheps commented Dec 11, 2019

DanSheps commented Jan 7, 2020

DanSheps commented Jan 7, 2020

kobayashi commented Jan 8, 2020

candlerb commented Dec 1, 2019 •

edited

Loading

DanSheps commented Dec 2, 2019 •

edited

Loading

candlerb commented Dec 5, 2019 •

edited

Loading

DanSheps commented Dec 6, 2019 •

edited

Loading

candlerb commented Dec 7, 2019 •

edited

Loading

jeremystretch commented Dec 9, 2019 •

edited

Loading