Lexicons

Avocado defines an abstract class named Lexicon. It is a common practice when normalizing a data model to break out repeated finite sets of terms within a column into their own table. This is quite obvious for entities such as books and authors, but less so for commonly used or enumerable terms.

id | name | birth_month
---+------+------------
 1   Sue    May
 2   Joe    Jun
 3   Bo     Jan
 4   Jane   Apr
...

The above shows a table with three columns id, name and birth_month. There are some inherent issues with birth_month:

Months have an arbitrary order which makes it very difficult to order the rows by birth_month since they are ordered lexicographically by default
As the table grows (think millions) the few bytes of disk space each repeated string takes up starts having a significant impact
The cost of querying for the distinct months within the population gets increasingly more expensive as the table grows
As the table grows, the cost of table scans increases since queries are acting on strings rather than an integer (e.g. a foreign key)

Although the above example is somewhat contrived, the reasons behind this type of normalization are apparent.

To implement, subclass and define the value and label fields.

from avocado.lexicon.models import Lexicon

class Month(Lexicon):
    label = models.CharField(max_length=20)
    value = models.CharField(max_length=20)

A few of the advantages include:

Define an arbitrary order of the items in the lexicon
Define an integer code which is useful for downstream clients that prefer working with a enumerable set of values such as SAS or R
Define a verbose/more readable label for each item
- For example map Jan to January

In addition, Avocado treats Lexicon subclasses specially since it is such a common practice to use them. They are used in the following ways:

Performing an init will create a DataField instance for the primary key of the Lexicon
The order field will be used whenever appropriate for ordering the lexicon items
The label field will be used when accessing f.labels() and for free-texting searches using f.search()
The code field will be used when accessing f.codes()

Manager Methods

`reorder`

The Lexicon class also comes with an extra method on it's manager called reorder which reorders the items in the lexicon and updates the order value of each item with the new sort index. This is generally only necessary if items are added to the set and the ordering needs to be updated. The method takes the same arguments as list.sort(), but key can also be a string corresponding to a built-in key function.

>> SomeLexicon.objects.reorder(key='coerce_float')

Performance Note: The entire lexicon is loaded into memory, sorted, and each item is saved. This should rarely every be an issue assuming your the lexicon is not millions of items in size.

Built-in Key Functions

coerce_float
- This relies on the value field for each object and attempts to coerce it to a float (in case numbers are represented as strings..) and falls back to itself if a TypeError or ValueError is raised.

Contents

Guides

Managing your metadata
Persisting sets of objects
Writing a custom Interface
Writing a custom Formatter
Cookbook

APIs

Context
View

Proposals

Protocols
- DataProtocol
- QueryProtocol
Field Interface
Validation
- Context Validation
- View/Facet Validation
Query Processing
Profiles API

Reference

Developers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexicons

Manager Methods

`reorder`

Built-in Key Functions

Clone this wiki locally