Skip to content

mxguardian/elasticsearch-analysis-email

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch Email Tokenizer

This plugin enables email address tokenization.

Build Status

Compatibility

Elasticsearch Version Plugin Version
2.4.6 2.4.6
2.4.5 2.4.5
2.4.4 2.4.4
2.4.3 2.4.3
2.4.1 2.4.1
2.4.0 2.4.0
2.3.5 2.3.5
2.3.4 2.3.4
2.3.3 2.3.3
2.3.0 2.3.0
2.2.2 2.2.2
2.2.1 2.2.1
2.2.0 2.2.0
2.1.1 2.1.1
2.0.0 2.0.0
1.6.x, 1.7.x 1.0.0

Installation

bin/plugin install https://github.com/jlinn/elasticsearch-analysis-email/releases/download/v2.4.6/elasticsearch-analysis-email-2.4.6.zip

Usage

Options:

  • part: Defaults to null. If left null, all email address parts will be tokenized. Options are whole, localpart, and domain.
  • tokenize_domain: Defaults to true. If true, the domain will be further tokenized using a reverse path hierarchy tokenizer with the delimiter set to ..
  • split_on_plus: Defaults to true. If true, the localpart of the email address will be split on the first instance of +, and both the part preceding + and the whole localpart will be used as tokens.
  • split_localpart: Defaults to null. This parameter expects an array of strings. If provided, the localpart will be split on each of the given strings.
  • allow_malformed: Defaults to false. If true, malformed email addresses will not be rejected, but will be indexed without tokenization.

Example:

Index settings:

{
	"settings": {
		"analysis": {
			"tokenizer": {
				"email_domain": {
					"type": "email",
					"part": "domain"
				}
			},
			"analyzer": {
				"email_domain": {
					"tokenizer": "email_domain"
				}
			}
		}
	}
}

Perform an analysis request:

curl 'http://localhost:9200/index_name/_analyze?analyzer=email_domain&pretty' -d 'foo+bar@email.com'

{
  "tokens" : [ {
    "token" : "email.com",
    "start_offset" : 8,
    "end_offset" : 17,
    "type" : "domain",
    "position" : 1
  }, {
    "token" : "com",
    "start_offset" : 14,
    "end_offset" : 17,
    "type" : "domain",
    "position" : 2
  } ]
}

About

An email address tokenizer plugin for Elasticsearch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%