Creates a fingerprint for the given entity name. It supports special character, emojis (because we all know that emoji's in company names are coming), and entity types in other non-latin scripts.
iex(1)> EntityFingerprint.create("ФИЛИАЛ КОМПАНИИ С ОГРАНИЧЕННОЙ")
{:ok,
[
fingerprint: "filial kompanii ogranichennoy s",
original: "ФИЛИАЛ КОМПАНИИ С ОГРАНИЧЕННОЙ",
script: "cyrillic"
]}
iex(2)> EntityFingerprint.create("ООО КУРЬЕР-РЕГИОН СТОЛИЦА")
{:ok,
[
fingerprint: "kurerregion ooo stolitsa",
original: "ООО КУРЬЕР-РЕГИОН СТОЛИЦА",
script: "cyrillic"
]}
iex(3)> EntityFingerprint.create("Google Limited Liability Company")
{:ok,
[
fingerprint: "google llc",
original: "Google Limited Liability Company",
script: "latin"
]}
iex(4)> EntityFingerprint.create("현대해상화재보험")
{:ok,
[
fingerprint: "hyeondaehaesanghwajaeboheom",
original: "현대해상화재보험",
script: "hangul"
]}
iex(5)> EntityFingerprint.create(" 💩 Limited Liability Company")
{:ok,
[
fingerprint: "llc poop",
original: " 💩 Limited Liability Company",
script: "common"
]}
iex(6)> EntityFingerprint.create("佐贤鸣智(上海)企业管理咨询有限公司")
{:ok,
[
fingerprint: "guanlizixun shanghai zuoxianmingzhi",
original: "佐贤鸣智(上海)企业管理咨询有限公司",
script: "han"
]}
iex(7)> EntityFingerprint.create("Siemens Aktiengesellschaft")
{:ok,
[
fingerprint: "ag siemens",
original: "Siemens Aktiengesellschaft",
script: "latin"
]}
iex(8)> EntityFingerprint.create("New York, New York")
{:ok,
[fingerprint: "new york", original: "New York, New York", script: "latin"]}
This library was heavily inspired by the python tool alephdata/fingerprints
-
A Google Spreadsheet created by OCCRP.
-
The ISO 20275: Entity Legal Forms Code List
-
Wikipedia also maintains an index of types of business entity.
- Clustering in Depth, part of the OpenRefine documentation discussing how to create collisions in data clustering.