Skip to content
Alistair King edited this page Jan 29, 2020 · 4 revisions

The high-level goal of libIPMeta is to provide a highly efficient mechanism for associating metadata with IP addresses. It was originally developed in the context of the UCSD Network Telescope project for doing IP geolocation and IP-ASN lookups for every packet captured.

To do this, there are two basic modules ("classes", "plugins") in the library: datastores and providers.

A datastore is a data structure that is capable of mapping from IP addresses (and prefixes) to some meta data. The library supports multiple datastores because different applications have different performance requirements. For example, the patricia trie data structure is relatively memory efficient, but compared with the "big array" data structure, relatively slow to perform lookups. Abstracting datastores behind a common interface like this allows new data structures to be added in a way that existing applications can use them without needing any code changes.

A provider represents the code needed to load a database that maps from IP addresses (normally in terms of IP ranges or prefixes) to some meta data. The three main providers we have are: netacq-edge, maxmind, and pfx2as. Typically these providers load the data from a file (or files), and use the datastore API to associate IPs/prefixes with meta data.

A user creates a top-level ipmeta instance by specifying the datastore and provider(s) to use. Each datastore and provider type takes a set of configuration arguments that, in the case of providers, specifies which instance of the database to load. Once this top-level instance has been created, the user can simply use the lookup methods to query for a given IP address or prefix, and the library takes care of querying the appropriate data structure.

Because our current providers all load data from flat files, it is the user's responsibility to select the appropriate version (time) of the database that they want to load. In the past we have written wrapper scripts that are able to select the best file(s) for a given timestamp, and the Python bindings have some code that helps with this. Eventually it would be nice to have a metadata service that is able to provide this high-level configuration automatically.

Note: The main difference between v1 and v2 is that in v1 each provider had its own datastore, whereas in v2, there is one shared datastore. This change was made so that one lookup call could return data for multiple providers, drastically improving performance for users that need to lookup metadata from multiple provider databases.

Clone this wiki locally