-
Notifications
You must be signed in to change notification settings - Fork 18
Backend ‐ Plugin Metadata
This page details how we fetch metadata for plugins from various data sources. After aggregating all the metadata we need for plugins in the backend codebase, we display this information on the plugin detail page, e.g. https://www.napari-hub.org/plugins/{plugin}.
See below for more details on type of metadata, the specific fields we pull, and how often this runs.
Check out the napari hub tech diagram for high-level architecture of our system: https://lucid.app/lucidchart/d32995a2-42d6-4ccd-84fc-9c5a097304de/view
pypi.py
contains the logic to get metadata through PyPI
API for a given plugin and version.
For more information about PyPI
Metadata, check out https://pypi.org/.
# Below are fields returned by PyPI for each plugin and version
name
summary
description
description_content_type
authors
license
python_version
operating_system
release_date
version
first_released
development_status
requirements
project_site
documentation
support
report_issues
twitter
code_repository
github.py
contains the logic to get metadata through GitHub
API.
The following are fields included in GitHub Metadata
citation
license
-
authors
,visibility
,conda
,category
are fetched from.napari-hub/config.yml
. -
description
is fetched from.napari-hub/DESCRIPTION.md
-
project_urls
will be deprecated soon.
We only look in napari-hub
if we already do not have the metadata from PyPI
metadata; we additionally check .napari/config.yml
and .napari/DESCRIPTION.md
.napari
is also going to be deprecated
Since the release of npe2
(napari's new of plugin engine), a plugin manifest file distributed with each plugin provides rich metadata about the functionality of the plugin including what type of contributions it provides (e.g. widget, file reader, file writer, theme). The npe2
library also provides utilities for generating manifest files for plugins implementing the original plugin engine. By ingesting this metadata on the napari hub, we can support a richer filtering & browsing experience for users trying to find the right plugin for their application.
Because discovering plugin manifests requires the fetching and inspection of Python package distributions, the discovery process must happen independently of the data fetch workflow. A separate lambda (the plugins
lambda) is executed to discover manifests. This lambda is invoked by the data fetching process when a new plugin version is released. It discovers the manifest file for the plugin using npe2
's own fetching mechanism, and then writes the manifest to dynamo plugin-metadata table as record of type=DISTRIBUTION
.
The npe2
manifest specification is being regularly (and sometimes frequently) updated. Refer to the docs on napari.org for a full listing of the manifest and contributions specifications.
A subset of this metadata is used for the napari hub:
-
display_name
is simply the manifest field of the same name -
plugin_types
is retrieved from the different types of contributions declared by the plugin. Currently we look forreaders
,writers
,widgets
,themes
andsample_data
contributions. -
reader_file_extensions
are a set of allfilename_patterns
declared inreaders
contributions -
writer_file_extensions
are a set of allfilename_extensions
declared inwriters
contributions -
writer_save_layers
are a set of alllayer_types
declared inwriters
contributions -
npe2
isTrue
when the manifest'snpe1_shim
field isFalse
, and vice versa
Architecture spec for the plugin using Dynamo:
CloudWatch
event rule triggers the update to the data stored in DynamoDB. The schedule is set to once every 5 minutes for production, once every hour for staging, and once every day for dev environments. For more information, check out https://us-west-2.console.aws.amazon.com/cloudwatch/home#rules.
For data workflow, the rule publishes the following JSON message to the SQS queue:
{"type": "plugin"}
- The dynamo tables are named with the environment name as the prefix.
The aggregated plugin data is stored in the plugin table. The table also has global secondary indices for the latest and excluded plugins.
The data from Pypi, GitHub, and Manifest are all stored for the various versions of the plugins.
Blocked plugins are listed here. The data in this table is filled manually.
The generation of the records for plugins is a two-part process. The first step is to fetch all the required plugin metadata and write it to the plugin metadata table. The second step is to create/update the aggregate record for all the plugins that have had updates to their metadata.
Fetch the latest plugins from Dynamo: Get all the plugins currently marked as latest in the plugin table. This produces a result of all the plugins marked as latest in our system.
Fetching plugin list from PyPI: We make requests to PyPI to fetch the latest versions of plugins that are classified with the framework as napari. This helps generate a list of all the latest plugins.
Identify newly added plugins: By filtering out the plugins already marked as latest in our system, we identify new plugins that have been added. For the newly added plugins, fetch metadata from various sources. Also, write a record to plugin-metadata of type=PYPI
with is_latest=True
.
Fetching PYPI metadata: Get metadata such as the release information, code repository, etc. from PyPI.
Fetching GitHub Metadata: If a valid GitHub code repository link exists for the plugin, fetch information from the README and the config files.
Fetching Manifest Metadata: Invoke the plugin lambda to capture the data from its manifest as specified above.
Identify stale plugins: All the plugins not in the latest plugins list fetched from PYPI, or if their version doesn't match the latest list, are stale. For those plugins, we update their PYPI record by removing the is_latest
field.
The metadata is written to the plugin metadata table. The different types of records are as follows:
- The
PYPI
record is used to identify if a specific version of the plugin is the latest version. - The
METADATA
record contains the metadata aggregated from PyPI and GitHub if a valid code_repository url exists. - The
DISTRIBUTION
record contains the data from the manifest files.
- The updates to plugin metadata are tracked using the dynamo streams.
- For any specific plugins-version that has had any of their metadata records updated, we recompute the plugin aggregation.
- In addition to the rich metadata, the aggregation also identifies the visibility of a plugin and if it is the latest version.
- The aggregation is written to the plugin table
Returns result from query of latest_plugins index on plugin table for the plugin name. It filters to ensure the plugin visibility is either public or hidden. If no result is found, the api returns a 404 HTTP status response.
Returns result from query on plugin table for the plugin name and version. It filters to ensure the plugin visibility is either public or hidden. If no result is found, the api returns a 404 HTTP status response.
Running the regular data-workflow for the plugin will backfill any missing data.
If any issue occurs during data-workflow execution, look through Lambda’s execution logs to identify the problem.