-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Criamos edited this page Aug 30, 2024
·
10 revisions
Welcome to the oeh-search-backend wiki!
This wiki should help you familiarize yourself with all things metadata. If you're looking for specific topics, take a look at the right-hand side and check out the Table of Content.
If you're a metadata provider or want to guarantee that your content is accessible to web crawlers more easily, please check out our Providing Metadata chapter. This part of our wiki will help you with getting to know the following topics:
- Preferred methods of providing metadata
-
Valuespace identifiers for
learningResourceType
,audience
and the educational subject'sabout
metadata - Dublin Core LRMI attributes and explanations as well as a LRMI-example
If you want to build your own crawlers to gather metadata from web sources, the following chapters might help you to get started:
- Required (minimum amount of) Metadata for a successful crawl that can be stored within the oeh / edu-sharing backend
- How-To set up your .env file
-
Spiders (Crawler):
- A Quick-Start Guide to building Spiders with python and the scrapy framework
- How To build a crawler using the sample_spider_alternative as your blueprint (includes a
BaseItem
-overview (diagram)) -
Required information (with explanations for
name
,friendlyName
,url
andversion
) and a sample_spider -
Integrated spiders in this project for different APIs and data models (e.g.
OAI
,LRMI
,RSS
) - Spiders provided by Scrapy and how to use import arguments
- In case your crawler works with an API: How-To use Insomnia is a quick introduction to the UI/UX-flow of the popular open-source REST client Insomnia (which might save you some time while troubleshooting you crawler).
This part of our wiki includes:
- a short overview to using your browser for interacting with an API
- a quick introduction to Insomnia, an open-source REST client with short examples on how to
These short guides are mainly intended for project maintainers (or developers) of oeh-search-etl
- the Poetry toolkit makes it easier to keep projects up to date and produce deterministic builds
- see: Poetry Cheat Sheet
- If you need to update the edu-sharing API client (which is heavily used in
converter/es_connector.py
), here's a quick reference. It contains:- useful bookmarks to quickly get started
- the commands that were used to generate said API client