Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Enhance XRONOS with Model-Based Pleiades Matching Strategy #350

Open
MartinHinz opened this issue Dec 1, 2024 · 1 comment
Labels
enhancement New feature or request lod Linked Open Data

Comments

@MartinHinz
Copy link
Collaborator

MartinHinz commented Dec 1, 2024

Description:

To enable the integration of the Pleiades dataset as LOD, I propose introducing a model-based solution within XRONOS. By implementing a matching method directly within either the Site model or the PleiadesItem model, we can establish an efficient way to link XRONOS site names with Pleiades data.


Proposed Enhancement:

1. Introduce a PleiadesItem Model

Add a model named PleiadesItem to store Pleiades site names and their corresponding IDs. This will provide the necessary structure for matching with the Site model.

Example model schema:

class PleiadesItem < ApplicationRecord
  validates :name, presence: true
  validates :pleiades_id, presence: true
end

2. Periodic Synchronization of Pleiades Data

Implement a rake task to populate and update the PleiadesItem model with the latest data from the Pleiades name_index.json. This ensures the data remains current.

Example task:

namespace :pleiades do
  desc "Sync Pleiades data with XRONOS"
  task sync: :environment do
    require 'open-uri'
    require 'json'

    url = 'https://raw.githubusercontent.com/ryanfb/pleiades-geojson/gh-pages/name_index.json'
    pleiades_data = JSON.parse(URI.open(url).read)

    pleiades_data.each do |name, id|
      PleiadesItem.find_or_create_by(name: name.strip.downcase, pleiades_id: id)
    end
  end
end

3. Match Logic in Model

a) Method in the Site Model

Add a method in the Site model to find a corresponding PleiadesItem for a site by comparing names:

Example:

class Site < ApplicationRecord
  def match_to_pleiades
    PleiadesItem.where("name ILIKE ?", "%#{self.name.strip.downcase}%").first
  end
end

Usage:

site = Site.find(1)
match = site.match_to_pleiades
if match
  puts "Site '#{site.name}' matches PleiadesItem '#{match.name}' with ID #{match.pleiades_id}"
else
  puts "No match found for site '#{site.name}'"
end

b) Method in the PleiadesItem Model

Alternatively, add a method in the PleiadesItem model to find all Site entries matching a specific Pleiades name:

Example:

class PleiadesItem < ApplicationRecord
  def match_sites
    Site.where("name ILIKE ?", "%#{self.name.strip.downcase}%")
  end
end

Usage:

pleiades_item = PleiadesItem.find(1)
matching_sites = pleiades_item.match_sites
matching_sites.each do |site|
  puts "PleiadesItem '#{pleiades_item.name}' matches Site '#{site.name}'"
end

4. Further Task

  • Add fuzzy matching capabilities using gems like amatch or fuzzy_match to improve match accuracy.
  • Store match results by adding a pleiades_id field to a link table similar to the wikidata approach.

Benefits:

  • Embeds the matching logic directly into the relevant models, keeping the code clean and cohesive.
  • Supports efficient and reusable methods for site-to-Pleiades matching.

Potential Issues:

  • Matching accuracy depends on data quality and name standardization.
  • Requires maintenance of the PleiadesItem model and periodic synchronization.

I look forward to feedback on this proposal!

@MartinHinz
Copy link
Collaborator Author

MartinHinz commented Dec 1, 2024

Related issues:

@MartinHinz MartinHinz added enhancement New feature or request lod Linked Open Data labels Dec 1, 2024
@joeroe joeroe added this to the Linked Open Data milestone Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lod Linked Open Data
Projects
None yet
Development

No branches or pull requests

2 participants