Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observer relationship #137

Open
jomey opened this issue Sep 19, 2024 · 10 comments
Open

Observer relationship #137

jomey opened this issue Sep 19, 2024 · 10 comments
Assignees
Labels
DB Any requests related to changes/updates of the DB table structure or definitions enhancement New feature or request

Comments

@jomey
Copy link
Member

jomey commented Sep 19, 2024

Question

Should we move the Observer relationship away from point and layers and add to the Site (#127)?

Think we could reduce data redundancy and increase clarity by moving the observer information away from an individual point or layer measurement and onto the site. Right now we have two link tables with lots of entries.

Moving this information was motivated by looking at the header of each pit sheet
image

Where each location is visit by a team and all measurements/obs are taken by them.

For auxiliary observations like GPR/SMP/SSA/Depth transects we could still ink each measurement back to the site or create a new entry in the site table if it does not belong to one in particular.

Thoughts @micah-prime @micahjohnson150 @meganmason ?

@jomey jomey added the enhancement New feature or request label Sep 19, 2024
@jomey jomey mentioned this issue Sep 19, 2024
@jomey
Copy link
Member Author

jomey commented Sep 19, 2024

erDiagram
  Site }|--|{ SiteObserver : "has many"
  SiteObserver }|--|{ Observer : "has many"
  Site {
    string PlotID
  }
  SiteObserver {
    int site_id
    int observer_id
  }
  Observer {
    string name
  }
  Site ||--|{ PointData : contains
  Site ||--|{ LayerData: contains
  PointData {
    int site_id
  }
  LayerData {
    int site_id
  }
Loading

@micah-prime
Copy link
Contributor

I like the idea, but I don't think it will work. Partly, I got one of the relationships wrong in the restructure - Point should not have a relationship with Sites. Sites (as you alluded to in one of the comments) are basically just a Pits.

So since LayerData will be the only data table that needs a Site, I think the observer relationship should be left directly to PointData and ImageData, with the slight simplification that a Site can have an Observer link now so that each Layer does not need it.

@jomey
Copy link
Member Author

jomey commented Sep 20, 2024

My main motivation to bring this up is that we will have a lot of redundant point_observer entries. For instance, one GPR entry has currently a campaign name (currently in the site_name column in points) e.g. 'Grand Mesa' and one observer. If we use the point_observer idea, then we create a lot of entries that are duplicate for one survey.

Guess we could extend the point_observer beyond a link table and make it a new relationship like so:

erDiagram
  PointData ||--|| CampaignObservation : "has one"
  CampaignObservation ||--|| Observer : "has one"
  CampaignObservation ||--|| Campaign : "belongs to"
  ImageData ||--|| CampaignObservation : "has one"
  Site }|--|{ SiteObserver : "has many"
  Site ||--|| Campaign : "belongs to"
  SiteObserver }|--|{ Observer : "has many"
  LayerData ||--|{ Site : "has one"

  PointData {
    int campaign_observation_id
  }
  ImageData {
    int campaign_observation_id
  }
  CampaignObservation {
    int campaign_id
    int observer_id
    String type
  }

  LayerData {
    int site_id
  }
  Site {
    string PlotID
  }
  SiteObserver {
    int site_id
    int observer_id
  }
  Observer {
    string name
  }
Loading

Then we could use the Single Table Inheritance feature to use one table for image and layer data, removing data redundancy. This would be the classic and perfect use for that SQLAlchemy feature.

The points and images are currently only single entities (person or company) in the current data. We can easily extend that to many in the future if the need arises.

@jomey
Copy link
Member Author

jomey commented Sep 20, 2024

Continuing my train of thought since I am in the groove.
Another argument I can see for the CampaignObserveration table is that we can then remove the instrument and date from the Point and Image too.

Extending my diagram from above and zooming in on the table:

erDiagram
  PointData ||--|| CampaignObservation : "has one"
  ImageData ||--|| CampaignObservation : "has one"
  CampaignObservation ||--|| Campaign : "belongs to"
  CampaignObservation ||--|| Instrument : "has one"
  CampaignObservation ||--|| MeasurementType : "has one"
  CampaignObservation ||--|| Observer : "has one"

  PointData {
    int campaign_observation_id
  }
  ImageData {
    int campaign_observation_id
  }
  CampaignObservation {
    int campaign_id
    int observer_id
    int instrument_id
    int measurement_type_id
    Date date
    String comment
    String type
  }
Loading

@micah-prime
Copy link
Contributor

My main motivation to bring this up is that we will have a lot of redundant point_observer entries. For instance, one GPR entry has currently a campaign name (currently in the site_name column in points) e.g. 'Grand Mesa' and one observer. If we use the point_observer idea, then we create a lot of entries that are duplicate for one survey.

Guess we could extend the point_observer beyond a link table and make it a new relationship like so:

erDiagram
  PointData ||--|| CampaignObservation : "has one"
  CampaignObservation ||--|| Observer : "has one"
  CampaignObservation ||--|| Campaign : "belongs to"
  ImageData ||--|| CampaignObservation : "has one"
  Site }|--|{ SiteObserver : "has many"
  Site ||--|| Campaign : "belongs to"
  SiteObserver }|--|{ Observer : "has many"
  LayerData ||--|{ Site : "has one"

  PointData {
    int campaign_observation_id
  }
  ImageData {
    int campaign_observation_id
  }
  CampaignObservation {
    int campaign_id
    int observer_id
    String type
  }

  LayerData {
    int site_id
  }
  Site {
    string PlotID
  }
  SiteObserver {
    int site_id
    int observer_id
  }
  Observer {
    string name
  }
Loading

Then we could use the Single Table Inheritance feature to use one table for image and layer data, removing data redundancy. This would be the classic and perfect use for that SQLAlchemy feature.

The points and images are currently only single entities (person or company) in the current data. We can easily extend that to many in the future if the need arises.

I think this example almost works, except ImageData and PointData cannot be stored in the same table because of their difference in geometry type and value type. In the PointData, the data is stored under a float value and in the ImageData the data is stored under a Raster column type that geoalchemy handles under the hood.

That said, Neither of them requires the many-to-many relationship with Observer that LayerData does, so we should be able to directly link a foreign key to the Observer table. There has to be 3 separate tables for the unique data requirements, but we can get rid of the redundant PointObservers by just skipping directly to the foreign key.

@jomey
Copy link
Member Author

jomey commented Sep 26, 2024

I feel we should have a call about this. My last post and ER diagram should be the reference one.

I want to keep the three separate tables (ImageData, ImageData, and LayerData). As you said, each will have their own unique data requirements. Only the associated metadata can be stored in a shared fashion.

@micah-prime
Copy link
Contributor

Ah okay, I didn't catch that detail. Fairly certain I'm up to speed now. I like the Single Inheritance idea - I think that would be quite clean!

I'm happy to have a call this week, but maybe I should code this up before we schedule something since I think we're on the same page now.

What do you think about locking in this design (assuming we don't hit snags) so we can start thinking about the data upload procedure as well?

@jomey
Copy link
Member Author

jomey commented Sep 30, 2024

I'm happy to have a call this week, but maybe I should code this up before we schedule something since I think we're on the same page now.

I also have a good chunk of time to work on the restructure this week. Will sit together with @aaarendt this Thursday too. Do you want to divide this effort?

What do you think about locking in this design (assuming we don't hit snags) so we can start thinking about the data upload procedure as well?

100% agreed. Think this should be the final major re-structure and updating the the data upload is next. Then we probably run into some small changes when looking at the data.

@micah-prime
Copy link
Contributor

I also have a good chunk of time to work on the restructure this week. Will sit together with @aaarendt this Thursday too. Do you want to divide this effort?

That sounds great! How about I work on the SiteConditions logic a bit and you guys can tackle the CampaignObservation?

Also, I want to push a small testing change (I'll do that today) to hopefully clean things up a little before we restructure more.

100% agreed. Think this should be the final major re-structure and updating the the data upload is next. Then we probably run into some small changes when looking at the data.

Sweet!

@micah-prime
Copy link
Contributor

Here is the test cleanup I mentioned
#142

@jomey jomey added the DB Any requests related to changes/updates of the DB table structure or definitions label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB Any requests related to changes/updates of the DB table structure or definitions enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants