-
Notifications
You must be signed in to change notification settings - Fork 782
Initial contribution of semantic metadata support #6288
Conversation
Seems like a solid foundation, and rather friendly for the user, I like it! Some initial remarks and questions:
I think you meant "semantics" not "synonyms"?
Maybe a hierarchy under Location_Indoor_Floor is necessary here to distinguish floors from one another? Ground Floor, First Floor, (Second floor?, more?), Attic, Basement/Cellar...
Open question related to NLP: in my equivalent here https://github.com/ghys/habot/blob/master/src/main/resources/tagattributes.properties I included plurals, because it was way easier than e.g. using OpenNLP to perform lemmatization beforehand. How would you handle that, include the lemmatization by the framework (or providing plurals as synonyms) or expect the caller to do it and provide the lemma? Implementations might or might not get the lemma e.g. from a cloud-based NLP service, or have access to a lemmatizer.
In case of Things, would that be defined at the channel level?
I think it's a double edged sword, on one hand, in many cases it provides the same semantic meaning as the tag, thus would make the tag redundant (and prevent the need to alter existing configurations). On the other hand, I suspect users frequently assume what's inside brackets has no semantic meaning at all and is just the default icon to display in sitemaps for the item, so they may choose only based on what icon they fancy better. |
How would you handle i18n and l10n of those synonyms? |
MetadataKey key = new MetadataKey(NAMESPACE, item.getName()); | ||
Map<String, Object> configuration = new HashMap<>(); | ||
Class<? extends Tag> type = tagService.getSemanticType(item); | ||
if (type != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't process the hierarchy unless a tag is directly applied on the item, so it doesn't handle situations like this:
Group gGF "Ground Floor" <groundfloor> ["Floor"]
Group GF_Corridor "Corridor" <corridor> (gGF) ["Corridor"]
Group:Switch:OR(ON, OFF) Lights "All Lights [(%d)]" <lightbulb> ["Lightbulb"]
Switch Light_GF_Corridor_Ceiling "Downstairs Corridor" (GF_Corridor, Lights)
In this case I would expect something like this:
"link": "http://localhost:8080/rest/items/Light_GF_Corridor_Ceiling",
"metadata": {
"semantics": {
"value": "Equipment_Lightbulb", // not actually sure how it would come up with it - prone to conflicts
"config": {
"isPartOf": "Lights",
"hasLocation": "GF_Corridor"
}
}
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will notice that I purposefully didn't add ANY tags to the functional group items (Lights, Heatings, etc.) in my example above. I think this can easily get us into trouble. It would actually mean "This item is a lightbulb", while you actually intend to express "All items in this group are lightbulbs". I think it is the safer way to require all items to be individually tagged accordingly.
MAYBE we can do an exception for Point (i.e. real functionality) types, because they will never(?) be assigned to a group item. So if they are found on a group item, once could declare that this means that it should be applied to all members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is the safer way to require all items to be individually tagged accordingly.
I'm still not 100% happy with this from a user convenience perspective :) ...but I recall a prior conversation about assuming groups as an entity by itself vs. groups as a set of similar items so I understand.
Not sure about introducing too many exceptions, it would just add to the confusion.
Sure, sorry, I've corrected it above.
Yes, on the wiki, we actually have that already. Will add it to the csv, I missed it there.
Ah, I didn't know yet that Deck is the plural of Terrace 😛 .
No, my plan was to simply link to a ThingUID here as Things also have a defined location and a user-provided label (but no synonyms 😟), so that could be useful without requiring the creation of group items that 1:1 correspond to a Thing. |
Yes, that's how it's used (and to be fair, that's also what it originates from - the "category" was once called "icon"). But for exactly this reason, I am also reluctant to include this information into the semantics. The only exception might be the category on a Thing itself (which should reflect an equipment tag).
I thought about simple property files that could be maintained. We could have a file like your tagattributes.properties generated with the english texts and then have that localised in separate files. Question is whether those should be packaged into clients like HABot at build time or if we should serve them via HTTP to request at runtime. Any preferences? |
And if you want to be thorough, Bathroom and Toilet are missing as well.
:) People usually don't have multiple terraces or first floors (unless they're lucky!), but they might have multiple smoke detectors and might ask for all of them in a sentence. Synonyms ("downstairs"), plural forms, acronyms, abbreviations... everything that might be reasonably encountered and extracted from natural language in raw form that would be referring to some entity class in the model.
Yes, even if there could sometimes be plurals (like "guest rooms", "plant sensors" or even brand names and the like - "mi floras", "roombas"...) for user-supplied synonyms it's fair to assume it's the responsibility of the user to provide all forms. Lemmatization is not easy...!
I think having it inside the framework would be a good thing so clients would simply supply their extracted natural language entities and expect a set of matching items. I don't know if there's actually a need to provide them in a HTTP API. The entity extraction from natural language is done on the server, not the client, so a regular service would do, or even this could be part of the upcoming querying facilities. In HABot the key method to reimplement is |
Hey @ghys, sorry for the late reply, but I had to do quite some further coding on this. Here's some update:
I've added a couple of further tags, but after reading this, I refrained from adding "Toilet"...
I have added a "synonym" column for the entity classes (tags) in the csv and I am now also generating property files from this, which can then be easily translated into other languages.
Yes, and I guess for those cases, the user could also use group items with functions to add such tags and synonyms; but let us better investigate this as a later step, there might be a few corner cases to consider...
Ok, in Java code, you have the
I have started with querying for the items within a given location. Please have a look if this is about what you need and we can add similar methods for querying equipment/points. I'd think we should keep point and equipment queries separate in the framework - if HABot currently simply takes the union of the results, that should be fine, but I wouldn't want to create any "query for object" in the framework. Let me know if this is something you could start with - I'd then declare the PR to be good enough for an initial merge (after review by some committer of course). My further time to work on this is unfortunately extremely limited, so I hope it is not totally unusable ;-) |
Related to eclipse-archived/smarthome#6288. Signed-off-by: Yannick Schaus <habpanel@schaus.net>
It's still a room most houses have... :) "Restroom" then maybe?
Yes of course, "object" is purely an HABot concept now (should be considered in a grammatic sense). Maybe I'll try to alter the training data so it recognizes the proper type "Point", "Equipment" etc. instead - but it makes little difference and could reduce accuracy.
It seems to be usable, I have an early implementation of a SemanticsItemResolver at https://github.com/ghys/habot/pull/24.
Sure, now that the general direction is set, if there is a need for adjustments or extensions I will file follow-up PRs. |
@kaikreuzer just so I understand: with the current model, a single item cannot be both an Equipment and a Point, right?
The way of modelling this to have both would then be a Group representing the equipment ("GarageDoor") with one member representing its "OpenState" point? |
Wow, you are amazing 🥇
Correct and far from user-friendly, I agree.
|
Yes but I think users would also want to query specific classes of equipments, like "show me all the doors" - this wouldn't work in that case because the information that this item represents a door is lost. They might also choose to omit the Point tag, but then "show me all the contact sensors" isn't possible anymore.
Could work if the querying supports it and it's in line with the Brick schema (or if it's okay to derive from it a little bit).
(isPointOf/pointOf could be omitted, they are implicit, just an example). |
This implements eclipse-archived#6288 (comment): Allows items to carry multiple semantic types e.g. `["Equipment_Door_GarageDoor", "Point_OpenState"]` and also to represent multiple properties e.g. `["Property_Water", "Property_Pressure"]` Relations are only expressed in metadata if they are relations to other items - "implied" relations between tags within the item itself don't appear in metadata. Signed-off-by: Yannick Schaus <eclipse@schaus.net>
@kaikreuzer I have given it some thought and I think rather than enforcing an 1 Brick entity <=> 1 ESH item principle, items should be allowed to have multiple "traits" i.e. carry multiple types. Here are some arguments:
is to be interpreted as:
in a single item.
I have done the implementations in kaikreuzer#17 to play with the idea, but am happy to discuss it further. |
"classes of" are exactly the words we define in the ontology. We define that there is such a concept as a "door" and we have some semantics attached to it. But this is the strict list defined in ESH - and (at least currently) not meant to be customized/extended by the user.
I have also given it some thought and I fear that it will actually break the overall concept pretty much. The "semantics" metadata is supposed to cleanly identify what the item is about - note that the main point of Brick are their TagSets with which they try to express the full semantics in a single (multi segment) string, not requiring consumers (i.e. UIs, algorithms, etc.) to have intrinsic knowledge on what tags might be valid in which case and how to get some meaning from seeing multiple of them.
We'll definitely need some validation, which should inform the user if he did any nonsense - when loading the files or maybe even already when editing them in VS Code. SO in short: Imho there should be something that prevents the user from applying "any" tag to any item.
But if those setups evolve, you might get into all kinds of trouble: If you add a lock to the door, which you so far only used with a contact, you will be in the situation that a group item for the "door" is the only correct model and you'd have 2 point items assigned to it. If one of your items now already implicitly also is the equipment, you could not evolve your model in a correct way.
I wouldn't call it opaque but rather "fuzzy" as you cannot get a clear statement from the semantics metadata on what that item represents as explained above. And note that items (with the exception of the group items) are ALL supposed to be point types as they should represent functionality. Your "door" example actually shows this: As long as you only have contacts on them, the result looks all fine. But now imagine that you have contacts, locks, intrusion sensors etc. for your doors: You would not want to see this all as a flat list, but grouped by the individual doors.
This is actually rather a "pro" argument for one of my initial questions above: "I wonder whether "Property" should be split into "Property" and "Resource""... All in all, I would prefer not to merge kaikreuzer#17 now, but to rather stay with constraints in our model, but keeping it clean at the same time. What I could imagine is that we find some middle way (more or less the virtual equipment idea from above): Already now, the tags, groups etc. is the "internal" information that is then aggregated to become the semantics metadata. So I am fine with allowing to add an equipment type tag to an item with a point type tag, which would make the metadata provider implicitly create an "virual equipment id", which cannot be retrieved through the REST API (as it is neither an item nor a thing), but which could be internally used to answer queries for points that are part of a specific equipment type - isn't that the main feature, we would need? |
Ah, right. My view was the former, not the latter i.e. letting users annotate their items without getting in the way with validation etc. (which is imho more in line with the general philosophy of tagging), and have the metadata reflect these and augment them with additional info like relations, but not necessarily try to derive any kind of ultimate meaning - that would be up to the user, and sometimes it might even not make sense, but that'd be the user's responsibility, not the framework's. But now I see your point.
To be fair, in the door/contact sensor case above, it is kind of fuzzy :) - whether the OpenState is a point of the contact sensor or the door itself is debatable.
Of course, I think it's easily understood that in this model, the items are self-contained if they are not broken into a group hierarchy - if you apply the "Door" tag separately to several items, it's pretty clear that we're referring to different doors. If you have multiple equipments or points on the same door, then a group is naturally the only way to go.
Interesting idea! but I don't know how to express this cleanly either. NB - A little off-topic, on a practical level, it would be great if this could be part of an openHAB distro build before Oct 21 ;) |
Yes, absolutely! My take would be that the functional assignment should always have the priority, so it would be a point of the door. If the same sensor has a "lowBat" channel/item, this would in contrast be a point of the contact sensor.
Yeah, that's why I am trying to keep extended requirements out of this PR for now and rather see, what we still manage to push after an initial version is out. |
It might actually be a point of both depending on the context... imho it should appear both "all my sensors" and "all my doors" queries as pictured above. For the "battery" point there is no debate indeed.
Okay - I'd still like the first version to be as useable as possible (close to the current version), to me it implies:
|
@ghys I suggest to get this merged asap and you provide small follow up PRs containing your most urgent changes & enhancements. @maggu2810 please have a look here since this stuff exceptionally has a deadline and Kai & me are already involved. Thanks :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment is about the generated resource bundles and classes, which imho should not be included in the repo but be created during build time (just like the DSL models). Can be discussed/changed later.
Apart from that: LGTM.
@htreu I approve your changes (thanks for them), but you they broke the build, so please also fix the build.properties. |
|
||
RELATIONS_PROPERTY.put(Arrays.asList(Point.class), "relatesTo"); | ||
} | ||
private final Map<List<Class<? extends Tag>>, String> RELATIONS_PARENT = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the benefit to have this really very static information in fields? If it is about making sure to have clean maps, shouldn't you maybe also clear the maps upon deactivation?
When changing it to fields, you should not use capitalised names anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit, I did this in the light of our recommendation to not have loggers defined static. I just realised that this is a recommendation from SLF4J where they say
instance variables are IOC-friendly whereas static variables are not
w/o any further explanation.
I guess static would be equally okay now.
shouldn't you maybe also clear the maps upon deactivation
I don't think so as the service instance will be disposed and garbage collected anyhow (thats at least my understanding). Initialising in activate
was mainly done due to the lack of a constructor.
When changing it to fields, you should not use capitalised names anymore.
true, will change that.
As I said, I would want to avoid this for now as it completely breaks the model. The "isPointOf" is a 1-1 relation and we should stick to it. Your example should thus not have an equipment tag, if the item is already a member of an equipment group item.
In our ESH Wiki, we had called the point "measurement" instead of "sensor" (which is the term used in Brick). Shall I change it back to "measurement" and we accept to use a different vocabulary than brick (@gtfierro feel free to join the discussion, if you have any strong arguments against it)?
That'll probably VERY hard... Do we really strictly have to avoid them or could a synonym be considered to be not-unique?
Would be cool if you could do that as a follow-up PR!
Good suggestions. I'm fine with most of them and would suggest I think about it and do a follow-up PR once this here is merged. |
Sure, the model is already fine as it is and I won't question it anymore, but it's still important to challenge it with some anticipated real-world scenarios, as we should probably expect some confusion at first about how to tag items (notably Equipments & Points) and there will a need for coherent explanations and examples.
The problem is that the result of
Will do the PR with the synonyms and translations, but maybe it's better to wait for the ontology to be stabilized first then. |
Yes and I'd hope we can involve @gtfierro in that discussion as well as he analysed many real-world cases when designing Brick - so he might have worthwhile input on that.
Yeah, this is probably the best solution.
Agreed, I'll work on the PR for the ontology amendments tonight. |
I've just pushed a (small) change - point sensor is now point measurement and point command is point control (i.e. I reverted to our own vocabulary on these). I contemplated over the ontology additions for an hour and eventually came to the conclusion that they should not be introduced so quickly. Most of them do not seem to deliver real semantical value for our use cases, which would differ from any tag that is already defined. The additions essentially boil down to a "common vocabulary", which would make it easier for people to define labels & synonyms for their items (because they would simply tag them accordingly). But if this is the major benefit, I still prefer to start small - people can still name their items "diswasher" or "tv" and they will be found. It is a bit more cumbersome to define for them, but it gives them the flexibility to call it by whatever name they want.
Brightness should be the very default when talking about "control" & "llight" with switches or dimmers. The information that if the user wants to "adjust brightness" you need to "control light" should imho be part of HABot and not required to be stored in the ontology.
I have changed the |
I'd like to ask to update the wiki page https://github.com/eclipse/smarthome/wiki/Semantic-Tag-Library whenever there are changes in the tag names so we could align properly. :) |
@afuechsel I'd plan to generate a documentation page out of the csv - we should not keep using the wiki on the long run. |
Signed-off-by: Kai Kreuzer <kai@openhab.org>
Sorry for getting to this so late @kaikreuzer. I've been reading through this discussion to try and get a feel for the issue at hand, so please let me know if I'm misunderstanding it. In Brick, we used the word As to your request for real-world scenarios, I would start with annotating some common equipment and simple building structures. What does a Thermostat look like under the scheme, and what kinds of relationships/annotations would you want to capture? For a thermostat, this might look like
Then you could go on to model simple structures and expand them outwards to capture more of the system, such as a thermostat, the room its in, the RTU it controls, the rooms that are conditioned by the RTU, the electric meter upstream of the RTU, which floor the room is on, etc... Depending on the scope of the ontology, you might find it helpful to go through some of the buildings that we created Brick models for. You can look at the viewer.brickschema.org site to get a sense for how Brick models are laid out. I could give you a big CSV of all the points/equipment in those buildings, if you'd like. |
Hey @gtfierro, thanks for your feedback!
I'd think
And it is defined as a "Device or instrument designed to detect and measure a variable." - to me, this sounds much more like an Equipment (i.e. the real hardware and not some function/point) and thus we also have a "Sensor" as an Equipment in our ontology.
The points look straight forward, but how would the relations exactly be modelled?
If so, our problem might be that we do not use the "feeds" relation at all. I do not really fully get the its purpose and I am not sure that it can solve our problem. Just to summarise the main question again: Would it simply mean that the Point is a "point of" both those equipments (i.e. the sensor device and the window)? We so far have it as a 1:1 relation, not a 1:n - so maybe we need to change this? Btw, I see that the BrickFrame.ttl defines many relations and also in a very generic (i.e. little constrained) way - it hardly reflects the diagram in the leaflet: Should this be considered to be outdated (it does not even allow Points being "point of" an Equipment)? Is there anywhere a more recent version of these relations diagram? |
BREAKING: Requires an openHAB distribution including openhab/openhab-core#415 to work! Related to eclipse-archived/smarthome#6288. Requires compatible tags. Extract name samples from tags, item labels & synonyms (comma-separated strings in the "synonyms" metadata namespace) Other misc changes: * Fix location dropdown counters in card deck * Fix invisible send button in chat text field * Better title & subtitle in generated chart cards * Expand matched groups automatically * Fail most skills if no entities found Signed-off-by: Yannick Schaus <habpanel@schaus.net>
The Brick relationships diagram is a little out of date, but is still mostly true. It has elided the implicit "reverse" relationships for simplicity. Equipment can We haven't kept any of the relationships at a strict 1:1, specifically for these cases where you could make the case that a Point is associated with multiple other entities: equipment, locations or other points. What is a For the case of the thermostat, we'd do "Rooftop Unit being controlled" as
and then model the relationship from the thermostat to the rooms by going through the RTU to the HVAC Zone
|
Thanks @gtfierro, makes all sense.
What relations would those have? And another question: Is the "feeds" relation defined in detail somewhere? Its meaning appears to be clear for physical stuff like air and water, but is it only applicable for those? Or is its meaning more generally something like "performs its function in" or "has an effect in"? What e.g. about a switch actuator that is located in a different room than the light that it switches? Would that "feed" the room with light? |
Assuming that they're associated with the thermostat, then we would do
The last triple places the thermostat in the room so that you can query the model for points of equipment in a room. Its important to capture this relationship because the thermostat isn't always in one of the rooms in the HVAC zone. I don't know if |
This pull request has been mentioned on openHAB Community. There might be relevant details there: https://community.openhab.org/t/semantics-and-metadata-for-alexa-skill/66436/1 |
This is the first code for defining a semantic model that is driven and maintained through tags on items. It is the outcome of the discussion at #1093 and explicitly an implementation of what has been sketched out in #1093 (comment).
While working on it, I made a few small edits and amendments to the discussed version. To give an overview, let me start with a visualisation of the chosen model:
This is very much in line with the Brick Schema with the main difference that "Property" (in Brick "MeasurementProperty") has been introduced with a "relatesTo" relation from "Point" - similar to what I have suggested in BuildSysUniformMetadata/GroundTruth#2. Having Properties that can be referred to, makes the number of different Point tags much smaller. Note also that I tried to avoid the expression "TagSet" so far as my feeling is that this would rather irritate people here as "Tags" in the Brick sense (single segments of a full id with underscores) are not used on their own, but always as a "fully qualified" tag only.
The tag library is pretty much in line with what we defined in the wiki, with a few adaptions:
The tags are maintained in a csv file from which all classes can be easily generated through a small script - this should simplify the maintenance significantly and also allow us to do adaptions to whatever we might want to have generated from it.
The main feature of this PR is a new metadata provider, which adds a "semantics" namespace as suggested in #1093 (comment). I decided against adding the synonyms here, but think that synonyms should be independent of semantics and can be simply provided by a separate metadata namespace "synonyms". Here's an example item file that a user could define:
Note that the
semantics
data is not specified by the user here, but it is constructed by the framework on the basis of the item information - i.e. its containment relations and its tags. A single["Light"]
tag is enough to deduce that the item is a "Command Point for Light".The resulting metadata looks e.g. like
for
GF_Bedroom_Light
.This is meant as a first validation of the approach, specifically for @ghys.
There are clearly still many open issues to address:
isPointOf
currently only points to items, but not to Things.Signed-off-by: Kai Kreuzer kai@openhab.org