Implementation proposal: representing changing beneficial ownership over time #475

kd-ods · 2023-03-03T17:25:19Z

kd-ods
Mar 3, 2023
Maintainer

See Feature development in BODS in the Handbook.

Comments on this ticket can be used to question, refine and develop this implementation proposal. Interactions and work on this ticket represent a collaborative process. Proposals may be paused, withdrawn, or developed into draft implementation plans. Update the 'Proposal status' as thinking progresses on this thread. Highlight changes or updates to the proposal within thread comments, with a clear 'Updated proposal' heading.

Implementation proposal for: representing changing beneficial ownership over time

Feature ticket link #392

Implementation proposal status: BEING IMPLEMENTED (Not: paused | withdrawn | active)

Initial proposal

Overview

We propose that a series of statements about the same person, entity or ownership-or-control relationship is linked over time by a dedicated identifier. To support this, we propose updating our conceptual and data model. In particular:

We introduce the concept of a record, in recognition that publishing systems are likely to have an updatable record for each entity and individual (and ownership-or-control relationship) stored in their system. In the data model, the identifier that would link a series of statements about the same object over time would therefore be called a record ID (recordID).
The replacesStatements property would be removed.
A relationship statement* is linked to a subject entity and an interested party not by their statement IDs but by their record IDs.
When an entity, person or relationship is no longer disclosed as part of an ownership or control network, its record is closed. This is recorded in a Record Status field (recordStatus).

* See terminology change proposal

For the full details of the proposal for representing changing beneficial ownership over time see this doc.

Please add comments, thoughts and reactions to this thread, rather than as comments on the doc. Thank you!

cosmin-marginean · 2023-03-07T09:45:19Z

cosmin-marginean
Mar 7, 2023

Firstly, thank you for putting this detailed proposal together! As I've argued in the past, and as it was pointed in the document again, there are limitations and complex questions around replacesStatement, so it's good to see we can move away from that.

The fact that the temporal aspect of this data can now be captured more organically is probably the most attractive outcome of this proposal. I can see a series of problems and use cases that could now be solved with this added support.

I'm still reflecting on the ability to produce the latest (or "most recent") version of the ownership and control network. If I'm not mistaken, to achieve that, one would simply discard records that are "not closed"?

I'm also a bit unsure whether there's enough clarity about closing records on the publisher's side and the potential complexity there. For example, in Declaration 4, described in the proposal, R2 R3 E2 are all marked as closed. So firstly, there's a need for the publisher to make sure all 3 are marked consistently. But in theory, I believe that closing R2 and R3 would suffice, because E2 would then implicitly be eliminated from the chain when reading only "not closed" relationships.

There is also a subtlety which I think we might need to capture here. What if E2 is still present in the register with other relationships - I'm assuming that in this case a publisher would be advised not to close the E2 record, even if R2/R3 are gone, correct?

I think this raises the question if "closing" should be exclusive to relationship records. It's not 100% clear what is the meaning of closing an entity record. Closing an entity/person record seems to be an implication of relationships being closed, if I understood this correctly. It might be a lot easier for publishers to only worry about closing relationships records, since reaching entities/people that are part of those relationships won't be possible anyway (in the "latest" snapshot). It would also remove the complexity around "is E2 still needed for other things? OK, then don't close it, even if R2 R3 are closed", etc.

0 replies

StephenAbbott · 2023-03-08T09:57:11Z

StephenAbbott
Mar 8, 2023
Maintainer

In terms of the nomenclature we're proposing, good to note that our proposed concept of a 'record' aligns with the European Union's understanding of the same term as set out in the 2021 regulations relating to the Beneficial Ownership Registers Interconnection System (BORIS):

The set of information in the national registers concerning a corporate or other legal entity, or a trust or a similar type of legal arrangement is referred to as ‘BO record’. The ‘BO record’ includes data on the profile of the entity or arrangement concerned, on the person of the beneficial owner or owners of that entity or arrangement, as well as on the beneficial interest(s) held by those owners.

0 replies

kd-ods · 2023-03-10T18:24:22Z

kd-ods
Mar 10, 2023
Maintainer Author

@cosmin-marginean - thanks so much for sharing those reflections.

I'm still reflecting on the ability to produce the latest (or "most recent") version of the ownership and control network. If I'm not mistaken, to achieve that, one would simply discard records that are "not closed"?

Other way around, I think. Discard all statements which are part of closed series. Then, of the remainder, only retain the latest statement each record-series.

So firstly, there's a need for the publisher to make sure all 3 are marked consistently. But in theory, I believe that closing R2 and R3 would suffice, because E2 would then implicitly be eliminated from the chain when reading only "not closed" relationships.

Or, theoretically, closing E2 (disclosing that E2 is no longer an intermediary in this chain) should close R2 and R3. It is - yes - something that the publisher would need to handle.

What if E2 is still present in the register with other relationships - I'm assuming that in this case a publisher would be advised not to close the E2 record, even if R2/R3 are gone, correct?

Well, I think this is where the BODS concept of a record differs from a publisher's concept. If a publisher has a single database record for an entity, but different declarants make statements about that entity, then there should be a unique recordID per declarant. So, the E2 BODS record would be closed in this case, but not the DB record which E2 pointed to. I'd diagram that out if I had time!

It might be a lot easier for publishers to only worry about closing relationships records

I need to think a bit more about the implications of centring the relationship records in this way. Interesting idea.

0 replies

cosmin-marginean · 2023-03-10T18:50:58Z

cosmin-marginean
Mar 10, 2023

Other way around, I think. Discard all statements which are part of closed series. Then, of the remainder, only retain the latest statement each record-series.

Yes, of course, I wrote two half sentences separately and came out wrong - it makes sense now.

Thanks for looking into this! I will ponder and re-read as well.

0 replies

kd-ods · 2023-03-16T15:43:37Z

kd-ods
Mar 16, 2023
Maintainer Author

Here's a link to a few slides I created to show how the data model would change in line with this proposal.

0 replies

kd-ods · 2023-05-23T17:14:58Z

kd-ods
May 23, 2023
Maintainer Author

Noting here an unintended feature of the French BO data that @Blueskies00 was looking at, most probably related to change over time.

The data contained a series of records for the same person, containing exactly the same information. It looks like there may have been updates to fields redacted for publication (e.g. person's full address) which kicked off the publication of the equivalent of a new statement.

We may want to include advice in the 0.4 documentation about best practice in this situation. Specifically: where fields are not exposed to publication via a given BODS stream/channel, an update to that field should not trigger the publication of a statement to that stream/channel.

0 replies

jpmckinney · 2023-08-22T00:01:14Z

jpmckinney
Aug 22, 2023

I'm quite hesitant about a status field, without some real evidence that publishers have the capacity to update it correctly.

I wrote a longer comment about records and metadata more generally in #477 (comment)

0 replies

tiredpixel · 2023-09-20T12:29:55Z

tiredpixel
Sep 20, 2023

I've just had a read-through of the implementation proposal for changing the statement data structure for a future version of BODS. It seems mostly okay to me, and I'm supportive of the overall idea. I'm very supportive of the idea of getting rid of replacesStatements, since this causes issues with performance, and indeed we've had to add some metadata internally to stop constantly traversing the graph client-side when querying the new Elasticsearch cluster utilising the BODS 0.2 documents. However, I do think some things need clarifying, and I also foresee some potential issues with the data model proposed (perhaps related to some misunderstanding on my part?).

We introduce the concept of a record, in recognition that publishing systems are likely to
have an updatable record for each entity and individual (and ownership-or-control
relationship) stored in their system. In the data model, the identifier that would link a
series of statements about the same object over time would therefore be called a record
ID (recordID).

This is minor, but I wonder about using such a term which is already doing a lot of heavy lifting both in data storage and informal discussion. I can foresee discussions similar to: 'the record was updated'—'wait, do you mean the record record, or the statement record', etc.

In Declaration 1 the entity declares that a single person is their beneficial owner, indirectly via
another entity. The declaring entity’s details are published in a statement with the recordId E1.
The beneficial owner’s details are published in a statement with the recordID P1. The
intermediary’s details are published in a statement with recordId E2. Details of the beneficial
ownership and legal ownership interests are published in relationships statements with
recordIds R1, R2, R3. In all six statements, the recordStatus is ‘new’.

I think the nomenclature here could lead to a little confusion. Does a record only have recordStatus new the first time it's inserted, or even in subsequent updates until such as time as it's updated or closed? My understanding is that it's the latter, but this means that many records will continue to be marked as new even if they've been there for a long time. This is alright, but might lead to some confusion. Additionally, it's a little unclear to me whether recordStatus new is only until the record is connected to other entities; that is, whether it's a transitional status. Again, my understanding is that it's not.

In a second, later declaration, the name of the beneficial owner is updated. A new statement
(with a new, unique statementID) with the recordId P1 is published, but the recordStatus in this
statement is ‘updated’.

What if the name of the beneficial owner is updated a second time? The recordStatus would already be updated, and get changed to… updated? How would the versions of these records be distinguished? How do they related to series? Would not having an incrementing version number be more resilient?

Additionally, in this declaration, the nature of the original beneficial owner’s interests are
declared to have changed. The person now has a direct stake in the declaring entity. Hence a
new statement about the relationship between the beneficial owner and the declaring entity is
made; this is an update to record R1. The records about the old intermediary (E2) and its
relationships in the ownership and control network (R2 and R3) are no longer active; statements
are published with recordStatus ‘closed’ to close these series of statements.

So does this mean the record recordStatus would be closed, or that that only happens by publishing an additional statement, linked to the existing record, marking as closed? In the latter case, does that not mean that we'd be storing statements which haven't come from any source—a kind of meta-statement? What if an entity stops appearing in one source, but continues to appear in another source? At what stage would the recordStatus be closed? And how would that interact with the statement published?

A statement series is an ordered series of statements about a single person, entity or
relationship. All statements in a given series share a Record ID. The first statement in a series
has a Record Status (recordStatus) of 'new'. The statementDate property is used to order
statements within a series.

What if multiple statements are made about an entity on the same date? I suppose this is unlikely when considering records coming from UK Companies House, for example, but what about for countries which have highly automated, digital corporate registers like Estonia? And what if an entity being updated causes a different source to update their records and make a new statement, resulting in the same date?

When a statement series is closed, the final statement’s recordStatus is set to 'closed'.

Versus the record recordStatus being set to closed? It's not clear to me whether all the data in records still has to appear in containing statements, or whether the record itself contains some information.

b. If two (or more) different declarants provide information about the same object, two (or
more) records should be created and two (or more) statement series should be opened.

But doesn't that mean that entities from different sources would have multiple records? In that cases, what about sources which make statements about entities in another country (such as already happens)? And how would this be compatible with the idea that a single record refers to an entity? Making multiple records and statement series about the same entity would still necessitate some kind of multi-record reconciliation or merging process, would it not? And this would greatly complicate using the data model.

Generating point-in-time snapshots is relatively simple. i.e. discard all statements with a
statement date after X; discard all statements with a recordID belonging to a closed
series; of remaining statements, discard all but the latest in their series.

It's probably worth including some concept of a latest boolean within the statements (such as we're currently doing for recently-introduced statement metadata); otherwise, finding the latest statement would require finding the latest version each time (much easier with a series or version number than the existing replacesStatements traversal, but still not very performant).

Leaves open an option for slimming streamed updates. E.g. a statement which is an
update to a record need only contain updated fields; and a confirmed statement can be
all but empty.

It's not clear to me how this would be accomplished without introducing the notion of partial vs full statements, and also introducing traversal within the statement series to resolve partial updates into a full model. This would be much more performance-intensive, but perhaps more importantly, it would be important to be clear when importing statements form a source about whether they are updates or full statements themselves. For example, if an entity had two addresses, and then there was an updated statement with a third address, does that mean that the previous two addresses are replaced, or does it mean that there are now three addresses? And does it mean that company number, etc., is deleted, because it hasn't been mentioned, but if treated as an update, which fields are updated and what is the replacement or merging strategy? This would require further modelling, since it could become complex.

Offers clarity around the scope of BODS, in that it does not model or represent the
transfer of ownership or control. (The flexibility and underspecified usage of
replacesStatements meant that we have tied ourselves up with questions like: can two
ownership-or-control statements replace a single ownership-or-control statement? What
does that mean - does it show that, eg, this block of shares was split in 2?)

Will the new records be exported at all? I suppose they will have to be, if relationships will be between them. So there will be statements contained within records coming from sources (and potentially including the statement series closed meta-statement…), but also statements outside of records relating records for ownership-and-control?

0 replies

kd-ods · 2023-09-21T17:14:07Z

kd-ods
Sep 21, 2023
Maintainer Author

@tiredpixel - Thank you so much for taking time to interrogate these proposals: it's extremely useful to re-examine things from others' point of view. I've tried to answer as best I can below.

Does a record only have recordStatus new the first time it's inserted, or even in subsequent updates until such as time as it's updated or closed? My understanding is that it's the latter, but this means that many records will continue to be marked as new even if they've been there for a long time.

The data standard is designed for data exchange so I think it helps to consider record storage and management as something done by data-handling and storage systems (e.g. a company register) that may or may not involve bods-style structured data. So - for example - a system might not even maintain a status field, but generate a recordStatus value for a published BODS statement based on other database field or logs.

In any case, I think a preferable way of saying "many records will continue to be marked as new even if they've been there for a long time" is that: statements are immutable and stick around and the first time a record with a given recordId value appears in a published BODS statement, the recordStatus will be 'new'. That statement will always have that status.

Additionally, it's a little unclear to me whether recordStatus new is only until the record is connected to other entities; that is, whether it's a transitional status. Again, my understanding is that it's not.

No - it's not a transitional status.

What if the name of the beneficial owner is updated a second time? The recordStatus would already be updated, and get changed to… updated? How would the versions of these records be distinguished? How do they related to series? Would not having an incrementing version number be more resilient?

Yes, the status would stay as 'updated'. StatementDate would be used for ordering. (We did consider having an incrementing instanceNumber field, but we decided that it was surplus to requirements and would be challenging for publishers to maintain. Still, we can reconsider in the future if necessary.)

So does this mean the record recordStatus would be closed, or that that only happens by publishing an additional statement, linked to the existing record, marking as closed? In the latter case, does that not mean that we'd be storing statements which haven't come from any source—a kind of meta-statement? What if an entity stops appearing in one source, but continues to appear in another source? At what stage would the recordStatus be closed? And how would that interact with the statement published?

So, I think this is where the particular use case of the OO register comes in. TBH, the Register is not currently designed to demonstrate how best to handle incoming BODS streams, update related records and then publish reconciled BO information. I'd argue that that should be the next iteration of development! I'd say that, if the Register was designed as a demonstration system, then it might publish a recordStatus 'closed' statement for an entity once all records about that entity in incoming streams are closed.

What if multiple statements are made about an entity on the same date? I suppose this is unlikely when considering records coming from UK Companies House, for example, but what about for countries which have highly automated, digital corporate registers like Estonia?

In BODS 0.4, the statementDate field will accept timestamps, date-time info.

And what if an entity being updated causes a different source to update their records and make a new statement, resulting in the same date?

That's fine. You'd have two statements from different sources on the same date and about the same entity. The statement model in BODS exists because of considerations like that.

I'd love for the next iteration of the Register to demonstrate how competing statements about the same real world thing can be handled instructively. So, for example, imagine Source A updated the name of a company to MERIDIAN INC on 12th Dec but Source B updated it to MERIDIAN INC on the 14th Dec. Then if you search for the company in the Register you'd see that there was an information conflict for a couple of days.

Versus the record recordStatus being set to closed? It's not clear to me whether all the data in records still has to appear in containing statements, or whether the record itself contains some information.

Record information is always wrapped in a statement. (That is how we're 'repackaging' things, compared with previous versions of BODS: see the spreadsheet linked to from #477.) So, a statement's recordStatus is shorthand for 'the status of the record which holds entity/person/relationship information given in a statement'.

But doesn't that mean that entities from different sources would have multiple records? In that cases, what about sources which make statements about entities in another country (such as already happens)? And how would this be compatible with the idea that a single record refers to an entity? Making multiple records and statement series about the same entity would still necessitate some kind of multi-record reconciliation or merging process, would it not? And this would greatly complicate using the data model.

Again, I think we're talking about the particular use case of the OO register here. Ideally, yes, the Register would be handling statements from different sources about the same objects (entities, people, relationships). (And each of those sources would have their own separate records for those objects.) The question for the Register (in the future!) is: how are those information streams handled? The answer depends on what the purpose of the Register is (in the future).

It's probably worth including some concept of a latest boolean within the statements (such as we're currently doing for recently-introduced statement metadata); otherwise, finding the latest statement would require finding the latest version each time

That might work for internal information-handling, but not for publishing: it would break immutability. A statement would only have latest true until a more recent statement about the same object is made.

It's not clear to me how this would be accomplished without introducing the notion of partial vs full statements, and also introducing traversal within the statement series to resolve partial updates into a full model. [...] This would require further modelling, since it could become complex.

Yes - I agree. Much more thought would need to be given to any slimmed-down publishing format.

Will the new records be exported at all?

'Record' is really being used as a concept in BODS 0.4 to refer to a record that exists in a publisher's database or system. Data held in those records is published as point-in-time snapshots via BODS Statement objects. We'll be doing a thorough update of the docs to explain the conceptual model and the data model and how they relate to one another.

0 replies

tiredpixel · 2023-09-22T10:08:01Z

tiredpixel
Sep 22, 2023

@kd-ods, thank you for taking the time to answer my questions so comprehensively. The proposal makes a lot more sense to me, now.

I can see that one thing which was causing confusion was my OO Register-centric thinking, whereas of course the BODS is designed to be far more agnostic than that. Given that OO Register 2 mostly stores things directly in BODS format, this led me to think that perhaps some things should be included in the new standard. But I understand your point about latest breaking immutability, and see that in fact this is an OO Register 2 implementation detail. (I myself likely would have designed OO Register data implementation differently.)

It's great that statementDate will accept timestamps.

I'd love for the next iteration of the Register to demonstrate how competing statements about the same real world thing can be handled instructively. So, for example, imagine Source A updated the name of a company to MERIDIAN INC on 12th Dec but Source B updated it to MERIDIAN INC on the 14th Dec. Then if you search for the company in the Register you'd see that there was an information conflict for a couple of days.

This is very interesting. I, too, would be curious to see how these divergent or forked views of reality could be modelled and utilised effectively. Perhaps it would be instructive for OO Register to introduce the concept of an observer, at least in thinking about the implementation. (Out of scope for here, I know, but interesting to think about.)

So, to try to ensure I now understand the proposal, please permit me to make some basic statements, from an alternative point of view. I'd be very grateful if you could confirm or correct my refined view of things.

BODS Statements are just the data structure by which something is 'stated'—that is, a description of reality at a specific point in time (statementDate).
The fact that OO Register 2 takes data from sources and transforms them into BODS Statements and stores those within separate 'streams' (i.e. indexes in Elasticsearch) is an implementation detail, not related to the BODS per se.
Whether and how those records are deduplicated or merged is again an OO Register 2 implementation detail; the BODS has nothing to say about that.
The output from such as BODS Statements is the focus of things, here. Currently, those Statements do not easily or reliably denote which real-world entities they are concerning (whether corporate entity, human, etc.) (other than in identifiers, but these are optional).
This proposal is to make BODS Statements refer to a BODS Record, which will have a persistent recordID to unite BODS Statements. e.g. UK Companies House data might update the name of a company registered in Scotland, in which case, the GB/PSC source scope and SCXXX company number might be extracted to identify the BODS Record somehow.
recordStatus will start off as new, and change to updated for each subsequent update. Eventually, if it is determined that all (or some authoritative) source considers that no more data will appear for that entity (or possibly that the entity is dissolved or deceased or for some other reason removed), the recordStatus will changed to closed.
Whether or not the recordStatus is stored or calculated based on the data coming from different sources is again an implementation detail; all that matters for the BODS is whether BODS Statements are continuing to be published for that entity.

0 replies

kd-ods · 2023-09-22T13:45:09Z

kd-ods
Sep 22, 2023
Maintainer Author

* BODS Statements are just the data structure by which something is 'stated'—that is, a description of reality at a specific point in time (`statementDate`).

Yes

* The fact that OO Register 2 takes data from sources and transforms them into BODS Statements and stores those within separate 'streams' (i.e. indexes in Elasticsearch) is an implementation detail, not related to the BODS per se.

Yes

* Whether and how those records are deduplicated or merged is again an OO Register 2 implementation detail; the BODS has nothing to say about that.

Yes

* The output from such as BODS Statements is the focus of things, here. Currently, those Statements do not easily or reliably denote which real-world entities they are concerning (whether corporate entity, human, etc.) (other than in `identifiers`, but these are optional).

Well, Statements (or, more precisely the recordDetails object inside a statement) should reliably refer to a real world entity or person (or relationship). In the case of persons and entities via the identifiers array.

* This proposal is to make BODS Statements refer to a BODS Record, which will have a persistent `recordID` to unite BODS Statements. e.g. UK Companies House data might update the name of a company registered in Scotland, in which case, the GB/PSC source scope and SCXXX company number might be extracted to identify the BODS Record somehow.

This isn't quite accurate. "BODS Statements refer to a BODS Record". No, BODS statements refer to a record held in a publisher's system. So if we think about the mapping that we currently have of UK Companies House PSC data to BODS 0.2; we need to identify within that PSC data source which id can reliably be mapped to the new recordId field in BODS 0.4. (The self or etag values might be candidates.) So ideally, when we map source BO data to BODS 0.4 there will be some kind of id which maps to recordId and if not, we would need to understand exactly how data about an an entity or person looks when it is updated over time, so that we might construct a recordId value.

* `recordStatus` will start off as `new`, and change to `updated` for each subsequent update. Eventually, if it is determined that all (or some authoritative) source considers that no more data will appear for that entity (or possibly that the entity is dissolved or deceased or for some other reason removed), the `recordStatus` will changed to `closed`.

Yes, perhaps. Again, this is an implementation detail for the Register. It's probably helpful to consider that the register works: (1) To ingest several streams of BODS data. (2) To manage, process and display BO data. (3) To provide an export of processed and merged BO data in BODS format. (There is also a prior stage (0) Mapping source data to BODS.) So when you say "the recordStatus will changed to closed", I think you're referring to records at stage (2). But actually those records don't have to have a recordStatus necessarily (BODS is a format for exchanging data, not storing or managing it). At stage (3), though, when a BODS statement is published about a record, the recordStatus will be closed.

* Whether or not the `recordStatus` is stored or calculated [for stage (3)] based on the data coming from different sources [at stage (1)] is again an implementation detail; all that matters for the BODS is whether BODS Statements are continuing to be published for that entity [stage (3)].

Exactly.

0 replies

tiredpixel · 2023-09-25T10:46:34Z

tiredpixel
Sep 25, 2023

@kd-ods, thank for for the confirmations and corrections to my understanding. This discussion has been very useful in me understanding the proposal and BODS more comprehensively.

0 replies

jpmckinney · 2023-09-25T21:51:43Z

jpmckinney
Sep 25, 2023

Records

I think it's important to always keep in mind what facts BODS intends to represent. I believe it intends to represent declarations (made by declarants) that contain statements relevant to the domain of beneficial ownership (about control, etc.).

The record proposal expands the scope of BODS, to cover the representation of these facts within source systems. However, other than as an attempt to clarify changes over time, there are no expressed use cases or user needs for this new scope.

As such, I would strongly encourage keeping BODS to its existing scope, and exploring alternative solutions to changes over time.

Changes over time

Within government-controlled systems and processes, my understanding is that the current BO information for a given declarant (company) is whatever is in their most recent declaration. No complex algorithm or special logic is required to disentangle what information is true at a given point in time – you just pull up the most recent declaration prior to the given point in time.

Note: If there are multiple, conflicting declarations about the same entities from different sources, no amount of standardization will help users – users will need to decide for themselves which sources they trust. This problem of changes over time can therefore be narrowed to changes within a given system.

Is there any real, experienced issues with the above solution? (Other than statementDate not (yet) being a required field, but I don't see any hope of a publisher correctly setting statuses without also having dates available.) I know we can dream up scenarios, but we're trying to solve for the real world.

Identifiers

#392 (and thus the record proposal) mixes in another issue that isn't directly related to changes over time.

Small changes require large changes

Any change to a statement requires a new statementID to be created.

This means that even if something small has changed, a lot of related statements may have to be amended. For example, a name correction for a person statement will require every ownership-or-control statement with that person as an interestedParty to be changed to include the new statementID.

The fundamental problem is that BODS doesn't have people or organizations, it just has statements. In BODS-land, there is no RDF node "Bob" about which Alice states "his eyes are green" (ID 1), and Vaughn states "his eyes are brown" (ID 2), and Noah states "his eyes are blue" (ID 3). There is only their free-floating statements. So, when Alice wants to correct with "his eyes are brown" (ID 4), then every "1" needs to be changed to a "4" – because BODS has no way to identify "Bob".

The record proposal is saying: Alice has a compartment in her head about Bob, let's call it "Record B". And similarly for Vaugh and Noah. When Alice makes statement ID 4, nothing needs to be updated, because everything is referring to Record B, which is constant.

I think a much simpler, clearer and natural solution is this: Add Person and Organization classes to BODS. Forget about "records". The result is the same – but much less confusing.

The only difference is perhaps that you might expect "Bob, the person" to be the same node for Alice, Vaughn and Noah – whereas you are okay accepting that "Bob, the mental compartment" is distinct for each of Alice, Vaughn and Noah.

But... we're not trying to solve the holy grail of the semantic web here. On the web and all over RDF, different sources make statements about the same thing without standardizing identifiers or URLs, all the time. So, I wouldn't worry about allowing the UK to have a Person with one ID and France to have the same person with another ID - that's totally normal and not an issue with this new proposal.

8 replies

kd-ods Oct 23, 2023
Maintainer Author

As you suggest, the simplest is to just change the original data. This breaks immutability – but, honestly, unless there are very strong use cases for preserving immutability, then it is so much simpler (for both users and publishers) to allow clerical errors as an exception to immutability.

Yes - we will need to treat certain types of correction and post-hoc redaction as special cases. So we will be outlining in future versions of the standard the circumstances under which statements need not be immutable.

I'm curious about the Person and Organization classes idea. Where would those sit within BODS? Would they be new entities, or within Statement somehow?

The record proposal expands the scope of BODS, to cover the representation of these facts within source systems.

Well, I would say that the proposal clarifies that BODS covers the representation of these facts within source systems. Implicitly, that was always the case.

If there are multiple, conflicting declarations about the same entities from different sources, no amount of standardization will help users – users will need to decide for themselves which sources they trust. This problem of changes over time can therefore be narrowed to changes within a given system. [...] Is there any real, experienced issues with the above solution?

Yes - changes within a given system are what we need to handle. But there can be multiple records about the same entity in a system. This is because of the graph / network nature of beneficial ownership information (and its overlap with legal ownership and company registration information). For example, in the UK PSC register there might be a record for Acme Corp as an organisation which is declaring its beneficial ownership. It might also be disclosed as a Relevant Legal Entity for widgetCraft Ltd, whereupon another record is created. The source (on which obligation for info disclosure rests) is different in each case. There are two distinct threads of declared information about the same entity: hence two records.

[...] the only data representation of a person or organization in BODS would be an identifier (assigned by the publishing system).

If I've understood you correctly @jpmckinney, you're suggesting what I've called a recordId. Though you'd call it a classItemId or similar?

jpmckinney Oct 23, 2023

Well, I would say that the proposal clarifies that BODS covers the representation of these facts within source systems. Implicitly, that was always the case.

This is splitting hairs, but I don't consider "implicit representation" as representation at all. In my usage of these words, only information that is explicitly contained in data can be said to be "represented in" or "covered by" the data standard. That is the sense in which the scope has increased. (There are definitely more fields now to disclose more facts - hence the scope has increased.)

For example, in the UK PSC register there might be a record for Acme Corp as an organisation which is declaring its beneficial ownership. It might also be disclosed as a Relevant Legal Entity for widgetCraft Ltd, whereupon another record is created.

The source (on which obligation for info disclosure rests) is different in each case. There are two distinct threads of declared information about the same entity: hence two records.

This is a useful example. If these declarations where made on paper, the UK PSC register would have two documents: Acme Corp's document and widgetCraft Ltd's document (which references Acme Corp). What does widgetCraft Ltd's document say about Acme Corp, other than its name (excluding details about their relationship, which are not "about" Acme Corp, but about their relationship)?

If we try to imagine their database, one reasonable set of tables might be:

declaration table, with an organization_id for the declarant, which links to the organization table, (plus other columns about the declaration).
organization table, with information supplied by the organization when registering an account for electronic submission of their declarations. I don't know if there is any such information that is both public and invariant (e.g. looking at the company declaration sheet of the sample form). If it's all variable between declarations, then there might not be much on this table, and so the "organization" is really just an ID – as I described in an earlier comment.
legalownership table, containing data from the legal ownership sheet, with a declaration_id and some columns about the interested party (plus other columns). Knowing most countries (and OO's advocacy), many countries don't collect identifiers, etc. and so the information might be too limited to disambiguate/reconcile the interested party. Let's say it's just a name and jurisdiction.
(More tables for other facts)

In this scenario, are you saying BODS would ask the publisher to invent a "record ID" to represent each name-jurisdiction pair on widgetCraft Ltd's declaration?

jpmckinney Dec 6, 2023

Ping

kd-ods Dec 7, 2023
Maintainer Author

Hi @jpmckinney -

I dealt with the more general issue in my wrap up comment:

There was a question about forcing publishers to create different recordId values for - eg - the same entity appearing in different sets of declarations. We will support flexibility, so that where this is not practical or meaningful for a given publisher, there will be a range of options for mapping a suitable existing field to the recordId.

I realise that your question relates to a single declaration, but the handling will be the same: a recordID will need to be derived from a suitable field in the publisher's system.

jpmckinney May 21, 2024

My ping had been about my entire comment at the start of this thread, most of which was not engaged with by BODS maintainers, though it did resonate with tiredpixel. Basically, I described alternative directions to the record proposal after describing some of the potential challenges with the record proposal, but there was no real consideration of these alternatives.

kd-ods · 2023-12-06T09:38:04Z

kd-ods
Dec 6, 2023
Maintainer Author

Thanks for everyone’s input and scrutiny on this proposal.

Response to issues raised above

We will be using the ‘record’ concept explicitly to support the handling of changing beneficial ownership over time, and the representation of the lifecycle of data within publishing systems.

There was a question about forcing publishers to create different recordId values for - eg - the same entity appearing in different sets of declarations. We will support flexibility, so that where this is not practical or meaningful for a given publisher, there will be a range of options for mapping a suitable existing field to the recordId.

Systems for handling beneficial ownership data will have various ways of managing the lifecycle of their records. For those with ‘low-resolution’ detail of beneficial ownership over time, it may not be practical to populate a recordStatus field. That field will not therefore be required. (This also leaves open the option of an implementation focusing on the lifecycle of relationship records only, as per Cosmin’s question.)

Accepted proposal

Bringing it all together, we will be implementing this proposal, in line with the initial proposal summary, with the following nuances:

recordId will be required
recordStatus will not be required
recordType will be required
recordDetails will be required

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation proposal: representing changing beneficial ownership over time #475

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 14 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Implementation proposal: representing changing beneficial ownership over time #475

kd-ods Mar 3, 2023 Maintainer

Implementation proposal for: representing changing beneficial ownership over time

Initial proposal

Replies: 14 comments · 8 replies

StephenAbbott Mar 8, 2023 Maintainer

kd-ods Mar 10, 2023 Maintainer Author

kd-ods Mar 16, 2023 Maintainer Author

kd-ods May 23, 2023 Maintainer Author

kd-ods Sep 21, 2023 Maintainer Author

kd-ods Sep 22, 2023 Maintainer Author

Records

Changes over time

Identifiers

kd-ods Oct 23, 2023 Maintainer Author

kd-ods Dec 7, 2023 Maintainer Author

kd-ods Dec 6, 2023 Maintainer Author

Response to issues raised above

Accepted proposal

kd-ods
Mar 3, 2023
Maintainer

Replies: 14 comments 8 replies

StephenAbbott
Mar 8, 2023
Maintainer

kd-ods
Mar 10, 2023
Maintainer Author

kd-ods
Mar 16, 2023
Maintainer Author

kd-ods
May 23, 2023
Maintainer Author

kd-ods
Sep 21, 2023
Maintainer Author

kd-ods
Sep 22, 2023
Maintainer Author

kd-ods Oct 23, 2023
Maintainer Author

kd-ods Dec 7, 2023
Maintainer Author

kd-ods
Dec 6, 2023
Maintainer Author