Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Wait for secondaries before installing #1533

Merged
merged 15 commits into from
Feb 14, 2020

Conversation

lbonn
Copy link
Contributor

@lbonn lbonn commented Jan 23, 2020

This is quite big and will need more tests.

-- Don't modify this! Create a new migration instead--see docs/schema-migrations.adoc
SAVEPOINT MIGRATION;

ALTER TABLE ecu_serials ADD COLUMN sec_type TEXT NOT NULL DEFAULT "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following comment is not exactly related to this PR and is not blocker for its merging, just simply general comment to the overall data model of aktualizr.

I think, it makes sense to have a notion of ECU in the data model (not ecu_serial) and corresponding table in the DB (ECU table), so manifest, public_key, etc are attributes of ECU not ecu_serials entity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've also wondered if it should just be another table.

As we've added fields to ecu_serials it has effectively become what you describe and would probably more aptly named ecus.

@@ -141,20 +147,53 @@ class SecondaryWaiter {
std::unordered_set<std::string> secondaries_to_wait_for_;
};

static Secondaries createIPSecondaries(const IPSecondariesConfig& config) {
static Secondaries createIPSecondaries(const IPSecondariesConfig& config, bool provision, Aktualizr& aktualizr) {
Copy link
Collaborator

@mike-sul mike-sul Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency of the IP Secondary factory on Aktualizr is questionable. Actually, it's me who implicitly introduced this dependency by putting createIPSecondaries() implementation in this file :) , it should be under libaktualizr-posix folder, IMHO. Anyway, this is not exactly comment to this PR, just thing that we might need to consider at some point.


sec_waiter.wait();
} else {
auto secondaries_info = aktualizr.GetRegisteredSecondaries();
Copy link
Collaborator

@mike-sul mike-sul Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the secondaries that have been added to the aktualizr instance be 'not registered', if not then perhaps renaming of GetRegisteredSecondaries() to just GetSecondaries() would make sense. Or, as an option, storage_ can be passed to createIPSecondaries instead of aktualizr so we can call storage_->loadSecondaryInfo(&info); here and replace dependency on Aktualizr with dependency on Storage which looks better since it reduces cross-dependencies, IMHO. Or even more better, just pass exactly what is required to do the job here, which is std::vector<SecondaryInfo>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSecondaries() would probably work.

My first version was also passing the storage around, the problem I had is that call originates from src/aktualizr_primary/main.cc directly where the underlying storage object is not accessible.

Passing std::vector<SecondaryInfo> might be fine or maybe too specific? Also, because of the previous point, the aktualizr method would still be needed in the current state.

throw std::runtime_error(err_ss.str());
}

// note: order in configuration and storage should be the same
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, IP and port could be stored in the DB too, so an order that secondaries are listed in the conf won't matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same vein as the previous comment, it is made harder by the fact we've abstracted the ip secondary concept from SotaUptaneClient and the storage.

One version I've explored was to add another free form string column to the ecu_serials table and SecondaryInfo that might have to be re-parsed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to avoid tables with secondary type-specific columns, so the free-form string might be the best option. Tedious but might be the most future-proof.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that adding secondary specific columns into a generic table (ecu_serials here) is a bad idea. I see a few options here:

  1. extend the existing ecu_serials table with an opaque column (e.g. sec_data, sec_info, etc) intended for serialized secondary specific data required for its instantiation;

  2. don't alter the existing ecu_serials table. Introduce another table ecu with the generic fields + the column containing serialized secondary specific data;

  3. Introduce another table ip_secondary that contains the ip secondary specific info. This table is managed solely by the IP secondary factory object/method/function. In this case we can avoid dependency on Aktualizr object at all, what is needed just a path to the DB if we would like to store this table in the same DB file.

  4. Introduce two new tables ECU containing generic columns + the ip_secondary specific table.

}

for (auto sec_it = targeted_secondaries.begin(); sec_it != targeted_secondaries.end();) {
if (sec_it->second->getManifest() != Json::Value()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sec_it->second->getManifest() sets up a TCP connection with Secondary, a timeout of the setting up TCP connection really depends on multiple factors and configuration of TCP stack at a kernel level. The same is true for write/read to/from socket, so it won't be 10 minutes for sure and actually will significantly differs in a case of a local host and a non-local one (http://willbryant.net/overriding_the_default_linux_kernel_20_second_tcp_socket_connect_timeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, I will rework something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why not just do certain amount of retries within getManifest() (I mean Asn1Rpc call) or keep retrying until certain time if we need to wait at least some period of time.

@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from a7fdec1 to 5b6b169 Compare January 24, 2020 16:15
@codecov-io
Copy link

codecov-io commented Jan 24, 2020

Codecov Report

Merging #1533 into master will decrease coverage by 0.08%.
The diff coverage is 84.4%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1533      +/-   ##
=========================================
- Coverage   82.19%   82.1%   -0.09%     
=========================================
  Files         186     187       +1     
  Lines       11494   11744     +250     
=========================================
+ Hits         9447    9642     +195     
- Misses       2047    2102      +55
Impacted Files Coverage Δ
src/libaktualizr/primary/aktualizr.h 100% <ø> (ø) ⬆️
src/libaktualizr/primary/initializer.h 100% <ø> (ø) ⬆️
src/libaktualizr/storage/sqlstorage.h 50% <ø> (ø) ⬆️
src/libaktualizr/uptane/manifest.h 100% <ø> (ø) ⬆️
...c/virtual_secondary/partialverificationsecondary.h 63.63% <0%> (-14.15%) ⬇️
src/aktualizr_secondary/aktualizr_secondary.h 60% <0%> (-40%) ⬇️
src/libaktualizr/uptane/isotpsecondary.h 0% <0%> (ø)
src/aktualizr_primary/main.cc 83.03% <100%> (-0.16%) ⬇️
src/libaktualizr-posix/ipuptanesecondary.h 100% <100%> (ø) ⬆️
src/libaktualizr/storage/invstorage.h 97.95% <100%> (+0.39%) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce42c5d...ba40f33. Read the comment docs.

@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from 5b6b169 to 3f22919 Compare January 24, 2020 16:57
@lbonn
Copy link
Contributor Author

lbonn commented Jan 24, 2020

Looks like I've overlooked something, e2e62c6 breaks Uptane.UptaneSecondaryMisconfigured

@pattivacek
Copy link
Collaborator

Looks like I've overlooked something, e2e62c6 breaks Uptane.UptaneSecondaryMisconfigured

Not necessarily unexpected, right? That test might need some reworking to still be meaningful.

if (secmanifest == Json::Value()) {
// could not get the secondary manifest, a cached value will be provided
std::string cached;
if (storage->loadCachedEcuManifest(ecu_serial, &cached)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 'manifest caching' functionality looks like IP Secondary specific functionality, so I would 'hide' it behind the secondary interface and implemented it in IpUptaneSecondary::getManifest().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like that also, but there should then be a way to access the storage from there. To evaluate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dedicated table in sql.db or even DB that is initialized in IP secondary factory (preferable) or ctor.

// note: this fail after a time out but will be retried at the next install
// phase if the targets have not been changed. This is done to avoid being
// stuck in an unrecoverable state here
if (!waitSecondariesReachable(updates)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this functionality belongs here, IMHO, this is secondary specific functionality and should not be in a generic code. Why we wait just here before the sending metatada but not wait before sending actual images, secondary can be available at the first step but becomes not available before the second (installation) step ?
This rhetorical question is another argument for having this wait or retry policy withing the IP secondary implementation, specifically, within IpUptaneSecondary implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. We're likely to need a generic way to wait for Secondaries regardless of protocol. Obviously for something like virtual Secondaries it can just return true. And if we can improve this later, that's fine too.

@@ -14,6 +14,7 @@ class SecondaryInterface {
using Ptr = std::shared_ptr<SecondaryInterface>;

public:
virtual std::string Type() const = 0;
Copy link
Collaborator

@mike-sul mike-sul Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this Type() as well as the logic of passing around of the registered secondaries can be eliminated if we persist required information about IP secondaries within the IP secondary builder/factory, e.g. in createIPSecondaries under if (provision) { branch. Therefore, we won't need aktualizr instance here and in its else we won't need to secondaries_info.erase(, we will just load the info from the DB.

Actually, I think, even provision parameter is not required here. In createIPSecondaries, we just check if the info about the secondary is persisted in the DB if so, just `Uptane::IpUptaneSecondary::connectAndCheck() if not connectAndCreate() + persist the info into the DB. What we need here, is just an instance of the object that can operate over the DB, BTW, I think, it's not necessary has to be the same instance that the sotauptaneclient owns as we can have IP secondary specific table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the provision around was indeed needlessly complicated, the last version is a bit more simple.

I'll continue the investigations about the Type() method and dedicated storage.

@mike-sul
Copy link
Collaborator

Regardless of the solution/PR design, I think, it makes sense to add test(s) into ipsecondary_test.py testing the given use-case.

@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch 4 times, most recently from b18a231 to d631a21 Compare February 3, 2020 17:12
@lbonn
Copy link
Contributor Author

lbonn commented Feb 3, 2020

Here is another version with two tables ecus and secondary_ecus, it uses some more complicated sql.

@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch 2 times, most recently from 2e631a4 to 0fbd57f Compare February 5, 2020 14:48
@lbonn
Copy link
Contributor Author

lbonn commented Feb 5, 2020

The version currently pushed only caches the manifest in memory and it is indeed simpler. But I am gonna revert to put it in the storage somehow because it doesn't fit the bill: if the secondary never gets up between the primary start and the attempted install, the backend marks the install as failed before it can start as the manifest is incomplete. And we clearly want to support this case.

tests/ipsecondary_test.py Outdated Show resolved Hide resolved
SAVEPOINT MIGRATION;

ALTER TABLE ecu_serials RENAME TO ecus;
CREATE TABLE secondary_ecus(serial TEXT PRIMARY KEY, sec_type TEXT NOT NULL, public_key_type TEXT NOT NULL, public_key TEXT NOT NULL, FOREIGN KEY(serial) REFERENCES ecus ON DELETE CASCADE ON UPDATE CASCADE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, these two tables should not know anything about each other. libaktualizr should not know details of any secondary implementation. In ideal world the IP secondary should have its own dedicated persistent layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the real world is all about compromises.

@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from 0d0ece5 to 71cdefc Compare February 12, 2020 14:50
Not used anywhere now

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
It does not seem to have a meaningful purpose here and is needlessly
limiting.

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
The extra check in `addNewSecondary()` can be added anywhere

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from 71cdefc to f9ea804 Compare February 13, 2020 16:10
It duplicates information and can be easily replaced by an helper
function.

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch 2 times, most recently from 98ecfdc to 505166c Compare February 14, 2020 09:38
pattivacek
pattivacek previously approved these changes Feb 14, 2020
Copy link
Collaborator

@pattivacek pattivacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you still document somewhere (maybe in the ticket?) what might still require some followup?

src/libaktualizr/primary/sotauptaneclient.cc Outdated Show resolved Hide resolved
'''Test Aktualizr with multiple ip secondaries'''

with aktualizr, secondary, secondary2:
aktualizr.wait_for_completion()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all happy-path, with everything more or less starting simultaneously, right? Can we extended this for the cases when one or more Secondaries don't come up immediately?

Secondary type, public key and an "extra" field.

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
We have to store some additional data to be able to pick it back up.

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
And fix some other test after changes

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
If we did not store a secondary public key (migrated from old storage
for example):
- store it later when we get it
- do not compare against the actual one, but do it on subsequent tries

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch 2 times, most recently from e389c18 to 10638d8 Compare February 14, 2020 14:53
Solves some headaches

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from 10638d8 to 83d0aa9 Compare February 14, 2020 15:09
@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from 83d0aa9 to c927768 Compare February 14, 2020 15:30
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
@lbonn lbonn force-pushed the feat/OTA-4279/campaign-secondary-wait branch from c927768 to ba40f33 Compare February 14, 2020 15:50
@lbonn lbonn merged commit 12bd804 into master Feb 14, 2020
@lbonn lbonn deleted the feat/OTA-4279/campaign-secondary-wait branch February 14, 2020 16:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants