Skip to content
This repository has been archived by the owner on Jun 21, 2020. It is now read-only.

[Ppal Node]: Retries for connecting to sgx.enigma.co #98

Closed
lacabra opened this issue Mar 22, 2019 · 6 comments
Closed

[Ppal Node]: Retries for connecting to sgx.enigma.co #98

lacabra opened this issue Mar 22, 2019 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@lacabra
Copy link
Contributor

lacabra commented Mar 22, 2019

Is your feature request related to a problem? Please describe.
I got the following error while launching the Principal Node:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Io(Custom { kind: TimedOut, error: StringError("timed out") }), url: Some("https://sgx.enigma.co/api") }

I don't know if this would also be an issue in Core? Cc: @elichai

https://sgx.enigma.co/api was indeed online at the time, and a subsequent manual request succeeded.

Describe the solution you'd like
I would suggest coding in at least 3 retries for trying this connection, otherwise the Principal Node crashes.

Describe alternatives you've considered
N/A

Additional context

backtrace of the error above:

principal_1   | stack backtrace:
principal_1   |    0: failure::backtrace::internal::InternalBacktrace::new::h012a694690d84db0 (0x55cb161236fe)
principal_1   |              at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/backtrace/internal.rs:44
principal_1   |    1: failure::backtrace::Backtrace::new::h1c372c7f484dc91c (0x55cb161231cd)
principal_1   |              at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/backtrace/mod.rs:111
principal_1   |    2: <failure::error::error_impl::ErrorImpl as core::convert::From<F>>::from::h7e6fd421c5f4b853 (0x55cb15852b2f)
principal_1   |              at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/error/error_impl.rs:19
principal_1   |    3: <failure::error::Error as core::convert::From<F>>::from::h637432e372f33cfc (0x55cb15840122)
principal_1   |              at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.5/src/error/mod.rs:36
principal_1   |    4: enigma_tools_u::attestation_service::service::AttestationService::send_request::h043a328328fe84e7 (0x55cb15819ef2)
principal_1   |              at /root/enigma-core/enigma-tools-u/src/attestation_service/service.rs:118
principal_1   |    5: enigma_tools_u::attestation_service::service::AttestationService::get_report::{{closure}}::h1da9c57383505c95 (0x55cb15831782)
principal_1   |              at /root/enigma-core/enigma-tools-u/src/attestation_service/service.rs:97
principal_1   |    6: enigma_tools_u::attestation_service::service::AttestationService::get_report::h5a07212ab6e43f4c (0x55cb1581ba3c)
principal_1   |              at /root/enigma-core/enigma-tools-u/src/attestation_service/service.rs:94
principal_1   |    7: enigma_principal_app::boot_network::principal_manager::ReportManager::get_registration_params::{{closure}}::h1f1c1f4ac90cd80e (0x55cb1539b529)
principal_1   |              at src/main.rs:1
principal_1   |    8: enigma_principal_app::boot_network::principal_manager::ReportManager::get_registration_params::h63d34d26d7da11c4 (0x55cb153d20d9)
principal_1   |              at src/boot_network/principal_manager.rs:77
principal_1   |    9: <enigma_principal_app::boot_network::principal_manager::PrincipalManager as enigma_principal_app::boot_network::principal_manager::Sampler>::register::h53dc61f57c55e04b (0x55cb15392dbd)
principal_1   |              at src/boot_network/principal_manager.rs:165
principal_1   |   10: <enigma_principal_app::boot_network::principal_manager::PrincipalManager as enigma_principal_app::boot_network::principal_manager::Sampler>::verify_identity_or_register::{{closure}}::hb6fc10e55956cb1c (0x55cb1539d6ed)
principal_1   |              at src/boot_network/principal_manager.rs:187
principal_1   |   11: <enigma_principal_app::boot_network::principal_manager::PrincipalManager as enigma_principal_app::boot_network::principal_manager::Sampler>::verify_identity_or_register::h9985b4b06fb97672 (0x55cb1539c208)
principal_1   |              at src/boot_network/principal_manager.rs:178
principal_1   |   12: <enigma_principal_app::boot_network::principal_manager::PrincipalManager as enigma_principal_app::boot_network::principal_manager::Sampler>::run::{{closure}}::h26edff5a3bbb3269 (0x55cb1539e37a)
principal_1   |              at src/boot_network/principal_manager.rs:207
principal_1   |   13: <enigma_principal_app::boot_network::principal_manager::PrincipalManager as enigma_principal_app::boot_network::principal_manager::Sampler>::run::h3517570f4bfe8a65 (0x55cb1539dc5a)
principal_1   |              at src/boot_network/principal_manager.rs:204
principal_1   |   14: enigma_principal_app::cli::app::start::{{closure}}::hcb790015458b97f5 (0x55cb153bce64)
principal_1   |              at src/cli/app.rs:80
principal_1   |   15: enigma_principal_app::cli::app::start::h34b1500ab6b3c7e1 (0x55cb152a0633)
principal_1   |              at src/cli/app.rs:17
principal_1   |   16: enigma_principal_app::main::hd42b6335d59d24d6 (0x55cb1530c644)
principal_1   |              at src/main.rs:73
principal_1   |   17: std::rt::lang_start::{{closure}}::h6422b3b451175020 (0x55cb153d2c4f)
principal_1   |              at libstd/rt.rs:74
principal_1   |   18: std::rt::lang_start_internal::{{closure}}::hdc2a896aeffb5179 (0x55cb16205ec2)
principal_1   |              at libstd/rt.rs:59
principal_1   |       std::panicking::try::do_call::h5a4eb2ce70a501f5
principal_1   |              at libstd/panicking.rs:310
principal_1   |   19: __rust_maybe_catch_panic (0x55cb16228789)
principal_1   |              at libpanic_unwind/lib.rs:102
principal_1   |   20: std::panicking::try::h97436c380f30f437 (0x55cb162083d5)
principal_1   |              at libstd/panicking.rs:289
principal_1   |       std::panic::catch_unwind::h9c28ef6e0c478c5d
principal_1   |              at libstd/panic.rs:392
principal_1   |       std::rt::lang_start_internal::h6abd6befa9748e41
principal_1   |              at libstd/rt.rs:58
principal_1   |   21: std::rt::lang_start::h9c265978b259098c (0x55cb153d2c27)
principal_1   |              at libstd/rt.rs:74
principal_1   |   22: main (0x55cb1530c829)
principal_1   |   23: __libc_start_main (0x7f50c1ba4b96)
principal_1   |   24: _start (0x55cb152a0399)
principal_1   |   25: <unknown> (0x0)', libcore/result.rs:1009:5
principal_1   | stack backtrace:
principal_1   |    0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
principal_1   |              at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
principal_1   |    1: std::sys_common::backtrace::print
principal_1   |              at libstd/sys_common/backtrace.rs:71
principal_1   |              at libstd/sys_common/backtrace.rs:59
principal_1   |    2: std::panicking::default_hook::{{closure}}
principal_1   |              at libstd/panicking.rs:211
principal_1   |    3: std::panicking::default_hook
principal_1   |              at libstd/panicking.rs:227
principal_1   |    4: std::panicking::rust_panic_with_hook
principal_1   |              at libstd/panicking.rs:476
principal_1   |    5: std::panicking::continue_panic_fmt
principal_1   |              at libstd/panicking.rs:390
principal_1   |    6: rust_begin_unwind
principal_1   |              at libstd/panicking.rs:325
principal_1   |    7: core::panicking::panic_fmt
principal_1   |              at libcore/panicking.rs:77
principal_1   |    8: core::result::unwrap_failed
principal_1   |              at libcore/macros.rs:26
principal_1   |    9: <core::result::Result<T, E>>::unwrap
principal_1   |              at libcore/result.rs:808
principal_1   |   10: enigma_principal_app::cli::app::start::{{closure}}
principal_1   |              at src/cli/app.rs:80
principal_1   |   11: enigma_principal_app::cli::app::start
principal_1   |              at src/cli/app.rs:17
principal_1   |   12: enigma_principal_app::main
principal_1   |              at src/main.rs:73
principal_1   |   13: std::rt::lang_start::{{closure}}
principal_1   |              at libstd/rt.rs:74
principal_1   |   14: std::panicking::try::do_call
principal_1   |              at libstd/rt.rs:59
principal_1   |              at libstd/panicking.rs:310
principal_1   |   15: __rust_maybe_catch_panic
principal_1   |              at libpanic_unwind/lib.rs:102
principal_1   |   16: std::rt::lang_start_internal
principal_1   |              at libstd/panicking.rs:289
principal_1   |              at libstd/panic.rs:392
principal_1   |              at libstd/rt.rs:58
principal_1   |   17: std::rt::lang_start
principal_1   |              at libstd/rt.rs:74
principal_1   |   18: main
principal_1   |   19: __libc_start_main
principal_1   |   20: _start
@lacabra lacabra added the enhancement New feature or request label Mar 22, 2019
@lacabra
Copy link
Contributor Author

lacabra commented Apr 2, 2019

@elichai: I have seen this error thrown by core as well. Upon retry, the request would succeed, but there are not retries built in.

@fredfortier
Copy link
Contributor

@lacabra There principal node isn't explicitly retrying these requests. It is calling equote::get_report. You are correct that get_report does not implement any retry mechanism. Do you suggest that the ppal node should retry or including the retry logic in equote::get_report? The latter seems more appropriate to me unless we have reasons to do otherwise.

@lacabra
Copy link
Contributor Author

lacabra commented Apr 22, 2019

@fredfortier As I mentioned above both core and the KeyMgmt suffer from the same issue, so I would suggest to implement the retry in their common code, presumably in equote::get_report as you suggest

@AvishaiW
Copy link
Contributor

@lacabra are you still experiencing this issue? I'm interested in making order in the KM issues. Thanks!

@AvishaiW AvishaiW assigned AvishaiW and unassigned fredfortier Jul 31, 2019
@lacabra
Copy link
Contributor Author

lacabra commented Aug 2, 2019

I have not experienced this particular issue recently, but I don't think this has been addressed either, and in my opinion is worth improving: adding retrials for both core and the km to get the report from our IAS proxy. We are trying to avoid that if for any reason the first attempt fails, either piece of code would crash and bring down the node.

@lacabra
Copy link
Contributor Author

lacabra commented Oct 3, 2019

@AvishaiW (Cc: @moriaab): I have experienced this issue twice right now, once with core and once with the km on different runs. Then it went away. It appears that the Remote Attestation proxy was momentarily unreachable. Can you implement a retry (ideally with an exponential backoff like we have in other parts of contract and p2p for other requests that depend on fetching requests across the network)?

As far as I can tell it is hard to reproduce because it depends on sgx.enigma.co being temporarily down.

OnlyDragon0403 added a commit to OnlyDragon0403/enigma-core that referenced this issue Jul 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants