Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in multi-threaded application with GoogleCloud.jl and HTTP.jl #240

Closed
nlw0 opened this issue Jul 4, 2022 · 8 comments
Closed

Failure in multi-threaded application with GoogleCloud.jl and HTTP.jl #240

nlw0 opened this issue Jul 4, 2022 · 8 comments

Comments

@nlw0
Copy link
Contributor

nlw0 commented Jul 4, 2022

I'm working with GoogleCloud.jl, which depends on MbedTLS.jl. Sometimes I've been getting errors like this:

┌ Error: Exception in task
│   exception =
│    MbedTLS error code -17280: RSA - The PKCS#1 verification failed
│    Stacktrace:
│      [1] mbed_err(ret::Int32)

┌ Error: Exception in task
│   exception =
│    MbedTLS error code -17162: RSA - The private key operation failed : BIGNUM - The input arguments are negative or result in illegal output
│    Stacktrace:
│      [1] mbed_err(ret::Int32)
│        @ MbedTLS ~/.julia/packages/MbedTLS/4YY6E/src/error.jl:17
│      [2] macro expansion
│        @ ~/.julia/packages/MbedTLS/4YY6E/src/error.jl:4 [inlined]
│      [3] sign!
│        @ ~/.julia/packages/MbedTLS/4YY6E/src/pk.jl:90 [inlined]
│      [4] sign(ctx::MbedTLS.PKContext, hash_alg::MbedTLS.MDKind, hash::Vector{UInt8}, rng::Random.MersenneTwister)
│        @ MbedTLS ~/.julia/packages/MbedTLS/4YY6E/src/pk.jl:100
│      [5] SHA256withRSA
│        @ ~/src/vdm-service/GoogleCloud.jl/src/session.jl:26 [inlined]
│      [6] JWS(credentials::GoogleCloud.credentials.JSONCredentials, claimset::GoogleCloud.session.JWTClaimSet, header::GoogleCloud.session.JWTHeader)
│        @ GoogleCloud.session ~/src/vdm-service/GoogleCloud.jl/src/session.jl:140
│      [7] JWS
│        @ ~/src/vdm-service/GoogleCloud.jl/src/session.jl:139 [inlined]
│      [8] token(credentials::GoogleCloud.credentials.JSONCredentials, scopes::Vector{String})
│        @ GoogleCloud.session ~/src/vdm-service/GoogleCloud.jl/src/session.jl:148
│      [9] authorize(session::GoogleCloud.session.GoogleSession{GoogleCloud.credentials.JSONCredentials}; cache::Bool)
│        @ GoogleCloud.session ~/src/vdm-service/GoogleCloud.jl/src/session.jl:181
│     [10] authorize
│        @ ~/src/vdm-service/GoogleCloud.jl/src/session.jl:177 [inlined]
│     [11] execute(session::GoogleCloud.session.GoogleSession{GoogleCloud.credentials.JSONCredentials}, resource::GoogleCloud.api.APIResource, method::GoogleCloud.api.APIMethod, path_args::String; data::Vector{UInt8}, gzip::Bool, content_type::String, debug::Bool, raw::Bool, max_backoff::Dates.Second, max_attempts::Int64, params::Base.Pairs{Symbol, String, Tuple{Symbol}, NamedTuple{(:fields,), Tuple{String}}})
│        @ GoogleCloud.api ~/src/vdm-service/GoogleCloud.jl/src/api/api.jl:241
│     [12] (::GoogleCloud.api.APIRoot)(resource_name::Symbol, method_name::Symbol, args::String; kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:session, :fields, :debug), Tuple{GoogleCloud.session.GoogleSession{GoogleCloud.credentials.JSONCredentials}, String, Bool}}})
│        @ GoogleCloud.api ~/src/vdm-service/GoogleCloud.jl/src/api/api.jl:218
│     [13] connect!(store::GoogleCloud.collection.KeyStore{String, CuboidService.PointCloud}; location::String, empty::Bool, debug::Bool)
│        @ GoogleCloud.collection ~/src/vdm-service/GoogleCloud.jl/src/collection.jl:79
│     [14] GoogleCloud.collection.KeyStore{String, CuboidService.PointCloud}(bucket_name::String; session::GoogleCloud.session.GoogleSession{GoogleCloud.credentials.JSONCredentials}, location::String, empty::Bool, gzip::Bool, key_format::Symbol, val_format::Symbol, debug::Bool)
│        @ GoogleCloud.collection ~/src/vdm-service/GoogleCloud.jl/src/collection.jl:73

I'm thinking it might be due to thread conflicts. Is MbedTLS thread-safe? Any recommendations how I might debug this? Or perhaps this might need to be fixed within GoogleCloud.jl?

@quinnj
Copy link
Member

quinnj commented Jul 4, 2022

Hmmmm.....looking through the involved code, it doesn't seem like there's anything obviously non-multithread-safe. Though I also can't find documentation that the call to mbedtls_pk_sign in the mbedtls C library is threadsafe. So possibly? It's possible that putting a lock in JSONCredentials around the private_key field might help; probably worth a shot.

HTTP.jl itself has had threadsafety issues in the 0.9 releases; the 1.0 release just came out that should resolve those, but I don't think GoogleCloud has updated for compatibility there yet (hopefully not too hard). But it doesn't look like this particular code path deals w/ HTTP.jl at all.

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 5, 2022

Thanks for taking a look, @quinnj . I'm looking at GoogleCloud.jl/src/session.jl:26, and considering I have a shared global GoogleSession object, I'm wondering if this can cause a problem. One attempt at using locks hasn't solved it, but maybe I need to change the approach. I'm not familiar with how TLS works, but maybe there's a random generator state that cannot be shared across threads? I'm not sure what parts of my code should go into a mutex in that case, or if I should have separate GoogleSession object. I'm not sure that'd be even even possible, so it's confusing.

And that might not be the whole story, because sometimes I get a segfault or a double-free error, the kind of thing we would really call non-thread-safe. But that's more difficult to reproduce.

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 5, 2022

GoogleCloud.jl methods can take an optional session argument, I believe using that with new sessions every thread has solved my problem. Thanks for the help!

@nlw0 nlw0 closed this as completed Jul 5, 2022
@nlw0 nlw0 reopened this Jul 8, 2022
@nlw0
Copy link
Contributor Author

nlw0 commented Jul 8, 2022

I thought the problem was gone after I started using separate session objects, but I've actually got the issue again. I'll try to come up with a minimal example...

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 8, 2022

I have created a short script that reproduces the issue, here's how it goes:

using GoogleCloud
using MD5, Base64

goocredentials = JSONCredentials("mycreds-123443211234.json")
mybucket = "mybucket-1234"
mypaths = ["..." for n in 1:23]  ## 54 different files

mydata = map(mypaths) do path
    # begin
    # @async begin
    Threads.@spawn begin
        goosession = GoogleSession(goocredentials, ["devstorage.full_control"])
        data = storage(:Object, :get, mybucket, path, session=goosession)
    end
end

allbytes = reduce(vcat, fetch.(mydata))
@show base64encode(md5(allbytes))

Running with julia -t1 it works, and using either begin or @async begin it works. With Threads.@spawn it reliably breaks with julia -t2.

I'm sorry I can't share my data, I can try to set up something public if it helps.

It's not always the same error. One of the errors I get is what I'm seeing most often in my real code, nested task error: MbedTLS error code -17280: RSA - The PKCS#1 verification failed. Other than that, I get things that look like memory allocation errors such as free(): corrupted unsorted chunks, signal (11): Segmentation fault and malloc(): smallbin double linked list corrupted

This looks to me like something in MbedTLS not being thread-safe, even with separate GoogleSession objects.

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 8, 2022

I don't know why I didn't see this before, but there really seem to be known issues with MbedTLS in multiple threads:
Mbed-TLS/mbedtls#3391
Mbed-TLS/mbedtls#3263

I created a PR on GoogleCloud.jl, but should anything be done in this library? At least a warning in the documentation, perhaps?

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 8, 2022

@quinnj
Copy link
Member

quinnj commented Jul 31, 2022

Should have been fixed via #245

@quinnj quinnj closed this as completed Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants