Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent test failures (setfield: expected Ptr{None}, got Symbol) #3956

Closed
timholy opened this issue Aug 6, 2013 · 27 comments
Closed

Inconsistent test failures (setfield: expected Ptr{None}, got Symbol) #3956

timholy opened this issue Aug 6, 2013 · 27 comments
Labels
parallelism Parallel or distributed computation

Comments

@timholy
Copy link
Member

timholy commented Aug 6, 2013

Anyone else seeing this? (35902fc)

make testall
    JULIA test/all
        From worker 3:       * keywordargs
        From worker 4:       * numbers
        From worker 5:       * strings
        From worker 2:       * core
        From worker 3:       * unicode
        From worker 2:       * collections
        From worker 3:       * hashing
        From worker 5:       * remote
        From worker 3:       * iostring
        From worker 5:       * arrayops
        From worker 3:       * linalg
        From worker 2:       * blas
        From worker 2:       * fft
        From worker 2:       * dsp
        From worker 5:       * sparse
        From worker 2:       * bitarray
        From worker 5:       * random
        From worker 5:       * math
        From worker 4:       * functional
        From worker 4:       * bigint
        From worker 4:       * sorting
        From worker 5:       * statistics
        From worker 5:       * spawn
        From worker 5:         [stdio passthrough ok]
        From worker 5:       * parallel
        From worker 4:       * priorityqueue
        From worker 4:       * arpack
        From worker 2:       * file
        From worker 4:       * suitesparse
        From worker 2:       * version
        From worker 2:       * resolve
        From worker 4:       * pollfd
        From worker 4:       * mpfr
        From worker 2:       * broadcast
        From worker 4:       * complex
        From worker 3:       * socket
        From worker 4:       * floatapprox
        From worker 4:       * readdlm
Worker 3 terminated.
ERROR: type: setfield: expected Ptr{None}, got Symbol
 in deserialize at serialize.jl:476
 in deserialize at serialize.jl:439
 in handle_deserialize at serialize.jl:300
 in anonymous at task.jl:842

ERROR: ProcessExitedException()
 in yield at multi.jl:1493
 in wait at task.jl:105
 in wait_full at multi.jl:548
 in remotecall_fetch at multi.jl:648
 in remotecall_fetch at multi.jl:653
 in anonymous at multi.jl:1335
at /home/tim/src/julia/test/runtests.jl:20

make[1]: *** [all] Error 1
make: *** [testall] Error 2

But it works fine if I say env JULIA_CPU_CORES=1 make testall.

@StefanKarpinski
Copy link
Member

Yikes, that's familiar. I was getting that all week last week but then it went away. Do you have any environment variables set?

@timholy
Copy link
Member Author

timholy commented Aug 6, 2013

Hmm. Now I can't get it to do it at all. What's weird is that I don't think I updated or changed anything.

Do you have any environment variables set?

Many :-). But nothing that looks suspicious. The only thing in my juliarc.jl is a single push!(LOAD_PATH,...) command. None of the files in that directory have the same name as anything in base/.

@pao
Copy link
Member

pao commented Aug 6, 2013

Now I can't get it to do it at all.

That's a funny definition of "consistent." :) #3888 was Stefan's run-in with this one. Title updated.

@timholy
Copy link
Member Author

timholy commented Aug 6, 2013

It's consistently-inconsistent :). It's odd because it failed for me 6 times in a row before I submitted this issue. Now it has passed for me in the last 9 trials.

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2013

I wonder if this is some sort of race condition in the I/O system. @loladiro did we see this before your no-copy updates?

@StefanKarpinski
Copy link
Member

And I'm seeing this again now. It's really weird – it's not random but seems to come and go – for long stretches I'll see it and then I won't. It feels more like a memory issue than a race condition.

@kmsquire
Copy link
Member

Just ran into this. I find I can trigger consistently (for now) it as long as I ran at least one other test:

~/src/julia/test$ julia runtests.jl socket 
     * socket
    SUCCESS
~/src/julia/test$ julia runtests.jl socket complex
    From worker 2:       * socket
    From worker 3:       * complex
Worker 2 terminated.
ERROR: type: setfield: expected Ptr{None}, got Symbol
 in deserialize at serialize.jl:476
 in deserialize at serialize.jl:439
 in handle_deserialize at serialize.jl:300
 in anonymous at task.jl:835

ERROR: ProcessExitedException()
 in yield at multi.jl:1512
 in wait at task.jl:105
 in wait_full at multi.jl:545
 in remotecall_fetch at multi.jl:645
 in remotecall_fetch at multi.jl:650
 in anonymous at multi.jl:1353
at /home/kmsquire/Source/julia/test/runtests.jl:20

@timholy
Copy link
Member Author

timholy commented Aug 14, 2013

I got it this morning, with make testall. make testall1 passed.

@kmsquire
Copy link
Member

Minimal test case:

ids = addprocs(1)
@everywhere using Base.Test
remotecall_fetch(ids[1], Core.include, "socket.jl")

@Keno
Copy link
Member

Keno commented Aug 14, 2013

I have a vague suspicion that this has to do with the getaddrinfo test. Would you mind putting say a println before and after it to verify.

@kmsquire
Copy link
Member

So it turns out that, on some systems, looking up the ip address of "foo.bar" doesn't fail, and returns an actual address. (Go figure!) This causes the last statement of socket.jl to succeed and return a TcpSocket, which is what causes this error.

This also explains why it's intermittent.

Adding nothing to the end of test/socket.jl fixes the error.

@kmsquire
Copy link
Member

But there might be a better fix, e.g., in testing the return value from pmap?

@kmsquire
Copy link
Member

I don't have time to look into this more right now, so I'll leave it to someone else to commit the proper fix.

@Keno
Copy link
Member

Keno commented Aug 14, 2013

So the actual problem is in serializing TcpSockets (I guess that makes sense)? Also is foo.bar actually valid, or are we misinterpreting libuv return codes which makes it think it's valid?

@kmsquire
Copy link
Member

It's not actually valid, but some DNS servers return an ip address anyway, perhaps to redirect traffic or for some other nefarious (or possibly benign but annoying) purpose. For example, you can set your dns server to 208.67.222.222 or 208.67.220.220 to see what I mean.

@pao
Copy link
Member

pao commented Aug 14, 2013

Perhaps we should be using a name guaranteed to be invalid, such as foo.invalid (the .invalid TLD is reserved for this purpose.)

@pao
Copy link
Member

pao commented Aug 14, 2013

...of course I don't know if that's sufficient defense against NXDOMAIN redirects.

@Keno
Copy link
Member

Keno commented Aug 14, 2013

Good point @pao

@kmsquire
Copy link
Member

That seems to work with the servers I just posted!

@StefanKarpinski
Copy link
Member

Damn. I guess that $40 I spent to reserve julia.invalid was a waste.

kmsquire added a commit that referenced this issue Aug 15, 2013
* Hopefully address NXDOMAIN redirect issue causing #3956
* Still need to address TcpSocket serialization
@JeffBezanson
Copy link
Member

Let's scrap this and set up a business selling .invalid TLDs.

@kmsquire
Copy link
Member

So it turns out that even with "foo.invalid", I still sometimes get a valid connection opening because of a NXDOMAIN redirect....

@pao
Copy link
Member

pao commented Aug 19, 2013

Several arcseconds of a circle of hell should be reserved for DNS operators who do this.

@staticfloat
Copy link
Member

I'm a little confused as to the purpose of this test. Is it to ensure that DNS name resolution on nonexistent domains fails? I'm not sure that's a good test, as we're completely at the mercy of the DNS providers, as we've seen.

If it's to ensure that we're picking up errors from getaddrinfo properly, we could just go the whole nine yards and pass in a completely invalid name:

julia> getaddrinfo(".invalid")
ERROR: getaddrinfo callback: system error (EAI_SYSTEM)
 in getaddrinfo at socket.jl:405

I think the chances of a DNS server responding to that kind of query are somewhat lower, (I haven't found a server that does so, but if you can find one, I will tip my hat to you, and sign your infernal petition)

@Keno
Copy link
Member

Keno commented Aug 19, 2013

It's to test the error case in getaddrinfo, so that should do.

@StefanKarpinski
Copy link
Member

I'm still with @pao on the hell thing though.

@staticfloat
Copy link
Member

At last, we are no longer beholden to those most infernal of DNS providers in our tests. The people let loose a collective sigh of relief, and there was much rejoicing.

IanButterworth pushed a commit that referenced this issue Jul 22, 2024
Stdlib: Pkg
URL: https://github.com/JuliaLang/Pkg.jl.git
Stdlib branch: master
Julia branch: master
Old commit: d801e4545
New commit: 6b4394914
Julia version: 1.12.0-DEV
Pkg version: 1.12.0
Bump invoked by: @IanButterworth
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaLang/Pkg.jl@d801e45...6b43949

```
$ git log --oneline d801e4545..6b4394914
6b4394914 Use more internal Pkg.add api to bypass auto-registry-install (#3941)
6002a29de Pkg.test: document that coverage can be a string (#3957)
77f0225b8 don't use `get_extension` to bridge REPLExt to REPLMode (#3959)
e6880bc9d add clarifying comment about source_path being the package root (#3956)
b1b4df8d8 Fix codeblock language and prompt in Pkg.status() docstring (#3955)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

8 participants