Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for v0.7 support #13

Merged
merged 4 commits into from
May 11, 2018
Merged

Update for v0.7 support #13

merged 4 commits into from
May 11, 2018

Conversation

ScottPJones
Copy link
Member

@ScottPJones ScottPJones commented May 9, 2018

Please take a look - there are 5 broken tests in v0.7, it seems all because the v0.7 compiler is smarter, and in your test cases, the strings already just have a single copy (it seems it's better about making a single copy of a string, when there are more than one copy of the literal in a method)

@ScottPJones ScottPJones requested a review from oxinabox May 9, 2018 20:45
@codecov-io
Copy link

codecov-io commented May 9, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@335af70). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##             master    #13   +/-   ##
=======================================
  Coverage          ?   100%           
=======================================
  Files             ?      1           
  Lines             ?     20           
  Branches          ?      0           
=======================================
  Hits              ?     20           
  Misses            ?      0           
  Partials          ?      0
Impacted Files Coverage Δ
src/InternedStrings.jl 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 335af70...481ba7d. Read the comment docs.

@oxinabox
Copy link
Member

My thoughts on breaking string literal automatic interning is probabably to wrap them in a function that writes them to a tempfile and reads them back. Almost Impossible for that to optimise away. See also the use function in some other tests.

I see you've got rid of the core functionality function. Kinda makes sense. That was a legacy from when their was more type related stuff

@ScottPJones
Copy link
Member Author

I also like to split things up, keep the top level module file pretty simple and include the rest, but when it's just a single file, and that short, it seemed like more hassle than it was worth.
I hope you didn't mind the reorg!

@ScottPJones
Copy link
Member Author

If you think it's OK, please merge (I have a rule about not merging my own changes, even when I could, except when I'm the only person on a project 😄)

@oxinabox
Copy link
Member

oxinabox commented May 11, 2018

I didn't get time to properly review yesterday so I'll give it a second of today and then I merge

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some basic stuff, but more to come now I get what is happening in 0.7.
See following comment

@@ -27,9 +27,9 @@ matrix:

## uncomment the following lines to override the default test script
#script:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops something I missed when I originally renamed it, good catch

@@ -82,7 +86,7 @@ There is an issue though:
How much are these tokens costing you in memory use?

Originally you had say a 100MB (10⁸ bytes) text file (multiply this out as required).
Which as a String took-up (10⁸ bytes + 1 pointer (4 bytes) + 1 length marker (4 bytes) + null terminating character (total 10⁸ + 9 bytes).
Which as a String took-up (10⁸ bytes + 1 pointer (4 or 8 bytes) + 1 length marker (4 or 8 bytes) + null terminating character (total 10⁸ + 9 (or 17) bytes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean (or 16)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a first pass and fixing the calculations.
The actual memory usage is a lot more complex.
The String type (in v0.6 and later) takes up (on 64-bit platforms) div(sizeof(str) + 1 + sizeof(Int) + 15, 16) * 16 (at a minimum, I'm not sure whether it can allocate 48 bytes, for example, I'll have to check, but it's always a multiple of 16 [I think that's a multiple of 8 on 32-bit systems])

@@ -27,9 +27,9 @@ matrix:

## uncomment the following lines to override the default test script
#script:
# - julia -e 'Pkg.clone(pwd()); Pkg.build("StringInterning"); Pkg.test("StringInterning"; coverage=true)'
# - julia -e 'Pkg.clone(pwd()); Pkg.build("InternedStrings"); Pkg.test("InternedStrings"; coverage=true)'
after_success:
# push coverage results to Coveralls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use just CodeCov and not Coveralls?
I don't see the point in having both, and I'm a bit more used to CodeCov

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, I've just copied that for that last 3 years from other packages!
I'll remove Coveralls.



macro i_str(s)
true_string_expr = esc(Meta.parse(string('"', unescape_string(s), '"')))
Copy link
Member

@oxinabox oxinabox May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment is likely wanted here explaining that is is making interpolation happen
Probably something I missed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add.


export @i_str, intern

include("corefunctionality.jl")


Base.@deprecate_binding(InternedString, String, true)
#InternedString(s)=intern(String(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented out-line should be deleted, (that is my bad from before)

@oxinabox
Copy link
Member

oxinabox commented May 11, 2018

Also you've broken the ability to do julia test/corefunctionality.jl,
because the test files are nolonger valid in isolation

I kinda see why that is needful, since there is shared code now in runtests.jl
It will probably do.
It isn't like these tests take ages so you want to run them in isolation.
The alternative would be to define a test/helpers.jl and include that in each file.

Not the main point though that I wanted to add, see Next comment

@oxinabox
Copy link
Member

So I believe most of what is going wrong in 0.7'st tests is
the result of JuliaLang/julia#22193

because it means for equal strings they will be === and have same object_id
So using object_id to compare them won't get one anywhere.

What wants to be done is to use pointer to compare them

julia> k = "AB"
"AB"

julia> m = string(Char(65))*"B"
"AB"

julia> pointer(k)
Ptr{UInt8} @0x00007fc26cb40df8

julia> pointer(m)
Ptr{UInt8} @0x00007fc26cb43658

julia> objectid(k)
0x4bc65251d1065455

julia> objectid(m)
0x4bc65251d1065455

julia> k===m
true

so rather than object_id_eq, lets have addr_eq(a, b) = pointer(a)==pointer(b).

And that will probably do right in 0.7.

If problems persist with literals being actually automatically the same reference,
replace the literals with v*"" which will break the automatic reference detection.


julia> bar() = pointer("a") == pointer("a")
bar (generic function with 1 method)

julia> bar()
true

julia> foo() = pointer("a"*"") == pointer("a")
foo (generic function with 1 method)

julia> foo()
false

@oxinabox oxinabox mentioned this pull request May 11, 2018
Copy link

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move from corefunctionality.jl to InternedStrings.jl is annoying because it's hard to review what actually changed in the core methods, but this LGTM.

How serious are the newly introduced @test_brokens?

@oxinabox
Copy link
Member

Ideally it would have been done in a separate PR but no big deal.
AFAICT there are no changes beyond the compat stuff at the top of the file..

The @test_brokens are problems with the tests, not with the code.
See #13 (comment)
They are all fixable and will be removed before this is merged (and I suspect the commit just made did so)


@test objectid(intern("Gold")) == target_id
@test pointer(intern(SubString("Gold", 1))) == target_addr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a different test.
A useful test, but a different one, because interning SubStrings works differently
However it doesn't exactly matter because the test was redundant anyway.
This whole test suite is basically checking the example from the readme works still.

And it is now out of sync with that, but I'll replace it anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't realize that interning SubStrings was handled differently, I was just trying to break the connection between the string literals.

@ScottPJones
Copy link
Member Author

I am surprised that str*"" isn't optimized away at some point.

@oxinabox
Copy link
Member

One day it will be, but not today.
I guess the uniqueifying happens before constant propagation

@oxinabox oxinabox merged commit 34e9c24 into master May 11, 2018
@ScottPJones ScottPJones deleted the spj/v07update branch May 11, 2018 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants