Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize base/string.jl, base/utf*, test/strings.jl, test/unicode.jl #11925

Merged
merged 1 commit into from
Jul 10, 2015

Conversation

ScottPJones
Copy link
Contributor

The monolithic string.jl has been split up into several files,
and the test files in strings.jl and unicode.jl have been made
to correspond with the files of the same names in base.
This will prevent a lot of manual merging that was previously necessary.

This is a preface to trying to get the string coverage to 100%

The only changes are splitting the files up, placing them into folders, and adding a top file that
includes the files in the folder.

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

The fact that things are in subfolders now is a little bit nicer than #11908, but I still don't think anyone is going to like the number of small files here. Base generally isn't organized as a single type per file. How about combining the various special string types together, into SpecialStrings.jl or something like that.

This will prevent a lot of manual merging that was previously necessary.

Given how git works, I really don't see how this is the case. Yes you get silly little conflicts at the end of a file sometimes, those are the easy ones and really not an issue. Rebasing PR's uses up a bit of CI time, but it also tests things in a closer merge state than when they were originally opened. Adding more files seems to me that it would be more likely someone would modify the ends of multiple files, causing this to happen more often, not less.

@JeffBezanson
Copy link
Member

This is better than the previous version.

Unrelated to this PR, but honestly I'd rather remove RepString, RevString, and RopeString.

I don't think we need utf/types.jl. Since we have utf16.jl and utf32.jl, we might as well put the corresponding type definitions in those files. However I wasn't paying attention when that file was introduced so I might be missing something.

Also looks like escape.jl and strio.jl could be combined.

@ScottPJones
Copy link
Contributor Author

@tkelman About #11908, I was in the process of moving things to folders when it was closed on me (I'd already moved the test files, see last commit in #11908). That hasn't been my experience with git.
I've been having to constantly rebase and do manual merging, on a daily basis, and not just for things at the end of a file. Having more files that are logically separate means that it is more likely that two people working on strings will not conflict (one working on printing, the other on searching, for example).

@JeffBezanson "better than the previous version" this is exactly what I was just about to push to the "previous version" when it got closed, and since I'm still fairly new with git and github, recreating it in a new branch ended up wasting a lot of my time and getting me frustrated. Frustration has passed, just want to move on now.
utf/types.jl is because the type definitions are needed early on, before the stuff in string and iobuffer,
and they are needed before utf/checkstring.jl, which is needed before utf/utf16.jl and utf/utf32.jl
The utf changes was pure renaming in base (in test, it was splitting out so the test files match the source files).

You're correct about escape.jl and strio.jl. I didn't see that at first because they were in different parts of the original string.jl. I'm rerunning the unit tests with that change right now, will push shortly.

@ScottPJones
Copy link
Contributor Author

@JeffBezanson OT about Rep/Rev/Rope, I'd already been bugging @tkelman about getting rid of Rope (it's not used anywhere, not in julia, not in any registered package, same thing is true of Rep).
He wanted me to move it into a package, but I haven't created any packages before, I can't create a package in JuliaLang, and I don't want to maintain code that I don't even think is needed.
RevStrings is used in a few places, once in REPLCompletions, once in shell, 5 times in strings/search. It isn't used in any registered packages either.
I'd be happy to simply delete Rep/Rope (there are no tests for them either, it would improve coverage stats also!). What do you think?

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

That hasn't been my experience with git.
I've been having to constantly rebase and do manual merging, on a daily basis, and not just for things at the end of a file. Having more files that are logically separate means that it is more likely that two people working on strings will not conflict (one working on printing, the other on searching, for example).

This is because many of your proposed changes have been too big and contentious. The likelihood of needing to rebase to fix a conflict in an open PR is proportional to how many lines of code that PR touches (and it's an increasing function of the amount of time the PR is open), and has almost nothing to do with the number of files. This reorganization will not help reduce the number of conflicts in any way.

@ScottPJones
Copy link
Contributor Author

@tkelman I've had that problem with rebasing even with small changes, it hasn't just been the children of #11004. The problem usually has been just with the way test/strings.jl was just a dumping place for any string related tests. This reorganization will help that quite a bit, if you look, the files in test exactly correspond to the files in base now, and in fact, the file base/string.jl is identical to test/string.jl.
(Previously splitting those up into even smaller logical units actually showed me that a number of things don't have any unit tests, something that has become hidden again by reducing the number of separate files in the strings folder!)

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

So what's the general nature of the conflicts you've needed to resolve then?

I'm not opposed to reorganization, but we're going to need to find somewhere between master and a dozen small files. I know this sounds arbitrary, but can we find a way to do this while only adding around 5-6 new files to base?

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

To clarify, by 5-6 new files I mean net additional files under base. Reorganizing the tests so the structure matches that in base is also good, I'm mostly focusing on converging to an agreeable structure of base here then the test changes can fall out of that.

@quinnj
Copy link
Member

quinnj commented Jun 29, 2015

Scott, there are 55 other collaborators for JuliaLang that more or less have the ability to review and accept pull requests. When a PR is submitted that receives immediate feedback that it's not the right direction, and your response is a lengthy paragraph continuing to argue, it becomes a waste of everyone's time (particularly when dealing with more stylistic decisions where it's hard to be "objectively better"). You made no indication you were considering the suggested feedback, hence why the PR was closed. Note that you can continue to work on a branch, even if a PR is closed and it can be reopened later.

@ScottPJones
Copy link
Contributor Author

Just stupid stuff, that perforce would have handled automatically, I'm not sure why git doesn't.
Sometimes it was because of mega changes outside of my control (Unions, Tuples, etc.),
other times it was just conflicts in test.
Since I do complete testing after any change (even one liners from rebasing), the turnaround is quite high for me.

@ScottPJones
Copy link
Contributor Author

@quinnj One paragraph explaining why I'd structured things as I did, after only 4 comments from contributors, 1 that was just a comment about MATLAB, without any suggestions as what to do, and 1 was positive about the direction (just saying that the number was too much, which I was addressing when you closed it). Giving somebody busy at least a day or two to respond would have been courteous.

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

@ScottPJones much of the continuing friction is due to your responses to feedback. A simple "I have another change coming where I'm going to modify A,B,C, hold on" would have sufficed as opposed to #11908 (comment). Given the feedback you had received up to that point, I don't think you were going to convince anyone to like the level of granularity you had gone to.

@ScottPJones
Copy link
Contributor Author

@tkelman I do think I've been very responsive to any constructive suggestions. I push back to make sure that the suggestion really is OK, esp. if it goes directly against my own experience.

@ScottPJones
Copy link
Contributor Author

Also, up til the point of my last commit (which, because of some screwup on my part due to my newness with git, and needing to rebase because of fixes in master that had prevented me from running my tests, had lost half of the change, where I moved the base files into the strings folder and merged a few, the PR was clearly marked as WIP, which means I'm still responding to the review suggestions)

@ScottPJones
Copy link
Contributor Author

@tkelman In the comment you mentioned, I tried to explain my reasoning behind the way I was organizing things. I even implied that I was going to change things further (but people had complained about doing more than one thing in a PR, so I said that it wasn't in that particular PR).
I think people need to be given at least time to respond to things from reviewers, and have the reviewers respond to that, before somebody pulls the plug on anything. I've seen other complaints recently about things being merged or closed in less than a day, which is a big problem with a global community.

Is there anything else that needs to be done to this? (@ninjin is also working on coverage for different parts of strings, so it would be good to get this in sooner rather than later, to avoid extra work from either of us)

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

Yes, this needs to not add a dozen new small files to base. 5 or 6 would probably be okay.

@StefanKarpinski
Copy link
Member

Yes, this needs to not add a dozen new small files to base. 5 or 6 would probably be okay.

+1

@ScottPJones
Copy link
Contributor Author

What change are you looking at? This adds just 6 files in the folder strings, basic, search, parse, strutil, strio, and cstring. Except for cstring, they are all fairly large, hundreds of lines. cstring could probably go into basic, even though logically it didn't seem to belong.
There are completely separate types, SubStrings, RevStrings, RepStrings, and RopeStrings, which I've already commented on, two should simply be removed from base completely. I was hoping that @JeffBezanson would respond about that.
Any other changes were just renaming files so that all of the utf* files are in the utf directory, so there's actually a net loss of files in Base, even with the files that I want to delete (RepStrings & RopeStrings)

@StefanKarpinski
Copy link
Member

I think that if you combine the RopeString, RevString and RepString stuff into a single file named stringtypes.jl or something like that (on both the implementation and testing sides), then this would be ok. The logical separation is looking pretty reasonable at this point.

@ScottPJones
Copy link
Contributor Author

What do you think about cstring (which is the only small one in the strings folder)?
On further thought, it really doesn't belong in the strings folder, maybe it should be added to either
ascii.jl or utf/utf8.jl, as it is a concrete type, and the rest of the strings folder is about AbstractString.
I'll merge the Rope/Rep/RevString (although I still want to hear from Jeff about killing Rope & Rep, if that could happen, it would just be SubString and RevString, which aren't small).

@ScottPJones
Copy link
Contributor Author

Merging RopeString, RepString on the testing side is easy, there is no testing currently!
(another reason I wish them oblivion 😀)

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

I'd also prefer unicode as a folder name and top-level base file, versus the context-free utf.

@StefanKarpinski
Copy link
Member

If we get rid of one of those types, we'll probably get rid of all of them at the same time.

I'd also prefer unicode as a folder name and top-level base file, versus the context-free utf.

+1

@ScottPJones
Copy link
Contributor Author

OK, will rename from utf to unicode as well.

@ScottPJones
Copy link
Contributor Author

Is there any way to inform git that I'm merging a file into another? I think part of the difficulty of reviewing this refactoring is that it doesn't seem to handle merging or splitting nicely, but maybe I'm not using git correctly. Thanks!

@StefanKarpinski
Copy link
Member

No, the git model is that history is a series of tree snapshots – any notion of code motion is purely heuristic and only used for presentation or efficient storage.

@ScottPJones
Copy link
Contributor Author

Ugh... I do miss perforce sometimes!

@JeffBezanson
Copy link
Member

JeffBezanson commented Jun 29, 2015 via email

@ScottPJones
Copy link
Contributor Author

Really, just kill them? 😀

@ScottPJones
Copy link
Contributor Author

OK, I have things all set to push with RopeString added back (now in stringtypes.jl (base & test)), but I'm waiting to see if the Appveyor and Travis-CI runs that are already in progress finish.
@ViralBShah I did check that absolutely no registered package uses RopeString, but I suppose somebody might have used it in private code.

@ScottPJones
Copy link
Contributor Author

OK, all tests passed, I'm pushing the version with RopeString added back (which passed locally [there is only a single line of testing for RopeString]). There are no substantive changes in this PR, it's just reorganization to make things more consistent and make life easier for those of us trying to increase testing coverage for strings.

@stevengj
Copy link
Member

stevengj commented Jul 7, 2015

I agree that the R*String types should not be removed in this PR, just to keep it focused. Deleting those should be a separate PR. I wouldn't be opposed to merging that separate PR before 0.4, however, since those string types are used in literally zero registered packages at this point.

@ScottPJones
Copy link
Contributor Author

It no longer removes anything, this is now strictly a cleaner restructuring of the files to help us split up the string coverage work.

@quinnj
Copy link
Member

quinnj commented Jul 7, 2015

A few comments (this is shaping up nicely):

  • let's move strings/parse.jl to just base/parse.jl; that functionality is more about parsing than strings
  • rename strio.jl to just io.jl, strutil.jl to util.jl; they're already in the new strings directory
  • it's probably ok where it is, but the cmp and isequals code feels like basic.jl rather than util.jl to me
  • This may have been discussed elsewhere, but I would prefer base/stringtypes.jl to be strings/types.jl

@StefanKarpinski
Copy link
Member

+1 to everything @quinnj said

@ViralBShah
Copy link
Member

Also will need a squash.

@ScottPJones
Copy link
Contributor Author

Into a single commit? I'd been told to wait to do that until a PR was ready to be merged, if you think it's ready now, I'll squash however you want.

@ViralBShah
Copy link
Member

It appears from the discussion that this is getting closer, so perhaps it would be a good time to do so.

@ScottPJones
Copy link
Contributor Author

OK, will do.

@ScottPJones
Copy link
Contributor Author

OK, all tests passed, and I think I've done everything people have asked for. Thanks for reviewing!

@stevengj
Copy link
Member

stevengj commented Jul 7, 2015

LGTM.

@quinnj
Copy link
Member

quinnj commented Jul 10, 2015

Looks like this needs a rebase, then LGTM

The monolithic string.jl has been split up into several files,
and the test files in strings.jl and unicode.jl have been made
to correspond with the files of the same names in base.
This will prevent a lot of manual merging that was previously necessary.
Merge Sub/Rev/Rep/RopeStrings into strings/types.jl, for base and test
@ScottPJones
Copy link
Contributor Author

All rebased, squashed to a single commit, passing all tests, 🎉 Thanks again to all reviewers, it's greatly appreciated!

pointer{T<:ByteString}(x::SubString{T}, i::Integer) = pointer(x.string.data) + x.offset + (i-1)
pointer(x::Union{UTF16String,UTF32String}, i::Integer) = pointer(x)+(i-1)*sizeof(eltype(x.data))
pointer{T<:Union{UTF16String,UTF32String}}(x::SubString{T}) = pointer(x.string.data) + x.offset*sizeof(eltype(x.data))
pointer{T<:Union{UTF16String,UTF32String}}(x::SubString{T}, i::Integer) = pointer(x.string.data) + (x.offset + (i-1))*sizeof(eltype(x.data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pointer methods should be in base/pointer.jl. It seems a little obscure to put some of these methods in utf32.jl. Doesn't need to hold up the PR here, but a change we should probably make.

stevengj added a commit that referenced this pull request Jul 10, 2015
Reorganize base/string.jl, base/utf*, test/strings.jl, test/unicode.jl
@stevengj stevengj merged commit ffb4ec4 into JuliaLang:master Jul 10, 2015
@tkelman
Copy link
Contributor

tkelman commented Jul 10, 2015

The merge commit here froze during bootstrap https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.6734/job/41e1ujwp78e22yaw

the PR build passed, at e3b1828 - not that many commits between the two

@ScottPJones ScottPJones deleted the spj/string branch July 10, 2015 23:45
@kshyatt
Copy link
Contributor

kshyatt commented Jul 13, 2015

On my latest Coveralls build the string coverage is looking really great. Awesome work, @ScottPJones.

@malmaud
Copy link
Contributor

malmaud commented Jul 15, 2015

This PR clobbered #11898, possibly other PRs that touched the string files.

@tkelman
Copy link
Contributor

tkelman commented Jul 15, 2015

would've been ideal if there were a script or some other mapping to actually look at the diff piece by piece and really verify that this was strictly organizational at the time of merging

@yuyichao
Copy link
Contributor

A Google search leads to this and similar ones which suggests that git diff -M/-C should work.

Edit: although I just give it a try and seems that it only work on file level (so splitting files won't work).

This was referenced Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants