Skip to content

Conversation

@JackStouffer
Copy link
Contributor

@JackStouffer JackStouffer commented May 7, 2017

These functions are old D1 string functions from before std.regex's existence. With std.regex and std.algorithm, these functions are no longer needed, and their use should be discouraged, as the std.algorithm versions will be faster (better search algorithms, better handling of auto-decoding).

Plus, this removes a domain specific language that's in use only in one minor part of D.

@JackStouffer
Copy link
Contributor Author

looks like there are some uses of munch in std.xml. One sec

std/string.d Outdated
* See if character c is in the intersection of the patterns.
*/

deprecated("The std.string pattern functions are obsolete and will be removed May 2018. Use std.regex instead")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to use the deprecated comment, s.t. this can be grepped for easily, e.g.

 // Explicitly undocumented. It will be removed in May 2018. @@@DEPRECATED_2018-05@@@

CC @jmdavis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the wrong comment. Ideally, a comment like @@@DEPRECATED_2018-05@@@ is there so that it indicates when the function will be removed from the documentation, and I can grep for it. But lines with "Explicitly undocumented" are only for functions which are actually undocumented.

The normal procedure is to deprecate the function and mark it as deprecated in the documentation. Then after it's been deprecated for a year, the documentation is removed, and a comment like like the one you listed is put there instead. Then, after another year, the function is fully removed.

@JackStouffer JackStouffer force-pushed the pattern branch 2 times, most recently from 4d63b71 to e22aa21 Compare May 8, 2017 02:51
@JackStouffer
Copy link
Contributor Author

Ok, took a bit longer than expected, but I fixed std.xml

@DmitryOlshansky
Copy link
Member

DmitryOlshansky commented May 8, 2017 via email

@JackStouffer
Copy link
Contributor Author

Ok, I don't want to turn this into a discussion about proper benchmarking code so I deleted that comment.

Let's instead focus on the merit of the deprecations.

@wilzbach
Copy link
Contributor

wilzbach commented May 8, 2017

Let's instead focus on the merit of the deprecations.

Ok, this definitely requires the approval of @andralex.
I know too little about the history of std.string to be able to judge this, but they definitely look old as e.g. removechars would be named removeChars or removePattern nowadays.
However especially from the PoV that we want to get rid of auto-decoding this seams to me a step in the right direction!

@andralex
Copy link
Member

andralex commented May 9, 2017

On the plus side, we improve consistency of the stdlib API and we reduce its code size (albeit by a small amount). On the minus side, we deprecate existing uses without a simple migration path. Consider someone who's not the author of the project but a user. They'd need to start doing nontrivial code surgery on a foreign project to use regexes instead of patterns etc.

For this kind of evolution I want us to define a path for marginalizing/obsoleting the code without removing it. Mark it in a specific way in the documentation, later make it only accessible if you click on "See obsolete APIs", etc. That way new code will not use it but old code will continue to work. Some libraries do this with good results. cc @MartinNowak @CyberShadow @WalterBright

@andralex andralex removed the @andralex label May 9, 2017
@JackStouffer
Copy link
Contributor Author

Mark it in a specific way in the documentation, later make it only accessible if you click on "See obsolete APIs", etc.

Isn't this why undeaD exists?

@CyberShadow
Copy link
Member

+1 for moving these to undeaD.

FWIW an anecdotal data point: I do a lot of text processing in D and I've never used these functions.

@jmdavis
Copy link
Member

jmdavis commented May 9, 2017

As to the history, a lot of the std.string functions were in D1 and had names which did not match Phobos' current naming schemes, and several years ago, when I fixed the names in std.string to follow Phobos' current naming scheme, I did not fix the pattern functions, because the idea was that we were going to replace them with functions with the proper capitalization that used regexes instead of the patterns (and if we were going to do that, it made no sense to make folks change their code to use the new capitalization and then make them change their code to use std.regex). However, no one has ever done the work to create the new versions that use std.regex, so they've just sat there.

I'm all for moving them to undead, but if we do that, we'll need to go through the normal deprecation process for them in Phobos, not simply remove them. And if we remove them, it would be nice to replace them with versions that use std.regex, but someone would have to do that work, and I don't know how much use these functions get (I've never used them).

@JackStouffer
Copy link
Contributor Author

I think you'd be hard pressed to find something these functions can do that can't already be done with std.regex + std.algorithm

@jmdavis
Copy link
Member

jmdavis commented May 9, 2017

I think you'd be hard pressed to find something these functions can do that can't already be done with std.regex + std.algorithm

Then there isn't much point in replacing them in std.string.

@JackStouffer
Copy link
Contributor Author

Then there isn't much point in replacing them in std.string.

Exactly?

Maybe we're talking past each other, but I'm not sure if you're for or against having these functions go through the deprecated -> undocumented -> removed -> undeaD cycle.

@andralex
Copy link
Member

andralex commented May 9, 2017

The way I see evolution is marginalize -> undocument -> deprecate -> move to undead.

@jmdavis
Copy link
Member

jmdavis commented May 10, 2017

Maybe we're talking past each other, but I'm not sure if you're for or against having these functions go through the deprecated -> undocumented -> removed -> undeaD cycle.

I'm in favor of this except that it should go in undead immediately so that folks can just switch to the undead versions if they want to keep using them without continuing to see deprecation messages.

The way I see evolution is marginalize -> undocument -> deprecate -> move to undead.

The normal process has been to deprecate and then undocument and then remove. Then the folks who see the deprecation message can look at the documentation unless they're slow to update (and then they can look at the older docs, though that's a bit of a pain right now, since we don't really have versioned docs on the site), and I don't see much point in undocumenting without deprecating. Presumably, folks won't use functions in new code if they're deprecated, and undocumenting without deprecating won't lead to anyone updating their code for a while longer, whereas folks who know about the functions but don't look at the docs will continue to write code using them that they'll then have to change later once the functions are actually deprecated, meaning that more code would have to be changed than if we simply deprecated them now. And if we think that these functions merit being in undead, then putting them in undead when we deprecate them makes it easy for folks to update their code and continue on their merry way if they don't want to actually rework it to use std.regex.

We'd been doing deprecate -> undocument -> remove for years now, and as far as I can tell, it's worked just fine.

@JackStouffer
Copy link
Contributor Author

I'll make a PR for undeaD

@JackStouffer
Copy link
Contributor Author

dlang/undeaD#21

Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially with undead an easy migration path is provided and if people really want use these old functions, which I hardly can imagine, they can simply switch to undead.
If it's necessary / demanded by the community, we could always create a simple tool that parses a codebase for these symbols and replaces their occurrences with their respective ones from undead, which would mean there's a zero cost migration path and we can keep Phobos nice & clean and avoid becoming the next C++.

@JackStouffer
Copy link
Contributor Author

Changed the deprecation message so it's more likely that people will see the undeaD link.

@dlang-bot dlang-bot merged commit 7031ed1 into dlang:master May 12, 2017
@JackStouffer JackStouffer deleted the pattern branch May 12, 2017 18:50
@wilzbach
Copy link
Contributor

wilzbach commented Jun 8, 2017

@JackStouffer it would be very nice to have a short changelog entry about this, s.t. people have a handy guide, e.g.

munch -> countUntil
inPattern -> !str.matchFirst(regex(p)).empty
inPattern -> patterns.any!(p => !str.matchFirst(regex(p)).empty)
countchars -> str.matchAll(regex(p)).walkLength
removechars -> str.replaceAll(regex(p), "")
squeeze -> str.group.map!a[0]

@andralex
Copy link
Member

andralex commented Jun 8, 2017

Also, I confess I'm unhappy about this kind of deprecation whereby clunky but correct and perfectly working functions get eliminated. I much more prefer marginalization - i.e. first step is to work on the docs to make the functions more difficult to find and less appealing (e.g. by requiring an extra click to access, mentioning things like "This function is obsolete, prefer using xxxx in new code" etc. Can we do this going forward? It has most of the advantages of deprecation with fewer drawbacks. cc @MartinNowak

@jmdavis
Copy link
Member

jmdavis commented Jun 8, 2017

I have mixed feelings about that approach. In general, I think that marking something as going to be deprecated or scheduled for deprecation just means that folks use it in new code for longer, and more code will then have to be changed later, especially since many folks aren't going to see the note in the documentation. And when we're replacing one symbol with another, it tends to be detrimental to leave the old one around (even if we do have to actually give folks time to switch from one to the other), and delaying the deprecation, delays removing the old symbol.

However, in a case like this, where we're basically saying that we're just getting rid of existing functionality, I see no problem with trying to marginalize the functionality first. We really don't want something that we're looking to get rid of sitting around forever, but delaying it a bit usually doesn't matter much. And in the cases where we're replacing one symbol with another, I do try and make sure that we have a release where we have both the old and the new symbols with the old marked as "scheduled" for deprecation rather than actually deprecated so that it's actually possible to build code with both the latest release and master without getting deprecation messages. So, in cases like this, we could take the same approach, or even drag it out to more like 6 months if we really wanted to.

But I'm not at all convinced that we'll actually reduce the pain of deprecation that way, since some folks will continue to use the symbols in new code without realizing that they're going away, and for the most part, old code won't be changed until the compiler complains. However, if that's where we want to go, I'm fine with taking that approach when we're not replacing the symbol but simply removing it - especially if it's a symbol that's been around for a long time (as is the case here). But I'd prefer that we not go that route when we're actually replacing a symbol, especially when switching the code from one symbol to the other is easy.

@quickfur
Copy link
Member

FYI, this PR has broken Jenkins CI on all PRs, because dyaml uses one of the deprecated functions.

@wilzbach
Copy link
Contributor

FYI, this PR has broken Jenkins CI on all PRs, because dyaml uses one of the deprecated functions.

Huh?
D-YAML, EMSI and Vibe.d are broken due to a regression introduced in dlang/dmd#6816

The UI requires a little getting used to ;-)

@quickfur
Copy link
Member

Oh nevermind, I misread the script output. Missed the line immediately following that complained about the dtor not being @safe.

@JohanEngelen
Copy link
Contributor

@wilzbach wrote:

it would be very nice to have a short changelog entry about this, s.t. people have a handy guide,

In my opinion, it should be mandatory to have an extensive piece of documentation about how to migrate from the to-be-deprecated function to other functionality. That would also be useful for judging whether the deprecation is OK for all use cases. That documentation should be added to each of the deprecated functions, not in the release notes as it currently is.

@andralex I agree with your marginalization approach. That would perhaps also naturally lead to more documentation on the migration path.

@wilzbach
Copy link
Contributor

That documentation should be added to each of the deprecated functions, not in the release notes as it currently is.

I agree, I will try to ensure the next deprecations include a short link to a detailed explanation if the deprecation message is to short.
( I also put this on my list to add this to the contribution guidelines). Thanks for the reminder!

@jmdavis
Copy link
Member

jmdavis commented Oct 12, 2017

@wilzbach I would point out that in the vast majority of cases, when something is deprecated, the deprecation message tells you what to use, and this is a non-issue. Additionally, the documentation for a deprecated symbol normally tells you what to go use and at least some of the time explains why the symbol in question is being deprecated.

I think that we really only have a problem here, because we're essentially just ripping out functionality for which we don't have a direct replacement, requiring the programmer to figure out how to do what they want to do differently. The closest equivalent to that was when we ripped out std.stream. We do provide functionality in Phobos that can be used to do the same thing, but because there is no direct replacement, it makes it so that the deprecation message certainly can't be informative enough, and telling folks what to use instead isn't necessarily straightforward either. So, we probably should have done a better job of explaining possible replacements, but in general, I don't think that we've had a problem with deprecations needing additional documentation. This case is not the norm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants