Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O module: express channel mark/revert/commit via context manager #19611

Open
jhh67 opened this issue Apr 5, 2022 · 5 comments
Open

I/O module: express channel mark/revert/commit via context manager #19611

jhh67 opened this issue Apr 5, 2022 · 5 comments

Comments

@jhh67
Copy link
Contributor

jhh67 commented Apr 5, 2022

Currently, lightweight transactions on channels are implemented via channel.mark, channel.revert, and channel.commit. Could lightweight transactions be expressed via a context manager?

@mppf
Copy link
Member

mppf commented May 17, 2022

In theory I think this might be possible but the context managers don't have the possibility for two different exit conditions, so I'm not sure how you would indicate revert vs commit.

Conceivably, that could be communicated by a variable (say that the context manager gives a ref to) or by throwing (and having the context manager think that throwing means it should revert).

However, it seems to me that revert and commit methods are clearer than either of those two strategies.

@mppf
Copy link
Member

mppf commented Apr 27, 2023

In theory I think this might be possible but the context managers don't have the possibility for two different exit conditions, so I'm not sure how you would indicate revert vs commit.

I think the main drawback of the method is that one might forget to call commit or revert in some code path. We could use a “return an object representing it being marked” and allow calling the same method on it to indicate that it is done.

But, come to think of it, I think the same approach can work with a context manager You can call a method (commit or revert) on something involved in the context managing, and then if that was done, it wouldn’t do anything when exiting the scope. I think a next step here would be to sketch out what such code would look like when using the context manager.

@lydia-duncan
Copy link
Member

Summary for ad-hoc sub-team meeting next week:

1. What is the API being presented? How is it intended to be used?

proc fileReader.mark() throws
proc fileWriter.mark() throws
proc fileReader.revert()
proc fileWriter.revert()
proc fileReader.commit()
proc fileWriter.commit()

These methods enable attempting minor changes to the fileReader/fileWriter and either undoing those changes or saving them once the experimentation is done. A location in the file is marked with mark, the operation is attempted, and then either commit (if the operation was successful) or revert (if the operation was unsuccessful) is called by the user. revert will leave the current offset in the fileReader/fileWriter unchanged, while commit will set the fileWriter to the offset that was saved.

I made a short mock up of a record that performs the offset recording behavior to see what using them in a context manager setting could look like. In this version, the manage statement determines whether to commit or revert the changes based on if an error was thrown from the body of the manage statement.

use List;

record ch {
  var x: list(int);
  var filePos = 0;

  proc enterThis() ref: int {
    writeln("currentPos is: ", filePos);
    return mark();
  }

  proc mark() ref{
    x.pushBack(filePos);
    return x.last();
  }

  proc leaveThis(in err: owned Error?) {
    if err then revert(); // We could do something here to rethrow the error so it's obvious that the changes were reverted
    else commit();
  }

  proc revert() {
    // Drop the buffer
    x.popBack();
  }

  proc commit() {
    // Flush the buffer
    filePos = x.popBack();
  }
}

proc main() {
  var c = new ch();
  writeln(c.filePos);
  manage c as offset {
    writeln(offset);
    c.filePos = 3;
    writeln(c.filePos);
    c.filePos = 7;
    writeln(c.filePos);
    c.filePos = 2;
    writeln(c.filePos);
    //throw new Error ("oops!"); // When this is uncommented, `revert` will be called instead of commit upon exiting the manage statement
  }
  writeln(c.filePos);
}

It seems to behave as one would expect. I think if we provide this support, we'd probably expect the user to not directly call mark, commit or revert themselves and let it be handled by the context manager, with an intentionally thrown error being the way to distinguish between how to end the manage statement. That'll lead to less scenarios where the mark stack gets cleared early or unintentionally due to the body of the manage statement. Though one could make the argument that throwing an error to deal with it could be heavier weight than what the user would normally do (or the user could just rely on errors thrown from the attempt transactions themselves rather than catching it and performing the revert as a response?)

How is it being used in Arkouda and CHAMPS?

These functions are not used in Arkouda or CHAMPS

2. What's the history of the feature, if it already exists?

These methods were added 12 years ago and have mostly seen minor refactoring since then. The main change is that mark now returns the offset that was saved.

3. What's the precedent in other languages, if they support it?

a. Python does not seem to support this
b. C/C++ has putback and unget methods on istreams, but that isn't as broad of support as what we provide today
c. Rust has rewind which only goes back to the beginning of the stream and doesn't seem to undo any buffer modifications.
d. Swift does not seem to support this
e. Julia has mark, unmark, ismarked and reset which allows you to go back to a specifically noted point, but does not seem to undo transactions.
f. Go has reset for Reader and for Writer which discards changes and goes to a different location.

4. Are there known Github issues with the feature?

There are no currently open issues with these functions. The issue open about mark has been resolved by returning the offset (mentioned here)

5. Are there features it is related to? What impact would changing or adding this feature have on those other features?

6. How do you propose to solve the problem?

A. Leave things as is

  • Pros:
    • So far as we know, things work the way they are now so we don't necessarily need to change anything
  • Cons:
    • Missing a potential opportunity to rely on this newer feature more and get more experience with it.

B. Use manage statements, either as outlined above or in another way I'm not thinking of

  • Pros:
    • Puts more weight on a newer feature
    • Arguably a reasonable and natural use for manage statements
    • Allows us to expose less code to the user
    • Makes it more obvious in the code that the transactions are grouped together and when a natural checkpoint occurs.
  • Cons:
    • This is not used super frequently, so we might not notice if we made a mistake in the transition for a while

@lydia-duncan
Copy link
Member

Discussion summary:

In addition to what was present above, I also included the following code to have a real-life example of what such an adjustment would look like (at Jeremiah's suggestion):

// Adjusted version of https://github.com/chapel-lang/chapel/blob/809fed0f3e5bf1b36145728171513e9ace51f806/modules/standard/IO.chpl#L6830
proc fileReader.readThrough(separator: string, ref s: string, maxSize=-1,
                            stripSeparator=false): bool throws {
  on this._home {
    try this.lock(); defer { this.unlock(); }

    const (searchErr, found, bytesOffset) =
      _findSeparator(separator, 4*maxSize, this._channel_internal);
    if searchErr != 0 && searchErr != EEOF then
      try this._ch_ioerror(searchErr, "in readThrough(string)");

    const bytesToRead = if found then
      bytesOffset + separator.numBytes else bytesOffset;

    // Note: this is the statements that were changed.  In the meeting, Ben suggested that
    // the contents of the manage statement (and the `if maxSize >=0` else branch) could
    // be transformed into a helper function, which would reduce the code duplication.
    // Michael noted in the meeting that this code was already relying on throwing, so is
    // not negatively impacted by the switch to using manage statements in this way
    if maxSize >= 0 {
      manage this as offset {
        const err = readStringBytesData(s, this._channel_internal, bytesToRead,
                                        -1);
        if err {
          try this._ch_ioerror(err, "in readThrough(string)");
        } else {
          if maxSize >= 0 && s.numCodepoints > maxSize {
            try this._ch_ioerror(EFORMAT:errorCode, "in readThrough(string)");
          }
        }
      }
    } else {
      const err = readStringBytesData(s, this._channel_internal, bytesToRead,
                                      -1);
      if err {
        try this._ch_ioerror(err, "in readThrough(string)");
      } else {
        if maxSize >= 0 && s.numCodepoints > maxSize {
          try this._ch_ioerror(EFORMAT:errorCode, "in readThrough(string)");
        }
      }
    }

    if found && stripSeparator then
      s = s[0..<s.numCodepoints-separator.numCodepoints];
  }
  return s.size > 0;
}

We also looked at its helper function,

private proc _findSeparator(separator: ?t, maxBytes=-1, ch_internal): (errorCode, bool, int)
, which would not be as straight-forward an adjustment due to the fact that it tries to return immediately after reverting (though it still seemed possible). Because this function does not already throw, it would be more of a burden to make it do so.

We decided to keep the current strategy of mark/commit/revert but leave notes for enabling context manager support in the future - it seemed there were scenarios where both were useful, though we would strongly discourage intermingling the two strategies. We would also like more design on the context manager strategy before stabilizing it.

We also discussed whether we should considering renaming these functions. We decided that the current names were good.

Discussion notes:

  • Michael pointed out that the Julia strategy is equivalent to ours in execution, since we don't explicitly clear out the things that were written to the buffer, we just assume they will be overwritten by any new action and don't write anything beyond the current point (so there isn't a risk of writing beyond what was most recently recorded there).
  • Michael: in terms of the names, people had strong ideas about what reset meant in the context of timers so that might apply to the name here
    • Lydia: as someone that had strong ideas for stopwatch, that was mostly based on the physical concept of stopwatches, so wouldn't read into it in this context
    • Lydia: "commit" feels like a stronger statement than "unmark", which is probably why I thought Julia was doing something different than we are.

@dlongnecke-cray
Copy link
Contributor

The context manager path seems super cool. If you ran into difficulties or complications I'd be curious to know if there are opportunities to improve the language feature.

I think it's perfectly reasonable to only support the lower level calls for now and then introduce the context manager in the future. Re: not wanting to mingle the two paths, producing the manager could set a bit on the channel which we could use to emit runtime warnings, e.g., manual calls to mark/commit/revert when managing this channel are ill advised.

jeremiah-corrado added a commit that referenced this issue Aug 29, 2023
Improves the docs for the `mark`/`commit`/`revert` methods on
fileReader/fileWriter:

- Minor wording/clarity improvements
- The meaning of `revert` and `commit` were swapped. This is corrected
- Some method links are shortened with `~` to make docs more readable
- Clarifies that a fileReader/fileWriter must be manually locked before
calling `mark` even if it has `locking=true`

Per the discussion
[here](#19611), also adds a
comment about our desire to support mark/commit/revert via a context
manager in the future.

- [x] inspected built docs
- [x] paratest

[ reviewed by @lydia-duncan ] - Thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants