I/O module: express channel mark/revert/commit via context manager #19611

jhh67 · 2022-04-05T22:46:45Z

Currently, lightweight transactions on channels are implemented via channel.mark, channel.revert, and channel.commit. Could lightweight transactions be expressed via a context manager?

The text was updated successfully, but these errors were encountered:

mppf · 2022-05-17T20:11:16Z

In theory I think this might be possible but the context managers don't have the possibility for two different exit conditions, so I'm not sure how you would indicate revert vs commit.

Conceivably, that could be communicated by a variable (say that the context manager gives a ref to) or by throwing (and having the context manager think that throwing means it should revert).

However, it seems to me that revert and commit methods are clearer than either of those two strategies.

mppf · 2023-04-27T14:04:44Z

In theory I think this might be possible but the context managers don't have the possibility for two different exit conditions, so I'm not sure how you would indicate revert vs commit.

I think the main drawback of the method is that one might forget to call commit or revert in some code path. We could use a “return an object representing it being marked” and allow calling the same method on it to indicate that it is done.

But, come to think of it, I think the same approach can work with a context manager You can call a method (commit or revert) on something involved in the context managing, and then if that was done, it wouldn’t do anything when exiting the scope. I think a next step here would be to sketch out what such code would look like when using the context manager.

lydia-duncan · 2023-06-07T17:38:26Z

Summary for ad-hoc sub-team meeting next week:

1. What is the API being presented? How is it intended to be used?

proc fileReader.mark() throws
proc fileWriter.mark() throws
proc fileReader.revert()
proc fileWriter.revert()
proc fileReader.commit()
proc fileWriter.commit()

These methods enable attempting minor changes to the fileReader/fileWriter and either undoing those changes or saving them once the experimentation is done. A location in the file is marked with mark, the operation is attempted, and then either commit (if the operation was successful) or revert (if the operation was unsuccessful) is called by the user. revert will leave the current offset in the fileReader/fileWriter unchanged, while commit will set the fileWriter to the offset that was saved.

I made a short mock up of a record that performs the offset recording behavior to see what using them in a context manager setting could look like. In this version, the manage statement determines whether to commit or revert the changes based on if an error was thrown from the body of the manage statement.

use List;

record ch {
  var x: list(int);
  var filePos = 0;

  proc enterThis() ref: int {
    writeln("currentPos is: ", filePos);
    return mark();
  }

  proc mark() ref{
    x.pushBack(filePos);
    return x.last();
  }

  proc leaveThis(in err: owned Error?) {
    if err then revert(); // We could do something here to rethrow the error so it's obvious that the changes were reverted
    else commit();
  }

  proc revert() {
    // Drop the buffer
    x.popBack();
  }

  proc commit() {
    // Flush the buffer
    filePos = x.popBack();
  }
}

proc main() {
  var c = new ch();
  writeln(c.filePos);
  manage c as offset {
    writeln(offset);
    c.filePos = 3;
    writeln(c.filePos);
    c.filePos = 7;
    writeln(c.filePos);
    c.filePos = 2;
    writeln(c.filePos);
    //throw new Error ("oops!"); // When this is uncommented, `revert` will be called instead of commit upon exiting the manage statement
  }
  writeln(c.filePos);
}

It seems to behave as one would expect. I think if we provide this support, we'd probably expect the user to not directly call mark, commit or revert themselves and let it be handled by the context manager, with an intentionally thrown error being the way to distinguish between how to end the manage statement. That'll lead to less scenarios where the mark stack gets cleared early or unintentionally due to the body of the manage statement. Though one could make the argument that throwing an error to deal with it could be heavier weight than what the user would normally do (or the user could just rely on errors thrown from the attempt transactions themselves rather than catching it and performing the revert as a response?)

How is it being used in Arkouda and CHAMPS?

These functions are not used in Arkouda or CHAMPS

2. What's the history of the feature, if it already exists?

These methods were added 12 years ago and have mostly seen minor refactoring since then. The main change is that mark now returns the offset that was saved.

3. What's the precedent in other languages, if they support it?

a. Python does not seem to support this
b. C/C++ has putback and unget methods on istreams, but that isn't as broad of support as what we provide today
c. Rust has rewind which only goes back to the beginning of the stream and doesn't seem to undo any buffer modifications.
d. Swift does not seem to support this
e. Julia has mark, unmark, ismarked and reset which allows you to go back to a specifically noted point, but does not seem to undo transactions.
f. Go has reset for Reader and for Writer which discards changes and goes to a different location.

4. Are there known Github issues with the feature?

There are no currently open issues with these functions. The issue open about mark has been resolved by returning the offset (mentioned here)

5. Are there features it is related to? What impact would changing or adding this feature have on those other features?

QIO_CH_ALWAYS_UNBUFFERED doesn't play well with this feature (How to handle QIO_CH_ALWAYS_UNBUFFERED in the IO module? #20432), so it's something to be aware of for users
These functions are used frequently in tandem with advance, if renamings happen as a result of that discussion it could impact their names
It can be helpful when dealing with I/O module: where should I/O read calls leave the fileReader on failure? #21345 or as part of implementation changes made to support it
It can be used in tandem with the ioLiteral/ioNewline replacements for speculative reads

6. How do you propose to solve the problem?

A. Leave things as is

Pros:
- So far as we know, things work the way they are now so we don't necessarily need to change anything
Cons:
- Missing a potential opportunity to rely on this newer feature more and get more experience with it.

B. Use manage statements, either as outlined above or in another way I'm not thinking of

Pros:
- Puts more weight on a newer feature
- Arguably a reasonable and natural use for manage statements
- Allows us to expose less code to the user
- Makes it more obvious in the code that the transactions are grouped together and when a natural checkpoint occurs.
Cons:
- This is not used super frequently, so we might not notice if we made a mistake in the transition for a while

lydia-duncan · 2023-06-14T16:25:25Z

Discussion summary:

In addition to what was present above, I also included the following code to have a real-life example of what such an adjustment would look like (at Jeremiah's suggestion):

// Adjusted version of https://github.com/chapel-lang/chapel/blob/809fed0f3e5bf1b36145728171513e9ace51f806/modules/standard/IO.chpl#L6830
proc fileReader.readThrough(separator: string, ref s: string, maxSize=-1,
                            stripSeparator=false): bool throws {
  on this._home {
    try this.lock(); defer { this.unlock(); }

    const (searchErr, found, bytesOffset) =
      _findSeparator(separator, 4*maxSize, this._channel_internal);
    if searchErr != 0 && searchErr != EEOF then
      try this._ch_ioerror(searchErr, "in readThrough(string)");

    const bytesToRead = if found then
      bytesOffset + separator.numBytes else bytesOffset;

    // Note: this is the statements that were changed.  In the meeting, Ben suggested that
    // the contents of the manage statement (and the `if maxSize >=0` else branch) could
    // be transformed into a helper function, which would reduce the code duplication.
    // Michael noted in the meeting that this code was already relying on throwing, so is
    // not negatively impacted by the switch to using manage statements in this way
    if maxSize >= 0 {
      manage this as offset {
        const err = readStringBytesData(s, this._channel_internal, bytesToRead,
                                        -1);
        if err {
          try this._ch_ioerror(err, "in readThrough(string)");
        } else {
          if maxSize >= 0 && s.numCodepoints > maxSize {
            try this._ch_ioerror(EFORMAT:errorCode, "in readThrough(string)");
          }
        }
      }
    } else {
      const err = readStringBytesData(s, this._channel_internal, bytesToRead,
                                      -1);
      if err {
        try this._ch_ioerror(err, "in readThrough(string)");
      } else {
        if maxSize >= 0 && s.numCodepoints > maxSize {
          try this._ch_ioerror(EFORMAT:errorCode, "in readThrough(string)");
        }
      }
    }

    if found && stripSeparator then
      s = s[0..<s.numCodepoints-separator.numCodepoints];
  }
  return s.size > 0;
}

We also looked at its helper function,

chapel/modules/standard/IO.chpl

Line 7050 in 809fed0

 private proc _findSeparator(separator: ?t, maxBytes=-1, ch_internal): (errorCode, bool, int) 

, which would not be as straight-forward an adjustment due to the fact that it tries to return immediately after reverting (though it still seemed possible). Because this function does not already throw, it would be more of a burden to make it do so.

We decided to keep the current strategy of mark/commit/revert but leave notes for enabling context manager support in the future - it seemed there were scenarios where both were useful, though we would strongly discourage intermingling the two strategies. We would also like more design on the context manager strategy before stabilizing it.

We also discussed whether we should considering renaming these functions. We decided that the current names were good.

Discussion notes:

Michael pointed out that the Julia strategy is equivalent to ours in execution, since we don't explicitly clear out the things that were written to the buffer, we just assume they will be overwritten by any new action and don't write anything beyond the current point (so there isn't a risk of writing beyond what was most recently recorded there).
Michael: in terms of the names, people had strong ideas about what reset meant in the context of timers so that might apply to the name here
- Lydia: as someone that had strong ideas for stopwatch, that was mostly based on the physical concept of stopwatches, so wouldn't read into it in this context
- Lydia: "commit" feels like a stronger statement than "unmark", which is probably why I thought Julia was doing something different than we are.

dlongnecke-cray · 2023-06-14T18:29:23Z

The context manager path seems super cool. If you ran into difficulties or complications I'd be curious to know if there are opportunities to improve the language feature.

I think it's perfectly reasonable to only support the lower level calls for now and then introduce the context manager in the future. Re: not wanting to mingle the two paths, producing the manager could set a bit on the channel which we could use to emit runtime warnings, e.g., manual calls to mark/commit/revert when managing this channel are ill advised.

@lydia-duncan

Improves the docs for the `mark`/`commit`/`revert` methods on fileReader/fileWriter: - Minor wording/clarity improvements - The meaning of `revert` and `commit` were swapped. This is corrected - Some method links are shortened with `~` to make docs more readable - Clarifies that a fileReader/fileWriter must be manually locked before calling `mark` even if it has `locking=true` Per the discussion [here](#19611), also adds a comment about our desire to support mark/commit/revert via a context manager in the future. - [x] inspected built docs - [x] paratest [ reviewed by @lydia-duncan ] - Thanks!

jhh67 added type: Design area: Libraries / Modules type: Stabilization labels Apr 5, 2022

lydia-duncan mentioned this issue Jun 8, 2023

I/O module: replace ioLiteral and ioNewline #19487

Closed

lydia-duncan removed the type: Stabilization label Jun 14, 2023

jeremiah-corrado mentioned this issue Jul 6, 2023

Feature Request: more convenient mark-commit interface for fileReader.read #22676

Open

jeremiah-corrado mentioned this issue Aug 29, 2023

Improve docs for mark/commit/revert #23131

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I/O module: express channel mark/revert/commit via context manager #19611

I/O module: express channel mark/revert/commit via context manager #19611

jhh67 commented Apr 5, 2022

mppf commented May 17, 2022 •

edited

Loading

mppf commented Apr 27, 2023

lydia-duncan commented Jun 7, 2023

lydia-duncan commented Jun 14, 2023

dlongnecke-cray commented Jun 14, 2023

I/O module: express channel mark/revert/commit via context manager #19611

I/O module: express channel mark/revert/commit via context manager #19611

Comments

jhh67 commented Apr 5, 2022

mppf commented May 17, 2022 • edited Loading

mppf commented Apr 27, 2023

lydia-duncan commented Jun 7, 2023

1. What is the API being presented? How is it intended to be used?

How is it being used in Arkouda and CHAMPS?

2. What's the history of the feature, if it already exists?

3. What's the precedent in other languages, if they support it?

4. Are there known Github issues with the feature?

5. Are there features it is related to? What impact would changing or adding this feature have on those other features?

6. How do you propose to solve the problem?

lydia-duncan commented Jun 14, 2023

dlongnecke-cray commented Jun 14, 2023

mppf commented May 17, 2022 •

edited

Loading