#171 in place edit #179

Rahel-A · 2018-03-11T13:37:32Z

These features were added:
The argument "-i[back-up-file-suffix]"
If a suffix is provided, the original file will be copied to this backup file. No warning is given if backup file matches original file (something to consider?)
Input file must be given with "-f[input-file]".

Issues encountered:
Lazily reading the input file causes us to have file blocked when trying to write to it.
ConsumeExtra in commonSeparators will consume expressions first and skip the input file, so "-f" is used to identify file.
TODO:

Write test cases
Consider an alternative write method?
- Write stdout to temporary file, and then overwrite the original
- Would that still not have problems with lazy read?

TODO: Output to files

Currently blocked by lazy read

gelisam · 2018-03-11T14:08:59Z

"-f" is used to identify file.

Please keep the original API, as discussed. It would be fine if the file could be specified either using -f or using a positional argument, but I don't want to remove the positional API unless there is a good reason to.

gelisam · 2018-03-11T14:12:14Z

Consider an alternative write method?

Write stdout to temporary file, and then overwrite the original

Would that still not have problems with lazy ready?

In your original email you said you wanted to "write to a temporary file, and in-case of success, overwrite the input file". This does sound like the right approach to me. Which "lazy read" problems are you anticipating?

gelisam · 2018-03-11T14:14:11Z

No warning is given if backup file matches original file (something to consider?)

The original file? This would only happen if the suffix is the empty string, do you still backup the file if that's the suffix which was given? Or did you mean that there is no warning if the original file plus the suffix gives an existing file? What is the behaviour of sed and gawk in that case?

gelisam · 2018-03-11T14:16:49Z

runtime/System/Console/Hawk/Args/Spec.hs

@@ -35,8 +35,7 @@ data InputSource

 data OutputSink
 = UseStdout
- -- OutputFile FilePath -- we might want to implement --in-place
- -- in the future
+ | OutputFile FilePath FilePath


Please add a comment explaining which FilePath is which, or even better, create a record whose names clarify their purpose.

OK, will clarify in the code: first filepath is the output file, and second filepath is the backup file name in full, if it's null ("") then no backup is created.

How about using Maybe FilePath for the backup file then

gelisam · 2018-03-11T14:21:13Z

runtime/System/Console/Hawk/Runtime.hs

@@ -20,18 +21,28 @@ import System.Console.Hawk.Runtime.Base

 data SomeRows = forall a. Rows a => SomeRows a

+-- Separate the lazy and strict versions so we don't get blocked


I would prefer to use the "write to a temporary and copy over original on success" approach and to remain lazy.

gelisam · 2018-03-11T14:24:34Z

runtime/System/Console/Hawk/Runtime.hs

- hFlush stdout
+ case out of
+ UseStdout -> do B.putStr s; hFlush stdout
+ OutputFile f "" -> do B.writeFile f s; hFlush stdout


Why flush stdout if we write to a file, not stdout?

I suppose we are not using file handles, so there's no need to flush?

gelisam · 2018-03-11T14:28:31Z

runtime/System/Console/Hawk/Runtime.hs

+ OutputFile f backup -> do
+ copyFile f backup
+ B.writeFile f s;
+ hFlush stdout


Could we move the backup (and eventually the "copy over original on success") logic out of the runtime? I would prefer to keep the runtime as small as possible.

On second thought, since OutputSpec is defined in the runtime, I guess it does make more sense for that logic to be in the runtime.

gelisam · 2018-03-11T14:30:22Z

src/System/Console/Hawk/Args/Option.hs

@@ -68,6 +75,8 @@ instance Option HawkOption where
 longName RecordDelimiter = "record-delimiter"
 longName OutputFieldDelimiter = "output-field-delim"
 longName OutputRecordDelimiter = "output-record-delim"
+ longName InPlaceEdit = "in-place-edit"


I would prefer in-place

gelisam · 2018-03-11T14:36:48Z

src/System/Console/Hawk/Args/Option.hs

@@ -78,6 +87,8 @@ instance Option HawkOption where
 helpMsg RecordDelimiter = ["default '\\n'"]
 helpMsg OutputFieldDelimiter = ["default <field-delim>"]
 helpMsg OutputRecordDelimiter = ["default <record-delim>"]
+ helpMsg InPlaceEdit = ["edit file in place <backup-suffix>"]


How about:

["overwrite <file>, optionally creating" ,"a backup copy at <file><suffix>"]

gelisam · 2018-03-11T14:39:29Z

src/System/Console/Hawk/Args/Parse.hs

@@ -18,7 +18,7 @@ import System.Console.Hawk.Context.Dir


 -- | (record separator, field separator)
-type CommonSeparators = (Separator, Separator)
+type CommonSeparators = (Separator, Separator, String)


The input file is not a separator; please rename this to something like "CommonOptions". And while we're modifying this, let's make it a record instead of a tuple, to clarify the purpose of each field.

Also, if it's a filename, let's use the type synonym FilePath instead of String

gelisam · 2018-03-11T14:44:00Z

src/System/Console/Hawk/Args/Parse.hs

+ else fail "in-place edit requires input file"
+ else if null backupSuffix
+ then return UseStdout
+ else if backupSuffix == "<none>"


Why not ""?

As the consumeNullable just returns a string, we do not know the whether the argument "-i" was provided or not. So a default value is used to check whether "-i" was provided with no backup file.

Yes, but why use "" for this default value? The value " looks like a legal suffix whereas "" does not, so it would make more sense to me if "" was used as the default.

Even better: I'll see if I can refactor this weird nullable API to return a Maybe instead of forcing a default.

Ah, I finally get it: you're not using "" because "" represents the case in which the --in-place option was not given, while "<none>" represents the case in which the --in-place was given with no suffix

gelisam · 2018-03-17T18:55:33Z

runtime/System/Console/Hawk/Args/Spec.hs

+ | OutputFile
+ { outputFile :: String
+ , backupFile :: Maybe String
+ }


Please don't mix record syntax and sum types, this causes outputFile and backupFile to be become partial functions:

> outputFile UseStdout Exception: No match in record selector outputFile

I would create a separate record type, like this:

data FileSink = FileSink { outputFile :: FilePath , backupFile :: Maybe FilePath } data OutputSink = UseStdout | OutputFile FileSink

gelisam · 2018-03-17T20:13:37Z

src/System/Console/Hawk/Args/Parse.hs

+ r' = fromSeparator $ record options
+ f' = fromSeparator $ field options
+ file = inputFile options
+ file' = (fromJust file)


Partial functions like fromJust are needlessly error-prone, as the compiler won't be able to catch your mistake if you use file' outside of the else branch of your if isNothing file check. Please pattern-match on the Maybe instead:

case inputFile options of Nothing -> ... Just file -> ... OutputFile file $ Just $ file ++ backupSuffix

could you instead replace your if isNothing file then ... else ... with a case file of {pattern-match on fileinstead of usingisNothing`?

gelisam

Thanks for the improvements! There are still a few remaining changes I would like to see:

don't mix record syntax and sum types (e.g. by defining a separate record type, or by removing the record selectors and relying on comments to distinguish the two arguments)
accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)
don't use fromJust
write to a temporary file, and in-case of success, overwrite the input file
revert to lazy bytestrings
when manipulating filenames, use FilePath instead of String

If the 40 hours you were planning to spend have elapsed and you no longer have time to make those changes, just tell me so and I'll make the improvements myself. Thanks!

Rahel-A · 2018-03-18T12:29:45Z

don't mix record syntax and sum types (e.g. by defining a separate record type, or by removing the record selectors and relying on comments to distinguish the two arguments)

fixed

don't use fromJust

fixed

when manipulating filenames, use FilePath instead of String

fixed

accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)

I'm not sure how to use the parser to only consume the last part of the string for input file

write to a temporary file, and in-case of success, overwrite the input file
revert to lazy bytestrings

Will update you on those two issues later today.

Rahel-A · 2018-03-18T13:35:20Z

OK, I believe I have gotten around the lazy read issue I was having. The input file is overwritten once a temporary file is completely written.

edit** This was my test case:
cp test.txt.~ test.txt && hawk -i"~" -ad "sum . L.map toInt" -ftest.txt

cat test.txt.~
5
6
7

cat test.txt
18

gelisam · 2018-03-18T20:26:45Z

accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)

I'm not sure how to use the parser to only consume the last part of the string for input file

Hmm, you're right, this is harder than it seems! I thought it would be as simple as using consumeExtra instead of consumeLast Option.InputFile, but I now realize that this would consume the expression instead of the filename.

The solution would be to consume the expression before the common options. Unfortunately, doing that is not obvious either because we only want to parse the expression (and complain if it's missing) when the main command is Option.Eval, Option.Apply, or Option.Map, not when the main command is Option.Help or Option.Version.

What I would do is to remove the commonOptions call and replace it with a higher-order function withExpr which could be called like this:

help    = return Help
version = return Version
eval    = withExpr $ \c e -> Eval  e <$>                 outputSpec c
apply   = withExpr $ \c e -> Apply e <$> inputSpec c <*> outputSpec c
map'    = withExpr $ \c e -> Map   e <$> inputSpec c <*> outputSpec c

This higher-order function would parse the expression before the common options so that when consuming those common options, the consumeExtra consumes the filename, not the expression.

gelisam · 2018-03-18T22:50:37Z

src/System/Console/Hawk/Args/Option.hs

@@ -78,6 +87,9 @@ instance Option HawkOption where
 helpMsg RecordDelimiter = ["default '\\n'"]
 helpMsg OutputFieldDelimiter = ["default <field-delim>"]
 helpMsg OutputRecordDelimiter = ["default <record-delim>"]
+ helpMsg InPlaceEdit = ["overwrite <file>, optionally creating"
+ ,"a backup copy at <file><suffix>"]


Now that you are adding a . between the file and the suffix, please update the comment to say <file>.<suffix>.

gelisam · 2018-03-18T23:46:38Z

runtime/System/Console/Hawk/Runtime.hs

+ OutputFile (FileSink f Nothing) -> B.writeFile (f <.> tmp) s
+ OutputFile (FileSink f (Just backup)) -> do
+ copyFile f backup
+ B.writeFile (f <.> tmp) s


I believe there is a function somewhere, probably in filepath or posix, for creating a temporary file in a location which is appropriate for the OS, such as /tmp. It would be better to create a file there, and to delete it afterwards using bracket, than to create a .tmp file in the current directory.

gelisam · 2018-03-18T23:50:39Z

src/System/Console/Hawk.hs

+overwrite :: OutputSpec -> IO ()
+overwrite (OutputSpec (OutputFile (FileSink f _)) _) = do
+ fe <- doesFileExist tmpFile
+ when fe $ renameFile tmpFile f


Under which circumstances would the temporary file not exist? If that happens, wouldn't we prefer to let the user know by failing loudly rather than silently doing nothing?

Now that I am taking a closer look, I see that it could happen if the UncertainT computation ends with an error. In that case, I agree that we don't want to try to perform an overwrite, but I'd rather use UncertainT's monadic short-circuiting to skip over the overwrite than to check whether we've reached the point in the logic where the temporary file has been created. After all, what if the error occurs after the temporary file has been created but before all the contents is written to it? Let's move the overwrite call inside the UncertainT computation.

Or even better, move the overwrite logic inside outputRows

Or even better, move the overwrite logic inside outputRows

I can't move this logic into outputRows because of lazy input reading. The input file is blocked until we force evaluation which only happens with strict reading.

I don't understand why laziness would cause any issues here. Lazy reading means that the input file won't be closed until its bytestring has been completely consumed. In outputRows, the B.writeFile (f <.> tmp) s should completely consume the s, so the file should be available immediately after that.

Oh, but this s is the output bytestring, not the input bytestring! If the user expression only looks at part of the input, e.g. L.take 10, consuming the entire output bytestring won't cause the entire input bytestring to be consumed, so the input file won't be closed.

But why is the input file now considered closed after the myRunUncertainIO block? That seems fishy. I would prefer to make sure the file is closed by manually calling hClose on it. Perhaps we could modify getInputString so it would also return an IO action which ensures the input is closed? We could then pass this action to outputRows, which would call the action immediately after the B.writeFile.

This in-place feature has more subtleties than I expected!

gelisam · 2018-03-18T23:54:21Z

runtime/System/Console/Hawk/Runtime.hs

@@ -86,6 +90,8 @@ outputRows (OutputSpec _ spec) x = ignoringBrokenPipe $ do
 join "\n" = B.unlines
 join sep = B.intercalate sep

+ copyFile i o = B.readFile i >>= \f -> B.writeFile o f


f is a strange choice of variable name, since it is the contents of the file, not a filename. I'd use bs, for ByteString.

gelisam

So, taking into account your latest changes and my latest comments about those changes, here's the remaining list of things to fix:

accept the file name as a positional argument
adjust the <file><suffix> documentation to match the new behaviour
create the temporary file in /tmp or whatever's the most appropriate location for the OS
move the overwrite logic inside outputRows

Rahel Ahmed added 3 commits March 11, 2018 14:04

Add parsing of -i and -f arguments

7937c4b

TODO: Output to files

WIP: Add write to file

5ffdf97

Currently blocked by lazy read

Add in-place write

c9d74a7

gelisam reviewed Mar 11, 2018

View reviewed changes

Rahel Ahmed added 2 commits March 17, 2018 15:52

Rename commonSep to commonOp & use records for commonOp

f211ecb

Use records for OutputSink & shorten inplaceedit arg

7d217db

gelisam reviewed Mar 17, 2018

View reviewed changes

gelisam requested changes Mar 17, 2018

View reviewed changes

Separate output file to FileSink constructor

c4779a0

Rahel Ahmed added 3 commits March 18, 2018 13:52

Change to lazy in-place edit

8317439

Overwrite input file lazily after success

7f6404c

Add more checks to file existence for overwrite

a3d3845

gelisam reviewed Mar 18, 2018

View reviewed changes

gelisam requested changes Mar 19, 2018

View reviewed changes

gelisam changed the base branch from master to main December 29, 2020 17:36

		@@ -20,18 +21,28 @@ import System.Console.Hawk.Runtime.Base

		data SomeRows = forall a. Rows a => SomeRows a

		-- Separate the lazy and strict versions so we don't get blocked

#171 in place edit #179

Are you sure you want to change the base?

#171 in place edit #179

Conversation

Rahel-A commented Mar 11, 2018 • edited Loading

gelisam commented Mar 11, 2018

gelisam commented Mar 11, 2018

gelisam commented Mar 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gelisam left a comment

Choose a reason for hiding this comment

Rahel-A commented Mar 18, 2018

Rahel-A commented Mar 18, 2018 • edited Loading

gelisam commented Mar 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gelisam left a comment

Choose a reason for hiding this comment

Rahel-A commented Mar 11, 2018 •

edited

Loading

Rahel-A commented Mar 18, 2018 •

edited

Loading