Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#171 in place edit #179

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

Rahel-A
Copy link

@Rahel-A Rahel-A commented Mar 11, 2018

These features were added:
The argument "-i[back-up-file-suffix]"
If a suffix is provided, the original file will be copied to this backup file. No warning is given if backup file matches original file (something to consider?)
Input file must be given with "-f[input-file]".

Issues encountered:
Lazily reading the input file causes us to have file blocked when trying to write to it.
ConsumeExtra in commonSeparators will consume expressions first and skip the input file, so "-f" is used to identify file.
TODO:

  • Write test cases
  • Consider an alternative write method?
    • Write stdout to temporary file, and then overwrite the original
    • Would that still not have problems with lazy read?

Rahel Ahmed added 3 commits March 11, 2018 14:04
TODO: Output to files
Currently blocked by lazy read
@gelisam
Copy link
Owner

gelisam commented Mar 11, 2018

"-f" is used to identify file.

Please keep the original API, as discussed. It would be fine if the file could be specified either using -f or using a positional argument, but I don't want to remove the positional API unless there is a good reason to.

@gelisam
Copy link
Owner

gelisam commented Mar 11, 2018

  • Consider an alternative write method?
    • Write stdout to temporary file, and then overwrite the original
    • Would that still not have problems with lazy ready?

In your original email you said you wanted to "write to a temporary file, and in-case of success, overwrite the input file". This does sound like the right approach to me. Which "lazy read" problems are you anticipating?

@gelisam
Copy link
Owner

gelisam commented Mar 11, 2018

No warning is given if backup file matches original file (something to consider?)

The original file? This would only happen if the suffix is the empty string, do you still backup the file if that's the suffix which was given? Or did you mean that there is no warning if the original file plus the suffix gives an existing file? What is the behaviour of sed and gawk in that case?

@@ -35,8 +35,7 @@ data InputSource

data OutputSink
= UseStdout
-- OutputFile FilePath -- we might want to implement --in-place
-- in the future
| OutputFile FilePath FilePath
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining which FilePath is which, or even better, create a record whose names clarify their purpose.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will clarify in the code: first filepath is the output file, and second filepath is the backup file name in full, if it's null ("") then no backup is created.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using Maybe FilePath for the backup file then

@@ -20,18 +21,28 @@ import System.Console.Hawk.Runtime.Base

data SomeRows = forall a. Rows a => SomeRows a

-- Separate the lazy and strict versions so we don't get blocked
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the "write to a temporary and copy over original on success" approach and to remain lazy.

hFlush stdout
case out of
UseStdout -> do B.putStr s; hFlush stdout
OutputFile f "" -> do B.writeFile f s; hFlush stdout
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why flush stdout if we write to a file, not stdout?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we are not using file handles, so there's no need to flush?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

OutputFile f backup -> do
copyFile f backup
B.writeFile f s;
hFlush stdout
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the backup (and eventually the "copy over original on success") logic out of the runtime? I would prefer to keep the runtime as small as possible.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, since OutputSpec is defined in the runtime, I guess it does make more sense for that logic to be in the runtime.

@@ -68,6 +75,8 @@ instance Option HawkOption where
longName RecordDelimiter = "record-delimiter"
longName OutputFieldDelimiter = "output-field-delim"
longName OutputRecordDelimiter = "output-record-delim"
longName InPlaceEdit = "in-place-edit"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer in-place

@@ -78,6 +87,8 @@ instance Option HawkOption where
helpMsg RecordDelimiter = ["default '\\n'"]
helpMsg OutputFieldDelimiter = ["default <field-delim>"]
helpMsg OutputRecordDelimiter = ["default <record-delim>"]
helpMsg InPlaceEdit = ["edit file in place <backup-suffix>"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

["overwrite <file>, optionally creating"
,"a backup copy at <file><suffix>"]

@@ -18,7 +18,7 @@ import System.Console.Hawk.Context.Dir


-- | (record separator, field separator)
type CommonSeparators = (Separator, Separator)
type CommonSeparators = (Separator, Separator, String)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input file is not a separator; please rename this to something like "CommonOptions". And while we're modifying this, let's make it a record instead of a tuple, to clarify the purpose of each field.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if it's a filename, let's use the type synonym FilePath instead of String

else fail "in-place edit requires input file"
else if null backupSuffix
then return UseStdout
else if backupSuffix == "<none>"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ""?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the consumeNullable just returns a string, we do not know the whether the argument "-i" was provided or not. So a default value is used to check whether "-i" was provided with no backup file.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why use "" for this default value? The value " looks like a legal suffix whereas "" does not, so it would make more sense to me if "" was used as the default.

Even better: I'll see if I can refactor this weird nullable API to return a Maybe instead of forcing a default.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I finally get it: you're not using "" because "" represents the case in which the --in-place option was not given, while "<none>" represents the case in which the --in-place was given with no suffix

| OutputFile
{ outputFile :: String
, backupFile :: Maybe String
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mix record syntax and sum types, this causes outputFile and backupFile to be become partial functions:

> outputFile UseStdout
Exception: No match in record selector outputFile

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a separate record type, like this:

data FileSink = FileSink
  { outputFile :: FilePath
  , backupFile :: Maybe FilePath
  }

data OutputSink
  = UseStdout
  | OutputFile FileSink

r' = fromSeparator $ record options
f' = fromSeparator $ field options
file = inputFile options
file' = (fromJust file)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial functions like fromJust are needlessly error-prone, as the compiler won't be able to catch your mistake if you use file' outside of the else branch of your if isNothing file check. Please pattern-match on the Maybe instead:

case inputFile options of
  Nothing -> ...
  Just file -> ... OutputFile file $ Just $ file ++ backupSuffix

could you instead replace your if isNothing file then ... else ... with a case file of {pattern-match on fileinstead of usingisNothing`?

Copy link
Owner

@gelisam gelisam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements! There are still a few remaining changes I would like to see:

  • don't mix record syntax and sum types (e.g. by defining a separate record type, or by removing the record selectors and relying on comments to distinguish the two arguments)
  • accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)
  • don't use fromJust
  • write to a temporary file, and in-case of success, overwrite the input file
  • revert to lazy bytestrings
  • when manipulating filenames, use FilePath instead of String

If the 40 hours you were planning to spend have elapsed and you no longer have time to make those changes, just tell me so and I'll make the improvements myself. Thanks!

@Rahel-A
Copy link
Author

Rahel-A commented Mar 18, 2018

don't mix record syntax and sum types (e.g. by defining a separate record type, or by removing the record selectors and relying on comments to distinguish the two arguments)

fixed

don't use fromJust

fixed

when manipulating filenames, use FilePath instead of String

fixed

accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)

I'm not sure how to use the parser to only consume the last part of the string for input file

write to a temporary file, and in-case of success, overwrite the input file
revert to lazy bytestrings

Will update you on those two issues later today.

@Rahel-A
Copy link
Author

Rahel-A commented Mar 18, 2018

OK, I believe I have gotten around the lazy read issue I was having. The input file is overwritten once a temporary file is completely written.

edit** This was my test case:
cp test.txt.~ test.txt && hawk -i"~" -ad "sum . L.map toInt" -ftest.txt

cat test.txt.~
5
6
7
cat test.txt
18

@gelisam
Copy link
Owner

gelisam commented Mar 18, 2018

accept the file name as a positional argument (either removing the new --file argument or keeping both ways of specifying the file name)

I'm not sure how to use the parser to only consume the last part of the string for input file

Hmm, you're right, this is harder than it seems! I thought it would be as simple as using consumeExtra instead of consumeLast Option.InputFile, but I now realize that this would consume the expression instead of the filename.

The solution would be to consume the expression before the common options. Unfortunately, doing that is not obvious either because we only want to parse the expression (and complain if it's missing) when the main command is Option.Eval, Option.Apply, or Option.Map, not when the main command is Option.Help or Option.Version.

What I would do is to remove the commonOptions call and replace it with a higher-order function withExpr which could be called like this:

help    = return Help
version = return Version
eval    = withExpr $ \c e -> Eval  e <$>                 outputSpec c
apply   = withExpr $ \c e -> Apply e <$> inputSpec c <*> outputSpec c
map'    = withExpr $ \c e -> Map   e <$> inputSpec c <*> outputSpec c

This higher-order function would parse the expression before the common options so that when consuming those common options, the consumeExtra consumes the filename, not the expression.

@@ -78,6 +87,9 @@ instance Option HawkOption where
helpMsg RecordDelimiter = ["default '\\n'"]
helpMsg OutputFieldDelimiter = ["default <field-delim>"]
helpMsg OutputRecordDelimiter = ["default <record-delim>"]
helpMsg InPlaceEdit = ["overwrite <file>, optionally creating"
,"a backup copy at <file><suffix>"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you are adding a . between the file and the suffix, please update the comment to say <file>.<suffix>.

OutputFile (FileSink f Nothing) -> B.writeFile (f <.> tmp) s
OutputFile (FileSink f (Just backup)) -> do
copyFile f backup
B.writeFile (f <.> tmp) s
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a function somewhere, probably in filepath or posix, for creating a temporary file in a location which is appropriate for the OS, such as /tmp. It would be better to create a file there, and to delete it afterwards using bracket, than to create a .tmp file in the current directory.

overwrite :: OutputSpec -> IO ()
overwrite (OutputSpec (OutputFile (FileSink f _)) _) = do
fe <- doesFileExist tmpFile
when fe $ renameFile tmpFile f
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which circumstances would the temporary file not exist? If that happens, wouldn't we prefer to let the user know by failing loudly rather than silently doing nothing?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I am taking a closer look, I see that it could happen if the UncertainT computation ends with an error. In that case, I agree that we don't want to try to perform an overwrite, but I'd rather use UncertainT's monadic short-circuiting to skip over the overwrite than to check whether we've reached the point in the logic where the temporary file has been created. After all, what if the error occurs after the temporary file has been created but before all the contents is written to it? Let's move the overwrite call inside the UncertainT computation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, move the overwrite logic inside outputRows

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, move the overwrite logic inside outputRows

I can't move this logic into outputRows because of lazy input reading. The input file is blocked until we force evaluation which only happens with strict reading.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why laziness would cause any issues here. Lazy reading means that the input file won't be closed until its bytestring has been completely consumed. In outputRows, the B.writeFile (f <.> tmp) s should completely consume the s, so the file should be available immediately after that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but this s is the output bytestring, not the input bytestring! If the user expression only looks at part of the input, e.g. L.take 10, consuming the entire output bytestring won't cause the entire input bytestring to be consumed, so the input file won't be closed.

But why is the input file now considered closed after the myRunUncertainIO block? That seems fishy. I would prefer to make sure the file is closed by manually calling hClose on it. Perhaps we could modify getInputString so it would also return an IO action which ensures the input is closed? We could then pass this action to outputRows, which would call the action immediately after the B.writeFile.

This in-place feature has more subtleties than I expected!

@@ -86,6 +90,8 @@ outputRows (OutputSpec _ spec) x = ignoringBrokenPipe $ do
join "\n" = B.unlines
join sep = B.intercalate sep

copyFile i o = B.readFile i >>= \f -> B.writeFile o f
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f is a strange choice of variable name, since it is the contents of the file, not a filename. I'd use bs, for ByteString.

Copy link
Owner

@gelisam gelisam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, taking into account your latest changes and my latest comments about those changes, here's the remaining list of things to fix:

  • accept the file name as a positional argument
  • adjust the <file><suffix> documentation to match the new behaviour
  • create the temporary file in /tmp or whatever's the most appropriate location for the OS
  • move the overwrite logic inside outputRows

@gelisam gelisam changed the base branch from master to main December 29, 2020 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants