-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#171 in place edit #179
base: main
Are you sure you want to change the base?
#171 in place edit #179
Conversation
TODO: Output to files
Currently blocked by lazy read
Please keep the original API, as discussed. It would be fine if the file could be specified either using |
In your original email you said you wanted to "write to a temporary file, and in-case of success, overwrite the input file". This does sound like the right approach to me. Which "lazy read" problems are you anticipating? |
The original file? This would only happen if the suffix is the empty string, do you still backup the file if that's the suffix which was given? Or did you mean that there is no warning if the original file plus the suffix gives an existing file? What is the behaviour of sed and gawk in that case? |
@@ -35,8 +35,7 @@ data InputSource | |||
|
|||
data OutputSink | |||
= UseStdout | |||
-- OutputFile FilePath -- we might want to implement --in-place | |||
-- in the future | |||
| OutputFile FilePath FilePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining which FilePath
is which, or even better, create a record whose names clarify their purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will clarify in the code: first filepath is the output file, and second filepath is the backup file name in full, if it's null ("") then no backup is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Maybe FilePath
for the backup file then
@@ -20,18 +21,28 @@ import System.Console.Hawk.Runtime.Base | |||
|
|||
data SomeRows = forall a. Rows a => SomeRows a | |||
|
|||
-- Separate the lazy and strict versions so we don't get blocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use the "write to a temporary and copy over original on success" approach and to remain lazy.
hFlush stdout | ||
case out of | ||
UseStdout -> do B.putStr s; hFlush stdout | ||
OutputFile f "" -> do B.writeFile f s; hFlush stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why flush stdout if we write to a file, not stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we are not using file handles, so there's no need to flush?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
OutputFile f backup -> do | ||
copyFile f backup | ||
B.writeFile f s; | ||
hFlush stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the backup (and eventually the "copy over original on success") logic out of the runtime? I would prefer to keep the runtime as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, since OutputSpec
is defined in the runtime, I guess it does make more sense for that logic to be in the runtime.
@@ -68,6 +75,8 @@ instance Option HawkOption where | |||
longName RecordDelimiter = "record-delimiter" | |||
longName OutputFieldDelimiter = "output-field-delim" | |||
longName OutputRecordDelimiter = "output-record-delim" | |||
longName InPlaceEdit = "in-place-edit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer in-place
@@ -78,6 +87,8 @@ instance Option HawkOption where | |||
helpMsg RecordDelimiter = ["default '\\n'"] | |||
helpMsg OutputFieldDelimiter = ["default <field-delim>"] | |||
helpMsg OutputRecordDelimiter = ["default <record-delim>"] | |||
helpMsg InPlaceEdit = ["edit file in place <backup-suffix>"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
["overwrite <file>, optionally creating"
,"a backup copy at <file><suffix>"]
@@ -18,7 +18,7 @@ import System.Console.Hawk.Context.Dir | |||
|
|||
|
|||
-- | (record separator, field separator) | |||
type CommonSeparators = (Separator, Separator) | |||
type CommonSeparators = (Separator, Separator, String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input file is not a separator; please rename this to something like "CommonOptions". And while we're modifying this, let's make it a record instead of a tuple, to clarify the purpose of each field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if it's a filename, let's use the type synonym FilePath instead of String
else fail "in-place edit requires input file" | ||
else if null backupSuffix | ||
then return UseStdout | ||
else if backupSuffix == "<none>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the consumeNullable just returns a string, we do not know the whether the argument "-i" was provided or not. So a default value is used to check whether "-i" was provided with no backup file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but why use "" for this default value? The value " looks like a legal suffix whereas "" does not, so it would make more sense to me if "" was used as the default.
Even better: I'll see if I can refactor this weird nullable API to return a Maybe instead of forcing a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I finally get it: you're not using ""
because ""
represents the case in which the --in-place
option was not given, while "<none>"
represents the case in which the --in-place
was given with no suffix
| OutputFile | ||
{ outputFile :: String | ||
, backupFile :: Maybe String | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't mix record syntax and sum types, this causes outputFile
and backupFile
to be become partial functions:
> outputFile UseStdout
Exception: No match in record selector outputFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create a separate record type, like this:
data FileSink = FileSink
{ outputFile :: FilePath
, backupFile :: Maybe FilePath
}
data OutputSink
= UseStdout
| OutputFile FileSink
r' = fromSeparator $ record options | ||
f' = fromSeparator $ field options | ||
file = inputFile options | ||
file' = (fromJust file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial functions like fromJust
are needlessly error-prone, as the compiler won't be able to catch your mistake if you use file'
outside of the else
branch of your if isNothing file
check. Please pattern-match on the Maybe
instead:
case inputFile options of
Nothing -> ...
Just file -> ... OutputFile file $ Just $ file ++ backupSuffix
could you instead replace your if isNothing file then ... else ...
with a case file of {pattern-match on
fileinstead of using
isNothing`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements! There are still a few remaining changes I would like to see:
- don't mix record syntax and sum types (e.g. by defining a separate record type, or by removing the record selectors and relying on comments to distinguish the two arguments)
- accept the file name as a positional argument (either removing the new
--file
argument or keeping both ways of specifying the file name) - don't use
fromJust
- write to a temporary file, and in-case of success, overwrite the input file
- revert to lazy bytestrings
- when manipulating filenames, use FilePath instead of String
If the 40 hours you were planning to spend have elapsed and you no longer have time to make those changes, just tell me so and I'll make the improvements myself. Thanks!
fixed
fixed
fixed
I'm not sure how to use the parser to only consume the last part of the string for input file
Will update you on those two issues later today. |
OK, I believe I have gotten around the lazy read issue I was having. The input file is overwritten once a temporary file is completely written. edit** This was my test case:
|
Hmm, you're right, this is harder than it seems! I thought it would be as simple as using The solution would be to consume the expression before the common options. Unfortunately, doing that is not obvious either because we only want to parse the expression (and complain if it's missing) when the main command is What I would do is to remove the
This higher-order function would parse the expression before the common options so that when consuming those common options, the |
@@ -78,6 +87,9 @@ instance Option HawkOption where | |||
helpMsg RecordDelimiter = ["default '\\n'"] | |||
helpMsg OutputFieldDelimiter = ["default <field-delim>"] | |||
helpMsg OutputRecordDelimiter = ["default <record-delim>"] | |||
helpMsg InPlaceEdit = ["overwrite <file>, optionally creating" | |||
,"a backup copy at <file><suffix>"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you are adding a .
between the file and the suffix, please update the comment to say <file>.<suffix>
.
OutputFile (FileSink f Nothing) -> B.writeFile (f <.> tmp) s | ||
OutputFile (FileSink f (Just backup)) -> do | ||
copyFile f backup | ||
B.writeFile (f <.> tmp) s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a function somewhere, probably in filepath
or posix
, for creating a temporary file in a location which is appropriate for the OS, such as /tmp
. It would be better to create a file there, and to delete it afterwards using bracket
, than to create a .tmp
file in the current directory.
overwrite :: OutputSpec -> IO () | ||
overwrite (OutputSpec (OutputFile (FileSink f _)) _) = do | ||
fe <- doesFileExist tmpFile | ||
when fe $ renameFile tmpFile f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under which circumstances would the temporary file not exist? If that happens, wouldn't we prefer to let the user know by failing loudly rather than silently doing nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I am taking a closer look, I see that it could happen if the UncertainT
computation ends with an error. In that case, I agree that we don't want to try to perform an overwrite, but I'd rather use UncertainT
's monadic short-circuiting to skip over the overwrite
than to check whether we've reached the point in the logic where the temporary file has been created. After all, what if the error occurs after the temporary file has been created but before all the contents is written to it? Let's move the overwrite
call inside the UncertainT
computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, move the overwrite
logic inside outputRows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, move the overwrite logic inside outputRows
I can't move this logic into outputRows because of lazy input reading. The input file is blocked until we force evaluation which only happens with strict reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why laziness would cause any issues here. Lazy reading means that the input file won't be closed until its bytestring has been completely consumed. In outputRows
, the B.writeFile (f <.> tmp) s
should completely consume the s
, so the file should be available immediately after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but this s
is the output bytestring, not the input bytestring! If the user expression only looks at part of the input, e.g. L.take 10
, consuming the entire output bytestring won't cause the entire input bytestring to be consumed, so the input file won't be closed.
But why is the input file now considered closed after the myRunUncertainIO
block? That seems fishy. I would prefer to make sure the file is closed by manually calling hClose
on it. Perhaps we could modify getInputString
so it would also return an IO action which ensures the input is closed? We could then pass this action to outputRows
, which would call the action immediately after the B.writeFile
.
This in-place feature has more subtleties than I expected!
@@ -86,6 +90,8 @@ outputRows (OutputSpec _ spec) x = ignoringBrokenPipe $ do | |||
join "\n" = B.unlines | |||
join sep = B.intercalate sep | |||
|
|||
copyFile i o = B.readFile i >>= \f -> B.writeFile o f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f
is a strange choice of variable name, since it is the contents of the file, not a filename. I'd use bs
, for ByteString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, taking into account your latest changes and my latest comments about those changes, here's the remaining list of things to fix:
- accept the file name as a positional argument
- adjust the
<file><suffix>
documentation to match the new behaviour - create the temporary file in
/tmp
or whatever's the most appropriate location for the OS - move the
overwrite
logic insideoutputRows
These features were added:
The argument "-i[back-up-file-suffix]"
If a suffix is provided, the original file will be copied to this backup file. No warning is given if backup file matches original file (something to consider?)
Input file must be given with "-f[input-file]".
Issues encountered:
Lazily reading the input file causes us to have file blocked when trying to write to it.
ConsumeExtra in commonSeparators will consume expressions first and skip the input file, so "-f" is used to identify file.
TODO: