Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#171 in place edit #179

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions haskell-awk.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ Library
build-depends: base >=4.6.0.1
, bytestring
, containers
, filepath
, stringsearch >=0.3.6.4
Default-Language: Haskell98

Expand Down
9 changes: 7 additions & 2 deletions runtime/System/Console/Hawk/Args/Spec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,15 @@ data InputSource
| InputFile FilePath
deriving (Show, Eq)

data FileSink = FileSink
{ outputFile :: FilePath
, backupFile :: Maybe FilePath
}
deriving (Show, Eq)

data OutputSink
= UseStdout
-- OutputFile FilePath -- we might want to implement --in-place
-- in the future
| OutputFile FileSink
deriving (Show, Eq)

data InputFormat
Expand Down
14 changes: 10 additions & 4 deletions runtime/System/Console/Hawk/Runtime.hs
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ import Data.ByteString.Lazy.Char8 as B
import Data.ByteString.Lazy.Search as Search
import GHC.IO.Exception
import System.IO
import System.FilePath ((<.>))

import System.Console.Hawk.Args.Spec
import System.Console.Hawk.Representable
import System.Console.Hawk.Runtime.Base


data SomeRows = forall a. Rows a => SomeRows a

processTable :: HawkRuntime -> ([[B.ByteString]] -> SomeRows) -> HawkIO ()
Expand Down Expand Up @@ -74,10 +74,14 @@ splitAtSeparator (Delimiter d) = Search.split d


outputRows :: Rows a => OutputSpec -> a -> IO ()
outputRows (OutputSpec _ spec) x = ignoringBrokenPipe $ do
outputRows (OutputSpec out spec) x = ignoringBrokenPipe $ do
let s = join' (toRows x)
B.putStr s
hFlush stdout
case out of
UseStdout -> B.putStr s >> hFlush stdout
OutputFile (FileSink f Nothing) -> B.writeFile (f <.> tmp) s
OutputFile (FileSink f (Just backup)) -> do
copyFile f backup
B.writeFile (f <.> tmp) s
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a function somewhere, probably in filepath or posix, for creating a temporary file in a location which is appropriate for the OS, such as /tmp. It would be better to create a file there, and to delete it afterwards using bracket, than to create a .tmp file in the current directory.

where
join' = join (B.fromStrict $ recordDelimiter spec)
toRows = repr (B.fromStrict $ fieldDelimiter spec)
Expand All @@ -86,6 +90,8 @@ outputRows (OutputSpec _ spec) x = ignoringBrokenPipe $ do
join "\n" = B.unlines
join sep = B.intercalate sep

copyFile i o = B.readFile i >>= \f -> B.writeFile o f
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f is a strange choice of variable name, since it is the contents of the file, not a filename. I'd use bs, for ByteString.


-- Don't fret if stdout is closed early, that is the way of shell pipelines.
ignoringBrokenPipe :: IO () -> IO ()
ignoringBrokenPipe = handleJust isBrokenPipe $ \_ ->
Expand Down
6 changes: 5 additions & 1 deletion runtime/System/Console/Hawk/Runtime/Base.hs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ import Data.Typeable

import System.Console.Hawk.Args.Spec


data HawkRuntime = HawkRuntime
{ inputSpec :: InputSpec
, outputSpec :: OutputSpec
Expand All @@ -21,3 +20,8 @@ data HawkRuntime = HawkRuntime
-- reexport IO under a unique name
newtype HawkIO a = HawkIO { runHawkIO :: IO a }
deriving Typeable

-- For redirecting stdout to a temporary file
tmp :: FilePath
tmp = "tmp"

15 changes: 13 additions & 2 deletions src/System/Console/Hawk.hs
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ module System.Console.Hawk
) where


import Control.Monad (when)
import Control.Monad.Trans
import Data.List
import Language.Haskell.Interpreter
import System.Directory (doesFileExist, renameFile)
import System.FilePath ((<.>))

import Control.Monad.Trans.Uncertain
import Data.HaskellExpr.Eval
Expand Down Expand Up @@ -50,8 +53,16 @@ processSpec :: HawkSpec -> IO ()
processSpec Help = help
processSpec Version = putStrLn versionString
processSpec (Eval e o) = myRunUncertainIO e $ processEvalSpec (contextSpec e) o (userExpr e)
processSpec (Apply e i o) = myRunUncertainIO e $ processApplySpec (contextSpec e) i o (userExpr e)
processSpec (Map e i o) = myRunUncertainIO e $ processMapSpec (contextSpec e) i o (userExpr e)
processSpec (Apply e i o) = myRunUncertainIO e (processApplySpec (contextSpec e) i o (userExpr e)) >> overwrite o
processSpec (Map e i o) = myRunUncertainIO e (processMapSpec (contextSpec e) i o (userExpr e)) >> overwrite o

-- | Overwrites the input file with the temporary output file if it exists
overwrite :: OutputSpec -> IO ()
overwrite (OutputSpec (OutputFile (FileSink f _)) _) = do
fe <- doesFileExist tmpFile
when fe $ renameFile tmpFile f
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which circumstances would the temporary file not exist? If that happens, wouldn't we prefer to let the user know by failing loudly rather than silently doing nothing?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I am taking a closer look, I see that it could happen if the UncertainT computation ends with an error. In that case, I agree that we don't want to try to perform an overwrite, but I'd rather use UncertainT's monadic short-circuiting to skip over the overwrite than to check whether we've reached the point in the logic where the temporary file has been created. After all, what if the error occurs after the temporary file has been created but before all the contents is written to it? Let's move the overwrite call inside the UncertainT computation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, move the overwrite logic inside outputRows

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, move the overwrite logic inside outputRows

I can't move this logic into outputRows because of lazy input reading. The input file is blocked until we force evaluation which only happens with strict reading.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why laziness would cause any issues here. Lazy reading means that the input file won't be closed until its bytestring has been completely consumed. In outputRows, the B.writeFile (f <.> tmp) s should completely consume the s, so the file should be available immediately after that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but this s is the output bytestring, not the input bytestring! If the user expression only looks at part of the input, e.g. L.take 10, consuming the entire output bytestring won't cause the entire input bytestring to be consumed, so the input file won't be closed.

But why is the input file now considered closed after the myRunUncertainIO block? That seems fishy. I would prefer to make sure the file is closed by manually calling hClose on it. Perhaps we could modify getInputString so it would also return an IO action which ensures the input is closed? We could then pass this action to outputRows, which would call the action immediately after the B.writeFile.

This in-place feature has more subtleties than I expected!

where tmpFile = f <.> tmp
overwrite _ = return ()

-- | A version of `runUncertainIO` which detects poor error messages and improves them.
myRunUncertainIO :: ExprSpec -> UncertainT IO () -> IO ()
Expand Down
14 changes: 14 additions & 0 deletions src/System/Console/Hawk/Args/Option.hs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ data HawkOption
| RecordDelimiter
| OutputFieldDelimiter
| OutputRecordDelimiter
| InPlaceEdit
| InputFile
| Version
| Help
| ContextDirectory
Expand All @@ -26,6 +28,9 @@ options :: [HawkOption]
options = enumFrom minBound


suffix :: OptionType
suffix = nullable (Setting "suffix")

delimiter :: OptionType
delimiter = nullable (Setting "delim")

Expand Down Expand Up @@ -58,6 +63,8 @@ instance Option HawkOption where
shortName RecordDelimiter = 'D'
shortName OutputFieldDelimiter = 'o'
shortName OutputRecordDelimiter = 'O'
shortName InPlaceEdit = 'i'
shortName InputFile = 'f'
shortName Version = 'v'
shortName Help = 'h'
shortName ContextDirectory = 'c'
Expand All @@ -68,6 +75,8 @@ instance Option HawkOption where
longName RecordDelimiter = "record-delimiter"
longName OutputFieldDelimiter = "output-field-delim"
longName OutputRecordDelimiter = "output-record-delim"
longName InPlaceEdit = "in-place"
longName InputFile = "file"
longName Version = "version"
longName Help = "help"
longName ContextDirectory = "context-directory"
Expand All @@ -78,6 +87,9 @@ instance Option HawkOption where
helpMsg RecordDelimiter = ["default '\\n'"]
helpMsg OutputFieldDelimiter = ["default <field-delim>"]
helpMsg OutputRecordDelimiter = ["default <record-delim>"]
helpMsg InPlaceEdit = ["overwrite <file>, optionally creating"
,"a backup copy at <file><suffix>"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you are adding a . between the file and the suffix, please update the comment to say <file>.<suffix>.

helpMsg InputFile = ["input file for in place editing"]
helpMsg Version = ["print version and exit"]
helpMsg Help = ["this help"]
helpMsg ContextDirectory = ["<ctx-dir> directory, default is"
Expand All @@ -89,6 +101,8 @@ instance Option HawkOption where
optionType RecordDelimiter = delimiter
optionType OutputFieldDelimiter = delimiter
optionType OutputRecordDelimiter = delimiter
optionType InPlaceEdit = suffix
optionType InputFile = suffix
optionType Version = flag
optionType Help = flag
optionType ContextDirectory = filePath
82 changes: 53 additions & 29 deletions src/System/Console/Hawk/Args/Parse.hs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ module System.Console.Hawk.Args.Parse (parseArgs) where
import Control.Applicative
import Data.Char (isSpace)
import "mtl" Control.Monad.Trans
import System.FilePath ((<.>))

import Control.Monad.Trans.OptionParser
import Control.Monad.Trans.Uncertain
Expand All @@ -17,12 +18,19 @@ import System.Console.Hawk.Context.Dir
-- >>> let testP parser = runUncertainIO . runOptionParserT options parser


-- | (record separator, field separator)
type CommonSeparators = (Separator, Separator)
-- | Record separator, field separator, and input file
-- is common to both output and input option parsing
data CommonOptions = CommonOptions
{ record :: Separator
, field :: Separator
, inputFile :: Maybe FilePath
}

-- | Extract '-D' and '-d'. We perform this step separately because those two
-- delimiters are used by both the input and output specs.
-- | Extract '-D', '-d', and input file.
-- We perform this step separately because those three
-- options are used by both the input and output specs.
--
-- TODO: UPDATE TEST CASES
-- >>> let test = testP commonSeparators
--
-- >>> test []
Expand All @@ -33,17 +41,17 @@ type CommonSeparators = (Separator, Separator)
--
-- >>> test ["-D|", "-d,"]
-- (Delimiter "|",Delimiter ",")
commonSeparators :: (Functor m, Monad m)
=> OptionParserT HawkOption m CommonSeparators
commonSeparators = do
commonOptions :: (Functor m, Monad m)
=> OptionParserT HawkOption m CommonOptions
commonOptions = do
r <- lastSep Option.RecordDelimiter defaultRecordSeparator
f <- lastSep Option.FieldDelimiter defaultFieldSeparator
return (r, f)
file <- consumeLast Option.InputFile "" $ consumeNullable "" consumeString
return $ CommonOptions r f $ if null file then Nothing else Just file
where
lastSep opt def = consumeLast opt def consumeSep
consumeSep = fmap Delimiter . Option.consumeDelimiter


-- | The input delimiters have already been parsed, but we still need to
-- interpret them and to determine the input source.
--
Expand Down Expand Up @@ -72,19 +80,18 @@ commonSeparators = do
-- InputFile "/etc/passwd"
-- Records (Delimiter "\n") (Fields (Delimiter ":"))
inputSpec :: (Functor m, Monad m)
=> CommonSeparators -> OptionParserT HawkOption m InputSpec
inputSpec (rSep, fSep) = InputSpec <$> source <*> format
=> CommonOptions -> OptionParserT HawkOption m InputSpec
inputSpec opts = InputSpec <$> source <*> format
where
source = do
r <- consumeExtra consumeString
return $ case r of
Nothing -> UseStdin
Just f -> InputFile f
source = return $ maybe UseStdin InputFile file
format = return streamFormat
streamFormat | rSep == Delimiter "" = RawStream
| otherwise = Records rSep recordFormat
| otherwise = Records rSep recordFormat
recordFormat | fSep == Delimiter "" = RawRecord
| otherwise = Fields fSep
| otherwise = Fields fSep
rSep = record opts
fSep = field opts
file = inputFile opts

-- | The output delimiters take priority over the input delimiters, regardless
-- of the order in which they appear.
Expand All @@ -109,16 +116,33 @@ inputSpec (rSep, fSep) = InputSpec <$> source <*> format
-- >>> test ["-o\t", "-d,", "-O|"]
-- UseStdout
-- ("|","\t")
--
-- TODO: write test cases for in-place edit
outputSpec :: (Functor m, Monad m)
=> CommonSeparators -> OptionParserT HawkOption m OutputSpec
outputSpec (r, f) = OutputSpec <$> sink <*> format
=> CommonOptions -> OptionParserT HawkOption m OutputSpec
outputSpec opts = OutputSpec <$> sink <*> format
where
sink = return UseStdout
format = OutputFormat <$> record <*> field
record = consumeLast Option.OutputRecordDelimiter r' Option.consumeDelimiter
field = consumeLast Option.OutputFieldDelimiter f' Option.consumeDelimiter
r' = fromSeparator r
f' = fromSeparator f
sink = do
-- TODO: refactor with upcoming consumeNullable changes
backupSuffix <- consumeLast Option.InPlaceEdit "" $
consumeNullable "<none>" consumeString
case file of
Nothing -> if null backupSuffix
then return UseStdout
else fail "in-place edit requires input file"
Just file' -> if null backupSuffix
then return UseStdout
else if backupSuffix == "<none>"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ""?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the consumeNullable just returns a string, we do not know the whether the argument "-i" was provided or not. So a default value is used to check whether "-i" was provided with no backup file.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why use "" for this default value? The value " looks like a legal suffix whereas "" does not, so it would make more sense to me if "" was used as the default.

Even better: I'll see if I can refactor this weird nullable API to return a Maybe instead of forcing a default.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I finally get it: you're not using "" because "" represents the case in which the --in-place option was not given, while "<none>" represents the case in which the --in-place was given with no suffix

then return $ OutputFile $ FileSink file' Nothing
else return $ OutputFile $ FileSink file'
(Just $ file' <.> backupSuffix)

format = OutputFormat <$> record' <*> field'
record' = consumeLast Option.OutputRecordDelimiter r' Option.consumeDelimiter
field' = consumeLast Option.OutputFieldDelimiter f' Option.consumeDelimiter
r' = fromSeparator $ record opts
f' = fromSeparator $ field opts
file = inputFile opts


-- | The information we need in order to evaluate a user expression:
Expand Down Expand Up @@ -209,15 +233,15 @@ parseArgs args = runOptionParserT options parser args
parser = do
lift $ return () -- silence a warning
cmd <- consumeExclusive assoc eval
c <- commonSeparators
c <- commonOptions
cmd c
assoc = [ (Option.Help, help)
, (Option.Version, version)
, (Option.Apply, apply)
, (Option.Map, map')
]
help, version, eval, apply, map' :: (Functor m,MonadIO m) => CommonSeparators

help, version, eval, apply, map' :: (Functor m,MonadIO m) => CommonOptions
-> OptionParserT HawkOption m HawkSpec
help _ = return Help
version _ = return Version
Expand Down