Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Approaches for RNA Transcription #1159

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions exercises/practice/rna-transcription/.approaches/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"introduction": {
"authors": [
"MatthijsBlom"
]
},
"approaches": [
{
"uuid": "209cd027-6f98-47ac-a77f-8a083e0cd100",
"slug": "validate-first",
"title": "Validate first, then transcribe",
"blurb": "First, find out whether there are invalid characters in the input. Then, if there aren't, transcribe the strand in one go.",
"authors": [
"MatthijsBlom"
]
}
]
}
204 changes: 204 additions & 0 deletions exercises/practice/rna-transcription/.approaches/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Introduction

This problem requires both

- validating that all input characters validly denote DNA nucleobases, and
- producing these DNA nucleobases' corresponding RNA nucleobases.

The first below listed approach has these tasks performed separately.
The other ones combine them in a single pass, in progressively more succinct ways.


## Approach: validate first, then transcribe

```haskell
toRNA :: String -> Either Char String
toRNA dna =
case find (`notElem` "GCTA") dna of
Nothing -> Right (map transcribe dna)
Just c -> Left c
where
transcribe = \case
'G' -> 'C'
'C' -> 'G'
'T' -> 'A'
'A' -> 'U'
```

First search for the first invalid nucleobase.
If you find one, return it.
If all are valid, transcribe the entire strand in one go using `map`.

This approach has the input walked twice.
Other approaches solve this problem in one pass.

This solution deals with nucleobases twice: first when validating, and again when transcribing.
Ideally, nucleobases are dealt with in only one place in the code.

[Read more about this approach][validate-first].


## Approach: a single pass using only elementary operations

```haskell
toRNA :: String -> Either Char String
toRNA [] = Right []
toRNA (n : dna) = case transcribe n of
Nothing -> Left n
Just n' -> case toRNA dna of
Left c -> Left c
Right rna -> Right (n' : rna)

transcribe :: Char -> Maybe Char
transcribe = \case
'G' -> Just 'C'
'C' -> Just 'G'
'T' -> Just 'A'
'A' -> Just 'U'
_ -> Nothing
```

This solution combines validation and transcription in a single list traversal.
It is _elementary_ in the sense that it employs no abstractions: it uses only constructors (`[]`, `(:)`, `Nothing`, `Just`, `Left`, `Right`) and pattern matching, and no predefined functions at all.

Some of the code patterns used in this solution are very common, and were therefore abstracted into standard library functions.
The approaches listed below show how much these functions can help to concisely express this approach's logic.

[Read more about this approach][elementary].


## Approach: use `do`-notation

```haskell
toRNA :: String -> Either Char String
toRNA [] = pure []
toRNA (n : dna) = do
n' <- transcribe n
rna <- toRNA dna
pure (n' : rna)

transcribe :: Char -> Either Char Char
transcribe = \case
'G' -> Right 'C'
'C' -> Right 'G'
'T' -> Right 'A'
'A' -> Right 'U'
c -> Left c
```

The [elementary solution][elementary] displays a common pattern that can equivalently be expressed using the common monadic `>>=` combinator and its `do`-notation [syntactic sugar][wikipedia-syntactic-sugar].

[Read more about this approach][do-notation].


## Approach: use `Functor`/`Applicative` combinators

```haskell
toRNA :: String -> Either Char String
toRNA [] = pure []
toRNA (n : dna) = (:) <$> transcribe n <*> toRNA dna

transcribe :: Char -> Either Char Char
transcribe = \case
'G' -> Right 'C'
'C' -> Right 'G'
'T' -> Right 'A'
'A' -> Right 'U'
c -> Left c
```

The [elementary solution][elementary] displays a number of common patterns.
As demonstrated by the [`do` notation solution][do-notation], these can be expressed with the `>>=` operator.
However, the full power of `Monad` is not required.
The same logic can also be expressed using common functorial combinators such as `fmap`/`<$>` and `<*>`.

[Read more about this approach][functorial-combinators].


## Approach: use `traverse`

```haskell
toRNA :: String -> Either Char String
toRNA = traverse $ \case
'G' -> Right 'C'
'C' -> Right 'G'
'T' -> Right 'A'
'A' -> Right 'U'
n -> Left n
```

As it turns out, the [solution that uses functorial combinators][functorial-combinators] closely resembles the definition of `traverse` for lists.
In fact, through a series of rewritings it can be shown to be equivalent.

[Read more about this approach][traverse].


## General guidance

### Language extensions

For various reasons, some of GHC's features are locked behind switches known as _language extensions_.
You can enable these by putting so-called _language pragmas_ at the top of your file:

```haskell
-- This 👇 is a language pragma
{-# LANGUAGE LambdaCase #-}

module DNA (toRNA) where

{-
The rest of your code here
-}
```


#### `LambdaCase`

Consider the following possible definition of `map`.

```haskell
map f xs = case xs of
[] -> []
x : xs' -> f x : map xs'
```

Here, a parameter `xs` is introduced only to be immediately pattern matched against, after which it is never used again.

Coming up with good names for such throwaway variables can be tedious and hard.
The `LambdaCase` extension allows us to avoid having to by providing an extra bit of [syntactic sugar][wikipedia-syntactic-sugar]:

```haskell
f = \case { }
-- is syntactic sugar for / an abbreviation of
f = \x -> case x of { }
```

The above definition of `map` can equivalently be written as

```haskell
map f = \case
[] -> []
x : xs -> f x : map f xs
```


[do-notation]:
https://exercism.org/tracks/haskell/exercises/rna-transcription/approaches/do-notation
"Approach: use do-notation"
[elementary]:
https://exercism.org/tracks/haskell/exercises/rna-transcription/approaches/elementary
"Approach: a single pass using only elementary operations"
[functorial-combinators]:
https://exercism.org/tracks/haskell/exercises/rna-transcription/approaches/functorial-combinators
"Approach: use Functor/Applicative combinators"
[traverse]:
https://exercism.org/tracks/haskell/exercises/rna-transcription/approaches/traverse
"Approach: use traverse"
[validate-first]:
https://exercism.org/tracks/haskell/exercises/rna-transcription/approaches/validate-first
"Approach: validate first"


[wikipedia-syntactic-sugar]:
https://en.wikipedia.org/wiki/Syntactic_sugar
"Wikipedia: Syntactic sugar"
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Validate first, then transcribe

```haskell
toRNA :: String -> Either Char String
toRNA dna =
case find (`notElem` "GCTA") dna of
Nothing -> Right (map transcribe dna)
Just c -> Left c
where
transcribe = \case
'G' -> 'C'
'C' -> 'G'
'T' -> 'A'
'A' -> 'U'
```

One approach to solving this problem is to

- first check that all input characters are valid,
- return one of the invalid characters if there are any, and otherwise to
- convert all the DNA nucleotides into RNA nucleotides.

Some submitted solutions retrieve the invalid character (if present) in two steps:

- first check that there are _some_ invalid characters, for example using `any`, and
- then find the first one, for example using `filter` and `head`.

The solution highlighted here combines these steps into one.
As used here, `find` returns `Nothing` if there are no invalid characters, and if there are then it returns `Just` the first one.
By pattern matching on `find`'s result it is determined how to proceed.

For transcribing DNA nucleobases into RNA nucleobases a locally defined function `transcribe` is used.
It is a [partial function][wiki-partial-functions]: when given any character other than `'G'`, `'C'`, `'T'`, or `'A'` it will crash.

Partial functions display behavior (e.g. crashing) that is not documented in their types.
This tends to make reasoning about code that uses them more difficult.
For this reason, partial functions are generally to be avoided.

Partiality is less objectionable in local functions than in global ones, because in local contexts it is easier to make sure that functions are never applied to problematic arguments.
Indeed, in the solution highlighted above it is clear that `transcribe` will never be applied to a problematic character, as if there were any such characters in `dna` then `find` would have returned `Just _` and not `Nothing`.

Still, it would be nice if it weren't necessary to check that `transcribe` is never applied to invalid characters.
`transcribe` is forced by its `Char -> Char` type to either be partial or else to return bogus values for some inputs &ndash; which would be similarly undesirable.
But another type, such as `Char -> Maybe Char`, would allow `transcribe` to be total.
The other approaches use such a variant.

This approach has the input walked twice (or thrice).
It is possible to solve this problem by walking the input only once.
The other approaches illustrate how.


[wiki-partial-functions]:
https://wiki.haskell.org/Partial_functions
"Haskell Wiki: Partial functions"
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
toRNA :: String -> Either Char String
toRNA dna =
case find (`notElem` "GCTA") dna of
Nothing -> Right (map transcribe dna)
Just c -> Left c
where
transcribe = \case
'G' -> 'C'