Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments-Upen #31

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ _You should see `advanced_shell.zip` as part of the output to the screen._

**4.** Finally, to **decompress the folder**:

## Comment-Upen: As we are already in terminal. it might be easier to type `unzip advanced_shell.zip` rather then going back to GUI.

* Double click on advanced_shell.zip on a mac. This will automatically inflate the folder.
* If you are on windows, press and hold (or right-click) the folder, select Extract All..., and then follow the instructions.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ In this lesson, we will:

## Getting Started

## Comment-Upen: Even we mentioned that we introuduced grep in the previous workshops. I think participants will find it useful to get a brief introduction of grep before we go in depth. and then introduce our toy file `catch.txt` to use.

Before we get started, let's take a briefly look at the `catch.txt` file in a `less` buffer in order to get an idea of what the file looks like:

```
Expand All @@ -23,6 +25,9 @@ less catch.txt

In here, you can see that we have a variety of case differences and misspellings. These differences are not exhaustive, but they will be helpful in exploring how regular expressions are implemented in `grep`.


## Comment-Upen: Before introducing cautions and extended regular expression, which we say we won't be using too many. I think, If I am a participant and in beginner level, I would be more interested to just try grep on the catch.txt file to start with with simple examples. And explain the difference between no quotation, single quotataion, and double quotation with dummy errors we can produce down below. may be we can demonstrate few simple flags we can use with grep like -c for counting, -n for printing line number, using -v to print negative results and others. We can use double quotations in all the examples and ask participants what will happen if we don't use quotation marks or use single quotation. Ask them to do it to practice grep with differnt flags. and introduce the importance of quotations and cases where those will be useful. Just a thought!

## A bit more depth on grep

There are two principles that we should discuss more, the `-E` option and the use of quotation marks.
Expand All @@ -31,6 +36,8 @@ There are two principles that we should discuss more, the `-E` option and the us

There is a `-E` option when using `grep` that allows the user to use what is considered "extended regular expressons". We won't use too many of these types of regular expressions and we will point them out when we need them. If you want to make it a habit to always use the `-E` option when using regular expressions in `grep` it is a bit more safe.

## Comment-Upen: I would explain what we meant by safe.

### Quotations

When using grep it is usually not required to put your search term in quotes. However, if you would like to use `grep` to do certain types of searches, it is better or *safer* to wrap your search term in quotations, and likely double quotations. Let's briefly discuss the differences:
Expand Down Expand Up @@ -86,7 +93,7 @@ Will return:
```
C${at}CH
```

## Comment-Upen: Maybe this take home message can go to the bottom of the page and bullet point 1.
### grep Depth Take-Home

In conclusion, while these are all mostly edge cases, we believe that it is generally a good habit to wrap the expressions that you use for `grep` in double quotations and also use the `-E` option. This practice will not matter for the overwhelming number of cases, but it is sometimes difficult to remember these edge cases and thus it is mofe safe to just build them into a habit. Of course, your preferences may vary.
Expand Down Expand Up @@ -465,6 +472,8 @@ C${at}CH
COTCH
```

## Comment-Upen: Having a multi-fasta or multi-fastq file in our demo data and using that to count number of sequences with grep -c "^>" my.fasta, Finding the starting codon "ATG" or stop codon "TAA" or extracting "cds" between ATG and TAA. using grep with -A 1 and -B 1 to get the header and sequence information of the sequence using a small part of the sequence. May be using using primer pairs to locate the pcr amplicon region, would give participants some basic real world example of grep application. I mean a few of these examples but not too many. and I think this will align well with bioinformatic examples in other lessons in this workshop, just a thought.

***

## Exercises
Expand Down
8 changes: 6 additions & 2 deletions Finding_and_summarizing_colossal_files/lessons/03_sed.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ Lastly, you can use `N~n` in the address to indicate that you want to apply the
```
sed '1~2 s/an/replacement/g' ecosystems.txt
```
## Comment-Upen: tilde didn't work on my computer in above code. it says not a valid command. I am using mac with Apple M3 chip, its the latest, I suppose many of my participants will have similar configuration?

## Bioinformatics Example

Expand All @@ -178,6 +179,7 @@ cat my_fastq.fq.gz | sed -n '1~4p' > quality_scores.txt
```
The first half of the pipe prints the file and the sed command grabs every forth line. Try it with the `Mov10_oe_1.subset.fq` file in the advanced_shell directory!

## Comment-Upen: There is no my_fastq.fq.gz in our training material folder. and also just my_fastq.gz or my_fq.gz would be fine as a file name. again tilde won't work on mine.

## Deletion

Expand Down Expand Up @@ -262,24 +264,26 @@ You can also ***c***hange entire lines in `sed` using the `c` command. We could
sed '1 c header' ecosystems.txt
```

## Comment-Upen: The above command doesn't work on my laptop. instead prints: `sed: 1: "1 c header": command c expects \ followed by text`
This can also be utilized in conjunction with the `A,B` interval syntax, but we should be aware that it will replace ALL lines in that interval with a SINGLE line.

```
sed '1,3 c header' ecosystems.txt
```

## Comment-Upen: same as above, doesn't work on my mac.
You can also replace every *n*-th line starting at *N*-th line using the `N~n` address syntax:

```
sed '1~3 c header' ecosystems.txt
```

## Comment-Upen: ~ in above command says invalid in my mac.
Lastly, you can also replace lines match a pattern:

```
sed '/jaguar/ c header' ecosystems.txt
```

## Comment-Upen: error on above command: sed: 1: "/jaguar/ c header": command c expects \ followed by text

## Multiple expressions

Expand Down
6 changes: 6 additions & 0 deletions Finding_and_summarizing_colossal_files/lessons/AWK_module.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ Were seals ever observed in any of the other parks, note that `||` is or in awk

</details>

## Comment-Upen: Both options above doesnot print anything on my laptop.

****

Expand Down Expand Up @@ -266,6 +267,7 @@ To simply extract the Yosemite data (column 3). We use the second part:
```bash
awk -F "," '$2 ~ "coyote"'
```
## Comment-Upen: may be add the file name animal_observations_edited.txt at the end in above script. if someone enters this, terminal will just hung up.

to separate the comma separated fields of column 3 and ask which lines have the string coyote in field 2. We want to print the entire comma separated list (i.e., column 3) to test our code which is the default behavior of `awk` in this case.

Expand Down Expand Up @@ -377,6 +379,8 @@ samtools view -S -b ${sam}.sam > ${sam}.bam
done
```

## Comment-Upen: We are not running this workshop in cluster right? running above chunk with samtools might be a problem?

This actually combines a number of basic and intermediate shell topics such as [positional parameters]([positional_params.md](https://hbctraining.github.io/Training-modules/Accelerate_with_automation/lessons/positional_params.html)), [for loops](https://hbctraining.github.io/Training-modules/Accelerate_with_automation/lessons/loops_and_scripts.html), and `awk`!

* We start with a for loop that counts from 1 to 10
Expand All @@ -391,6 +395,8 @@ With our new `awk` expertise let's take a look at that `awk` command alone!
```bash
awk -v awkvar="${i}" 'NR==awkvar' samples.txt
```
## Comment-Upen: No samples.txt in workshop material folder??


We have not encountered -v yet. The correct syntax is `-v var=val` which assign the value val to the variable var, before execution of the program begins. So what we are doing is creating our own variable within our `awk` program, calling it `awkvar` and assigning it the value of `${i}` which will be a number between 1 and 10 (see for loop above). `${i}` and thus `awkvar` will be different for each loop.

Expand Down