Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: uniq_vals

Description

uniq_vals selects records from the stream by checking values of a given key. If a duplicate record exists based on the given key, it will only output one record does not locate records where the value to the specified key is located only once ). If the -i switch is used, then non-unique records are located.

Usage

... | uniq_vals [options]

Options

[-?          | --help]               #  Print full usage description.
[-k <string> | --key=<string>]       #  Key for which the value is checked for uniqueness.
[-i          | --invert]             #  Display non-unique records.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following two column table in the file test.tab:

Human   H1
Human   H2
Human   H3
Dog     D1
Dog     D2
Mouse   M1

To locate all unique values of the first columen we use read_tab:

read_tab -i test.tab | uniq_vals -k V0

V0: Human
V1: H1
---
V0: Dog
V1: D1
---
V0: Mouse
V1: M1
---

The result is three records, one unique for each V0.

If we instead want the non-unique records we use the -i switch with uniq_vals:

read_tab -i test.tab | uniq_vals -k V0 -i

V0: Human
V1: H2
---
V0: Human
V1: H3
---
V0: Dog
V1: D2
---

... and the result shows those records which duplicate values to V0.

So, how do we get the non-duplicated record with the Mouse? That is in fact not a job for uniq_vals.

read_tab -i test.tab | count_vals -k V0 | grab -e 'V0_COUNT=1'

V0: Mouse
V1: M1
V0_COUNT: 1
---

However, if we use both count_vals we can obtain a list of how many times each of the records were duplicated based on the first column:

read_tab -i test.tab | count_vals -k V0 | uniq_vals -k V0_COUNT

V0: Human
V1: H1
V0_COUNT: 3
---
V0: Dog
V1: D1
V0_COUNT: 2
---
V0: Mouse
V1: M1
V0_COUNT: 1
---

See also

read_tab

count_vals

grab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

uniq_vals is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally