Skip to content

Commit d4ae149

Browse files
authoredJul 13, 2020
Chueatwork add file format tags (awslabs#608)
* Fix description formatting * Update beataml.yaml * Add spp * Add spp, life sciences, aws-pds tags * Add spp * Update aws-igenomes.yaml * Add spp and tags * Add tags * Add file extensions
1 parent 4b3d220 commit d4ae149

8 files changed

+39
-2
lines changed
 

‎datasets/allen-brain-observatory.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Tags:
1414
- life sciences
1515
- signal processing
1616
- electrophysiology
17+
- Mus musculus
1718
License: http://www.alleninstitute.org/legal/terms-use/
1819
Resources:
1920
- Description: Project data files in a public bucket

‎datasets/allen-cell-imaging-collections.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,15 @@ Contact: jacksonb@alleninstitute.org
1616
ManagedBy: "[Allen Institute for Cell Science](https://www.allencell.org)"
1717
UpdateFrequency: Biweekly
1818
Tags:
19+
- aws-pds
20+
- life sciences
1921
- biology
2022
- cell imaging
23+
- cell biology
2124
- microscopy
2225
- image processing
2326
- machine learning
27+
- Homo sapiens
2428
License: https://www.allencell.org/terms-of-use.html
2529
Resources:
2630
- Description: Data files in a public bucket

‎datasets/allen-mouse-brain-atlas.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Tags:
1616
- life sciences
1717
- machine learning
1818
- transcriptomics
19+
- Mus musculus
1920
License: http://www.alleninstitute.org/legal/terms-use/
2021
Resources:
2122
- Description: Project data files in a public bucket

‎datasets/aws-igenomes.yaml

+7
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,17 @@ Contact: https://github.com/ewels/AWS-iGenomes/issues
55
UpdateFrequency: New data are added when available.
66
Tags:
77
- aws-pds
8+
- agriculture
89
- biology
910
- genetic
1011
- genomic
1112
- life sciences
13+
- reference index
14+
- Caenorhabditis elegans
15+
- Danio rerio
16+
- Homo sapiens
17+
- Mus musculus
18+
- Rattus norvegicus
1219
License: Multiple - please see [data origins](https://github.com/ewels/AWS-iGenomes#data-origin).
1320
Resources:
1421
- Description: AWS-iGenomes S3 Bucket

‎datasets/broad-references.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ Tags:
1212
- genetic
1313
- genomic
1414
- life sciences
15+
- reference index
16+
- Homo sapiens
1517
License: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
1618
Resources:
1719
- Description: "This dataset includes two human genome references assembled by the Genome Reference Consortium: Hg19 and Hg38."

‎datasets/ccle.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,12 @@ UpdateFrequency: |
1313
Tags:
1414
- aws-pds
1515
- cancer
16+
- genetic
1617
- genomic
1718
- life sciences
19+
- transcriptomics
20+
- whole genome sequencing
21+
- Homo sapiens
1822
License: "NIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies"
1923
Resources:
2024
- Description: RNA-Seq Alligned Reads, WXS Alligned Reads, WGS Alligned Reads

‎datasets/target.yaml

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
Name: Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
2-
Description: |
2+
Description: >
33
Therapeutically Applicable Research to Generate Effective Treatments (TARGET) is the collaborative
44
effort of a large, diverse consortium of extramural and NCI investigators. The goal of the effort
55
is to accelerate molecular discoveries that drive the initiation and progression of hard-to-treat
6-
childhood cancers and facilitate rapid translation of those findings into the clinic. TARGET projects provide comprehensive molecular characterization to determine the genetic changes
6+
childhood cancers and facilitate rapid translation of those findings into the clinic.
7+
8+
TARGET projects provide comprehensive molecular characterization to determine the genetic changes
79
that drive the initiation and progression of childhood cancers.The dataset contains open Clinical
810
Supplement, Biospecimen Supplement, RNA-Seq Gene Expression Quantification, miRNA-Seq Isoform
911
Expression Quantification, miRNA-Seq miRNA Expression Quantification data from Genomic Data

‎tags.yaml

+16
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- autism spectrum disorder
1111
- automatic speech recognition
1212
- autonomous vehicles
13+
- bam
1314
- biodiversity
1415
- bioinformatics
1516
- biology
@@ -34,13 +35,16 @@
3435
- conservation
3536
- coronavirus
3637
- COVID-19
38+
- cram
39+
- csv
3740
- culture
3841
- cyber security
3942
- Danio rerio
4043
- deep learning
4144
- demographics
4245
- denoising
4346
- dialog
47+
- dicom
4448
- digital preservation
4549
- disaster response
4650
- distributional semantics
@@ -57,6 +61,9 @@
5761
- energy
5862
- environmental
5963
- events
64+
- fasta
65+
- fastq
66+
- fast5
6067
- financial markets
6168
- fluorescence imaging
6269
- food security
@@ -69,6 +76,8 @@
6976
- geospatial
7077
- governance
7178
- government spending
79+
- h5
80+
- hdf5
7281
- health
7382
- high-throughput imaging
7483
- history
@@ -81,11 +90,13 @@
8190
- infrastructure
8291
- internet
8392
- intrusion detection
93+
- json
8494
- labeled
8595
- land
8696
- lidar
8797
- life sciences
8898
- loftee
99+
- long read sequencing
89100
- lightsheet microscopy
90101
- machine learning
91102
- machine translation
@@ -112,10 +123,12 @@
112123
- neuroimaging
113124
- neurophysiology
114125
- neuroscience
126+
- non-human primate
115127
- oceans
116128
- open source software
117129
- organelle
118130
- osm
131+
- pbi
119132
- pediatric
120133
- pharmaceutical
121134
- politics
@@ -133,6 +146,7 @@
133146
- satellite imagery
134147
- seismology
135148
- sentiment
149+
- short read sequencing
136150
- signal processing
137151
- simulations
138152
- single-cell transcriptomics
@@ -159,9 +173,11 @@
159173
- urban
160174
- us
161175
- us-dc
176+
- vcf
162177
- vep
163178
- virus
164179
- water
165180
- weather
166181
- whole genome sequencing
167182
- word embeddings
183+
- xml

0 commit comments

Comments
 (0)
Please sign in to comment.