BUG: Issue converting raven.txt file to simple-seq #261

sfcooke96 · 2024-01-22T21:49:17Z

I'm running the following on a MAC with crowsetta V 5.0.1

I tried using the following script (suggested here: yardencsGitHub/tweetynet#223) to convert my raven.txt files to simple-seq for use with vak and tweetynet.

import crowsetta
import numpy as np

example = crowsetta.data.get('raven')
raven = crowsetta.formats.bbox.Raven.from_file(example.annot_path, annot_col='Species')
annot = raven.to_annot()
onsets_s = []
offsets_s = []
labels = []
for bbox in annot.bboxes:
    onsets_s.append(bbox.onset)
    offsets_s.append(bbox.offset)
    labels.append(bbox.label)
onsets_s = np.array(onsets_s)
offsets_s = np.array(offsets_s)
labels = np.array(labels)
simpleseq = crowsetta.formats.seq.SimpleSeq(
    onsets_s=onsets_s,
    offsets_s=offsets_s, 
    labels=labels,
    annot_path='/dummy/path'
)
simpleseq.to_csv('example-data.csv')

After running this I got:

AttributeError: 'SimpleSeq' object has no attribute 'to_csv'

I adjusted the script slightly (raven = .... , simplest.to_file...) to the following:

import crowsetta
import numpy as np


example = crowsetta.data.get('raven')
raven = crowsetta.formats.bbox.raven.Raven.from_file(example.annot_path, annot_col='Species)
annot = raven.to_annot()
onsets_s = []
offsets_s = []
labels = []
for bbox in annot.bboxes:
    onsets_s.append(bbox.onset)
    offsets_s.append(bbox.offset)
    labels.append(bbox.label)
onsets_s = np.array(onsets_s)
offsets_s = np.array(offsets_s)
labels = np.array(labels)
simpleseq = crowsetta.formats.seq.SimpleSeq(
    onsets_s=onsets_s,
    offsets_s=offsets_s, 
    labels=labels,
    annot_path='/Users/training_data'
)

simpleseq.to_file("data.csv")

I have 10 .txt files in my directory (> 15 rows per file) to be written into simple-seq format but the resulting output is the following (this is complete):

onset_s,offset_s,label
154.387792767,154.911598217,EATO
167.526598245,168.17302044,EATO
183.609636834,184.097751553,EATO
250.527480604,251.160710509,EATO
277.88724277,278.480895806,EATO
295.52970757,296.110168316,EATO

I tried adjusting the above code

raven = crowsetta.formats.bbox.**raven**.Raven.from_file(example.annot_path, annot_col='Species)

By changing annot_col to 'Annotation' - the header for the annotation col in my .txt files. - and received the following output:

(tweetynet) Stephens-MacBook-Pro:HAV_TN_Training stephencooke$ python test.py 
Traceback (most recent call last):
  File "/Users/stephencooke/Library/CloudStorage/OneDrive-UniversityofArizona/Tweetynet/HAV_TN_Training/test.py", line 6, in <module>
    raven = crowsetta.formats.bbox.raven.Raven.from_file(example.annot_path, annot_col='Annotations')
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/crowsetta/formats/bbox/raven.py", line 107, in from_file
    df = RavenSchema.validate(df)
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/pandera/api/pandas/model.py", line 306, in validate
    cls.to_schema().validate(
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/pandera/api/pandas/container.py", line 375, in validate
    return self._validate(
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/pandera/api/pandas/container.py", line 404, in _validate
    return self.get_backend(check_obj).validate(
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/pandera/backends/pandas/container.py", line 97, in validate
    error_handler = self.run_checks_and_handle_errors(
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/pandera/backends/pandas/container.py", line 172, in run_checks_and_handle_errors
    error_handler.collect_error(
  File "/Users/stephencooke/miniconda3/envs/tweetynet/lib/python3.9/site-packages/pandera/error_handlers.py", line 38, in collect_error
    raise schema_error from original_exc
pandera.errors.SchemaError: column 'annotation' not in dataframe
   Selection           View  Channel  begin_time_s  end_time_s  low_freq_hz  high_freq_hz Species
0          1  Spectrogram 1        1    154.387793  154.911598       2878.2        4049.0    EATO
1          2  Spectrogram 1        1    167.526598  168.173020       2731.9        3902.7    EATO
2          3  Spectrogram 1        1    183.609637  184.097752       2878.2        3975.8    EATO
3          4  Spectrogram 1        1    250.527481  251.160711       2756.2        3951.4    EATO
4          5  Spectrogram 1        1    277.887243  278.480896       2707.5        3975.8    EATO

I've attached example data here, the python script, and output file. troubleshooting.zip

Another question while we're here: will training the model on simple-seq annotations restrict the predicted annotations to onset - offset borders without including high and low frequency bounds? I'm interested because I was hoping to estimate frequency ranges with the output data. Apologies if I'm misunderstanding how prediction output will be formatted.

Thanks for your help!

The text was updated successfully, but these errors were encountered:

NickleDave · 2024-01-22T23:59:13Z

Hi @sfcooke96!

Thank you for providing a detailed bug report and the zip with a couple samples to test with. 🙏

I think I might have confused you with my snippet on the other issue.

When you use your data, you'll want to specify the path to those files as the first argument to crowsetta.formats.bbox.Raven.from_file, like so:

crowsetta.formats.bbox.Raven.from_file(
    'troubleshooting/data1.txt'
)

I was able to do this and load the file without issue.
You don't need to specify the annot_col since it has the default name for Raven (the example data we have is from a dataset that uses a different name for their annotations column). Seems like we handle extra columns gracefully (I guess I programmed the class better than I thought 😏 ).

You'll also need to loop over all your files and save each of them with a separate name, so you don't overwrite the previous one you saved.
Please try this short script and see if you get separate files, each with the appropriate number of rows.

import pathlib

import crowsetta
import numpy as np

# this is where we get our files from
src_dir = pathlib.Path('./troubleshooting')
# next line: sorted because 
# https://www.vice.com/en/article/zmjwda/a-code-glitch-may-have-caused-errors-in-more-than-100-published-studies
src_txt_files = sorted(src_dir.glob('*.txt'))

# this is where we save the files (so we don't overwrite the originals)
dst_dir = pathlib.Path('./annots-simple-seq')
dst_dir.mkdir(exist_ok=True)

# to save ourselves from a typo
assert dst_dir != src_dir

for txt_file in src_txt_files:
    print(
        f"Converting Raven file to simple-seq format: {txt_file}"
    )
    annot = crowsetta.formats.bbox.Raven.from_file(
        txt_file
    ).to_annot()

    onsets_s = []
    offsets_s = []
    labels = []
    for bbox in annot.bboxes:
        onsets_s.append(bbox.onset)
        offsets_s.append(bbox.offset)
        labels.append(bbox.label)
    onsets_s = np.array(onsets_s)
    offsets_s = np.array(offsets_s)
    labels = np.array(labels)
    simpleseq = crowsetta.formats.seq.SimpleSeq(
        onsets_s=onsets_s,
        offsets_s=offsets_s, 
        labels=labels,
        annot_path='/dummy/path/doesnt/matter/here'
    )
    dst_txt_file = dst_dir / txt_file.name
    print(
        f"Saving converted simple-seq file: {dst_txt_file}"
    )
    simpleseq.to_file(dst_txt_file)

Just let me know if you have any questions about what this is doing!
Happy to share the ~five things I've managed to learn about Python and just keep recycling 😜

Another question while we're here

Re: the TweetyNet model, please see my reply on the issue on the TweetyNet repo: yardencsGitHub/tweetynet#223 (comment)

sfcooke96 · 2024-01-24T00:27:32Z

@NickleDave, thank you - this solution seems to have worked! On to prepping, training, and predicting.

Thanks a lot for your active support here! 🙏

NickleDave · 2024-01-31T23:24:43Z

Of course, glad to hear it's working @sfcooke96!
I will go ahead and close this issue.

NickleDave closed this as completed Jan 31, 2024

NickleDave mentioned this issue Jan 31, 2024

Training PREP error: AttributeError: 'Annotation' object has no attribute 'seq' yardencsGitHub/tweetynet#223

Closed

NickleDave mentioned this issue Feb 22, 2024

Error running vak prep vocalpy/vak#740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Issue converting raven.txt file to simple-seq #261

BUG: Issue converting raven.txt file to simple-seq #261

sfcooke96 commented Jan 22, 2024 •

edited by NickleDave

Loading

NickleDave commented Jan 22, 2024 •

edited

Loading

sfcooke96 commented Jan 24, 2024

NickleDave commented Jan 31, 2024

BUG: Issue converting raven.txt file to simple-seq #261

BUG: Issue converting raven.txt file to simple-seq #261

Comments

sfcooke96 commented Jan 22, 2024 • edited by NickleDave Loading

NickleDave commented Jan 22, 2024 • edited Loading

sfcooke96 commented Jan 24, 2024

NickleDave commented Jan 31, 2024

sfcooke96 commented Jan 22, 2024 •

edited by NickleDave

Loading

NickleDave commented Jan 22, 2024 •

edited

Loading