Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while training #16

Open
kow10120 opened this issue Feb 26, 2019 · 25 comments
Open

Error while training #16

kow10120 opened this issue Feb 26, 2019 · 25 comments

Comments

@kow10120
Copy link

Hello all,

I am trying to train a model on a Windows computer. When I input the following:

 train(path_prefix = "F:/IERCMLWIC/TrainingImages/TRAININGIMAGES",
  
         data_info = "F:/IERCMLWIC/L1/data_info.csv",
 
         model_dir = "F:/IERCMLWIC",
 
         python_loc = "C:/Users/kvanatta/Anaconda3/",
  
         num_classes = 24,
 
         log_dir_train = "IERCMLWIC" 
         )

I get the following output:

train(path_prefix = "F:/IERCMLWIC/TrainingImages/TRAININGIMAGES",
  
          data_info = "F:/IERCMLWIC/L1/data_info.csv",
 
          model_dir = "F:/IERCMLWIC",
 
          python_loc = "C:/Users/kvanatta/Anaconda3/",
  
          num_classes = 24,
 
          log_dir_train = "IERCMLWIC" 
          )
Error in UseMethod("train") : 
  no applicable method for 'train' applied to an object of class "character"

Does anyone have experience with this?

@Nova-Scotia
Copy link

did you try added a "/" to the end of your path_prefix?

@Nova-Scotia
Copy link

and maybe your model_dir, not sure. I know classify has some code that deals with missing slashes, not sure if train does without doing some digging

@mikeyEcology
Copy link
Owner

This does not look like an error message from MLWIC. You might have another package loaded that has a function called train. A way to be sure you're using the correct function is to be more specific when you use it, so try using MLWIC::train() instead.

@kow10120
Copy link
Author

Thank you both for the help. I've tried both of your suggestions. Mikey following your suggestion to use MLWIC::train() I am now getting different output:

MLWIC::train(path_prefix = "F:/IERCMLWIC/TRAININGIMAGES",  
         data_info = "F:/IERCMLWIC/L1/data_info.csv", 
         model_dir = "F:/IERCMLWIC", 
         python_loc = "C:/Users/kvanatta/Anaconda3/",  
         num_classes = 24, 
         log_dir_train = "IERCMLWIC" 
         )
C:\Users\kvanatta\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
 from ._conv import register_converters as _register_converters
Traceback (most recent call last):
 File "train.py", line 339, in <module>
   main()
 File "train.py", line 319, in main
   args.num_samples = sum(1 for line in open(args.data_info))
FileNotFoundError: [Errno 2] No such file or directory: 'data_info_train.csv'
[1] "training of model took 2.23065400123596 secs. The trained model is in IERCMLWIC. Specify this directory as the log_dir when you use classify(). ""

It appears that tensorflow was masking the train() function. I have tried every combination of trailing slashes on path_prefix and model_dir as suggested by Nova-Scotia, but the output shown above does not change.

@mikeyEcology
Copy link
Owner

mikeyEcology commented Feb 27, 2019

It seems like for some reason, there are issues when you try to use the package when your files are on an external drive. (I'm assuming that your F drive is external?)

One potential solution is to move and rename your data_info.csv file manually. So rename this file data_info_train.csv and make sure that it is in the folder F:/IERCMLWIC/L1/.

You also might want to try setting retrain = FALSE, but this probably wouldn't fix the error that you're getting.

@Nova-Scotia
Copy link

@kow10120 , did you get this to work? The fix @mikeyEcology suggested (rename the .csv to data_info_train.csv) worked for me, after I got the same errors that you noted.

@kow10120
Copy link
Author

Thank you for the advice, sorry I have not had much free time to devote to this recently. I did not get it to work, but I may need to spend more time carefully combing through the large excel file I'm working with and the actual photos to ensure there are no discrepancies. The joys of large data sets.
Thank you for the help, and I will update when I am ready to proceed again.

@mikeyEcology
Copy link
Owner

Hopefully you're not going through the file names manually? There are ways to do this in Unix that can save you a lot of time. In Unix, I would go to the directory where I have the files and type find $PWD -type f > listOfFiles.txt, which would create a file in my directory called listOfFiles.txt with the whole list. Presumably Windows has a similar function.

@tundraboyce
Copy link

Had this same error and just wanted to say that changing the name to data_info_train.csv worked. I ran into another error I'm hoping might be obvious in train. I'm guessing this is to do with the num_classes argument? I have 23 classes that I'm trying to train with, have a missed a step somewhere to specify this?

I have only done so in the line "python_loc = "C:\Users\User\Anaconda3\",num_classes = 23, log_dir_train = "traindir" so far

Assign requires shapes of both tensors to match. lhs shape= [23] rhs shape= [28]
[[node save/Assign_1 (defined at train.py:198) ]]

@mikeyEcology
Copy link
Owner

@tundraboyce did you specify retrain=FALSE? Can you please post all of the code that you put in the train function and all of the output?

@mikeyEcology
Copy link
Owner

This was previously not explained in the readme. I just updated it so that this is more clear:

G) If your num_classes is not equal to the number in the built in model (num_classes != 28), you will need to specify retrain=FALSE.

@tundraboyce
Copy link

That sorted it, thanks again! Appreciate the responses.

Now I just have "
InvalidArgumentError (see above for traceback): targets[14] is out of range
[[node Tower_1/in_top_k/InTopKV2 (defined at train.py:127) ]]"

@mikeyEcology
Copy link
Owner

Hey @Nova-Scotia since you're the Windows expert on MLWIC, I'm wondering if you can try something for me when you get a chance. I updated train and classify so that they should properly move the data_info file on a Windows computer if you set os=Windows in the function call. I don't have a way to test it, though, because I'm only running Linux.

@Nova-Scotia
Copy link

Hi @mikeyEcology , sure, I can do that - might take me a couple days to get to it (busy week!) but I'll keep you posted. Let me know if another user gets to it first!

Erica

@mikeyEcology
Copy link
Owner

Thank you @Nova-Scotia !

@Nova-Scotia
Copy link

Hi again. Did a quick check of classify, haven't tried train yet. It wasn't working so I dug into the code and realized maybe there's an easy fix?

in classify the code tells R to make a new file named "data_info_train.csv":

if (os == "Windows") {
        data_file <- read.table(data_info, header = FALSE, sep = ",")
        output.file <- file("data_info_train.csv", "wb")
        write.table(data_file, file = output.file, append = TRUE, 
            quote = FALSE, row.names = FALSE, col.names = FALSE, 
            sep = ",")
        close(output.file)
        rm(output.file)

but then later in the code it calls for "data_info.csv":

 eval_py <- paste0(python_loc, "python eval.py --architecture ", 
        architecture, " --depth ", depth, " --log_dir ", log_dir, 
        " --path_prefix ", path_prefix, " --batch_size 128 --data_info data_info.csv", 
        " --delimiter ", delimiter, " --save_predictions ", save_predictions, 
        " --top_n ", top_n, " --num_classes=", num_classes, "\n")

Maybe just a typo when copy-pasting from train code?

@tundraboyce
Copy link

I actually managed to get train to work this morning and I have another computer chugging away on it now. The issue you described was pretty spot on.

Train was looking for a data_info_train.csv regardless of what was called in the code. I was trying to call a different file (e.g., data_info_pilot.csv) but the code would only work, and only look for "data_info_train" in the L1 folder. . Also my out-of-range error came from num_classes = 23: 23+0 = 24, duh. Brain freeze.

I'll let you know how well classify works with this model on my images.

@Nova-Scotia
Copy link

Nova-Scotia commented Jun 12, 2019

Just an update - the classify command does work as expected if you change "data_info_train.csv" to "data_info.csv" in the source code.

@mikeyEcology
Copy link
Owner

Thank you @Nova-Scotia and @tundraboyce for testing this. I corrected the error that you suggested with classify. That's what I get for trying to copy and paste.

@pirocha
Copy link

pirocha commented Aug 31, 2019

Hi!
I'm trying to train a model with my species, but I'm getting a different error you mentioned in this topic.

The input I'm using is:

MLWIC::train(
    path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa/", 
    data_info = "C:/Users/User/Documents/CameraTrap/L1/data_info_train.csv",
    model_dir = "C:/Users/User/Documents/CameraTrap", 
    python_loc = "C:/Users/User/Anaconda3/", 
    os = "Windows", 
    num_classes = 51, 
    delimiter = ",", 
    architecture = "resnet", 
    depth = "152", 
    batch_size = "64",
    log_dir_train = "angola_output", 
    retrain = FALSE, 
    print_cmd = FALSE )

and I get the following output:
C:\Users\User\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from floattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Namespace(LR_steps=[19, 30, 44, 53], LR_values=[0.01, 0.005, 0.001, 0.0005, 0.0001], WD_steps=[30], WD_values=[0.0005, 0.0], architecture='resnet', batch_size=64, chunked_batch_size=32, crop_size=[224, 224], data_info='data_info_train.csv', delimiter=',', depth=152, load_size=[256, 256], log_debug_info=False, log_device_placement=False, log_dir='angola_output', num_batches=3095, num_channels=3, num_classes=51, num_epochs=55, num_gpus=2, num_samples=198073, num_threads=20, path_prefix='C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa/', retrain_from=None, run_name='Run-31-08-2019_19-19-32', shuffle=True, snapshot_prefix='snapshot', top_n=2, transfer_mode=[0]) Saving everything in angola_output Traceback (most recent call last): File "train.py", line 339, in <module> main() File "train.py", line 335, in main train(args) File "train.py", line 99, in train images, labels = data_loader.read_inputs(True, args) File "C:\Users\User\Documents\CameraTrap\L1\data_loader.py", line 23, in read_inputs filepaths, labels = _read_label_file(args.data_info, args.delimiter) File "C:\Users\User\Documents\CameraTrap\L1\data_loader.py", line 19, in _read_label_file labels.append(int(tokens[1])) ValueError: invalid literal for int() with base 10: 'NA\n' [1] "training of model took 3.71519708633423 secs. The trained model is in angola_output. Specify this directory as the log_dir when you use classify(). "

Can somebody help me with this?
Thank you!

@mikeyEcology
Copy link
Owner

Hi @pirocha,
I'm not sure exactly what the problem is, but to rule some things out, can you try running this

MLWIC::train(
    path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa", 
    data_info = "C:/Users/User/Documents/CameraTrap/L1/data_info_train.csv",
    model_dir = "C:/Users/User/Documents/CameraTrap", 
    python_loc = "C:/Users/User/Anaconda3/", 
    os = "Windows", 
    num_classes = 51, 
    delimiter = ",", 
    architecture = "resnet", 
depth = "18", 
    #depth = "152", 
    #batch_size = "64",
batch_size = "128",
    log_dir_train = "angola_output", 
    retrain = FALSE, 
    print_cmd = FALSE )

@pirocha
Copy link

pirocha commented Sep 1, 2019

Hi @mikeyEcology ,
I tried to change the depth and batch_size as you suggested but I still get the same output.
I know nothing about programming, but is it possible that the python script considers something like an integer that is a float in my data? For instance, in data_info_train.csv?

@mikeyEcology
Copy link
Owner

Did you try also re-setting your path prefix to:
path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa

@pirocha
Copy link

pirocha commented Sep 1, 2019 via email

@mikeyEcology
Copy link
Owner

Ok. No worries. Yes-NAs in the input file will cause errors.
Glad you got it running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants