Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于第一轮结束之后,关于java的报错问题(用五个卡跑的,java问题报错后, watch -n 2 --color gpustat --c显示还有四个卡在跑) #34

Open
Markkk111 opened this issue Apr 13, 2023 · 10 comments

Comments

@Markkk111
Copy link

您好,非常感谢您能解答这个问题!

1、报错代码如下:

Evaling epoch 0
caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:17<00:00, 8.60s/it]
caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:17<00:00, 8.59s/it]
caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:18<00:00, 8.68s/it]
caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:19<00:00, 8.71s/it]
caption_diff_vitb16: 100%|██████████████████████████████████████████████████████████████████████| 16/16 [02:20<00:00, 8.75s/it]
loading annotations into memory...
0:00:00.266195
creating index...
index created!
Loading and preparing results...
DONE (t=0.02s)
creating index...
index created!
tokenization...
PTBTokenizer tokenized 307085 tokens at 1518683.41 tokens per second.
PTBTokenizer tokenized 58441 tokens at 526455.68 tokens per second.
setting up scorers...
computing Bleu score...
{'testlen': 48641, 'reflen': 47900, 'guess': [48641, 43641, 38641, 33641], 'correct': [22410, 5483, 267, 19]}
ratio: 1.0154697286012315
Bleu_1: 0.461
Bleu_2: 0.241
Bleu_3: 0.074
Bleu_4: 0.022
computing METEOR score...
METEOR: 0.104
computing Rouge score...
ROUGE_L: 0.355
computing CIDEr score...
CIDEr: 0.104
computing SPICE score...
Invalid maximum heap size: -Xmx8G
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
subprocess.CalledProcessError: Command '['/home/anaconda3/envs/DDCap/bin/python', '-u', 'train.py', '--local_rank=4', '--out_dir', '/home/tjut_caixiasong/ddcap/results_diff', '--tag', 'caption_diff_vitb16']' died with <Signals.SIGSEGV: 11>.

2、java版本如下:
~/ddcap$ java -version
java version "1.8.0_361"
Java(TM) SE Runtime Environment (build 1.8.0_361-b09)
Java HotSpot(TM) Server VM (build 25.361-b09, mixed mode)

3、由于没有root权限,修改bashrc文件配置如下:
#java profile
export JAVA_HOME=/home/username/java/jdk1.8.0_361
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$SRILM/bin/i686-m64:$SRILM/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME
#export LD_LIBRARY_PATH=/home/username/anaconda3/envs/diffusion/lib
export PATH="$PATH:/tmp/bin"
#export LD_LIBRARY_PATH=/home/app/anaconda3/lib
export PYTORCH_NVFUSER_DISABLE=fallback
export LESS="-R"
export JAVA_HOME=/home/username/java/jdk1.8.0_361
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

4、修改后有source使之生效。

并且多次pkill重新运行仍然报错,感谢!

@buxiangzhiren
Copy link
Owner

应该是java版本不对,导致的内存空间不够报错了

@Markkk111
Copy link
Author

谢谢您的回答,可以问一下您用的那个java版本吗?

@buxiangzhiren
Copy link
Owner

试下这个命令"$ sudo apt-get update",然后 "$ sudo apt-get install openjdk-8-jdk".

@Markkk111
Copy link
Author

您好 ,我没有sudo权限 ,我之前装的java是在非root权限下装的 ,在管理员账户下执行以上命令,我的java版本还是openjdk version "1.8.0_361"。 那我接下来再尝试一下,在非root账户下安装最新版本的java试试,谢谢您的耐心与回复。

@buxiangzhiren
Copy link
Owner

还有可能是装的32位的问题,你确定一下装的是64位的

@Markkk111
Copy link
Author

谢谢!第一轮正常结束了现在!非常感谢您的回0复!(*java是最新版本且64bit)
$ java -version
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (build 1.8.0_362-8u362-ga-0ubuntu1
20.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.362-b09, mixed mode)

@buxiangzhiren
Copy link
Owner

不客气,问题解决了就好

@Markkk111
Copy link
Author

您好,我的结果和论文中差不多,请问在论文中的Table 5部分,Continuous Diffusion的实现具体是借鉴了哪个model啊?

@buxiangzhiren
Copy link
Owner

我们当时用的ddpm的code,然后直接用一个训练好的fixed token embedding layer投影到latent space。在latent space上面做的。从纯noise出发,得到一个vector,与前面fixed token embedding layer的weight计算相似度,相似度最小的就是最后的token。

@Markkk111
Copy link
Author

嗯嗯,谢谢您的回复,万分感激,祝您学业顺利,生活美满!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants