Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run classifier. #26

Open
LuoPangpang opened this issue Jul 5, 2019 · 8 comments
Open

Can't run classifier. #26

LuoPangpang opened this issue Jul 5, 2019 · 8 comments

Comments

@LuoPangpang
Copy link

Dear Author,

I got the message as shown below when trying to run the final step run_isown.pl:

perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"


Reformat files in '181023001' to emaf ...

WARNING: 18 variants with unknown annotation were removed
Total number of variants after filtering 3770

Running prediction using file '181023001/181023001.isown.txt.emaf' ...

...
Your working directory is 181023001
...
This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff
...
Total number of samples in your set is 1
...
Number of loaded nonsilent coding variants in test set is 808
...


Naive Bayes Classifier:
Option: supervised discretization (SD) is true
10-fold cross-validation


F1-measure: 98.12%.
Recall: 97.817%.
Precision: 98.425%.
False positive rate: 1.565%.
AUC: 99.77%.


Can't run classifier.
java.io.IOException: nominal value not declared in header, read Token[null], line 19
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138)
at weka.core.Instances.(Instances.java:126)
at main.Prediction.runClassifier(Prediction.java:233)
at main.runISOWN.main(runISOWN.java:90)

...
Total number of predicted somatic mutations 0
Final results are saved here: 181023001/181023001.isown.txt
...

Done

INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...

I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:

java -jar /workplace/Software/ISOWN/bin/weka.jar

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.(Window.java:536)
at java.awt.Frame.(Frame.java:420)
at javax.swing.JFrame.(JFrame.java:233)
at weka.gui.LogWindow.(LogWindow.java:252)
at weka.gui.GUIChooser.(GUIChooser.java:215)

So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.

Thank you very much!

Pang

@amit21AIT
Copy link

amit21AIT commented Jul 6, 2019

Can you send the first few lines of your annotated VCF file excluding headers ? I was getting an error after the database annotation process - "ArrayIndexOutOfBound 3 " , i just want to find where the error is .
Thanks
Amit

@LuoPangpang
Copy link
Author

LuoPangpang commented Jul 8, 2019

Can you send the first few lines of your annotated VCF file excluding headers ? I was getting an error after the database annotation process - "ArrayIndexOutOfBound 3 " , i just want to find where the error is .
Thanks
Amit

Hi Amit,

This is the output annotated vcf that I run through without errors using the "test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf as input". Hope it helps.

Pang

##fileformat=VCFv4.1
##fileDate=20160129
##pancancerversion=1.0
##reference=<ID=hs37d5,Source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz>;
##center="DKFZ"
##workflowName=DKFZ_SNV_workflow
##workflowVersion=1.0.0
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=GERMLINE,Number=0,Type=Flag,Description="Indicates if record is a germline mutation">
##INFO=<ID=UNCLEAR,Number=0,Type=Flag,Description="Indicates if the somatic status of a mutation is unclear">
##INFO=<ID=VT,Number=1,Type=String,Description="Variant type, can be SNP, INS or DEL">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency in primary data, for each ALT allele, in the same order as listed">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="RMS Mapping Quality">
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Indicates membership in 1000Genomes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position in the sample">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases">
##FILTER=<ID=RE,Description="variant in UCSC_27Sept2013_RepeatMasker.bed.gz region and/or SimpleTandemRepeats_chr.bed.gz region, downloaded from UCSC genome browser and/or variant in segmental duplication region, annotated by annovar">
##FILTER=<ID=BL,Description="variant in DAC-Blacklist from ENCODE or in DUKE_EXCLUDED list, both downloaded from UCSC genome browser">
##FILTER=<ID=DP,Description="<= 5 reads total at position in tumor">
##FILTER=<ID=SB,Description="Strand bias of reads with mutant allele = zero reads on one strand">
##FILTER=<ID=TAC,Description="less than 6 reads in Tumor at position">
##FILTER=<ID=dbSNP,Description="variant in dbSNP135">
##FILTER=<ID=DB,Description="variant in 1000Genomes, ALL.wgs.phase1_integrated_calls.20101123.snps_chr.vcf.gz or dbSNP">
##FILTER=<ID=HSDEPTH,Description="variant in HiSeqDepthTop10Pct_chr.bed.gz region, downloaded from UCSC genome browser">
##FILTER=<ID=MAP,Description="variant overlaps a region from wgEncodeCrgMapabilityAlign100mer.bedGraph.gz:::--breakPointMode --aEndOffset=1 with a value below 0.5, punishment increases with a decreasing mapability">
##FILTER=<ID=SBAF,Description="Strand bias of reads with mutant allele = zero reads on one strand and variant allele frequency below 0.1">
##FILTER=<ID=FRQ,Description="variant allele frequency below 0.05">
##FILTER=<ID=TAR,Description="Only one alternative read in Tumor at position">
##FILTER=<ID=UNCLEAR,Description="Classification is unclear">
##FILTER=<ID=DPHIGH,Description="Too many reads mapped in control at this region">
##FILTER=<ID=DPLOWC,Description="Only 5 or less reads in control">
##FILTER=<ID=1PS,Description="Only two alternative reads, one on each strand">
##FILTER=<ID=ALTC,Description="Alternative reads in control">
##FILTER=<ID=ALTCFR,Description="Alternative reads in control and tumor allele frequency below 0.3">
##FILTER=<ID=FRC,Description="Variant allele frequency below 0.3 in germline call">
##FILTER=<ID=YALT,Description="Variant on Y chromosome with low allele frequency">
##FILTER=<ID=VAF,Description="Variant allele frequency in tumor < 5 times allele frequency in control">
##FILTER=<ID=BI,Description="Bias towards a PCR strand or sequencing strand">
##SAMPLE=<ID=CONTROL,SampleName=control_NA,Individual=NA,Description="Control">
##SAMPLE=<ID=TUMOR,SampleName=tumor_NA,Individual=NA,Description="Tumor">
##TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz
##VCF_FILE:dbSNP151=file:///workplace/Software/ISOWN/bin/../external_databases/dbsnp_151.hg19.All.modified.vcf.gz
##VCF_FILE:COSMIC_77=file:///workplace/Software/ISOWN/bin/../external_databases/Cosmic-All-Muts.vcf.gz
##VCF_FILE:ExAC.r0.3_20150421=file:///workplace/Software/ISOWN/bin/../external_databases/ExAC.r0.3.1.database.vcf.gz
##VCF_FILE:2015_12_31_MA=file:///workplace/Software/ISOWN/bin/../external_databases/2015_12_31_MA_r3.vcf.gz
#TAB_DELIMITED_HEADER=sample_name chr pos reference alternative genotype totalReadDepth %readDepthAlt in.dbSNP.or.not in.dbSNP.COMMON.or.not in.COSMIC.or.not MA_functional_impact MA_score is.SOMATIC in.ExAC ExAC_NCC FLANKING_STR POLYPHEN VARIANT_CLASS LENGTH CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR

@Guofengyu
Copy link

Hi, Pang

Could you send several actual variants in your annotated vcf ? ANNOVAR can also annotate PolyPhen, Mutation Assessor, etc. I want to change the annotated vcf by ANNOVAR into the ISOWN annotated vcf and then test. Thanks.

@Guofengyu
Copy link

Hi, Pang

I have completed the test with reference to the process. I didn't encountered your problem. But I have another promblem like "Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface weka.core.Instance, but class was expected" until I replace the weka.jar to the origin file. I realize the weka.jar was already in the bin/ directory througth your issue. Thank you all the same.

Guo.

@yueyangtime
Copy link

yueyangtime commented Dec 30, 2019

Dear Author,

I got the message as shown below when trying to run the final step run_isown.pl:

perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"

Reformat files in '181023001' to emaf ...

WARNING: 18 variants with unknown annotation were removed
Total number of variants after filtering 3770

Running prediction using file '181023001/181023001.isown.txt.emaf' ...

...
Your working directory is 181023001
...
This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff
...
Total number of samples in your set is 1
...
Number of loaded nonsilent coding variants in test set is 808
...

Naive Bayes Classifier:
Option: supervised discretization (SD) is true
10-fold cross-validation

F1-measure: 98.12%.
Recall: 97.817%.
Precision: 98.425%.
False positive rate: 1.565%.
AUC: 99.77%.

Can't run classifier.
java.io.IOException: nominal value not declared in header, read Token[null], line 19
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138)
at weka.core.Instances.(Instances.java:126)
at main.Prediction.runClassifier(Prediction.java:233)
at main.runISOWN.main(runISOWN.java:90)

...
Total number of predicted somatic mutations 0
Final results are saved here: 181023001/181023001.isown.txt
...

Done

INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...

I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:

java -jar /workplace/Software/ISOWN/bin/weka.jar

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.(Window.java:536)
at java.awt.Frame.(Frame.java:420)
at javax.swing.JFrame.(JFrame.java:233)
at weka.gui.LogWindow.(LogWindow.java:252)
at weka.gui.GUIChooser.(GUIChooser.java:215)

So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.

Thank you very much!

Pang

Hi, Pang
Have you solve the problem? I just get the same error Can't run classifier.

thanks
yueyang

@zjiang-lji
Copy link

@Guofengyu, I got this error from the first command, reformatting to emaf, of the classifying step:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
        at com.Processing.processVcf(Processing.java:119)
        at com.runReformating.main(runReformating.java:39)

The beginning of my input VCF is the annotated version of the sample VCF. It looks like this:

##fileformat=VCFv4.1																																				
##fileDate=20160129																																				
##pancancerversion=1.0																																				
##reference=<ID=hs37d5,Source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz>;																																				
##center="DKFZ"																																				
##workflowName=DKFZ_SNV_workflow																																				
##workflowVersion=1.0.0																																				
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">																																				
##INFO=<ID=GERMLINE,Number=0,Type=Flag,Description="Indicates if record is a germline mutation">																																				
##INFO=<ID=UNCLEAR,Number=0,Type=Flag,Description="Indicates if the somatic status of a mutation is unclear">																																				
##INFO=<ID=VT,Number=1,Type=String,Description="Variant type, can be SNP, INS or DEL">																																				
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency in primary data, for each ALT allele, in the same order as listed">																																				
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">																																				
##INFO=<ID=MQ,Number=1,Type=Integer,Description="RMS Mapping Quality">																																				
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Indicates membership in 1000Genomes">																																				
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">																																				
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position in the sample">																																				
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases">																																				
##FILTER=<ID=RE,Description="variant in UCSC_27Sept2013_RepeatMasker.bed.gz region and/or SimpleTandemRepeats_chr.bed.gz region, downloaded from UCSC genome browser and/or variant in segmental duplication region, annotated by annovar">																																				
##FILTER=<ID=BL,Description="variant in DAC-Blacklist from ENCODE or in DUKE_EXCLUDED list, both downloaded from UCSC genome browser">																																				
##FILTER=<ID=DP,Description="<= 5 reads total at position in tumor">																																				
##FILTER=<ID=SB,Description="Strand bias of reads with mutant allele = zero reads on one strand">																																				
##FILTER=<ID=TAC,Description="less than 6 reads in Tumor at position">																																				
##FILTER=<ID=dbSNP,Description="variant in dbSNP135">																																				
##FILTER=<ID=DB,Description="variant in 1000Genomes, ALL.wgs.phase1_integrated_calls.20101123.snps_chr.vcf.gz or dbSNP">																																				
##FILTER=<ID=HSDEPTH,Description="variant in HiSeqDepthTop10Pct_chr.bed.gz region, downloaded from UCSC genome browser">																																				
##FILTER=<ID=MAP,Description="variant overlaps a region from wgEncodeCrgMapabilityAlign100mer.bedGraph.gz:::--breakPointMode --aEndOffset=1 with a value below 0.5, punishment increases with a decreasing mapability">																																				
##FILTER=<ID=SBAF,Description="Strand bias of reads with mutant allele = zero reads on one strand and variant allele frequency below 0.1">																																				
##FILTER=<ID=FRQ,Description="variant allele frequency below 0.05">																																				
##FILTER=<ID=TAR,Description="Only one alternative read in Tumor at position">																																				
##FILTER=<ID=UNCLEAR,Description="Classification is unclear">																																				
##FILTER=<ID=DPHIGH,Description="Too many reads mapped in control at this region">																																				
##FILTER=<ID=DPLOWC,Description="Only 5 or less reads in control">																																				
##FILTER=<ID=1PS,Description="Only two alternative reads, one on each strand">																																				
##FILTER=<ID=ALTC,Description="Alternative reads in control">																																				
##FILTER=<ID=ALTCFR,Description="Alternative reads in control and tumor allele frequency below 0.3">																																				
##FILTER=<ID=FRC,Description="Variant allele frequency below 0.3 in germline call">																																				
##FILTER=<ID=YALT,Description="Variant on Y chromosome with low allele frequency">																																				
##FILTER=<ID=VAF,Description="Variant allele frequency in tumor < 5 times allele frequency in control">																																				
##FILTER=<ID=BI,Description="Bias towards a PCR strand or sequencing strand">																																				
##SAMPLE=<ID=CONTROL,SampleName=control_NA,Individual=NA,Description="Control">																																				
##SAMPLE=<ID=TUMOR,SampleName=tumor_NA,Individual=NA,Description="Tumor">																																				
##TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz																																				
##VCF_FILE:dbSNP152_All_20180423=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/dbSNP152_All_20180423.vcf.gz.modified.vcf.gz																																				
##VCF_FILE:COSMIC_94=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/COSMIC_v94.vcf.gz																																				
##VCF_FILE:ExAC.r0.3.1=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/ExAC.r0.3.1.sites.vep.vcf.gz																																				
##VCF_FILE:2021_07_23_MA=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/2021_07_23_MA.vcf.gz																																				
#TAB_DELIMITED_HEADER=sample_name       chr     pos     reference       alternative     genotype        totalReadDepth  %readDepthAlt   in.dbSNP.or.not in.dbSNP.COMMON.or.not      in.COSMIC.or.not        MA_functional_impact    MA_score        is.SOMATIC      in.ExAC ExAC_NCC        FLANKING_STR    POLYPHEN   VARIANT_CLASS    LENGTH  CHROM   POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR																																				
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	TUMOR																											
TUMOR	chr1	876499	A	G	GENOTYPE_BB	48	100	IN.dbSNP	not.in.dbSNP.COMMON	IN.COSMIC_CNT=0	.	0	0	1	NCC=	V1=[.;.;.;.;.];X=[chr1;876499;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0	NO_POLYPHEN_DATA	VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT	1	chr1	876499	rs4372192_876499	A	G	.	PASS	GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]];ANNOVAR=intronic,SAMD11;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=GAT;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;876499;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}	AD:GT:DP:DP4	0,48:1/1:48:0,0,32,16							
TUMOR	chr1	877715	C	G	GENOTYPE_BB	34	100	IN.dbSNP	not.in.dbSNP.COMMON	IN.COSMIC_CNT=0	.	0	0	1	NCC=	V1=[.;.;.;.;.];X=[chr1;877715;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0	NO_POLYPHEN_DATA	VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT	1	chr1	877715	rs6605066_877715	C	G	.	PASS	GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];ANNOVAR=intronic,SAMD11;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=CCG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;877715;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}	AD:GT:DP:DP4	0,34:1/1:34:0,0,13,21							
TUMOR	chr1	877831	T	C	GENOTYPE_BB	33	100	IN.dbSNP	not.in.dbSNP.COMMON	IN.COSMIC_CNT=0	.	0	0	1	NCC=	V1=[.;.;.;.;.];X=[chr1;877831;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0	transcript:uc001abw.1,uc001abx.1;hdiv_prediction:benign,benign;hdiv_class:neutral,neutral;hvar_prediction:benign,benign;hvar_class:neutral,neutral	VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT	1	chr1	877831	rs6672356_877831	T	C	.	PASS	GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];ANNOVAR=exonic,SAMD11;ANNOVAR_EXONIC=nonsynonymous	SNV,SAMD11:NM_152486:exon10:c.T1027C:p.W343R,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|877831|.|T|C|.|.|RefGenome	variant=W>R;Gene=SAMD11;Uniprot=SAM11_HUMAN;Info=;Uniprot	variant=W343R;Func.	Impact=neutral;FI	score=-2.1]};POLYPHEN=[polyphenWHESS_20150403=1,transcript:uc001abw.1,uc001abx.1;hdiv_prediction:benign,benign;hdiv_class:neutral,neutral;hvar_prediction:benign,benign;hvar_class:neutral,neutral];SEQUENCE_CONTEXT=CTG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;877831;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}	AD:GT:DP:DP4	0,33:1/1:33:0,0,15,18		
TUMOR	chr1	880238	A	G	GENOTYPE_BB	73	100	IN.dbSNP	not.in.dbSNP.COMMON	IN.COSMIC_CNT=0	.	0	0	1	NCC=	V1=[.;.;.;.;.];X=[chr1;880238;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0	NO_POLYPHEN_DATA	VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT	1	chr1	880238	rs3748592_880238	A	G	.	PASS	GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,880160,880280]];ANNOVAR=intronic,NOC2L;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=TAG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;880238;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}	AD:GT:DP:DP4	0,73:1/1:73:0,0,36,37							
TUMOR	chr1	880466	T	C	GENOTYPE_AB	65	35.38	IN.dbSNP	not.in.dbSNP.COMMON	IN.COSMIC_CNT=0	.	0	0	1	NCC=	V1=[.;.;.;.;.];X=[chr1;880466;25.5987;35.3846;45.1706];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0	NO_POLYPHEN_DATA	VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT	1	chr1	880466	rs138652036_880466	T	C	.	PASS	GERMLINE;SNP;AF=0.51,0.35;MQ=60;DB;[SureSelectHumanAllExonV4=1,1,[chr1,880449,880637]];ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=nonsynonymous	SNV,NOC2L:NM_015658:exon18:c.A2114G:p.E705G,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|880466|.|T|C|.|.|RefGenome	variant=E>G;Gene=NOC2L;Uniprot=NOC2L_HUMAN;Info=;Uniprot	variant=E705G;Func.	Impact=neutral;FI	score=0.77]};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=CTC;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;880466;25.5987;35.3846;45.1706];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}	AD:GT:DP:DP4	42,23:0/1:65:19,23,10,13		
TUMOR	chr1	881627	G	A	GENOTYPE_BB	44	100	IN.dbSNP	not.in.dbSNP.COMMON	IN.COSMIC_CNT=0	.	0	0	1	NCC=	V1=[.;.;.;.;.];X=[chr1;881627;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0	NO_POLYPHEN_DATA	VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT	1	chr1	881627	rs2272757_881627	G	A	.	PASS	GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,881618,881803]];ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=synonymous	SNV,NOC2L:NM_015658:exon16:c.C1843T:p.L615L,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|881627|.|G|A|.|.|RefGenome	variant=L>L;Gene=NOC2L;Uniprot=NOC2L_HUMAN;Info=synonymous	in	Uniprot;Uniprot	variant=L615L;Func.	Impact=;FI	score=]};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=AGG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;881627;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}	AD:GT:DP:DP4	0,44:1/1:44:0,0,30,14

@zjiang-lji
Copy link

Can someone help me with this "Can't run classifier...nominal value not declared in header" error?

...
Your working directory is /oasis/tscc/scratch/z8jiang/ISOWN/run_isown_trial6
...
This file was chosen for classifier training: /oasis/tscc/scratch/z8jiang/ISOWN/training_data/COAD_100_TrainSet.arff
...
Total number of samples in your set is 2
...
Number of loaded nonsilent coding variants in test set is 6330
...
*************
Naive Bayes Classifier: 
Option: supervised discretization (SD) is true
10-fold cross-validation
*************
F1-measure: 96.163%.
Recall: 95.235%.
Precision: 97.11%.
False positive rate: 2.834%.
AUC: 99.39%.
*************
Can't run classifier.
java.io.IOException: nominal value not declared in header, read Token[null], line 59
	at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)
	at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)
	at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)
	at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)
	at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)
	at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)
	at weka.core.Instances.<init>(Instances.java:126)
	at main.Prediction.runClassifier(Prediction.java:233)
	at main.runISOWN.main(runISOWN.java:90)

...
Total number of predicted somatic mutations 0
Final results are saved here: test.txt
...

The first few lines of my .emaf file looks like the following:

Variant	chr	pos	reference	alternative	sample_name	type	subtype	gene_name	amino_acid_change	MA_functional_impact	MA_score	isFlanking	is_in_COSMIC	CNT	is_in_dbSNP	is_in_dbSNP_common	readDepthAlt	totalReadDepth	SEQUENCING_CONTEXT	POLYPHEN_hdiv	POLYPHEN_hvar	is_in_ExAct	isSOMATIC
chr1,942665C>A	chr1	942665	C	A	SP10_filtered	exonic	nonsynonymous	SAMD11	L554M	.	.	NA	T	0	T	F	20.00	5	GCT	.	.	T	false
chr1,942668C>G	chr1	942668	C	G	SP10_filtered	exonic	nonsynonymous	SAMD11	Q555E	.	.	NA	T	0	T	F	20.00	5	GCA	.	.	T	false
chr1,942681CC>GG	chr1	942681	CC	GG	SP10_filtered	exonic	nonframeshift substitution	SAMD11	.	.	.	NA	T	0	T	F	20.00	5		.	.	T	false

@HengqiLiu
Copy link

nominal value not declared in header

Hi, zjiang-lji

Have you solve the problem? I just get the same error Can't run classifier.

Thanks,
Hengqi Liu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants