Author Topic: vcf2bed Problem  (Read 10114 times)

j-andrews7

  • Newbie
  • *
  • Posts: 9
vcf2bed Problem
« on: January 19, 2016, 10:16:32 AM »
Hi guys,

I'm having trouble with the vcf2bed utility. It works with certain files but not with others. Here are the meta-info and first few record in the file I'm trying to convert, perhaps I missed something:
Code: [Select]
##fileformat=VCFv4.2
##INFO=<ID=OTHER,Number=.,Type=String,Description="Other Information From Original File">
##INFO=<ID=SAMPLE,Number=.,Type=String,Description="Sample id">
##INFO=<ID=CDS,Number=.,Type=String,Description="Coding Variants or not">
##INFO=<ID=VA,Number=.,Type=String,Description="Coding Variant Annotation">
##INFO=<ID=HUB,Number=.,Type=String,Description="Network Hubs, PPI (protein protein interaction network), REG (regulatory network), PHOS (phosphorylation network)...">
##INFO=<ID=GNEG,Number=.,Type=String,Description="Gene Under Negative Selection">
##INFO=<ID=GERP,Number=.,Type=String,Description="Gerp Score">
##INFO=<ID=NCENC,Number=.,Type=String,Description="NonCoding ENCODE Annotation">
##INFO=<ID=HOT,Number=.,Type=String,Description="Highly Occupied Target Region">
##INFO=<ID=MOTIFBR,Number=.,Type=String,Description="Motif Breaking">
##INFO=<ID=MOTIFG,Number=.,Type=String,Description="Motif Gain">
##INFO=<ID=SEN,Number=.,Type=String,Description="In Sensitive Region">
##INFO=<ID=USEN,Number=.,Type=String,Description="In Ultra-Sensitive Region">
##INFO=<ID=UCONS,Number=.,Type=String,Description="In Ultra-Conserved Region">
##INFO=<ID=GENE,Number=.,Type=String,Description="Target Gene (For coding - directly affected genes ; For non-coding - promoter or distal regulatory module)">
##INFO=<ID=CANG,Number=.,Type=String,Description="Prior Gene Information, e.g.[cancer][TF_regulating_known_cancer_gene][up_regulated][actionable]...">
##INFO=<ID=CDSS,Number=.,Type=String,Description="Coding Score">
##INFO=<ID=NCDS,Number=.,Type=String,Description="NonCoding Score">
##INFO=<ID=RECUR,Number=.,Type=String,Description="Recurrent elements / variants">
##INFO=<ID=DBRECUR,Number=.,Type=String,Description="Recurrence database">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 569896 . A G 221.999 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCENC=DHS(MCV-106|chr1:569820-569970),Pseudogene(ENSG00000198744.5[RP5-857K21.11]);NCDS=0.18521432
chr1 724205 . C A 38.2637 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCDS=0
chr1 724511 . G T 40.2635 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCDS=0
chr1 724535 . T A 16.0802 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCDS=0

And here's the error I'm getting. I'm like 90% sure it's the compilers throwing a fit due to an error similar to this: http://stackoverflow.com/questions/8428978/what-might-be-causing-this-buffer-overflow. Sorry the traceback isn't more informative, I'm on a cluster so I assume that info is hidden to most users.
Code: [Select]
*** buffer overflow detected ***: convert2bed terminated
======= Backtrace: =========
[0x8063b0d]
[0x80882f4]
[0x808828d]
[0x805292a]
[0x804b694]
[0x805449f]
[0x8087bce]
======= Memory map: ========
08048000-0810c000 r-xp 00000000 f71:d615a 144115381752053961             /scratch/jandrews/bin/convert2bed
0810c000-0810f000 rw-p 000c3000 f71:d615a 144115381752053961             /scratch/jandrews/bin/convert2bed
0810f000-08112000 rw-p 00000000 00:00 0
09c49000-09c6b000 rw-p 00000000 00:00 0                                  [heap]
55555000-55556000 r-xp 00000000 00:00 0                                  [vdso]
55556000-55557000 ---p 00000000 00:00 0
55557000-55757000 rw-p 00000000 00:00 0
55757000-55758000 ---p 00000000 00:00 0
55758000-55958000 rw-p 00000000 00:00 0
55958000-55959000 ---p 00000000 00:00 0
55959000-55b59000 rw-p 00000000 00:00 0
55b59000-55b5a000 ---p 00000000 00:00 0
55b5a000-55d5b000 rw-p 00000000 00:00 0
55e00000-55e41000 rw-p 00000000 00:00 0
55e41000-55f00000 ---p 00000000 00:00 0
55f00000-56301000 rw-p 00000000 00:00 0
ffc07000-ffc8a000 rw-p 00000000 00:00 0                                  [stack]
/scratch/jandrews/bin/vcf2bed: line 164: 30728 Aborted                 (core dumped) ${cmd} ${options} - 0<&0

« Last Edit: January 19, 2016, 04:12:05 PM by j-andrews7 »

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: vcf2bed Problem
« Reply #1 on: January 19, 2016, 04:49:08 PM »
Assuming you are using BEDOPS 2.4.14, can you please submit a sample VCF file you are having problems with? You could share it via Dropbox, for instance, or some other public file sharing mechanism. Thanks!

j-andrews7

  • Newbie
  • *
  • Posts: 9
Re: vcf2bed Problem
« Reply #2 on: January 20, 2016, 07:07:06 AM »
Direct download here: http://puu.sh/mCRMA/1059236925.gz

And I just realized that the contigs aren't in the header (which isn't kosher VCF format and which other tools like bedtools require). However, I don't think that should be an issue with a straight conversion.
« Last Edit: January 20, 2016, 09:00:09 AM by j-andrews7 »

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: vcf2bed Problem
« Reply #3 on: January 21, 2016, 05:01:57 PM »
The vcf2bed tool runs into buffer overflows with input that have very long strings in single fields.

I made some inadequate assumptions about string lengths, and so where there is more data in a field than available memory to store it in, there is a segmentation fault or crash or the equivalent thereof.

I have made some adjustments that relax these assumptions considerably, which I hope will address conversion of most VCF inputs going forwards. I will likely push out a v2.4.15 release in the next day or two after I have finished testing.

In the meantime, let me know if you have any questions and thanks for the bug report.
« Last Edit: January 21, 2016, 05:03:41 PM by AlexReynolds »

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: vcf2bed Problem
« Reply #4 on: January 22, 2016, 01:19:45 AM »
Please see:

https://github.com/bedops/bedops/releases/tag/v2.4.15

for updated installer packages and source code.

j-andrews7

  • Newbie
  • *
  • Posts: 9
Re: vcf2bed Problem
« Reply #5 on: January 22, 2016, 09:22:27 AM »
Great, thanks. Have you considered creating the reverse programs (bed2vcf, etc)? I know it's not terribly difficult to do with awk, etc, but it'd be pretty convenient, especially for longer workflows. Thanks again!

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: vcf2bed Problem
« Reply #6 on: January 22, 2016, 01:33:47 PM »
It's an interesting idea. Conversion of some formats discards information. Default conversion of VCF to BED discards the header, for example. So I think it might be difficult to (reliably) go back to the original, in some cases.

j-andrews7

  • Newbie
  • *
  • Posts: 9
Re: vcf2bed Problem
« Reply #7 on: January 22, 2016, 02:53:10 PM »
Yeah, that's true. The only VCF headers that are required are the fileformat, contigs, and the #CHROM ID REF, etc line. And I think the FILTER line if it's anything other than "." or "PASSED" for that column. INFO/FORMAT lines are technically optional, though some tools will require them to do operations. Other formats would probably be more of a hassle though.