Recent Posts

Pages: 1 2 [3] 4 5 ... 10
all / Re: smoothing
« Last post by AlexReynolds on February 08, 2016, 03:30:39 AM »
The script at glues together various Unix command-line tools with BEDOPS tools like bam2bed, sort-bed, bedmap and starch.

What you could do is use the script, specifying the options you want, which depend on your experiment:

1. <bam-file> - a BAM file containing reads
2. <out-file> - the filename used for the smoothed output (in BEDOPS Starch format, a compressed form of BED)
3. <window-size> - the size of a genomic interval over which signal (counts of reads) is measured
4. <step-size> - the number of bases that each window is shifted or "stepped" over
5. <chromosome-file> - the filename for a BED file containing each of your genome build's chromosome names, and start and stop fields of 0 and the chromosome size for each chromosome (for example, )

Some modifications may be needed, depending on your inputs and outputs.

For example, if you have a BED file of reads, and not BAM, then you can edit the script to remove the bam2bed conversion step, but modify the first awk block to call your BED file directly. Just make sure the BED file is sorted with sort-bed.

If you want to make a bigwig from the output, you can remove the last line of the second awk block that calls the starch binary, and just write the data to a Bedgraph file (a four-column form of BED, where the fourth column is the score value; to write Bedgraph, you can use cut or awk to write the columns you need). Then use UCSC bedGraphToBigWig to convert the Bedgraph file to bigWig.
all / smoothing
« Last post by Udi on February 06, 2016, 11:17:35 PM »

I have a basic question (noob..)

How can i use the smooth option?. couldn't figure it out from here

I have a bed file of reads and I would like to get a bigwig file with smoothed peaks.

Conversion tools to BED format / Re: vcf2bed Problem
« Last post by j-andrews7 on January 22, 2016, 02:53:10 PM »
Yeah, that's true. The only VCF headers that are required are the fileformat, contigs, and the #CHROM ID REF, etc line. And I think the FILTER line if it's anything other than "." or "PASSED" for that column. INFO/FORMAT lines are technically optional, though some tools will require them to do operations. Other formats would probably be more of a hassle though.
Conversion tools to BED format / Re: vcf2bed Problem
« Last post by AlexReynolds on January 22, 2016, 01:33:47 PM »
It's an interesting idea. Conversion of some formats discards information. Default conversion of VCF to BED discards the header, for example. So I think it might be difficult to (reliably) go back to the original, in some cases.
Conversion tools to BED format / Re: vcf2bed Problem
« Last post by j-andrews7 on January 22, 2016, 09:22:27 AM »
Great, thanks. Have you considered creating the reverse programs (bed2vcf, etc)? I know it's not terribly difficult to do with awk, etc, but it'd be pretty convenient, especially for longer workflows. Thanks again!
Conversion tools to BED format / Re: vcf2bed Problem
« Last post by AlexReynolds on January 22, 2016, 01:19:45 AM »
Please see:

for updated installer packages and source code.
Conversion tools to BED format / Re: vcf2bed Problem
« Last post by AlexReynolds on January 21, 2016, 05:01:57 PM »
The vcf2bed tool runs into buffer overflows with input that have very long strings in single fields.

I made some inadequate assumptions about string lengths, and so where there is more data in a field than available memory to store it in, there is a segmentation fault or crash or the equivalent thereof.

I have made some adjustments that relax these assumptions considerably, which I hope will address conversion of most VCF inputs going forwards. I will likely push out a v2.4.15 release in the next day or two after I have finished testing.

In the meantime, let me know if you have any questions and thanks for the bug report.
Conversion tools to BED format / Re: vcf2bed Problem
« Last post by j-andrews7 on January 20, 2016, 07:07:06 AM »
Direct download here:

And I just realized that the contigs aren't in the header (which isn't kosher VCF format and which other tools like bedtools require). However, I don't think that should be an issue with a straight conversion.
Conversion tools to BED format / Re: vcf2bed Problem
« Last post by AlexReynolds on January 19, 2016, 04:49:08 PM »
Assuming you are using BEDOPS 2.4.14, can you please submit a sample VCF file you are having problems with? You could share it via Dropbox, for instance, or some other public file sharing mechanism. Thanks!
Conversion tools to BED format / vcf2bed Problem
« Last post by j-andrews7 on January 19, 2016, 10:16:32 AM »
Hi guys,

I'm having trouble with the vcf2bed utility. It works with certain files but not with others. Here are the meta-info and first few record in the file I'm trying to convert, perhaps I missed something:
Code: [Select]
##INFO=<ID=OTHER,Number=.,Type=String,Description="Other Information From Original File">
##INFO=<ID=SAMPLE,Number=.,Type=String,Description="Sample id">
##INFO=<ID=CDS,Number=.,Type=String,Description="Coding Variants or not">
##INFO=<ID=VA,Number=.,Type=String,Description="Coding Variant Annotation">
##INFO=<ID=HUB,Number=.,Type=String,Description="Network Hubs, PPI (protein protein interaction network), REG (regulatory network), PHOS (phosphorylation network)...">
##INFO=<ID=GNEG,Number=.,Type=String,Description="Gene Under Negative Selection">
##INFO=<ID=GERP,Number=.,Type=String,Description="Gerp Score">
##INFO=<ID=NCENC,Number=.,Type=String,Description="NonCoding ENCODE Annotation">
##INFO=<ID=HOT,Number=.,Type=String,Description="Highly Occupied Target Region">
##INFO=<ID=MOTIFBR,Number=.,Type=String,Description="Motif Breaking">
##INFO=<ID=MOTIFG,Number=.,Type=String,Description="Motif Gain">
##INFO=<ID=SEN,Number=.,Type=String,Description="In Sensitive Region">
##INFO=<ID=USEN,Number=.,Type=String,Description="In Ultra-Sensitive Region">
##INFO=<ID=UCONS,Number=.,Type=String,Description="In Ultra-Conserved Region">
##INFO=<ID=GENE,Number=.,Type=String,Description="Target Gene (For coding - directly affected genes ; For non-coding - promoter or distal regulatory module)">
##INFO=<ID=CANG,Number=.,Type=String,Description="Prior Gene Information, e.g.[cancer][TF_regulating_known_cancer_gene][up_regulated][actionable]...">
##INFO=<ID=CDSS,Number=.,Type=String,Description="Coding Score">
##INFO=<ID=NCDS,Number=.,Type=String,Description="NonCoding Score">
##INFO=<ID=RECUR,Number=.,Type=String,Description="Recurrent elements / variants">
##INFO=<ID=DBRECUR,Number=.,Type=String,Description="Recurrence database">
chr1 569896 . A G 221.999 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCENC=DHS(MCV-106|chr1:569820-569970),Pseudogene(ENSG00000198744.5[RP5-857K21.11]);NCDS=0.18521432
chr1 724205 . C A 38.2637 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCDS=0
chr1 724511 . G T 40.2635 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCDS=0
chr1 724535 . T A 16.0802 PASS SAMPLE=merged_noncoding_multitype_variants;GERP=.;CDS=No;NCDS=0

And here's the error I'm getting. I'm like 90% sure it's the compilers throwing a fit due to an error similar to this: Sorry the traceback isn't more informative, I'm on a cluster so I assume that info is hidden to most users.
Code: [Select]
*** buffer overflow detected ***: convert2bed terminated
======= Backtrace: =========
======= Memory map: ========
08048000-0810c000 r-xp 00000000 f71:d615a 144115381752053961             /scratch/jandrews/bin/convert2bed
0810c000-0810f000 rw-p 000c3000 f71:d615a 144115381752053961             /scratch/jandrews/bin/convert2bed
0810f000-08112000 rw-p 00000000 00:00 0
09c49000-09c6b000 rw-p 00000000 00:00 0                                  [heap]
55555000-55556000 r-xp 00000000 00:00 0                                  [vdso]
55556000-55557000 ---p 00000000 00:00 0
55557000-55757000 rw-p 00000000 00:00 0
55757000-55758000 ---p 00000000 00:00 0
55758000-55958000 rw-p 00000000 00:00 0
55958000-55959000 ---p 00000000 00:00 0
55959000-55b59000 rw-p 00000000 00:00 0
55b59000-55b5a000 ---p 00000000 00:00 0
55b5a000-55d5b000 rw-p 00000000 00:00 0
55e00000-55e41000 rw-p 00000000 00:00 0
55e41000-55f00000 ---p 00000000 00:00 0
55f00000-56301000 rw-p 00000000 00:00 0
ffc07000-ffc8a000 rw-p 00000000 00:00 0                                  [stack]
/scratch/jandrews/bin/vcf2bed: line 164: 30728 Aborted                 (core dumped) ${cmd} ${options} - 0<&0

Pages: 1 2 [3] 4 5 ... 10