Recent Posts

Pages: 1 [2] 3 4 ... 10
11
all / Re: Bedops for extracting fasta sequences from bed file
« Last post by AlexReynolds on November 15, 2016, 09:38:36 PM »
BEDOPS works with BED files and leaves work on FASTA files to other toolkits.

If you're looking for an alternative to bedtools, you might investigate samtools faidx.

Some examples here:

https://www.biostars.org/p/183629/#183633
https://www.biostars.org/p/98725/#98733
12
all / Bedops for extracting fasta sequences from bed file
« Last post by chudar on November 15, 2016, 01:16:05 PM »
Hi,

I am new to bedops and I have bed file like following
Code: [Select]
bed_file
scaffold123    56    120    transcript_A
scaffold123    175  234    transcript_A

I would like to extract the fasta sequence from the from the respective scaffold in genome.fasta with transcript header. Using bedtools this can be acheived by
Code: [Select]
bedtools getfasta -name fi genome.fasta -bed bed_file Is it possible to achieve the same output using bedops. Kindly guide me.
13
bedops / Re: Counting forward strand reads and reverse strand between ranges
« Last post by AlexReynolds on October 24, 2016, 05:06:21 PM »
Bedmap can't process $line as an actual line of BED data. You need to specify a filename, or pipe in the line from an upstream command:

$ line="chrN\t123\t456"
$ echo -e $line | bedmap --echo --count --delim '\t' - someMapElements.bed > answer.bed

Or:

$ line="chrN\t123\t456"
$ echo -e $line | bedmap --echo --count --delim '\t' someReferenceElements.bed - > answer.bed

I'm not sure what your script is trying to do. Perhaps try the scripts I wrote and see if those work, and maybe pick apart the results at each step so you see how it works. Or try working with a simpler script that does work in smaller pieces.
14
bedops / Re: Counting forward strand reads and reverse strand between ranges
« Last post by dino on October 24, 2016, 07:43:09 AM »
Hi Alex,

Thank you very much. I tried your approach and it worked perfectly fine when I direclty apply commands on console.  I prepared a bash script which takes the transcript.bed as input and reads every line and finds uses awk and bedmap to count the "+" and "-".  Below is my script

Code: [Select]
#!/bin/bash
filename="$1"
while read line;
do
scaffold=$(echo $line | awk '{print $1}')
echo "$line"
plus=$(awk '$4=="+"' Sorted_"$scaffold".bed | bedmap --echo --count --delim '\t' "$line" - )
minus=$(awk '$4=="-"' Sorted_"$scaffold".bed | bedmap --echo --count --delim '\t' "$line"  -)
done < "$filename"
echo "$plus" > forward_strand.txt
echo "$minus" > reverse_strand.txt
cat forward_strand.txt reverse_strand.txt

But the above script throws me following error saying that

Code: [Select]
scaffold11296 36372 36414 7 comp5_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36372 36414 7 comp5_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36372 36414 7 comp_4179_0_1 14956 36371 -
scaffold11296 36471 36526 3 dd_Smed_v6_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36471 36526 3 comp_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36471 36526 3 comp_4179_0_1 14956 36371 -

I am new to awk and bash scripting. I think bedmap never considers the $line as input.
I also tried your script you posted on october 19th with slight modifcation like follow:
Code: [Select]
#!/bin/bash

referenceFn="$1"
mapFn=$(cat Sorted_scaffold11296.bed)
answerFn=$(cat answer.bed)

while read line
do
    echo -e ${line} \
        | bedops --element-of 1 ${mapFn} - \
        | awk -v line=${line} ' \
            BEGIN { plus = 0; minus = 0; } \
            { if ($4=="+") { plus++; } else { minus++; } } \
            END { print line"\t"minus"\t"plus; }' - \
        > ${answerFn}
done < "$referenceFn"

It gave me the following error
Code: [Select]
alex_rey.sh: line 11: ${answerFn}: ambiguous redirect
May use bedops --help for more help.

Error: Bad Input
Cannot find scaffold11296
alex_rey.sh: line 11: ${answerFn}: ambiguous redirect
May use bedops --help for more help.

Error: Bad Input
Cannot find scaffold11296
Kindly guide me. At present I am a beginner to awk and other linux commands, kindly pardon me if I have committed any mistakes


15
bedops / Re: Counting forward strand reads and reverse strand between ranges
« Last post by AlexReynolds on October 20, 2016, 11:14:31 AM »
Another way to do this, using bedmap and awk:

$ awk '$4=="+"' Sorted_genome.bed | bedmap --count file1.bed - > count.for.txt
$ awk '$4=="-"' Sorted_genome.bed | bedmap --count file1.bed - > count.rev.txt
$ paste file1.bed count.for.txt count.rev.txt > count.both.bed
16
bedops / Re: Counting forward strand reads and reverse strand between ranges
« Last post by AlexReynolds on October 19, 2016, 10:05:12 AM »
Here's a bash script that would run bedops --element-of 1 on each line of file1.bed against all of Sorted_genome.bed.

For each line of file1.bed, any overlapping elements from the Sorted_genome.bed get sent to awk, which sums strand counts and prints out a result at the end into a file called answer.bed.

It is untested, but hopefully it gets you started:

#!/bin/sh

referenceFn=file1.bed
mapFn=Sorted_genome.bed
answerFn=answer.bed

while read line
do
    echo -e ${line} \
        | bedops --element-of 1 ${mapFn} - \
        | awk -vline=${line} ' \
            BEGIN { plus = 0; minus = 0; } \
            { if ($4=="+") { plus++; } else { minus++; } } \
            END { print line"\t"minus"\t"plus; }' - \
        > ${answerFn}
done < ${referenceFn}


Good luck!
17
bedops / Counting forward strand reads and reverse strand between ranges
« Last post by dino on October 19, 2016, 07:18:01 AM »
hi,

I have two bed file like follows:
file1.bed
Code: [Select]
scaffold11296 36365 36414 7
scaffold11296 36471 36526 3

First column is "Scaffold_ID",second column is "Start", third column is "End" and fourth column total number of reads within that range.

Sorted_genome.bed
Code: [Select]
scaffold11296 36302 36334 -
        scaffold11296 36303 36334 +
        scaffold11296 36339 36370 +
        scaffold11296 36340 36369 +
        scaffold11296 36365 36395 -
        scaffold11296 36366 36394 -
        scaffold11296 36367 36395 -
        scaffold11296 36368 36395 -
        scaffold11296 36394 36414 -
        scaffold11296 36471 36502 +
        scaffold11296 36483 36516 +
        scaffold11296 36495 36526 +
        scaffold11296 40892 40909 +
        scaffold11296 40892 40909 +
        scaffold11296 40892 40909 +
        scaffold11296 40892 40909 +

Now I would like to find how many of the 7 reads within
Code: [Select]
scaffold11296 36365 36414 has "+" and "-" reads. Similarly for 3 reads within
Code: [Select]
scaffold11296 36471 36526 how many has "+" reads and "-" reads. I tried "bedtools annotate" but it didn’t yield what I want. All I want want like following:
Code: [Select]
scaffold11296 36365 36414 7      5     2
scaffold11296 36471 36526 3      0     3
where "5" is number of "-" reads and 2 is number of "+" reads within
Code: [Select]
scaffold11296 36365 36414. Similarly "0" is the number of "-" reads and "3" is the number of "+" reads within
Code: [Select]
scaffold11296 36471 36526 3.

Kindly guide me.


18
all / Re: bed files in genome browser
« Last post by incub0x on April 07, 2016, 10:33:53 AM »
Thank you very much for your help. This is useful for me.

 :) ;)
19
all / Re: bed files in genome browser
« Last post by sjn on April 07, 2016, 10:19:10 AM »
Hi Alex,
I think your question would be better directed to the folks who maintain the UCSC Genome Browser.  Here is a good place to start: https://genome.ucsc.edu/contacts.html

BEDOPS doesn't have any internal knowledge about genes or other features but instead relies upon BED inputs that contain that information.  In your case, if the second file you describe had chromosome, start-of-sequence, end-of-sequence information and a Gene name in the fourth column, you could use BEDOPS to map the gene information onto your first file that contains the 4th column information that is important (# of times the sequence is found).  That's just one example.

Shane
20
all / bed files in genome browser
« Last post by incub0x on April 07, 2016, 10:05:59 AM »
Hello:

I am using my bed files resulting in the tables of https://genome.ucsc.edu/cgi-bin/hgTables , with the next characteristics:


}group
~Genes and Gene Predictions

}region
~defined regions

#####  In this part, I uploaded a bed format file, in the first column of the bed format file is the chromosome, in the second column the beginning of the sequence, in the third column is the ending of the sequence and in the last column is the number of times that the sequence is in my file, for that reason, the fourth column is  very important to me. #####

}Output format
~selected fields from primary and related tables
~~~>get output.

~name
~chrom
~geneSymbol
~refseq
~description
~kgXref
~refSeqSummary
~~~>get output.


#### At the ending of the process the      genome.ucsc.edu      give me the chromosomes and the genes that correspond to the coordinates of the bed format file, but it doesn´t appear the fourth column next to the resulting genes. Some coordinates are not characterized, for that reason some coordinates are erased and I do not know what genes correspond to the specific number of the fourth column.


I would like to know if there is a way of getting the resulting genes with the numbers of the fourth column of my bed format file.

I would appreciate if someone can help me. ####


Thank you.

Alex
Pages: 1 [2] 3 4 ... 10