Applications of BEDOPS > bedops
Counting forward strand reads and reverse strand between ranges
(1/1)
dino:
hi,
I have two bed file like follows:
file1.bed
--- Code: ---scaffold11296 36365 36414 7
scaffold11296 36471 36526 3
--- End code ---
First column is "Scaffold_ID",second column is "Start", third column is "End" and fourth column total number of reads within that range.
Sorted_genome.bed
--- Code: --- scaffold11296 36302 36334 -
scaffold11296 36303 36334 +
scaffold11296 36339 36370 +
scaffold11296 36340 36369 +
scaffold11296 36365 36395 -
scaffold11296 36366 36394 -
scaffold11296 36367 36395 -
scaffold11296 36368 36395 -
scaffold11296 36394 36414 -
scaffold11296 36471 36502 +
scaffold11296 36483 36516 +
scaffold11296 36495 36526 +
scaffold11296 40892 40909 +
scaffold11296 40892 40909 +
scaffold11296 40892 40909 +
scaffold11296 40892 40909 +
--- End code ---
Now I would like to find how many of the 7 reads within
--- Code: ---scaffold11296 36365 36414
--- End code ---
has "+" and "-" reads. Similarly for 3 reads within
--- Code: ---scaffold11296 36471 36526
--- End code ---
how many has "+" reads and "-" reads. I tried "bedtools annotate" but it didn’t yield what I want. All I want want like following:
--- Code: ---scaffold11296 36365 36414 7 5 2
scaffold11296 36471 36526 3 0 3
--- End code ---
where "5" is number of "-" reads and 2 is number of "+" reads within
--- Code: ---scaffold11296 36365 36414
--- End code ---
. Similarly "0" is the number of "-" reads and "3" is the number of "+" reads within
--- Code: ---scaffold11296 36471 36526 3
--- End code ---
.
Kindly guide me.
AlexReynolds:
Here's a bash script that would run bedops --element-of 1 on each line of file1.bed against all of Sorted_genome.bed.
For each line of file1.bed, any overlapping elements from the Sorted_genome.bed get sent to awk, which sums strand counts and prints out a result at the end into a file called answer.bed.
It is untested, but hopefully it gets you started:
#!/bin/sh
referenceFn=file1.bed
mapFn=Sorted_genome.bed
answerFn=answer.bed
while read line
do
echo -e ${line} \
| bedops --element-of 1 ${mapFn} - \
| awk -vline=${line} ' \
BEGIN { plus = 0; minus = 0; } \
{ if ($4=="+") { plus++; } else { minus++; } } \
END { print line"\t"minus"\t"plus; }' - \
> ${answerFn}
done < ${referenceFn}
Good luck!
AlexReynolds:
Another way to do this, using bedmap and awk:
$ awk '$4=="+"' Sorted_genome.bed | bedmap --count file1.bed - > count.for.txt
$ awk '$4=="-"' Sorted_genome.bed | bedmap --count file1.bed - > count.rev.txt
$ paste file1.bed count.for.txt count.rev.txt > count.both.bed
dino:
Hi Alex,
Thank you very much. I tried your approach and it worked perfectly fine when I direclty apply commands on console. I prepared a bash script which takes the transcript.bed as input and reads every line and finds uses awk and bedmap to count the "+" and "-". Below is my script
--- Code: ---#!/bin/bash
filename="$1"
while read line;
do
scaffold=$(echo $line | awk '{print $1}')
echo "$line"
plus=$(awk '$4=="+"' Sorted_"$scaffold".bed | bedmap --echo --count --delim '\t' "$line" - )
minus=$(awk '$4=="-"' Sorted_"$scaffold".bed | bedmap --echo --count --delim '\t' "$line" -)
done < "$filename"
echo "$plus" > forward_strand.txt
echo "$minus" > reverse_strand.txt
cat forward_strand.txt reverse_strand.txt
--- End code ---
But the above script throws me following error saying that
--- Code: ---scaffold11296 36372 36414 7 comp5_4179_0_1 14956 36371 -
May use bedmap --help for more help.
Error: Unable to find file: scaffold11296 36372 36414 7 comp5_4179_0_1 14956 36371 -
May use bedmap --help for more help.
Error: Unable to find file: scaffold11296 36372 36414 7 comp_4179_0_1 14956 36371 -
scaffold11296 36471 36526 3 dd_Smed_v6_4179_0_1 14956 36371 -
May use bedmap --help for more help.
Error: Unable to find file: scaffold11296 36471 36526 3 comp_4179_0_1 14956 36371 -
May use bedmap --help for more help.
Error: Unable to find file: scaffold11296 36471 36526 3 comp_4179_0_1 14956 36371 -
--- End code ---
I am new to awk and bash scripting. I think bedmap never considers the $line as input.
I also tried your script you posted on october 19th with slight modifcation like follow:
--- Code: ---#!/bin/bash
referenceFn="$1"
mapFn=$(cat Sorted_scaffold11296.bed)
answerFn=$(cat answer.bed)
while read line
do
echo -e ${line} \
| bedops --element-of 1 ${mapFn} - \
| awk -v line=${line} ' \
BEGIN { plus = 0; minus = 0; } \
{ if ($4=="+") { plus++; } else { minus++; } } \
END { print line"\t"minus"\t"plus; }' - \
> ${answerFn}
done < "$referenceFn"
--- End code ---
It gave me the following error
--- Code: ---alex_rey.sh: line 11: ${answerFn}: ambiguous redirect
May use bedops --help for more help.
Error: Bad Input
Cannot find scaffold11296
alex_rey.sh: line 11: ${answerFn}: ambiguous redirect
May use bedops --help for more help.
Error: Bad Input
Cannot find scaffold11296
--- End code ---
Kindly guide me. At present I am a beginner to awk and other linux commands, kindly pardon me if I have committed any mistakes
AlexReynolds:
Bedmap can't process $line as an actual line of BED data. You need to specify a filename, or pipe in the line from an upstream command:
$ line="chrN\t123\t456"
$ echo -e $line | bedmap --echo --count --delim '\t' - someMapElements.bed > answer.bed
Or:
$ line="chrN\t123\t456"
$ echo -e $line | bedmap --echo --count --delim '\t' someReferenceElements.bed - > answer.bed
I'm not sure what your script is trying to do. Perhaps try the scripts I wrote and see if those work, and maybe pick apart the results at each step so you see how it works. Or try working with a simpler script that does work in smaller pieces.
Navigation
[0] Message Index
Go to full version