Author Topic: Counting forward strand reads and reverse strand between ranges  (Read 2956 times)

dino

  • Newbie
  • *
  • Posts: 2
Counting forward strand reads and reverse strand between ranges
« on: October 19, 2016, 07:18:01 AM »
hi,

I have two bed file like follows:
file1.bed
Code: [Select]
scaffold11296 36365 36414 7
scaffold11296 36471 36526 3

First column is "Scaffold_ID",second column is "Start", third column is "End" and fourth column total number of reads within that range.

Sorted_genome.bed
Code: [Select]
scaffold11296 36302 36334 -
        scaffold11296 36303 36334 +
        scaffold11296 36339 36370 +
        scaffold11296 36340 36369 +
        scaffold11296 36365 36395 -
        scaffold11296 36366 36394 -
        scaffold11296 36367 36395 -
        scaffold11296 36368 36395 -
        scaffold11296 36394 36414 -
        scaffold11296 36471 36502 +
        scaffold11296 36483 36516 +
        scaffold11296 36495 36526 +
        scaffold11296 40892 40909 +
        scaffold11296 40892 40909 +
        scaffold11296 40892 40909 +
        scaffold11296 40892 40909 +

Now I would like to find how many of the 7 reads within
Code: [Select]
scaffold11296 36365 36414 has "+" and "-" reads. Similarly for 3 reads within
Code: [Select]
scaffold11296 36471 36526 how many has "+" reads and "-" reads. I tried "bedtools annotate" but it didnít yield what I want. All I want want like following:
Code: [Select]
scaffold11296 36365 36414 7      5     2
scaffold11296 36471 36526 3      0     3
where "5" is number of "-" reads and 2 is number of "+" reads within
Code: [Select]
scaffold11296 36365 36414. Similarly "0" is the number of "-" reads and "3" is the number of "+" reads within
Code: [Select]
scaffold11296 36471 36526 3.

Kindly guide me.



AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Counting forward strand reads and reverse strand between ranges
« Reply #1 on: October 19, 2016, 10:05:12 AM »
Here's a bash script that would run bedops --element-of 1 on each line of file1.bed against all of Sorted_genome.bed.

For each line of file1.bed, any overlapping elements from the Sorted_genome.bed get sent to awk, which sums strand counts and prints out a result at the end into a file called answer.bed.

It is untested, but hopefully it gets you started:

#!/bin/sh

referenceFn=file1.bed
mapFn=Sorted_genome.bed
answerFn=answer.bed

while read line
do
    echo -e ${line} \
        | bedops --element-of 1 ${mapFn} - \
        | awk -vline=${line} ' \
            BEGIN { plus = 0; minus = 0; } \
            { if ($4=="+") { plus++; } else { minus++; } } \
            END { print line"\t"minus"\t"plus; }' - \
        > ${answerFn}
done < ${referenceFn}


Good luck!
« Last Edit: October 19, 2016, 02:55:31 PM by AlexReynolds »

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Counting forward strand reads and reverse strand between ranges
« Reply #2 on: October 20, 2016, 11:14:31 AM »
Another way to do this, using bedmap and awk:

$ awk '$4=="+"' Sorted_genome.bed | bedmap --count file1.bed - > count.for.txt
$ awk '$4=="-"' Sorted_genome.bed | bedmap --count file1.bed - > count.rev.txt
$ paste file1.bed count.for.txt count.rev.txt > count.both.bed

dino

  • Newbie
  • *
  • Posts: 2
Re: Counting forward strand reads and reverse strand between ranges
« Reply #3 on: October 24, 2016, 07:43:09 AM »
Hi Alex,

Thank you very much. I tried your approach and it worked perfectly fine when I direclty apply commands on console.  I prepared a bash script which takes the transcript.bed as input and reads every line and finds uses awk and bedmap to count the "+" and "-".  Below is my script

Code: [Select]
#!/bin/bash
filename="$1"
while read line;
do
scaffold=$(echo $line | awk '{print $1}')
echo "$line"
plus=$(awk '$4=="+"' Sorted_"$scaffold".bed | bedmap --echo --count --delim '\t' "$line" - )
minus=$(awk '$4=="-"' Sorted_"$scaffold".bed | bedmap --echo --count --delim '\t' "$line"  -)
done < "$filename"
echo "$plus" > forward_strand.txt
echo "$minus" > reverse_strand.txt
cat forward_strand.txt reverse_strand.txt

But the above script throws me following error saying that

Code: [Select]
scaffold11296 36372 36414 7 comp5_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36372 36414 7 comp5_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36372 36414 7 comp_4179_0_1 14956 36371 -
scaffold11296 36471 36526 3 dd_Smed_v6_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36471 36526 3 comp_4179_0_1 14956 36371 -
May use bedmap --help for more help.

Error: Unable to find file: scaffold11296 36471 36526 3 comp_4179_0_1 14956 36371 -

I am new to awk and bash scripting. I think bedmap never considers the $line as input.
I also tried your script you posted on october 19th with slight modifcation like follow:
Code: [Select]
#!/bin/bash

referenceFn="$1"
mapFn=$(cat Sorted_scaffold11296.bed)
answerFn=$(cat answer.bed)

while read line
do
    echo -e ${line} \
        | bedops --element-of 1 ${mapFn} - \
        | awk -v line=${line} ' \
            BEGIN { plus = 0; minus = 0; } \
            { if ($4=="+") { plus++; } else { minus++; } } \
            END { print line"\t"minus"\t"plus; }' - \
        > ${answerFn}
done < "$referenceFn"

It gave me the following error
Code: [Select]
alex_rey.sh: line 11: ${answerFn}: ambiguous redirect
May use bedops --help for more help.

Error: Bad Input
Cannot find scaffold11296
alex_rey.sh: line 11: ${answerFn}: ambiguous redirect
May use bedops --help for more help.

Error: Bad Input
Cannot find scaffold11296
Kindly guide me. At present I am a beginner to awk and other linux commands, kindly pardon me if I have committed any mistakes



AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Counting forward strand reads and reverse strand between ranges
« Reply #4 on: October 24, 2016, 05:06:21 PM »
Bedmap can't process $line as an actual line of BED data. You need to specify a filename, or pipe in the line from an upstream command:

$ line="chrN\t123\t456"
$ echo -e $line | bedmap --echo --count --delim '\t' - someMapElements.bed > answer.bed

Or:

$ line="chrN\t123\t456"
$ echo -e $line | bedmap --echo --count --delim '\t' someReferenceElements.bed - > answer.bed

I'm not sure what your script is trying to do. Perhaps try the scripts I wrote and see if those work, and maybe pick apart the results at each step so you see how it works. Or try working with a simpler script that does work in smaller pieces.
« Last Edit: October 24, 2016, 05:08:18 PM by AlexReynolds »