Author Topic: Strandedness  (Read 5961 times)

sjn

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Strandedness
« on: August 31, 2011, 03:25:15 PM »
The general strategy for dealing with strandedness, when needed, is to partition an input into 2 files - one for each strand.

While general strandedness features could be built into the tools, it adds a fair amount of complexity in some cases, and it may require more options that interact with other options in specific ways (read: feature bloat and increased usage complexity).

We lean toward simpler approaches.  In this case, partitioning data is easy, and if the data will be used multiple times in the future in a strand-specific way, then it may make sense to store things that way indefinitely.

For the case where field 6 stores the strand information:

awk '$6 == "+"' input.bed > input.pos.bed
awk '$6 == "-"' input.bed > input.neg.bed

If input.bed is already sorted properly, so will be input.pos.bed and input.neg.bed.  Outside of creating the 2 files, there is no real processing disadvantage with operating on the two files in place of the original.  If desired, BED results obtained separately using these 2 files can be combined with bedops --everything.

« Last Edit: March 25, 2012, 01:33:32 PM by sjn »

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Strandedness
« Reply #1 on: November 01, 2011, 06:52:45 PM »
Note that this recommendation doesn't affect use of starch and unstarch, which only care about the data in the first three columns (which, in turn, do not contain any strand information).