Author Topic: fragment length extension  (Read 3994 times)

cohendm

  • Newbie
  • *
  • Posts: 5
fragment length extension
« on: May 12, 2015, 11:17:44 AM »
New to the BEDOPS toolset, and I have a very basic question. The documentation for bedmap indicates that it will count 5' ends of tags that map to regions of interest, and that you can extend the boundaries of that region, if desired, to smoothen the data. I would like to perform an analysis that counts tags that intersect another bed file, but based not solely on 5' end of the tag, but rather the fragment length (i.e. count the tag if any portion of it intersects). Is there an easy way to do this? I want to preserve strand information in this fragment length analysis (that is, I want to extend the 5' end only in one direction by an arbitrary length). Thanks!

sjn

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: fragment length extension
« Reply #1 on: May 12, 2015, 11:32:12 AM »
Hi.  Thanks for joining the forum.

To use BEDOPS in a strand-sensitive way, you'll have to break your analysis up a bit.  If your strand information is in the 6th column, for example:

awk '$6 == "+"' myfile.bed \
  | bedmap --echo --count .... \
 > result.forward-strand.bed

Do the same on the opposing strand, and, if you'd like, you can glue things back together using bedops -u into a final result.

By default, bedmap will map anything in your second file that overlaps the first file's region by 1 bp or more (not necessarily the 5' end).  This seems to do what you want by default, but you can add the --bp-ovr 1 flag explicitly if you like.

When extending a region, you'll want to use bedops --range since you want a non-symmetrical padding.  The --range option with bedmap is a bit different and it's a symmetrical thing.

I'll pretend that you have two files, both of which have strand information in the 6th column.  And, I'll pretend that you want to add 10 bp of upstream padding.
a.bed
b.bed

awk '$6 == "+"' a.bed | bedops -u --range -10:0 > a.plus.bed
awk '$6 == "-"' a.bed | bedops -u --range 0:10 > a.minus.bed

You would need to parse out per-strand information from b.bed if you want strand-sensitive results.  So, your original 2 files becomes 4.

bedmap --echo --count a.plus.bed b.plus.bed > plus.final.bed
bedmap --echo --count a.minus.bed b.minus.bed > minus.final.bed

If you want these results in one file at the end:
bedops -u plus.final.bed minus.final.bed > final.answer.bed

Hope that helps.  Most important to everything in BEDOPS is that your files are sorted properly using the sort-bed program - make sure to do that for a.bed and b.bed before anything else.  The other outputs you generate will be in sorted order, so you don't have to do that again.
Shane
« Last Edit: May 12, 2015, 11:35:21 AM by sjn »

cohendm

  • Newbie
  • *
  • Posts: 5
Re: fragment length extension
« Reply #2 on: May 12, 2015, 12:14:57 PM »
Thanks for the fast and detailed reply, Shane. This is a huge help! Will give it a go per your examples.