Author Topic: Bedops Intersect not consistent with Bedtools  (Read 7432 times)

alisterd17

  • Newbie
  • *
  • Posts: 2
Bedops Intersect not consistent with Bedtools
« on: May 20, 2015, 01:31:39 PM »
Hi everyone,

I was looking to convert current calls to bedtools intersect in my code to bedops intersect or similar. When I used small size bed files I was able to get identical output between the two functions, knowing where the overlaps were and what the outputs should be beforehand. However when I tried larger files ~ 200000+ lines I received varied output, specifically bedops didn't detect many regions of overlap between the two bed files. only 6 chromosomes were shown to have regions of overlap whilst all of them did in actuality, something that the bedtools output showed.

I made my calls to both programs as follows:

./bedtools intersect -a exome.bait.bed -b cardio.bed
./bedops --intersect cardio.bed exome.bait.bed

I am using bedtools v 2.2.2, and bedops v 2.4.5. I'm not at liberty to attach the files themselves, but if anyone else has had this probelm before or if they have any ideas or suggestions of how to fix it please let me know.

Thanks,
Alister

sjn

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Bedops Intersect not consistent with Bedtools
« Reply #1 on: May 20, 2015, 02:05:26 PM »
The most likely culprit is that you need to sort the files for use with bedops.
sort-bed your-file.bed > your-file.sorted.bed

Then, take the intersection.  The output from bedops will also then be sorted and you can use results immediately for downstream processing.

alisterd17

  • Newbie
  • *
  • Posts: 2
Re: Bedops Intersect not consistent with Bedtools
« Reply #2 on: May 21, 2015, 06:36:33 AM »
Thanks, that appears to have solved most of the issues.

There are still a few entries that appear in the bedtools output that don't in the bedops. It has to do with having extra columns in the bed files. One of my input files for the intersect call has 3 additional columns giving information about the region such as accession number and strand orientation.

When I call bedtools with the file having added columns as my -a I get this additional information in my output bedfile. However when I call bedops regardless of file order I don't have these columns in my output file, and only get the first 3 basic columns. My downstream analysis requires this information for certain tasks. Is there any way to keep this information in my output using bedops?

sjn

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Bedops Intersect not consistent with Bedtools
« Reply #3 on: May 21, 2015, 06:58:27 AM »
Yes - there is a distinction between a genomic intersection and a genomic element-of (subset) in bedops.

bedops --intersect is a literal coordinate intersection, while bedops --element-of is a subset feature.  With the latter, you will receive the entire element if it meets your overlap criterion and all the extra columns are there.

Consider 2 files:
cat a.bed
chr1  10  100  id-1  2.34  SNV312

cat b.bed
chr1  5   77 file2  4.56  +

If I intersect those with bedops -i a.bed b.bed, I receive:
chr1 10 77

it isn't clear what extra columns should go in this intersected region.  Should it be from a.bed or b.bed?  What if a.bed has multiple elements in the intersection, or what if you use 100 input files instead of just these 2?  Philosophically and generally, the intersected results make up a new BED file with regions that are different from elements found in either a.bed or b.bed as shown above.

If you instead use bedops -e 1 a.bed b.bed, you get:
chr1  10  100  id-1  2.34  SNV312

because the element in a.bed overlaps something in b.bed by 1 bp or more (you can change the overlap criterion from 1 to a different value or use a percentage, like 50%).

If you are looking for the literal coordinate intersection (--intersect/-i) and you want extra columns from 1 (or more) of your input files, then you couple bedops with bedmap.  You first get the literal coordinate intersection with bedops, and then you map on information from the file(s) of interest.  For example,
bedops -i a.bed b.bed | bedmap --echo --echo-map-id --delim "\t" - a.bed

will pull out the information in the 4th column of a.bed and map it onto the intersected regions.  Other columns can be mapped too with --echo-map-score and --echo-map, which maps information from all columns.  --echo-map is the most general, and you sometimes need a small awk or cut statement at the end to pull out particular columns of interest.  We can help with that if needed.

Hope that helps.
« Last Edit: May 22, 2015, 05:20:15 AM by sjn »