Author Topic: Problem with multiple comments in a bed file  (Read 4513 times)

pachkov

  • Newbie
  • *
  • Posts: 14
Problem with multiple comments in a bed file
« on: August 18, 2014, 06:15:05 AM »
Hi again!

I have got rathe unusual bed file which confuses sort-bed.
This is how it looks like:

#Deleted in new
10      3100141 3100242 HISEQ:130:C2EWUACXX:8:2202:9761:39782   1       -
#Deleted in new
10      3100151 3100252 HISEQ:130:C2EWUACXX:8:2106:1494:66677   1       +
#Deleted in new


sort-bed says the following:

Non-numeric start coordinate.  See line 3 in tmp.bed.
(remember that chromosome names should not contain spaces.)

I think that this is a wrong behaviour. All lines starting with "#" should be skipped.
Is that right?

Best,

Mikhail

AlexReynolds

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Problem with multiple comments in a bed file
« Reply #1 on: August 18, 2014, 04:36:31 PM »
Thanks for the report, which we will investigate.

In the meantime, you can also do this to strip comment lines and sort your file:

$ grep -v '^#' unsorted.bed | sort-bed - > sorted.bed

sjn

  • Administrator
  • Jr. Member
  • *****
  • Posts: 72
Re: Problem with multiple comments in a bed file
« Reply #2 on: August 18, 2014, 05:09:56 PM »
sort-bed only removes header lines that start with a '#' (or other supported header lines: see bedmap --help or docs for that list).  In this case, these are not all at the top and they won't be stripped.  It tries to read 'in' as a numeric start coordinate and dies a miserable death.

Alex's suggestion will work well for your input type.

Shane

pachkov

  • Newbie
  • *
  • Posts: 14
Re: Problem with multiple comments in a bed file
« Reply #3 on: August 19, 2014, 12:21:23 AM »
Thank you both!

I do exactly what Alex suggested but it would be nice to get ignoring all comments in the sort-bed.

Best,

Mikhail