Applications of BEDOPS > bedops
Tips for working with --complement and left and right edges of chromosome
(1/1)
AlexReynolds:
There is one complication to the usage case for -c (or --complement) with a 0-based coordinate system (and only with this coordinate system), where it is impossible to get back any result that produces base 0 on the output, for any chromosome.
One might think of this situation as wanting to find the region between the "left" edge of a chromosome and the first input element. Consider the following three elements:
chr1 1 4
chr1 100 110
chr2 50 75
Running bedops -c on this input will produce the following result:
chr1 4 100
The optional -L argument may be given with -c, so that bedops -c -L with the elements above produces, instead:
chr1 0 1
chr1 4 100
chr2 0 50
This handles the "left" edge case. What about the "right" edge — the extent of the chromosome?
Hard-coded bounds for various genome assemblies are not part of BEDOPS, but one can easily specify any maximum coordinate bound as an input region, in order to retrieve coordinates as far out to the right edge (i.e., chromosome length, or "maximum bound") as desired. It often makes sense to download the chromInfo table for the genome of interest from the UCSC table browser (http://genome.ucsc.edu/), do some small processing and use that as input to bedops -c. For example, say you're working in 0-based coordinates and you want complemented regions to both ends of a chromosome where applicable:
awk '(NR > 1) { print $1, $2, $2+1; }' chromInfo | sort-bed - | bedops -c -L myfile.bed - > answer.bed
will give you what you're after though the downloaded chromInfo file may have 'extra' chromosome information (such as chrUn_gl000212). If you're not interested in those extras, but only the chromosomes in your myfile.bed file, then:
bedexract --listchr myfile.bed > tmp
grep -w -F -f tmp chromInfo | awk '(NR > 1) { print $1, $2, $2+1; }' | sort-bed - | bedops -c -L myfile.bed - > answer.bed
rm tmp
will clean things up for you. We grab the unique chromosomes in your myfile.bed file using bedextract. We then grep out rows in chromInfo that have the same chromosome names as in your file, mark the end of the chromosomes, sort, and then apply the complement operation.
Navigation
[0] Message Index
Go to full version