• bedopsNew --partition operatorThis operator will efficiently split overlapping inputs and report disjoint segments that partition the shared genomic space.
To demonstrate, say you have a few input BED files (sorted with BEDOPS
sort-bed) or equivalent Starch archives. Together they have coordinate segments on
chrN that look like:
------------------------------------
---------------
------------------
-------------------------------------
----The output from
bedops --partition on these inputs would be:
---
----
-----------
-------
----
----
---
-------• starchImproved error checking for interleaved records
• Conversion scriptsAll scripts now use BEDOPS
sort-bed behind the scenes to output sorted BED output, ready for consumption by BEDOPS utilities like
bedextract,
bedmap,
bedops and
closest-features.
In other words, it is no longer necessary to pipe converted output to
sort-bed before piping to other BEDOPS utilities.
New
psl2bed conversion script, converting PSL-formatted UCSC BLAT output to BED.
New
wig2bed conversion script written in Python.
New
*2starch convenience scripts offered for all
*2bed scripts, which convert data and output Starch v2 archives.
• Improved Mac OS X supportNew installer package makes installation of BEDOPS binaries and scripts much easier for OS X 10.6 - 10.8 hosts. (
http://bedops.googlecode.com/files/bedops_macosx_intel_fat-v2.1.0.mpkg.zip)
Installer resolves fatal library errors seen by some end users of older OS X BEDOPS releases.
This release also includes major BEDOPS v2 features, such as:
• Support for BEDOPS Starch archives with main toolkitThe
bedextract,
bedmap,
bedops and
closest-features tools now all accept Starch-formatted files as inputs, as well as UCSC BED files, as before. (In other words, it is no longer necessary to extract Starch data to intermediate files before applying set or statistical operations.)
• Very efficient single-chromosome operationsNew
--chrom operator applies set, statistical or ID operations to specified chromosome with
bedmap,
bedops and
closest-features, without needing to stream through the entire BED file. (This is highly useful for parallelization tasks on very large BED data.)
• bedmapNew
--echo-map-id-uniq operator lists unique ID values from mapped elements.
New
--max-element and
--min-element operators return the highest or lowest scoring overlapping map element.
• sort-bedNew
--max-mem option limits sorting to specified memory, useful for sorting large BED inputs larger than system memory.
• starch, unstarch and starchcatBEDOPS Starch v2 archives contain useful, precomputed metadata that can improve the efficiency of scripts.
For instance, calling
unstarch --elements on Starch v2 archive shows the total number of records in the entire file or for any individual chromosome, while
unstarch --bases and
unstarch --bases-uniq give the number of total and unique bases covered by elements in the whole archive or over elements of the specified chromosome. These latter two options are analogous to those already available in bedmap.
As an example, using the
--elements operator on a Starch v2 archive made from DNaseI-seq or RNAseq tag data would return the total number of reads over the entire BED file. Using
--elements chr3 would return the total number of tags in chromosome
chr3.
Values are precomputed and stored in the archive's metadata, allowing practically instantaneous retrieval. Going back to
--elements again, this option is much, much faster than extracting data and piping it to
wc -l.
New checksum data help validate the integrity of the archive and its metadata.
Other metadata enhancements to Starch-format archival and extraction, including:
--note,
--list-chromosomes,
--archive-timestamp,
--archive-type and
--archive-version.
Added 20-35% performance boost to creating Starch archives with
starch utility.
New documentation with technical overview of the Starch format specification.
http://code.google.com/p/bedops/wiki/starchSpecification• Conversion scriptsNew
gtf2bed conversion script, converting GTF (v2.2) to BED.
• Overall improvements in 64-bit type handling and error checkingConsistency across the codebase helps ensure that all BEDOPS applications can scale to arbitrarily large genomes.