Hi IrsanKooi,
Thanks for joining our forum.
I'd recommend using awk for something like this. If the problem was a little more involved, I'd go with a unix 'join' command (combined with awk).
Here is a preferred solution without BEDOPS:
awk '{ \
if ( $4 in arr ) { arr[$4] = arr[$4]";"$1"\t"$2"\t"$3; ++cnts[$4]; } \
else { arr[$4] = $1"\t"$2"\t"$3; cnts[$4]=1; } \
} END { \
for ( i in cnts ) \
print i"\t"cnts[i]"|"arr[i]; \
}' bed-file
The output may not be sorted in a way that you like. You can pipe results to 'sort -k1,1' to fix that. This kind of problem doesn't require any sophisticated windowing or set-like logic, which are the types of things that BEDOPS is best used for. You can do some acrobatics and use BEDOPS in answering this kind of question, yet such an approach still requires a number of standard unix tools (like cut, awk, uniq). I recommend using the above approach, understanding its mechanics and why BEDOPS is probably not the best tool for this job. Because you are just matching on a single field, and, specifically, because you are not using BED chromosome or coordinate information at all, BEDOPS is of limited utility without forcing a solution. Yet, simple tools provided in linux and macosx are ideal for this sort of thing (the awk code shown above).
Here is a somewhat awkward solution that includes bedmap and sort-bed calls.
awk 'BEGIN {OFS="\t"} ; { print $4, 0, 1, $1"-"$2"-"$3; }' bed-file \
| sort-bed - \
| tee all-entries.bed \
| cut -f1-3 \
| uniq \
> uniq-entries.bed
bedmap --exact --echo --count --echo-map-id --delim "\t" uniq-entries.bed all-entries.bed \
| cut -f1,4- \
> answer.txt
I've taken some liberties with delimiters between the output fields you want. These can be changed with the 'tr' command (or with '--delim' option in the bedmap call).
We try to work well with standard unix tools rather than replicate what they offer, either directly or with a couple lines of code. I mentioned unix join above. This is a powerful little utility that is already on your system that, with a small amount of awk, can do everything a 'groupby' gives you in other tool suites.