Thanks for the quick responses. I've answered my second question by tweaking my original command and doing a bit of postprocessing. I now have several files, each containing samples for a specific cell type. For example:
chr1 1757937 1803240 CB021314 ['GNB1', 'NADK', 'TMEM52', 'CALML6']
chr1 2572802 2630150 CB011514;CB012214 ['TTC34', 'MMEL1']
chr1 3565069 3596796 CB021314 ['WRAP73', 'TP73', 'RP5-1092A11.5', 'WDR8', 'MEGF6', 'TPRG1L']
I know that I'm cheating with the 5th column not being numeric, but I don't plan on doing operations with the column - I should have specified in my original example that they are actual gene symbols. In any case, I can drop them and re-annotate after processing if needed. I may have to do that regardless, we'll see.
Anyway, Shane, could you elaborate on your last sentence:
"That should return the list of regions (including all columns found in your original input files) that have no other overlapping elements that meet your overlap criterion."
I think what you've given me is what I want. To be certain, I want to take files (like my example above) and compare them to other such files to identify elements that are "unique", which is defined as either no overlap with any of the elements in the other files or a specified maximum overlap (like 0.25, etc). In the code you've specified, it's returning elements for all specified files that do not meet the minimum overlap criteria (60% in your example), correct?
Thanks again for the help.