MAME tries to name the files after their name and location of the original PCB. This can result in different filenames for identical files in different sets. If they don't share a parent/clone relationship this doesn't play a role. Now what happens when this happens within a parent/clone relationship.
By default, clrmamepro acts like this: different names - different roms.
So if you have something like:
pacman\rom1.bin crc 0x12345678
pucmanjp\rom1jp.bin crc 0x12345678
within a parent clone relationship and you fully merge the sets you will have 2 files in there which are byte-wise identical.
A waste of disc space? Well...since MAME loads by crc/sha1, you don't actually need the files twice...MAME doesn't care about naming anyway....clrmame does. clrmame is strict. In days of terabyte HDs it's questionable if you really have to care about some bytes..
MAME itself added something to define that the files are identical. These are the so called merge-tags and they tell clrmamepro which 'alternative' name is also allowed:
rom name="mds-te_2b_a.bin" merge="mds-te.2b"
When clrmamepro uses this information it's not that strict anymore and detects if double roms can be avoided. By default usage of this tags is disabled. The reason for this is: I personally like the strict way better since each rom listed in the datfile can be found with the correct name in the sets and in the past merge tags were buggy in MAME's xml output.
You can enable the usage of the rom merge tags in profiler->options->Parse rom 'merge' tags.
By the way, disks (chds) also have merge tags...and surprisingly this option is enabled by default (otherwise you'd need some of the beatmania chds twice or even more...this *IS* a waste
)