Rebuilder V0.07

Little tool to rebuild MAME (https://www.mamedev.org/) machine sets.

(c) 2022 - 2023 Roman Scherzer

This software is freeware.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For (c) and licence information of the used 3rd party tools, please refer to the bottom of this document.

The story so far

Another rebuilder?

Well, actually you might know that I started clrmame(pro) in 1997. Back then the MAME world was different, the C++ world was different. clrmamepro grew over decades while MAME added things like using crc32, md5, sha1, merge modes, chds, devices, xml and so on. And then there were other projects which also wanted to use clrmamepro for their purposes and so lots of requests were added.

The downside of this: Maintaining the code is pretty tough. I'm pretty bored looking at that old code basis, so it is time for something new. It feels a bit like 1997 again, code something for me, for fun. It's fun again to see that 'modern' C++ and 3rd party tools make life way easier. Less options, faster, no MS Windows specific coding, complete new ideas how to keep data in memory or how to handle different merge modes ....so it ended (for now) with a commandline utility which does the rebuild job. Maybe some people will enjoy it even if it is -or especially because it is- commandline based.

Time will tell, if this little project turns into a new clrmame in total (with a scanner, an UI, a profiler but definetly no merger). For now it's more or less an experiment.

So, if you're not bored yet, read on.

How the rebuilder works

The rebuilder analyzes each file (or file in an archive) in the input folder and tries to match each file's hash/size against the loaded database. Each found match (there can be multiple, think e.g. of a file which is shared by 10 machines) will be added to the output folder. There it uses the correct file name and machine name. Optionally, rebuilt files (and empty folders) are removed from the input folder. If your input folder holds an archive with an archive or chd inside, this inner archive is unpacked to your temporary folder. Such temporary files are removed at the end of a rebuilding process. By default the system's temporary folder is used. You can alter the temporary folder by modifying the "Rebuilder/TempFolder" entry in settings.xml. This can be very useful if your system's temporary folder is on a slower disk. Selecting a custom temporary folder can have a positive effect on rebuilding speed.

By default, the rebuilder matches files by crc32/sha1 and size. There is an option (-s, --sha1) which allows you to select between no sha1, input, output and both sha1 checks. Enabling one sha1 mode is of course slower since it needs additional time for decompressing the archived data and the actual hash calculation. Surely it's more accurate. In case there are different sha1 values for one crc32 specified in the datfile and you disabled the input sha1 check, it will take the sha1 value into account, too.

This rebuilder is also able to rebuild CHD files (from version 3 onwards). For such files, the sha1 information from the CHD header is used for a match.

If you already have an existing archive/folder in the destination, the matched file is added but only if the file itself does not exist there. If there is an existing file but it does not match the right hash, this existing file is moved to the backup folder and the found match replaces it in the rebuilder destination. The backup folder is generated in the output folder and is called 'backup'.

This rebuilder can also identify source archives which already match an output machine completely. If the destination does not exist yet, such archives are copied directly.

The rebuilder runs through various phases. First it loads the datfile, checks source/destination paths for existance and builds merge mode views. Then it runs through the input folder. Be patient, this can take some time depending, especially if you scan lots of files. After that the output is checked. If no output exists or contains not many files, this should be very fast. An optimize phase is next and -if possible- the rebuilder then copies archives which can be copied directly. Finally, the actual rebuild is done and in the end a cleaning step is performed.

The commandline options

The program allows the following commandline parameters:

-x, --xml: Here you specify your used xml file which holds your databasis. This option is mandatory. See below for the supported types of XML files.

-i, --input: The rebuilder input folder. This folder is checked for matching data. This option is mandatory and the folder has to exist.

-o, --output: The rebuilder output folder. In this folder, the matched data is copied/moved to. This option is mandatory. If the folder does not exist, it will be created. Note: When using -r, --recursive, the rebuilder output path can't be a subfolder of the rebuilder input path. A 'backup' folder is generated in this folder, too, which will hold replaced files. It is generated if a file is replaced. An existing but empty folder is removed at the end.

-m, --mode: Your prefered output merge mode. See below what the 3 modes are. This is an optional setting, default value is split. You can use split, full or standalone.

-p, --pattern: With this option you can specify an output pattern. Basically a path information which is put in between the output folder and the machine name. See below for examples.

-c, --compress: This defines your prefered output compression method. This is an optional setting. Default is zip which keeps your files in zip archives. You can use zip, rezip, 7z or none. The latter one would keep your machines decompressed. rezip will always recompress the destination files and a direct copy of archives is not performed.

-f, --filter: Here you can specify a regular expression on the machine name. Only matching entries from the loaded xml will then taken into account during rebuild. This is optional and no filter is the default value.

-d, --delete: With this option, rebuilt input files are removed. Be warned, they are gone! If the last file from an input archive or folder is removed, this archive/folder is also removed. Deletion is optional and disabled by default.

-s, --sha1: This would turn on or off additional sha1 matching of input and/or output files. Enabling it will be slower but more accurate. Default is input. You can use none to turn on simple crc32/size checks, input to do sha1 checks on the input file only, output to do sha1 checks on a possibly existing output file only and both which is identical to input and output. -s none is the fastest mode but keep in mind, depending on the files you're scanning, you may run into crc32 matches where in fact the sha1 would not match.

-r, --recursive: Turn this on if you want to run through your input folder and all of its subfolders.

-l, --loglevel: Specify the detail level of the output. By default this is set to info. You can use err, warn, info or trace where the latter one additionally lists source file and rebuild information. info shows you a little progress bar here and there and gives you some updates when reading folders. If you redirect your ouput, progress bars and updating file counts are not visible.

-u, --uselinks Default is none. hard or sym are possible other values. Turn this on if you want to generate a filesystem hard or sym link instead doing a file copy operation. This takes place when copying archives 1:1, copying chds or copying single unpacked files from a source to the target. Keep in mind that there are limitations to the use of links in general and based on its type. This includes volume restrictions and access rights.

Supported XML file types (-x, --xml)

Currently three types of xml files are supported:

You can create a xml file by running MAME from the commandline interpreter and redirect its output, e.g. mame.exe -listxml >245.xml.

When loading an input file you might see some warnings. For a standard MAME -listxml you e.g. see sample specific warnings. It's mainly about sample relationships from machines to a sample parent machine which is not available in the XML. Such sample-only sets are generated automatically so that the assignment is correct again. Similar warnings exist for the use of samples which aren't available in the sample parent set. This is also fixed internally.

XML, Input, Output

The three things you need to specify are

Examples:

Load a MAME software list xml and rebuild from C:\Users\FooBar\Downloads to f:\softwarelists\a2600_cas: rebuild.exe -x e:\MAME\hash\a2600_cass.xml -i C:\Users\FooBar\Downloads -o f:\softwarelists\a2600_cass

Load a MAME -listxml xml and rebuild from f:\roms and all its subfolders to f:\mame\roms: rebuild.exe -x e:\MAME\244.xml -r -i f:\roms -o f:\mame\roms

Load a MAME -listxml xml and rebuild from f:\roms and all its subfolders to f:\mame\roms and remove all rebuilt files: rebuild.exe -x e:\MAME\244.xml -r -i f:\roms -o f:\mame\roms -d

Rebuilding is more or less a copy operation of files. If we talk about CHDs, we even talk about huge files. If your input and output folders are on the same ssd/hd, you will create pretty much I/O traffic. Ever tried copying (not moving) a multi GB file on one and the same hd? It usually crawls. So be aware of this before you try to rebuild a complete MAME collection from one folder to another.

Modifying the output (-p, --pattern, -c, --compress)

Generally you can define how the output should be stored. Either compressed (currently as zip) or decompressed as files and folders.

Create decompressed output: -c none

Create zip archives: -c zip

Create 7z archives: -c 7z

Alway recompress zip archives: -c rezip

There is another option to modify the output by defining some patterns which can be used to add additional folders to the output: -p sub1/sub2/sub3

This will add three level of sub folders to the given rebuilder output root. Assuming you specified e:/temp as output folder, your machine sets will then be placed in e:/temp/sub1/sub2/sub3. While folder separator characters (/ ) are allowed, . or .. are not possible to be used here.

Way more interesting are predefined patterns which can be used with the -p command. You can use:

If you're loading a software list collection datfile, you automatically have a pattern of #SOFTLIST# active internally as top level.

Example: You want to split up your collection by manufacturer and by year: -p #MANUFACTURER#/#YEAR#. You want to split up your collection by something which is nearly identical to clrmamepro's system default paths: -p #BIOSSPLIT#

Destination machine folders and/or archives get a current timestamp. Files inside a setfolder or inside an archive keep their original timestamp.

If you're not happy with the prefix names (e.g. #device), you can alter them in settings.xml (see below).

Merge Modes (-m, --mode)

A merge mode defines how your stored mechanines are bundled. Some machines share a parent / clone relationship which is specified in the underlying datfile. Depending on the chosen mode, such machines can be merged together.

We differ between:

Limit your output (-f, --filter)

With the -f, --filter option you can filter the loaded XML to a subset of machines. You define a regular expression which is matched against the machines name. So for example if you only want to rebuild "pacman", you can simply add: -f pacman

If you want to rebuild all machines which start with 'pac', you can write: -f pac.*

For filtering only pacman and outrun, you'd write: -f pacman|outrun

Settings

settings.xml is created / loaded on startup which allows you to alter some settings. Currently you can only change the default pre-strings for the -p command #TYPE# and #BIOSSPLIT# values. So e.g. you can change #default to 'StandardSets' or similar. Only valid path characters are allowed. Illegal values won't be accepted and the defaults are used again. You can also specify a temporary folder here which is used for temporary decompression purposes (e.g. when an archive is in an archive) or when data needs to get recompressed.

History

2023-11-28 V0.07 released

2023-05-04 V0.06 released

2023-04-14 V0.05 released

2023-03-12 V0.04 released

2022-10-05 V0.03 released

2022-08-16 V0.02 released

2022-07-13 V0.01 released

Since there is no scanner yet, can I scan my rebuilt sets with clrmamepro?

Yes, but be aware of the following:

What are the benefits over clrmamepro

Besides of the -in my opinion- way better code, worth mentioning is:

On the other hand I of course understand if users start to moan "but it does not have feature x and y", "it does not support datfile type z" and so on. Yes, this is the case but currently I don't want to implement requests which might be used by 1% of the users. Time will tell what comes next.

Future Plans / Source Availability

In this state of the project it is closed source, mainly due to the use of the full version licence of ZipArchive.

There are definetly plans for the future to get open source. Currently there are discussions about some licences (e.g. free version of ZipArchive is currently GPL)

Things I'm interested in:

Things I'm not interested in:

Bug Reporting / Donation

If you found something spooky, have problems, feel free to use the clrmamepro forum: https://www.emulab.it/forum/index.php?board=6.0

If you're totally happy with it, feel free to donate ;-) https://mamedev.emulab.it/clrmamepro/#donate

Third party licence information

Zip Handling: ZipArchive

ZipArchive Library 4.6.9 Copyright (c) Tadeusz Dracz

Currently using the 'full version' licence. Making the product currently closed source.

7z/Rar Handling Bit7z

Bit7z v4.0.4 Copyright (c) 2014 - 2023 Riccardo Ostani

https://github.com/rikyoz/bit7z/blob/master/LICENSE

MPLv2 License

You can obtain a copy of the MPLv2 License here https://mozilla.org/MPL/2.0/

7z.dll is used which is part of the 7-Zip program. 7-Zip is licensed under the GNU LGPL license. You can find 7-zip including source code at https://www.7-zip.org

CLI Parser: CLI11

CLI11 2.3.1 Copyright (c) 2017-2023 University of Cincinnati, developed by Henry Schreiner under NSF AWARD 1414736. All rights reserved.

https://github.com/CLIUtils/CLI11/blob/main/LICENSE

Redistribution and use in source and binary forms of CLI11, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

XML Parser: PugiXml

pugixml 1.14 Copyright (c) 2006-2023 Arseny Kapoulkine

https://github.com/zeux/pugixml/blob/master/LICENSE.md

MIT License (MIT) (see below)

Logging: SpdLog

SpdLog 1.12.0 Copyright (c) 2016 Gabi Melman.

https://github.com/gabime/spdlog/blob/v1.x/LICENSE

MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

SHA1 calculation: CSHA1

CSHA1 2.1

100% free public domain implementation of the SHA-1 algorithm by Dominik Reichl dominik.reichl@t-online.de