EMULAB Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

The new forum is online, hope you enjoy it!

Pages: [1]   Go Down

Author Topic: How to check Progetto Snaps?  (Read 5830 times)

Traxx

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 10
  • Operating System:
  • Linux Linux
  • Browser:
  • Firefox 45.0 Firefox 45.0
    • View Profile
How to check Progetto Snaps?
« on: 23 October 2016, 04:33 »

Hallo,
i've downloaded Progetto Snaps 'Full Set 0.170' up to 'Upd 0.178' (for MAME 0.178) and extracted them into separate folders (9 folders). I use provided Progetto DAT file and rebuild the snaps (without compression). Right now there are more than 100 million(!) matches and it needs very long to process (for just some PNGs). It seems it gets slower and slower with raising match count.

Where are these matches come from? The DAT file doesn't hold such a high count of files. I guess something is not correct here.

How can i properly rebuild/check the snaps/titles?

Arch Linux x86-64
Wine/CMPro 4.031a (32-Bit)
Progetto - http://www.progettosnaps.net/snapshots/



Edit: These are the final statistics and it looks like CMP did it properly nevertheless (36825 files). The scanner doesn't show any 'snap' folder entries either so i guess its complete. But the question is still why does it need so long and where is the high match count coming from? Never seen that before.

Code: [Select]
Source-Files:           37449

- now even counting files in archives -

Analyzed Files:         37449
Created Files:          36873

Matched Files:          164413030
Skipped Files:          21886
« Last Edit: 23 October 2016, 05:03 by Traxx »
Logged


Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 54.0.2840.71 Chrome 54.0.2840.71
    • View Profile
Re: How to check Progetto Snaps?
« Reply #1 on: 23 October 2016, 07:10 »

Generally progetto files can be scanned exactly the same way as roms for other emulators/collections.
All you need to know is how to store the files properly...and here the general rule applies:
rompath\setname\file1..filen for not compressed sets, rompath\setname.zip(.rar/.7z) for compressed ones.
So for example you have
F:\Progetto\progetto-SNAPS - Bosses\Bosses\3in1semi.png
where F:\Progetto\progetto-SNAPS - Bosses is the rompath, Bosses is the setname (coming from the belonging datfile) and the png file is actually one file in the set.

Progetto dats are organized that you usually only have 1 set (or some for softwarelists) and thousands of files in it.


Rebuilding: well, it can take long if you're using compressed sets...since then single files are added to an existing dat over and over again....this takes long.
The match count can grow so high. There are thousands of placeholder files within the progetto snaps which are fully identical and they do match again thousand of files in the dat....so you can rather quickly reach such amounts. A match is found when one source file matches one database instance...and in this case it matches a lot...and you most likely got a lot of identical sourcefiles...


If you want to speed up processing (which is hard with progetto), use decompressed files and no special additional checks like sha1 checks etc...or in scanner don't use fix missing deep checks etc..
Logged

Traxx

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 10
  • Operating System:
  • Linux Linux
  • Browser:
  • Firefox 48.0 Firefox 48.0
    • View Profile
Re: How to check Progetto Snaps?
« Reply #2 on: 23 October 2016, 19:52 »

So a 'Matched File' entry is just the comparison between dat crc and file crc (successful or unsuccessful)? "Match" rather sounds like a successful "crc match".

The DAT file is 3.4MB, has 132 'machines' and 59593 'crc's. There are 37449 files to check/rebuild. There is a range of 37.449 - 2.231.698.257 theoretical comparisons. I had 164.413.030 "Matched Files" (comparisons) that are ~7.4% of worst case count. Every file needed 4.39 tries (average) until "created".

In case that is correct it doesn't sound all too bad but it really needed a long time nevertheless like an hour or so. I already processed all files on tmpfs (ramdisk) with sha1 disabled and 'uncompressed files'.

I used the 32bit version (more out of laziness as it is provided by Arch User Repo). Are there noticeable speed improvements in the 64bit version (multi threading, hardware hash calculations...)?
« Last Edit: 23 October 2016, 19:57 by Traxx »
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 54.0.2840.71 Chrome 54.0.2840.71
    • View Profile
Re: How to check Progetto Snaps?
« Reply #3 on: 23 October 2016, 20:00 »

no...it has nothing to do with tries and matches.

Progetto uses one and the same placeholder file (a dummy screenshot showing the mame logo with text like "this set has no title snapshot), i.e. one specific CRC is listed multiple times (if not hundred of times) in the dat. Let's say 100 times as an example.
So if cmpro finds one (1) of these in the rebuilder source, the match counter is increased by 100 (since it's 100 times in the database)...and it gets rebuilt 100 times in the destination (let's assume you got an empty destination before)....now you got all of these 100 files in the source (since you've dowloaded the sets)...so you get a matchcount of 100 * 100 = so you're already at 10000. etc..etc..cmpro of course sees that the exact file already exists in the destination and does not recreate them. That's why you got the "created" file count. Matches are usually way higher since identical files are shared across sets.

Let's take the file with the crc32 4218c199 for example...it's listed 8866 times in the dat!....so if you have all of them as source files you end up with something around 78 million matches...since each source file matches each entry with that checksum in the dat.


Speed is only really gained by not compressing progetto sets (espacially NOT 7z or even solid archives) and turning off time intensive operations.....I doubt a 32/64 bit change will bring you a speed boost.
But anyway...scanning/rebuilding such sets with multiple ten thousand files per set simply takes longer than common arcade MAME sets.
« Last Edit: 23 October 2016, 20:04 by Roman »
Logged

Traxx

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 10
  • Operating System:
  • Linux (Ubuntu) Linux (Ubuntu)
  • Browser:
  • Firefox 48.0 Firefox 48.0
    • View Profile
Re: How to check Progetto Snaps?
« Reply #4 on: 23 October 2016, 21:44 »

Thanks for the clarification. Just had the feeling it gets slower and slower the more matches it found. If i reckon correctly the first 10.000 creations were found in the first minute whereas the last 20.000 needed maybe an hour. Edit: Never mind... as you already said all dupes are created at first approach. That's why it probably boosts in the beginning.
Will do more testing on the titles set with 64bit version (if it runs on Wine). Edit: Works. Felt much faster (like 20mins) but title dat is smaller than snaps. Maybe not comparable.

I doubt a 32/64 bit change will bring you a speed boost.
I rather hoped you would have put some individual processor specific intrinsics there like _mm_crc32_u64 (SSE4.2 hardware accelerated crc32 calculation, multi-threaded).
On 32bit version the CPU is constantly stalled at 1/4 only (on 4 cores) that is typical for single core limitation. I found some entries about parallelization/multithreading in CMP history (somewhere 2008) but it doesn't look like it has any effect here or has been dropped already (or is Linux/Wine issue).
« Last Edit: 23 October 2016, 23:35 by Traxx »
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 54.0.2840.71 Chrome 54.0.2840.71
    • View Profile
Re: How to check Progetto Snaps?
« Reply #5 on: 24 October 2016, 07:43 »

The real bottle neck these days is (hd) disk access....(or 7z solid archives ;-))
If you for example run a scan on a MAME collection at the first time it takes a while...if you have enought disk cache and run another scan it flies since everything is in the cache and the hd is not touched at all....
Logged
Pages: [1]   Go Up
 

Page created in 0.202 seconds with 20 queries.

anything