EMULAB Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

The new forum is online, hope you enjoy it!

Pages: [1]   Go Down

Author Topic: Slow even with checksums disabled  (Read 5631 times)

coccola

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 56
  • Operating System:
  • Windows 7/Server 2008 R2 Windows 7/Server 2008 R2
  • Browser:
  • Firefox 45.0 Firefox 45.0
    • View Profile
Slow even with checksums disabled
« on: 09 April 2016, 03:10 »

Hi!

I used a dat from http://www.progettosnaps.net/ with thousands of small PNG files.

I used clrmamepro to rebuild the files I had according to the dat, with the rebuilder set to keep the files uncompressed in the destination folder.

After rebuilding all the files I had, I needed to check if there were still missing files according to the dat.

To do that, I used clrmamepro's scanner with Check Checksums disabled, expecting that it would take only a few seconds to finish, but unhappily it was taking ages, just like it would if Check Checksums was enabled.

I think there's something wrong with it, it should be ultra fast when there's no need of calculating the checksums of files. Could you do something about it?
Logged


Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Mac OS X Mac OS X
  • Browser:
  • Safari 9.0 Safari 9.0
    • View Profile
Re: Slow even with checksums disabled
« Reply #1 on: 09 April 2016, 11:00 »

unpacked sets are slow...simply because hashes are calculated...for example for name checks, unneeded checks etc.
in case of packed sets, the hash can be taken from the archive directly.
so your assumption is wrong since checksum check is only one out of many places where a file hash is calculated.

the problem with packed sets however is that there the fixing part takes longer (or even ages when using solid 7z archives)
« Last Edit: 09 April 2016, 11:01 by Roman »
Logged

coccola

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 56
  • Operating System:
  • Windows 7/Server 2008 R2 Windows 7/Server 2008 R2
  • Browser:
  • Firefox 45.0 Firefox 45.0
    • View Profile
Re: Slow even with checksums disabled
« Reply #2 on: 09 April 2016, 16:28 »

On my case I don't need to check the contents of the files (checksums) because I'm pretty sure that the files that are on the folder are correct, I've just rebuilt them. I only want to know which files are missing, and I want it to be faster because of that.

If clrmamepro always checks the checksums, no matter what the setting on the scanner is, why is there a Checksum checkbox?
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 49.0.2623.112 Chrome 49.0.2623.112
    • View Profile
Re: Slow even with checksums disabled
« Reply #3 on: 09 April 2016, 16:35 »

Unneeded and Name check needs to calculate a hash to determine if the name is correct or not or if the file is totally unneeded.

The checksum checkbox stands for testing checksums in general (rom/chd, sha1/md5/crc32 depending on further settings) is correct or not...to show or not to show a warning about a bad checksum. If you toggle checkboxes after a scan you can toggle (hide/show) the belonging results.

So if you need to go for speed (a ssd or a good hd cache and recent cpu by the way should be ok, too), you can turn off the unneeded/name check...and only keep missing enabled....


So...I've just checked a progetto snaps archive (34.800 pngs)...on a common hd (not ssd) and a full scan with everything (name, unneeded, checksum, case, etc..etc..) enabled took 2 min 47 seconds without diskcache....another one with diskcache only 38 seconds....
and with only missing enabled (and not checksum, etc) only 7 seconds...I guess that what users can live with.

« Last Edit: 09 April 2016, 17:06 by Roman »
Logged

coccola

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 56
  • Operating System:
  • Windows 7/Server 2008 R2 Windows 7/Server 2008 R2
  • Browser:
  • Firefox 45.0 Firefox 45.0
    • View Profile
Re: Slow even with checksums disabled
« Reply #4 on: 09 April 2016, 17:30 »

7 seconds is OK for me, that's about the number I was expecting. But that is without cache, right?
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 49.0.2623.112 Chrome 49.0.2623.112
    • View Profile
Re: Slow even with checksums disabled
« Reply #5 on: 09 April 2016, 18:03 »

No that was with cache, too. You can get that down to 4 seconds if you minimize the progress window ...you see window text printing does not come for free. Actually nothing comes for free. HD access is the bottle neck.

But anyway I meant that 2 minutes is also fine for 34800 files and multiple hash calculation. What do you expect when opening/closing that amout files.
If you want something faster write yourself a cmdline script which runs a deep dir command and matches the lines against a (preprocessed) mame -listxml output....or use archives..then your hd access is limited to one file.
Progetto png files (or other huge 1-set-billion-rom sets) are not really the main target for cmpro.
« Last Edit: 09 April 2016, 18:05 by Roman »
Logged

coccola

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 56
  • Operating System:
  • Windows 7/Server 2008 R2 Windows 7/Server 2008 R2
  • Browser:
  • Firefox 45.0 Firefox 45.0
    • View Profile
Re: Slow even with checksums disabled
« Reply #6 on: 09 April 2016, 18:57 »

I still think the checksums shouldn't be calculated when it's disabled. It would be much faster that way, and I'm pretty sure other people would benefit from it. Sometimes we don't need a deep scan...   ;)

The next time I update my set (near the end of the month, probably) I'll try keeping only 'check missing' enabled, and see how long it takes.

Thank you for your time and patience!  :)
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Mac OS X Mac OS X
  • Browser:
  • Safari 9.0 Safari 9.0
    • View Profile
Re: Slow even with checksums disabled
« Reply #7 on: 09 April 2016, 19:02 »

you might think so but you're wrong with your assumption.
...name and unneeded checks need a hash calculation, checksum check, too. if you disable the checksum check and keep the other ones enabled you don't really save time. if you disable them all you will gain some time. of course the checksum check does not recalculate a hash if the name check already calculted it.
Logged

coccola

  • Member
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 56
  • Operating System:
  • Windows 7/Server 2008 R2 Windows 7/Server 2008 R2
  • Browser:
  • Firefox 45.0 Firefox 45.0
    • View Profile
Re: Slow even with checksums disabled
« Reply #8 on: 10 April 2016, 03:52 »

For you, the 'name' and 'unneeded' checks require the checksums to do their jobs properly, to identify each file for sure.

For me, each checking option is kind of independent. For example, it's possible to check which files are unneeded without calculating checksums, just search which file names are not on the dat (while checksums are disabled).

It's not a matter of right or wrong, it's a matter of way of thinking. It also has to do with the concepts of the software, what the developer has in mind.

I wonder if I would get a message when 'name' and/or 'unneeded' checks are enabled but 'checksums' is disabled. A popup saying that checksum checking is required for these would be very nice.
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3292
  • Operating System:
  • Windows NT 10.0 Windows NT 10.0
  • Browser:
  • Chrome 49.0.2623.112 Chrome 49.0.2623.112
    • View Profile
Re: Slow even with checksums disabled
« Reply #9 on: 10 April 2016, 05:42 »

This leads to nothing...we're turning in circles. Each option *IS* independent. Each check option does what it is named. The implementation of these actions however might require something which is in common. This won't change.

"it's possible to check which files are unneeded without calculating checksums" - oh yeah great a simple name match against the database...... "puckmam.png" becomes unneeded just because it is not named "puckman.png" while the data is correct. Sure..that's a fast way of checking this...and users won't complain that all valid files get deleted...erm..wait a second..no..they do complain.

And keep one thing in mind. The hd access is the real bottle neck in here...if the diskcache is not used (like in a 2nd scan) you don't get that much difference with or without any hashcalculation. It simply takes time when Windows accesses 34800 files. We're not talking about a central dir read here but real file access.

Again, for your purposes - just a quick missing check - write and run yourself a simple script which matches the dirtree walk names against a plain list.
« Last Edit: 10 April 2016, 07:20 by Roman »
Logged
Pages: [1]   Go Up
 

Page created in 0.168 seconds with 20 queries.

anything