EMULAB Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

The new forum is online, hope you enjoy it!

Pages: [1]   Go Down

Author Topic: Suggestion: Better Uncompressed Set Support  (Read 8685 times)

NvrBst

  • Karma: 0
  • Offline Offline
  • Posts: 4
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 6.0.472.55 Chrome 6.0.472.55
    • View Profile
Suggestion: Better Uncompressed Set Support
« on: 09 September 2010, 23:18 »

I find uncompressed sets pretty tedious to work with in cmp.  Can I make the follow suggestions.

1. Optimize the CRC32 calculation.
   Rational: If I have a single 4GB file set and want to just verify with a "New Scan" (SHA/MD5 is turned off) then it takes about 40 seconds for cmp to finish the "New Scan".  If I calculate the CRC with the following public program: http://www.codeproject.com/KB/recipes/crc32.aspx  (using dynamic assembly) the crc32 generation of the 4GB file takes 15 seconds.  cmpro seems to take between 2x-3x the time.
   Possible Bug: Even if I uncheck the "Checksums" checkbox on the scan window, the "New Scan" button seems to still check checksums.  Is this intended?

2. If possible use a "Move" operation when rebuilding for uncompressed sets instead of "Copy".
   Rational: When rebuilding the set mentioned above it takes ~40 seconds to crc32 then about 60 seconds to complete the copy operation.  This is extrmely long.  If possible you should be able to check if this is "a) uncompressed, & b) This rom isn't used at another place in the set, & c) Removed matched source files is checked" then it should be safe to issue a move operation instead of a copy, else, do what it normally does.


With just the above two suggestions rebuilding uncompressed sets could be 4x-5x faster, and "new scan" uncomperssed sets should be at least 2x faster.  Also users should be able to "new scan" a folder with the "Checksums" checkbox unchecked and still be able to check for incorrect sizes/names/etc (if the bug above is intended)?

3. Port the "No SetFolder for decompressed sets" from the rebuilder window to the scan window.
   Rational: If cmpro supports building this file-structure, cmpro should support scanning this file structure as well.

However, with "1" and "2" both implemented "3" may not be as tedious to do manually with a full rebuild whenever you want to clean it.  But being able to scan this structure would be the most ideal solution.

Thanks.  Any other improvments you can think of for uncompressed sets would be welcome too as some of us do use it ;)
Logged


Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3294
  • Operating System:
  • Mac OS X Mac OS X
  • Browser:
  • Safari 4.0.5 Safari 4.0.5
    • View Profile
Re: Suggestion: Better Uncompressed Set Support
« Reply #1 on: 10 September 2010, 05:01 »

The crc32 function from zlib is used at various points. for example unneeded or name checks and of course during checksum check. i will see if that can be cached.

i will check if a move operation can be used during rebuild. i remember there are already some but iirc they are used for archives.

no, the scanner won't get this rebuilder no set folder option. scanner strictly follows the standard storing method rompath/setname/filename. use a different dat if you want to have all files in one set.
« Last Edit: 10 September 2010, 18:12 by Roman »
Logged

NvrBst

  • Karma: 0
  • Offline Offline
  • Posts: 4
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 6.0.472.55 Chrome 6.0.472.55
    • View Profile
Re: Suggestion: Better Uncompressed Set Support
« Reply #2 on: 10 September 2010, 19:25 »

Sounds good.  If possible though I'd still suggest benchmarking the zlib crc32 func ya using to a faster assembly version.  If what you say is true and the 40sec "new scan" is calculating crc32 multiple times then maybe it is okay *with caching* but if you find out your zlib version is even 25% slower than a custom made crc32, that is 25% faster cmpro ;).

Zlibs crc32 only advantage is it stability, but, crc32 is simple *easy to make stable*; using a custom fast assembly version *or at least giving people an option to use an assembly version in say the settings window* could bring massive speedups to cmpro, especially to uncompressed sets.


Also you mentioned other options like 'name checks' need to calculate crc32? (I assume calculates crc32 and checks to make sure name matches the entry in the dat with that crc32).  As an additional suggestion, the other checks * theoretically* shouldn't need crc32; aka if the current "set\filename" is not in the dat it failed the name check; should be able to scan dat for "set" and then "name" as easily as it scans for "crc".  Unneeded file (with checksum button unchecked) would simply check "filesize"&"set\name" (items always instantly available) to determine if it passed or not.  I aggree it wont be as accurate, and cmpro may move valid items to the backup folder that were say only renamed (etc), but, if someone manually unchecks "checksums" on the scan window they are already assuming no hashs are being calculated/used during the scan.


Being able to do a new scan and still check/fix all the scan options (without calculating a single hash) would be very ideal for keeping uncompressed sets up-to-date as crc32 isn't readily available like in compressed sets.
« Last Edit: 10 September 2010, 19:35 by NvrBst »
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 113
  • Offline Offline
  • Posts: 3294
  • Operating System:
  • Mac OS X Mac OS X
  • Browser:
  • Safari 4.0.5 Safari 4.0.5
    • View Profile
Re: Suggestion: Better Uncompressed Set Support
« Reply #3 on: 10 September 2010, 21:15 »

if you got a fully wrong named file, you need a hash calc to find its real name.
regarding a faster crc32... well I will see what I find after my holiday.
Logged
Pages: [1]   Go Up
 

Page created in 0.18 seconds with 19 queries.

anything
anything