Topic: Work In Progress (Read 134661 times)

Roman · « **on:** 04 July 2011, 20:13 »

Well, yes...pretty no news recently....but I'm alive and currently playing around with this:

http://mamedev.emulab.it/clrmamepro/wip_july.png

Don't ask me when it's done....not much free time these days and several things (e.g. packer support for this) to do...so don't expect anything anytime soon....just wanted to say PEEEEEP....(the names are actually coming from the MESS snes hash files......just took them for some example lines...)

(Update)
ah..nice...it works (e.g. rebuilder) already with decompressed files which I did not expect actually...The shown file and folder was done via a rebuilt

http://mamedev.emulab.it/clrmamepro/wip_july2.png

(again, just some dummy test files..ignore naming, sizes, datestamp and checksum)

Roman · « **Reply #1 on:** 06 July 2011, 21:11 »

So...maybe some more WIP....
so what does this unicode stuff mean at all?

Well, generally since we're now having Operating Systems which support UNICODE (if you don't have an updated OS, well, tough luck), Filenames can be in unicode or to make it simple in local language special characters which are not part of the plain ASCII-7 charset....

And if you can store files that way, you may want to list them in the datfiles correctly spelled.

So the steps to make this possible in cmpro is:

1) make a unicode compile of cmpro
ok...after doing some annoying _T() macro and TCHAR padding and resolving some LPTCSTR pointer conversions (if you're familar with Visual C++ you know what I mean) it finally compiled

2) fixing the common char issues
well, just by adding macros and changing pointertypes doesn't necessarily mean it works after compilation...so some post work needed to be done to fix ugly little char / char* issues

3) reading unicode text files
again some more fiddling since just by having an unicode compile doesn't mean you can read and display unicode characters correctly, but finally I managed it to do that as you see in the screenshots.
So I was positively surprised when I ran a simple rebuild on a dummy utf8 datfile and it created the folder and file in unicode characters.

So...what happened then (aka TODAY)?

Well, I've changed all writing and reading of text files to be utf8 now. Well, it reads anything (plain ascii, utf8) but writes utf8 with a BOM (ByteOrderMark...some bytes at the beginning telling you something about the used encoding). So all cmpro ini files, miss/have list, etc will now be utf8 with a BOM.

In case of XML files, you don't necessarily need an UTF8 BOM now, you can also specify the encoding="utf-8" attribute.
Had to remove the old utf8 xml handling...it's obsolete and it was actually wrong..
(Note to myself...hmm..maybe when writing XML files, don't write a BOM but add the encoding attribute....)

So...what's next?
Next step would be to hook up the latest zip library which should be able to handle utf8 encoding.
Internal zip, 7z and rar reader routines needs to be rechecked, too for character conversion.
The compressor settings for OEM2ANSI conversion should then become obsolete...
And then some cleanup and testing...

But again...time is very limited within the next week(s), so...just be patient....

Roman · « **Reply #2 on:** 11 July 2011, 20:48 »

Just some little updates:

- don't write BOM in case of writing XML files (using encoding attribute instead)
- write xml special characters as-is (hey...we're now in an unicode environment, so don't write &#xxxx; (but of course read and parse it correctly)
- hooked up 7z unicode support
- hooked up rar unicode support
- acquired latest full version of ziparchive lib (Thanks a million Tadeusz!)

So actually you can fully use it now for decompressed, 7z and rar files...wooho...
http://mamedev.emulab.it/clrmamepro/wip_july3.png (Rebuilt archives/folder)

Next steps:

- hook up latest version of ziparchive
- update internal zip to unicode which is used for in-place renames and no-compress copies
- cleanup zip setting screen, i.e. remove oem conversion, buffer and flush options...and most likely the compression level (internally use highest)...such options became pretty obsolete over the years.
- maybe check additional new features of latest ziparchive lib (>4GB zip support etc...) but that's most likely something for a future update since it most likely mean that I have to replace my internal zip routines (no-recompress/inplace rename) completely...

Again, don't know when the next steps are done (especially since I don't have any free time next week)....but full unicode support is on its way...and as you can see it's already working for decompressed, 7z'ed and rar'ed sets...

Tadaa...

Cassiel · « **Reply #3 on:** 14 July 2011, 20:28 »

Outstanding mate.... just outstanding!

Roman · « **Reply #4 on:** 14 July 2011, 21:29 »

ehehe, thanks,

Having a break of a week now due to familiy business...

Today I at least I managed to do some work on the internal/own zip routines so they can read and handle utf8 now....plus some work on the internal stuff for no-recompression copy and inplace-rename...basically it's always a conversion of the filenames from utf8 buffer to something you can display...or viceversa...This needs some more work end of next week...

cough..ok...zip copy without recompression works....next: in-place rename....cough

and then I try to hook up the new ziparchive lib.... and then we're close to something useful....

And now....breeeeeeaaaak...

Roman · « **Reply #5 on:** 24 July 2011, 21:36 »

back...
at least converted the remaining internal zip routines (repair/in place rename) to utf8...next step (besides some testing) hook up new zip lib...

Roman · « **Reply #6 on:** 25 July 2011, 21:13 »

no big news today...did some testing of the internal zip routines, some fixing here and there, removed the usage of any oem2ansi conversion, zip flush, zip buffer settings and added the latest ziparchive lib....compiles and links fine. However it seems that I need to pass some special parameters so it works fine with utf8....ok...reading docs now....

Roman · « **Reply #7 on:** 26 July 2011, 20:50 »

ok...some progress...(does somebody actually reading this diary?

?)

So....yesterday I did hook up the latest zipclass lib but today I had several issues with utf8 names in created zips...after some research I found out that I somehow added a wrong version (doooh). A clean remove and reinstall of the latest lib solved all issues (Thanks again Tadeusz!).

All issues solved? well...actually I found out another thing...there are several ways for 'standard' zipfiles to handle utf8 encoded names....one method (which I prefer) is simply store names in utf8 and hope that applications handle it correctly. Winzip (15.x), 7z (9.x) and Winrar (4.x) do...so I stick to this method...The other method is to store the filename differently and keep information about encoding in the zip structure extra fields....actually this also makes the zipfile larger...

Now that it seems that we have full utf8 support for all 3 archive types, I also cleaned up the settings->compressor screen:

* oem2ansi conversion is gone (yes...live with it...use utf8!)
* zip compression level is gone (internally 'best' is used which corresponds to '9')...I don't see any reason why this should be selectable by the user. You can now start a discussion about torrentzip or why not using bzip2 as compression method (which newer zip programs can handle)....but that's something for the bin or the future...
* zip flush option...removed...It's a relict from years ago where people had slow hds and faulty chipsets....
* zip buffer option...removed...In our days I don't think increasing this will give you a speed boost....I may try some internal testing on different values...

So....that's it for today...Next steps will be testing, testing, testing...again, I got no deadline for a release in my mind yet...

Some other remarks: Well, I'm working with Windows7 (ultimate) and it does not have any problems showing asian characters...I know form XP that you need to first install some asian-related charsets/libs so you can view such characters correctly. Windows does that for you...in your system regional settings you find a checkbox to enable asian-character-support.
Datauthors may wonder how to work with utf8 dats...well...get a good texteditor. While I'm a big fan of Textpad, I have to say that for such tasks, Notepad++ (Yes, plus plus, not the notepad.exe from your standard Windows

) is great to use since it offers you utf-8 saving/loading options...

Ok...that's it for now....I only wonder what happens if MAME.exe's -listxml prints out utf8 names on stdout and I redirect it and read it in....yay...something to test

Cassiel · « **Reply #8 on:** 26 July 2011, 21:09 »

Quote from: Roman on 26 July 2011, 20:50

ok...some progress...(does somebody actually reading this diary??)

I am!

(as is the TOSEC project as a whole)

Quote from: Roman on 26 July 2011, 20:50

You can now start a discussion about torrentzip [...]

Had to smile at that.......

f205v · « **Reply #9 on:** 27 July 2011, 11:40 »

I am too!

Simone · « **Reply #10 on:** 27 July 2011, 12:48 »

hei Roman, keep it up

oxyandy · « **Reply #11 on:** 27 July 2011, 15:09 »

For sure Roman, read all the posts.
Sounds like great progress,
will miss the low level compression setting,
for times when I want to rebuild something quickly though.

Roman · « **Reply #12 on:** 27 July 2011, 15:59 »

then disable rebuilder's recompress files option

Roman · « **Reply #13 on:** 27 July 2011, 21:36 »

so, the next entry in the diary...well...nothing really special today...

I've tested some standard zip/rar/7z if they work fine (ehehe...not that I get multibytes for ascii7 chars now....) and checked under which circumstances Winzip creates utf8 encoded files with extra-field usage...actually I wasn't able to produce one...so again...I will stick to the non-extra-field-usage method to use utf8.

I more used today's little time to align scanner's 'allow not separated bios sets' and rebuilders 'split bios sets' option...they are both named identical now (split bios sets). Also I started to remove the scanner advanced '* SysDefPath' options from the UI......they will be internally enabled if sysdefpaths are setup...which makes more sense in my opinion....checking other options as well..maybe some become obsolete to be set by the user....time will tell...

oxyandy · « **Reply #14 on:** 28 July 2011, 10:12 »

Quote

when I want to rebuild something quickly though.

Doh, I really wasn't thinking when I wrote that, of course I could just untick "Compress Files" too.
Really don't need that setting for compression, eh..
Damn, 3am posts.

Just thought..
Is something like this possible ?
To keep parent/clone relationships ?

Plus, is it possible to have a utility to clean zips of any dupe crc files ?
After all, a merged set only needs a single matching crc.

Roman · « **Reply #15 on:** 28 July 2011, 11:56 »

Enable Profiler -> Options -> parse Rom Merge Tags if you want to get rid of the dupes.
Actually I think you can force to split-merged sets if you want to avoid the removal of parent/clone relationships in case of identical names for non-identical files...but I have to check that....

Roman · « **Reply #16 on:** 28 July 2011, 20:01 »

ok...back to the main topic...utf8...

one thing I wondered about was....what happens if MAME's -listxml output prints out asian characters....and cmpro's profile points to the mame binary (ok, internally it calls MAME and redirects its output..).

...So I tested it....

1st step...picked a set (for easyness I simply used the first in the xml "005" from segag80r.c) and changed a rom name to something in chinese...saved it as utf8 without BOM (with notepad++), recompiled MAME....a -listxml output lists the asian characters..looks good...

2nd step...since cmpro will parse xml datfiles as utf8 only when they got a BOM or when the encoding is specified in the XML tag, I changed info.c to add an encoding="utf-8" attribute, recompiled MAME, -listxml shows it...fine

3rd step...let cmpro import the data directly.......the stdout redirector did a good job....it works

So...actually if anyone ever decides to update MAME to list set names, rom names, descriptions etc in the original language...fine....DO SO!

http://mamedev.emulab.it/clrmamepro/mame_utf8.png

Roman · « **Reply #17 on:** 03 August 2011, 17:29 »

well, nothing really new regarding utf8....only made a clean compile with a clean new solution setup in my Visual Studio....actually I somehow screwed up my old one and it had major problems with precompiled headers....hehehe
so..new setup...working fine...

so...I guess next week could be a good start for some testing...if you're interested, let me know your email address...can't guarantee that everyone gets a testversion .... and as I said...earliest is somewhen next week...

By the way, if you got some utf8 dats, please send them in....best would be if the crc32/sha1/sizes match MAME roms

....and the names / description / manufacturer tags etc could be something chinese/japanese/etc....

On a sidenote .... if you don't want to see a winrar window popping up when adding/deleting files, add a -ibck in the rar commandline params...

Roman · « **Reply #18 on:** 04 August 2011, 19:59 »

For those who are testing:

Keep in mind that dats (if they use non-ascii chars) need to be saved as utf8 with or without a BOM (ByteOrderMark). If you don't use a BOM, be sure that your xml datfile holds an encoding="utf-8" attribute within the xml tag at the beginning.
XML dats are prefered of course, however old style dats should work as well (when saved with BOM).
Again, I can recommend notepad++ for easy saving and editing dats...

...and don't expect feedback before next week

DopefishJustin · « **Reply #19 on:** 09 August 2011, 19:33 »

You can try e.g. fm77av.xml from MESS as an example of Japanese names in UTF-8.

Requiring an encoding declaration for UTF-8 is bogus though because the XML standard mandates UTF-8 as the default for XML documents with no encoding specified:

Quote

In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

http://www.w3.org/TR/2006/REC-xml11-20060816/#charencoding

News:

Author Topic: Work In Progress (Read 134661 times)