EMULAB Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

The new forum is online, hope you enjoy it!

Pages: [1]   Go Down

Author Topic: XML dats and UTF support  (Read 5327 times)

Diabol

  • Karma: 0
  • Offline Offline
  • Posts: 6
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
XML dats and UTF support
« on: 19 January 2011, 07:26 »

Hi!

I use clrmamepro to create some XML (I need UTF support) DATs. I see that clrmamepro is using numerical HTML encoding
to represent a Unicode character and at least one character seems to be handled wrong. I'm talking about ł. If I use it while I create a DAT (I use it in the "Author" field in the Dir2Dat section) everything looks good. I load the DAT, click on "Show Info" and I see the ł displayed correctly. However If you open the DAT in a text editor you'll see that "ł" is represented by ³ string instead of &#322. Everywhere else this string is displayed as ³ This is a problem when you use DATs with other tools. Is this simply a mistake or is there something else I should know? I'd be very grateful if someone could explain that.

Thank you,

Diaboł
Logged


Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Windows XP Windows XP
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #1 on: 19 January 2011, 07:57 »

Be sure you're using an xml dat with

<?xml version="1.0" encoding="UTF-8"?>

at the beginning...then UTF8 is supported. If you still have issues with that one char, send me the dat.
Logged

Diabol

  • Karma: 0
  • Offline Offline
  • Posts: 6
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #2 on: 19 January 2011, 09:03 »

I generate my DATs using clrmpro only and it doesn't insert  "<?xml version="1.0" encoding="UTF-8"?>" line at the beginning. I'll sent you the DAT right now.
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Windows XP Windows XP
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #3 on: 19 January 2011, 09:56 »

Yes and that's intended. By default there is no utf-8 encoding.
What character are you actually trying to show? are we talking about the superscript 3 character? As in cubic meters...
Logged

Diabol

  • Karma: 0
  • Offline Offline
  • Posts: 6
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #4 on: 19 January 2011, 11:18 »

Ups... that's a bad news. I was talking about "ł" character, but I really need them all (all from Latin). I find these characters in filenames and I need to preserve them all. Is it possible with clrmamepro?
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Windows XP Windows XP
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #5 on: 19 January 2011, 11:34 »

What's bad news? that you have to manually add the encoding attribute to the generated xml dat?

when not using xml dats you run into issues with non standard characters anyway since they differ from codepage to codepage and you have to toggle the oem/ansi conversion for archives...in other words....don't use non standard chars ;)

With utf-8 encoded dats (and you need to specify the encoding) you should at least be able to have datfile xmls which can be read in the whole world without caring about different characters.

When I'm opening your datfile is see a superscripted 3 character and no 'sword' / 'cross' character as in your posts. So...before we can test something, we first have to define...what do you want to see ;)
Logged

Diabol

  • Karma: 0
  • Offline Offline
  • Posts: 6
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #6 on: 19 January 2011, 12:31 »

With utf-8 encoded dats (and you need to specify the encoding) you should at least be able to have datfile xmls which can be read in the whole world without caring about different characters.

When I'm opening your datfile is see a superscripted 3 character and no 'sword' / 'cross' character as in your posts. So...before we can test something, we first have to define...what do you want to see ;)

That's what I'm talking about. There should be no superscripted 3 character. You should see &#322; If I generate a DAT and insert <?xml version="1.0" encoding="UTF-8"?> manually after the DAT is created superscripted 3 won't change to &#322; ( ł ). When I make a DAT I put "Diaboł" in the "Author field" so when the DAT is done there should be "Diabo&#322;" not "Diabo&#179;"

"Diabo&#322;" --> "Diaboł"
"Diabo&#179;" --> "Diabo³"

Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Windows XP Windows XP
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #7 on: 19 January 2011, 14:06 »

Well, mainly cmpro is not a dat generator ;) There are other tools which do a better job...however I will have a look at it...
Logged

Diabol

  • Karma: 0
  • Offline Offline
  • Posts: 6
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #8 on: 19 January 2011, 14:16 »

Please do! I use this tool for a long time and I really like it. The UTF support is essential for me to preserve original file names inside (zip) archives and some other info included in a set (zip) name. I'll give you all the info you may need.

I'm not sure if this is important, but If I load a DAT created by Diaboł  :P into clrmamepro and click on Show Info the name is displayed correctly :)
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Windows XP Windows XP
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #9 on: 19 January 2011, 14:40 »

It's not shown correctly ;)

It simply takes your code page and for YOU it looks correctly. When I load it (DE country page), it shows the superscript 3.

utf8 characters inside filenames inside a zip is a totally different story.....all I say for that: Good Luck ;)
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Windows XP Windows XP
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #10 on: 20 January 2011, 08:33 »

ok I had a quick look at the source
...first of all utf-8 support is used when the encoding attribute lists "utf-8" (lowercase)...this will enable ut8 conversion for all texts but special characters should be set as multibyte chars. Encoding via &#x123; will only work correct for byte size not word size values.

guess one day I should rewrite cmpro with full unicode support...
Logged

Diabol

  • Karma: 0
  • Offline Offline
  • Posts: 6
  • Operating System:
  • Windows 7 Windows 7
  • Browser:
  • Chrome 8.0.552.237 Chrome 8.0.552.237
    • View Profile
Re: XML dats and UTF support
« Reply #11 on: 22 January 2011, 13:11 »

Hmm... so it looks like full UTF support is not a priority now if I understand correctly. Thanks for your answer.
Logged

Roman

  • Global Moderator
  • Member
  • ***
  • Karma: 96
  • Online Online
  • Posts: 2915
  • Operating System:
  • Mac OS X Mac OS X
  • Browser:
  • Safari 5.0.2 Safari 5.0.2
    • View Profile
Re: XML dats and UTF support
« Reply #12 on: 22 January 2011, 20:27 »

there is actually nothing with priority in cmpro.. I got very limited time and when I find some something gets added. cant tell you whats next... theres a huge list of all kind of requests and something is picked.... currently it's more the is-mechanical changes in mame or chd rebuild. nothing is forgotten.... but something may come later or even later...
Logged
Pages: [1]   Go Up
 

Page created in 0.178 seconds with 21 queries.

anything