patch for escaped xml characters in filelist bug

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

patch for escaped xml characters in filelist bug

Yaohan Chen
A few posts reported problems with remote files containing characters such as
&. They were listed in XML escaped form, such as &, and the get file
command sent to the remote appeared to also use the escaped form, and the
remote would not be able to find the file. The following patch fixes these
problems. In short, just add

  xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT);

before the xmlParseDocument(ctxt) lines of both filelist_xml_open and
filelist_bzxml_open functions in src/xml_flist.c.


Yaohan Chen


--- microdc2-0.15.6.original/microdc2-0.15.6/src/xml_flist.c 2006-12-24
11:54:36.000000000 -0500
+++ microdc2-0.15.6/src/xml_flist.c 2007-08-31 12:13:31.000000000 -0400
@@ -469,6 +469,7 @@
         sax.endElement = end_element_callback;
 
         xmlParserCtxtPtr ctxt = xmlCreateIOParserCtxt(&sax, &state_ctxt,
read_plain_xml, NULL, &io_ctxt, XML_CHAR_ENCODING_UTF8);
+ xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT);
         xmlParseDocument(ctxt);
         xmlFreeParserCtxt(ctxt);
 
@@ -497,6 +498,7 @@
         sax.endElement      = end_element_callback;
 
         xmlParserCtxtPtr ctxt = xmlCreateIOParserCtxt(&sax, &state_ctxt,
read_bzip2_xml, NULL, &io_ctxt, XML_CHAR_ENCODING_UTF8);
+        xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT);
         xmlParseDocument(ctxt);
         xmlFreeParserCtxt(ctxt);


_______________________________________________
microdc-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/microdc-devel
Reply | Threaded
Open this post in threaded view
|

Re: patch for escaped xml characters in filelist bug

Steffen Schulz
On 070831 at 18:36, Yaohan Chen wrote:

> A few posts reported problems with remote files containing characters such as
> &. They were listed in XML escaped form, such as &, and the get file
> command sent to the remote appeared to also use the escaped form, and the
> remote would not be able to find the file. The following patch fixes these
> problems. In short, just add
>
>   xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT);
>
> before the xmlParseDocument(ctxt) lines of both filelist_xml_open and
> filelist_bzxml_open functions in src/xml_flist.c.

I applied your patch, but others still can't download files with german
umlauts or other characters outside of (it seems) the ascii set. This
may be due to utf8, which is what all my filenames are encoded with.
Other programs work fine, so my locale settings should be fine.

I described the problem in earlier mails and it sure looks like what
your patch fixes. Still, I was unable to narrow it down. Maybe you have
some ideas?

regards,
/steffen
--
       #
 (o_  #                                                +49/1781384223
 //\-x                                        gpg --recv-key A04D7875
 V_/_    Use the source, Tux!             mailto: [hidden email]


_______________________________________________
microdc-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/microdc-devel
Reply | Threaded
Open this post in threaded view
|

Re: patch for escaped xml characters in filelist bug

Yaohan Chen
Steffen Schulz,

My patch only deals with listing and requesting remote files, so it shouldn't
affect the problem with other people getting your files. I'm not sure how to
fix your problem either. But I tried downloading a file with Japanese name in
UTF-8 using LinuxDC++, and it works fine. Do you know which clients have
problem downloading your files with umlauts?


Yaohan Chen


On Monday 03 September 2007 08:48:11 am Steffen Schulz wrote:
> I applied your patch, but others still can't download files with german
> umlauts or other characters outside of (it seems) the ascii set. This
> may be due to utf8, which is what all my filenames are encoded with.
> Other programs work fine, so my locale settings should be fine.
>
> I described the problem in earlier mails and it sure looks like what
> your patch fixes. Still, I was unable to narrow it down. Maybe you have
> some ideas?




_______________________________________________
microdc-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/microdc-devel
Reply | Threaded
Open this post in threaded view
|

Re: patch for escaped xml characters in filelist bug

Yaohan Chen
Someone tried downloading the Japanese named file from me with BCDC++ on
Windows, and it worked too. Maybe your problem is due to people's clients or
their encoding configuration.


On Tuesday 04 September 2007 10:01:44 pm Yaohan Chen wrote:

> Steffen Schulz,
>
> My patch only deals with listing and requesting remote files, so it
> shouldn't affect the problem with other people getting your files. I'm not
> sure how to fix your problem either. But I tried downloading a file with
> Japanese name in UTF-8 using LinuxDC++, and it works fine. Do you know
> which clients have problem downloading your files with umlauts?
>
>
> Yaohan Chen
>
> On Monday 03 September 2007 08:48:11 am Steffen Schulz wrote:
> > I applied your patch, but others still can't download files with german
> > umlauts or other characters outside of (it seems) the ascii set. This
> > may be due to utf8, which is what all my filenames are encoded with.
> > Other programs work fine, so my locale settings should be fine.
> >
> > I described the problem in earlier mails and it sure looks like what
> > your patch fixes. Still, I was unable to narrow it down. Maybe you have
> > some ideas?




_______________________________________________
microdc-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/microdc-devel
Reply | Threaded
Open this post in threaded view
|

Re: patch for escaped xml characters in filelist bug

Steffen Schulz
In reply to this post by Yaohan Chen
Chen,


On 070905 at 04:02, Yaohan Chen wrote:
> My patch only deals with listing and requesting remote files, so it shouldn't
> affect the problem with other people getting your files. I'm not sure how to
> fix your problem either. But I tried downloading a file with Japanese name in
> UTF-8 using LinuxDC++, and it works fine. Do you know which clients have
> problem downloading your files with umlauts?


Sorry for the late answer.

I'm using dc daily, with many users and diffent(mostly windows)
clients. Valknut works perfectly with them, but they all have a problem
downloading 'special' files from microdc2.

My locales are set to UTF-8, as are the filenames I create. Microdc2
has set the fs charset to UTF8.

Now I create a file with an umlaut in its name and share it. This is
what I get in microdc2' filelist and also in the filelists other
clients download from my me:

00000000  3c 46 69 6c 65 20 4e 61  6d 65 3d 22 74 c3 a4 73  |<File Name="t..s|
00000010  74 22 20 53 69 7a 65 3d  22 34 22 3e 3c 2f 46 69  |t" Size="4"></Fi|
00000020  6c 65 3e 0a                                       |le>.|

74 c3 a4 73 should be the correct utf8 representation.
A local 'ls|hexdump' confirms this.

However, when I try to download this file with a different microdc2 or
valknut, microdc2 produces a File not Available-Error. I had no luck
identifying the bug, but here is a starting point:

main.c, line 713:

msgq_get(uc->get_mq, MSGQ_INT, &id, MSGQ_INT, &type, MSGQ_STR, &remote_file, MSGQ_END);
local_file = resolve_upload_file(uc->info, type, remote_file, &flag, &size);

warn(_("\nXXX %s XXX %s XXX\n\n"), local_file, remote_file);
int i;
for (i=0;i<strlen(remote_file); i++)
    warn(_("%x "), remote_file[i]);


Produces this output when I request my täst-file:

XXX (null) XXX /downloads/täst XXX

2f 64 6f 77 6e 6c 6f 61 64 73 2f 74 ffffffe4 73 74


0xe4 is latin9 representation of ä, not UTF. As this works with other
clients, There must be some wrong charset conversion on remote_file
somewhere..


/steffen
--
No matter where you are... Everyone is always connected.
                                - Serial Experiments Lain


_______________________________________________
microdc-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/microdc-devel
Reply | Threaded
Open this post in threaded view
|

Re: patch for escaped xml characters in filelist bug

Steffen Schulz
On 070911 at 13:10, Steffen Schulz wrote:
> XXX (null) XXX /downloads/täst XXX
>
> 2f 64 6f 77 6e 6c 6f 61 64 73 2f 74 ffffffe4 73 74

I forgot: This works when I set 'remote encoding' to UTF8 in the
filelist browser window of valknut. But such modifications are usually
not needed. This probably means that the dc client is doing a
conversion to latin9 before requesting the file.

On the other hand, valknut can download files with e.g. chinese
characters in their name. So there may be some general conversion rule
that doesn't break non-latin9, no?

/steffen
--
If you aren't remembered, then you never existed.
                                - Serial Experiments Lain


_______________________________________________
microdc-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/microdc-devel