What filename characters does Mac OS X support?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

What filename characters does Mac OS X support?

Ben Escoto
I'm trying to make rdiff-backup a little friendlier on Mac OS X now.
Does anyone know offhand what characters are allowable in a Mac OS X
filename?

I'm asking because I need to figure out exactly when we don't need to
quote characters.  As long as the source directory is non-empty (and
has a filename that has a letter in it), then we can tell whether that
directory is case-sensitive.  But it's harder to tell whether or not a
source (and thus read-only) directory supports characters like a colon
or a backslash.

So if Mac OS X supports everything except case sensitivity (and '/',
and NULL).  But if there is other support missing things could become
more complicated.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Kevin Horton
On 20 Oct 2005, at 21:26, Ben Escoto wrote:

> I'm trying to make rdiff-backup a little friendlier on Mac OS X now.
> Does anyone know offhand what characters are allowable in a Mac OS X
> filename?
>
> I'm asking because I need to figure out exactly when we don't need to
> quote characters.  As long as the source directory is non-empty (and
> has a filename that has a letter in it), then we can tell whether that
> directory is case-sensitive.  But it's harder to tell whether or not a
> source (and thus read-only) directory supports characters like a colon
> or a backslash.
>
> So if Mac OS X supports everything except case sensitivity (and '/',
> and NULL).  But if there is other support missing things could become
> more complicated.

At the GUI level, it seems that OS X allows any character except a  
colon and NULL.  "/" characters are legal, but the GUI translates  
them to ":" behind the scenes at the unix level.  So a file named  
"crazy/name.txt" at the GUI level is actually named "crazy:name.txt"  
at the unix level.

This strange system of translating "/" to ":" is due to the fact that  
OS 9 and earlier used ":" as the path specifier - the equivalent of  
"/" in unix, or "\" in DOS.  "/" was legal in filenames in OS 9.  
When OS X came along, they needed a way to handle users who had  
legacy files with a "/" in them, so they came up with this scheme to  
translate them to ":".

At the unix level where rdiff-backup works, every character appears  
to be legal except "/" and NULL.

The above info is based on what the OS X Help system says, and what I  
can find on Google.  It is not guaranteed to be 100% complete, but I  
am 99.9% sure it is good info.

http://docs.info.apple.com/article.html?path=Mac/10.4/en/mh552.html
http://lists.seas.upenn.edu/pipermail/unison-hackers/2005-March/ 
000006.html

Kevin Horton
Ottawa, Canada




_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Carsten Lorenz
In reply to this post by Ben Escoto
Ben Escoto wrote:

>I'm asking because I need to figure out exactly when we don't need to
>quote characters.  As long as the source directory is non-empty (and
>has a filename that has a letter in it), then we can tell whether that
>directory is case-sensitive.  But it's harder to tell whether or not a
>source (and thus read-only) directory supports characters like a colon
>or a backslash.
>
There are different filesystems coming with OS X Tiger:
Mac OS Extended alias HFS+ (with and without journaling)
Mac OS Extended case sensitive alias HFSx (with and without journaling)
UNIX File System alias UFS

I think that it can also access all Windows filesystems but NTFS can
only been read.

The normally used HFS+ filesystem is not case-sensitive, but it is
case-preserving.
The new HFSx filesystem is case-sensitive.

Optimal would be to check source and destination and quote only if the
source is case-sensitive and the destination is only case-preserving.

Carsten


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Alastair Rankine
In reply to this post by Ben Escoto
On 21/10/2005, at 11:26 AM, Ben Escoto wrote:

> I'm trying to make rdiff-backup a little friendlier on Mac OS X now.
> Does anyone know offhand what characters are allowable in a Mac OS X
> filename?

Technote 1150 describes the HFS Plus volume format in great detail:
http://developer.apple.com/technotes/tn/tn1150.html

In particular, there's a table with a list of illegal characters  
here: http://developer.apple.com/technotes/tn/tn1150table.html

> I'm asking because I need to figure out exactly when we don't need to
> quote characters.  As long as the source directory is non-empty (and
> has a filename that has a letter in it), then we can tell whether that
> directory is case-sensitive.  But it's harder to tell whether or not a
> source (and thus read-only) directory supports characters like a colon
> or a backslash.

Maybe it's because I'm new to rdiff-backup, but I can't understand  
why you need to determine the capabilities of the source file system?



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Ben Escoto
In reply to this post by Carsten Lorenz
>>>>> "Carsten Lorenz" <[hidden email]>
>>>>> wrote the following on Fri, 21 Oct 2005 10:30:55 +0200
>
> Optimal would be to check source and destination and quote only if the
> source is case-sensitive and the destination is only case-preserving.

Yes, that is the plan.  I thought there might be a complication
because we might have needed to quote ':' or whatever even if both
systems were case insensitive, but it looks like that's not the case.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Ben Escoto
In reply to this post by Alastair Rankine
>>>>> Alastair Rankine <[hidden email]>
>>>>> wrote the following on Sat, 22 Oct 2005 08:29:26 +1000

> Technote 1150 describes the HFS Plus volume format in great detail:
> http://developer.apple.com/technotes/tn/tn1150.html
>
> In particular, there's a table with a list of illegal characters  
> here: http://developer.apple.com/technotes/tn/tn1150table.html

Thanks for the references.  Unfortunately they're in unicode and I
don't know enough to translate them to ascii offhand.  Kevin Horton's
message suggests that all the standard unix characters should be fine
though.

If anyone wants to be more precise about this, by looking at
Alastair's table and translating it into normal unix calls (which deal
with C char *'s), let me know what you come up with.

> Maybe it's because I'm new to rdiff-backup, but I can't understand
> why you need to determine the capabilities of the source file
> system?

Under the old system we didn't check the source, just the destination
(as in your scheme).  This worked ok, but led to unnecessary quoting.
For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
quote all uppercase characters.

If we determine the capabilities of the source too, then we can quote
only what needs to be.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Alastair Rankine
On 22/10/2005, at 12:35 PM, Ben Escoto wrote:

In particular, there's a table with a list of illegal characters  


Thanks for the references.  Unfortunately they're in unicode and I
don't know enough to translate them to ascii offhand.  Kevin Horton's
message suggests that all the standard unix characters should be fine
though.

Ben, I don't know what you mean by "translate [unicode characters] to ascii"? This just isn't possible, but perhaps you mean translate these characters to UTF-8 (ie char * in C)? In which case you should look at the "encode" python string methods, and/or the libiconv C library.

However: After some further investigation I'm not entirely sure you need to worry about that table of illegal unicode characters I quoted earlier. I just ran the following experiment:

#!/usr/bin/python
# -*- coding: utf-8 -*-
open( u"é composed char", "w").close()
open( u"\u00e9 escaped composed", "w").close()
open( u"\u0065\u0301 escaped decomposed", "w").close()

This resulted in the é character being successfully inserted into each of the three output filenames. (I'd include output of "ls" here, but it doesn't seem to be unicode aware). So even though U+00E9 is explicitly designated as an illegal character by the filesystem specification, it looks like the OS is silently taking care of the required decomposition into the U+0065, U+0301 sequence on disk.

So although it is an issue *on disk* for some unicode characters to be decomposed, in reality it doesn't seem to make any difference - the OS takes care of the correct on-disk representation. Interestingly, the OS seems to be re-composing the decomposed characters when reading them from disk:

>>> os.listdir(u".")
[u'e\u0301 composed char', u'e\u0301 escaped composed', u'e\u0301 escaped decomposed']

This is not important for rdiff-backup, just an interesting aside.

Anyway, it seems that any of the unicode character set is usable in MacOS X filenames.

\Maybe it's because I'm new to rdiff-backup, but I can't understand
why you need to determine the capabilities of the source file
system?

Under the old system we didn't check the source, just the destination
(as in your scheme).  This worked ok, but led to unnecessary quoting.
For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
quote all uppercase characters.

I'm sorry I still don't get it. If the destination filesystem is case *preserving* (which in this case it is), surely this removes the need for unnecessary quoting?

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Kevin Horton
On 23 Oct 2005, at 01:56, Alastair Rankine wrote:

> On 22/10/2005, at 12:35 PM, Ben Escoto wrote:
>> Under the old system we didn't check the source, just the destination
>> (as in your scheme).  This worked ok, but led to unnecessary quoting.
>> For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
>> quote all uppercase characters.
>
> I'm sorry I still don't get it. If the destination filesystem is  
> case *preserving* (which in this case it is), surely this removes  
> the need for unnecessary quoting?

The HFS+ file system is case preserving, but case insensitive.  E.g.  
a file name "SomeFile" will overwrite a file named "somefile", so  
these two filenames cannot coexist.

Imagine that the source has a file system that is case sensitive, so  
it could have both those file names.  If the user does a backup onto  
an HFS+ volume we have a problem unless we somehow deal with the case  
issue.  Rdiff-backup deals with this by quoting the upper case  
characters.  However, it does it in some situations where it is not  
necessary, i.e. if both the source and destination are HFS+.

Kevin Horton
Ottawa, Canada




_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Carsten Lorenz
In reply to this post by Ben Escoto
Ben Escoto wrote:
Alastair Rankine [hidden email]
wrote the following on Sat, 22 Oct 2005 08:29:26 +1000
            

  
Technote 1150 describes the HFS Plus volume format in great detail:
http://developer.apple.com/technotes/tn/tn1150.html

In particular, there's a table with a list of illegal characters  
here: http://developer.apple.com/technotes/tn/tn1150table.html
    

Thanks for the references.  Unfortunately they're in unicode and I
don't know enough to translate them to ascii offhand.  Kevin Horton's
message suggests that all the standard unix characters should be fine
though.

If anyone wants to be more precise about this, by looking at
Alastair's table and translating it into normal unix calls (which deal
with C char *'s), let me know what you come up with.
  
The table is a list of characters that are allowed to be represented in different ways.

Just some tests:

The lowest mention character in this table is 0x00C0 which is an ?. And if you enter "touch \300" the answer is "touch: ?: Invalid argument".
It must be replaced by the other Unicode coding 0x0041 and 0x0300 which is A and `.

next test:
0x00B5 which is ? isn't mentioned. So it isn't replaced.

Since this is all 16bit, a function may have converted it to UTF8 to get 8bit (C char).
Then ? is represented by 0x41 0xCC 0x80 and ? becomes 0xC2 0xB5.

The codings of ASCII, Unicode and UTF8 are equal for characters <0x80. This characters aren't converted (accept that Unicode is 16bit).

Carsten

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Ben Escoto
In reply to this post by Alastair Rankine
>>>>> Alastair Rankine <[hidden email]>
>>>>> wrote the following on Sun, 23 Oct 2005 15:56:53 +1000
>
> Ben, I don't know what you mean by "translate [unicode characters] to  
> ascii"? This just isn't possible, but perhaps you mean translate  
> these characters to UTF-8 (ie char * in C)? In which case you should  
> look at the "encode" python string methods, and/or the libiconv C  
> library.

Well I was just hoping to remain ignorant of unicode.  Regardless of
what the unicode descriptions of the files are, if the files can be
processed with standard unix functions like open(char *, int) then
their filenames get represented as a collection of bytes (which I
improperly called ascii).

So I was hoping to deal with filenames just as byte arrays, and not
worry what they represent and if they are unicode or whatever.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Ben Escoto
In reply to this post by Carsten Lorenz
>>>>> "Carsten Lorenz" <[hidden email]>
>>>>> wrote the following on Mon, 24 Oct 2005 15:40:30 +0200
>
> The lowest mention character in this table is 0x00C0 which is an À. And
> if you enter "touch \300" the answer is "touch: À: Invalid argument".
> It must be replaced by the other Unicode coding 0x0041 and 0x0300 which
> is A and `.

Hmm so the quoting behavior in 1.1.0 isn't foolproof, in that there
could be some filesystem that was case insensitive, yet still
supported a filename of À ("\300").  If you tried to back this
filesystem up to a HFS+ system, then rdiff-backup wouldn't quote at
all, and would barf on the À files.

I don't think there is a read-only way to test for this stuff though,
and in practice I don't know if this will ever come up.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What filename characters does Mac OS X support?

Alastair Rankine
In reply to this post by Ben Escoto
On 25/10/2005, at 2:29 PM, Ben Escoto wrote:

> So I was hoping to deal with filenames just as byte arrays, and not
> worry what they represent and if they are unicode or whatever.

UTF-8 lets you do that - as long as you don't make the assumption  
that one byte == one character. ASCII characters take one byte, but  
when you get into the more esoteric parts of the unicode character  
set, it can take up to six (I think, at least four) bytes to  
represent a single character.



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki