lethargic_man: (Default)
[personal profile] lethargic_man
Somewhere back in the history of MS-DOS, or probably in the pre-Micros~1 Q-DOS stage, backslash was adopted as the directory separator character. I don't know who was responsible for this, but things would have been a lot simple if the forward slash had been chosen instead, as used in Unix (and URLs), because backslashes are typically used to encode control characters, to escape things, and in regular expressions.

The reason I bring this up is because I have a Windows filename, C:\Documents and Settings\me\Local Settings\Temp\myfile.txt, which I am trying to interpolate into another string, in Java. It should be a simple case of myString.replaceFirst ("file_here", filename). But that won't work, because backslashes have a special meaning in regular expressions. So we need to escape the backslashes. And the way we do that is to use a backslash. So, we have to replace every \ in filename with \\.

The problem with this is that all three of the above backslashes need escaping, so as to ensure they are interpreted as literal backslashes and not anything else. (This wouldn't be a problem if Java had a non-interpolating string format, like Perl's single-quoted strings.) Hence what we actually have to specify is to replace \\ in filename with \\\\.

Only that still won't work, because although \\ evaluates to \, this would be interpreted as the single backslash introducing another character, to create the likes of "\n", so we need to escape this backslashes to make sure that that doesn't happen. And the way to do that is to use backslashes. And these, of course, have to be escaped.

In summary, in order to interpolate filename into myString and get exactly the same filename in the resulting string you had in your original one, you have to use myString.replaceFirst ("file_here", filename.replaceAll ("\\\\", "\\\\\\\\").

As I said, you couldn't make this up.

(Of course, I could avoid this problem by not using String.replaceAll(), but it is the simplest way, and, like Mt Everest, it's there...)

Date: 2007-04-02 11:16 am (UTC)
From: [identity profile] pseudomonas.livejournal.com
I don't know about Java, but in perl you can use forward-slashes regardless of the OS and they'll be automagically fixed. so open SYS, 'C:/Windows/system/foo.txt' will do what you mean. I'm not sure what this does to MS filenames that *contain* forward-slashes as part of the name.

Date: 2007-04-02 11:20 am (UTC)
From: [identity profile] lethargic-man.livejournal.com
I strongly suspect the same applies to Java, but that's no help when you're munging user-inputted data containing backslashes.

Date: 2007-04-02 11:22 am (UTC)
From: [identity profile] pseudomonas.livejournal.com
Just an idea, but can you refer to the slashes using character codes rather than escaped sequences?

Date: 2007-04-02 11:28 am (UTC)
From: [identity profile] lethargic-man.livejournal.com
Possibly, but I'm sure it would work out with an even longer piece of code. As I say, this is a solved problem; I'm not looking for help here; I'm just blogging it for the rest of you to boggle at. :o)

Date: 2007-04-02 11:32 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Yes for the string literals but not for the replacement string. So you'd end up with \ -> \\ -> \xNN\xNN instead of \ -> \\ -> \\\\, and annoying though the latter is at least you don't have to go to a character table to find out what it means, just realize that there's two levels of quoting going on and so divide by four.

Date: 2007-04-02 11:36 am (UTC)
From: [identity profile] pseudomonas.livejournal.com
I think in this case I'd probably try and stay sane by creating a doubleEscapedBackSlash string at the top of the code so I know what I'm using it for.

Date: 2007-04-02 12:13 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
In *this* case I'd cobble together whatever the eventual string was in some other way entirely rather than trying to do a regexp substitution into a pattern. snprintf in C or % in Python, for instance. And that's assuming that a string is the right answer; if the aim is to synthesize a command to be executed then in many interfaces you wanted some kind of list, so you don't have to worry about shell quoting rules.

Date: 2007-04-02 11:34 am (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
I believe that it was in DOS 2, and that the reason was that they'd already used / as an option character (that being the convention in the CP/M world they were rooted in). It was, indeed, a horrendous decision, and has caused much more pain since than changing the option character to (for instance) the UNIX convention of '-' would have done.

Profile

lethargic_man: (Default)
Lethargic Man (anag.)

May 2025

S M T W T F S
    123
45678910
11121314151617
181920212223 24
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Monday, June 16th, 2025 12:58 am
Powered by Dreamwidth Studios