lethargic_man: (Default)
Lethargic Man (anag.) ([personal profile] lethargic_man) wrote2007-04-02 11:24 am

You couldn't make this up...

Somewhere back in the history of MS-DOS, or probably in the pre-Micros~1 Q-DOS stage, backslash was adopted as the directory separator character. I don't know who was responsible for this, but things would have been a lot simple if the forward slash had been chosen instead, as used in Unix (and URLs), because backslashes are typically used to encode control characters, to escape things, and in regular expressions.

The reason I bring this up is because I have a Windows filename, C:\Documents and Settings\me\Local Settings\Temp\myfile.txt, which I am trying to interpolate into another string, in Java. It should be a simple case of myString.replaceFirst ("file_here", filename). But that won't work, because backslashes have a special meaning in regular expressions. So we need to escape the backslashes. And the way we do that is to use a backslash. So, we have to replace every \ in filename with \\.

The problem with this is that all three of the above backslashes need escaping, so as to ensure they are interpreted as literal backslashes and not anything else. (This wouldn't be a problem if Java had a non-interpolating string format, like Perl's single-quoted strings.) Hence what we actually have to specify is to replace \\ in filename with \\\\.

Only that still won't work, because although \\ evaluates to \, this would be interpreted as the single backslash introducing another character, to create the likes of "\n", so we need to escape this backslashes to make sure that that doesn't happen. And the way to do that is to use backslashes. And these, of course, have to be escaped.

In summary, in order to interpolate filename into myString and get exactly the same filename in the resulting string you had in your original one, you have to use myString.replaceFirst ("file_here", filename.replaceAll ("\\\\", "\\\\\\\\").

As I said, you couldn't make this up.

(Of course, I could avoid this problem by not using String.replaceAll(), but it is the simplest way, and, like Mt Everest, it's there...)

[identity profile] pseudomonas.livejournal.com 2007-04-02 11:22 am (UTC)(link)
Just an idea, but can you refer to the slashes using character codes rather than escaped sequences?

[identity profile] lethargic-man.livejournal.com 2007-04-02 11:28 am (UTC)(link)
Possibly, but I'm sure it would work out with an even longer piece of code. As I say, this is a solved problem; I'm not looking for help here; I'm just blogging it for the rest of you to boggle at. :o)
ext_8103: (Default)

[identity profile] ewx.livejournal.com 2007-04-02 11:32 am (UTC)(link)
Yes for the string literals but not for the replacement string. So you'd end up with \ -> \\ -> \xNN\xNN instead of \ -> \\ -> \\\\, and annoying though the latter is at least you don't have to go to a character table to find out what it means, just realize that there's two levels of quoting going on and so divide by four.

[identity profile] pseudomonas.livejournal.com 2007-04-02 11:36 am (UTC)(link)
I think in this case I'd probably try and stay sane by creating a doubleEscapedBackSlash string at the top of the code so I know what I'm using it for.
ext_8103: (Default)

[identity profile] ewx.livejournal.com 2007-04-02 12:13 pm (UTC)(link)
In *this* case I'd cobble together whatever the eventual string was in some other way entirely rather than trying to do a regexp substitution into a pattern. snprintf in C or % in Python, for instance. And that's assuming that a string is the right answer; if the aim is to synthesize a command to be executed then in many interfaces you wanted some kind of list, so you don't have to worry about shell quoting rules.