case-insensitive scanner

Nov 27, 2009 at 10:57 PM

I saw the following on the documentation page of gplex: "There is a feature request for simple ways to generate case-insensitive scanners."

and that it will be addressed in Q4, 2009. Do you have a date on that? Are you still on target?

 

In the mean time, how would I make a case-insensitive scanner? My language spec requires it, and I can't figure out anyway to do it with gplex.

Any work around would be greatly appreciated. 1000s of old scripts depend on case-insensitivity. I'm willing to hack to work around this.

 

 

So far gplex and gppg have been great tools for my language design. Thanks for a great, useful tool.

Coordinator
Dec 7, 2009 at 12:05 AM
CodeMoniker wrote:

I saw the following on the documentation page of gplex: "There is a feature request for simple ways to generate case-insensitive scanners."

and that it will be addressed in Q4, 2009. Do you have a date on that? Are you still on target?

 

In the mean time, how would I make a case-insensitive scanner? My language spec requires it, and I can't figure out anyway to do it with gplex.

Any work around would be greatly appreciated. 1000s of old scripts depend on case-insensitivity. I'm willing to hack to work around this.

 

 

So far gplex and gppg have been great tools for my language design. Thanks for a great, useful tool.

 I am still expecting to release a case-insensitive option in December.  I will post experimental changes to the code repository within the next few days.  There is no quick hack that fixes it, short of making your lex file case-agnostic.  For non-ASCII specs that is a horrible task.

Coordinator
Dec 13, 2009 at 5:50 PM

Ok.  The code repository has experimental code for case insensitivity.  The way it works is:  You use your existing lex file and use the /codeinsensitive command line flag, or option inside the source.

GPLEX constructs an automaton that is case insensitive.  However yytext contains the actual untransformed characters that were read from input.

What are the drawbacks?  Case insensitivity is an incomputable problem at its heart, given that for some languages a single character in one case transforms into two or more characters in the alternative case, or has different choices depending on where the character is in the word.  The current rule that GPLEX uses is the that notion of ToUpper and ToLower are defined by the culture setting of the MACHINE ON WHICH GPLEX RUNS.  This may not be the same as the culture setting on the machine on which the generated scanner runs.

However, if you want to create a scanner for an ASCII character set or simple cases of Latin languages then it should work fine.

As usual, post any bugs to the issues tab.