Quoted strings are not scanned correctly with /caseinsensitive

Mar 19, 2012 at 4:18 PM
Edited Mar 19, 2012 at 4:19 PM

Hi all,

Given the following example, running this scanner does not scan the quoted text with caseinsensitive option enabled. The text is scanned correctly when i disable the option. Is this expected behavior?

 

%namespace LexScanner
%option noparser nofiles caseinsensitive

string	\"[^\"]*\"

%%
{string}	Console.WriteLine("string: " + yytext);

%%

    public static void Main(string[] argp) { 
		var foo = "\"a quoted string\"";
		
		Console.WriteLine("Scanning '" + foo + "'");
		
		Scanner scnr = new Scanner();
		scnr.SetSource(foo, 0);
		scnr.yylex();
    }

 

I'm running GPLEX version: 1.1.6

Regards

Coordinator
Mar 21, 2012 at 10:46 AM

Hi swiebertje.  Yes, it looks like a bug.  I will check it out and get back when I have a fix.

John

Coordinator
Mar 22, 2012 at 4:19 AM

swiebertje

Ok, so this was a problem with the processing of inverted sets in the /caseinsensitive case. A couple of other small issues were fixed at the same time, and the updated sources are in the repository.  I will generate a new distribution within the next few days.

There were actually two problems:  character sets were always broken if all of the following were true - /caseinsensitive option, set was inverted [^...], character set was 8-bit (/nounicode option).  This is why your example is broken even though there is no alphabetic character in \"[^\"]*\".  The second problem was that the conversion of any such set to the case insensitive form must be performed *before* the set inversion is processed. 

Thank you very much for giving me a nice short example to beat on!

John

Mar 22, 2012 at 2:09 PM

Hi John.

Thanks for the fast fix and the effort you're putting in this nice project!

Regards