project summary
|
|
JFlex - Frequently Asked Questions
|
-
Can I use my old JLex specifications with JFlex?
Yes. You usually can use them unchanged. See section [porting
from JLex] of the manual for more information on that topic.
-
Where can I get the latest version of JFlex?
-
What platforms does JFlex support?
JFlex is written with Sun's JDK 1.1 and produces JDK 1.1 compatible code.
It should run on any platform that supports JRE/JDK 1.1 or above.
-
Can I use CUP and JFlex together?
-
Can I use the generated code of my JFlex specification commercially?
You can use your generated code without restriction. See the file copyright for more
information.
-
I want my scanner to read from a network byte stream or from interactive stdin. Can I do this with JFlex?
This actually depends on the syntax of the input you want to read. The problem is, that for some
expressions the scanner needs one character of lookahead to decide which action to execute.
For interactive use and network streams this is very inconvenient, because the stream
doesn't send an EOF (or any other data) when the user stops typing while the scanner just waits
for the next character and doesn't return a symbol. Since version 1.1.1 of JFlex this problem
can be avoided because of a little more analysis at generation time. Take a look at these three
rules:
1. a
2. a*
3. ";"
When the scanner has read one a, an additional input character is needed to decide,
if this matches rule 1 (just one a) or rule two (when the next character is another
a). With input aaa the scanner also has to read one additional character,
because it is supposed to return the longest match (so if there comes another a, the match
is aaaa and not only aaa). But: When the scanner reads a ";",
it does not need an additional character and can immediatly execute the action for rule
number 3. This is the case for all rules that are not prefixes of any other rules in the
specification and that have a fixed length end (so (a* b) is ok but (a b*) is not).
For your application this means: if all commands (or whatever units of input you have) are
terminated by some delimiter (for instance ";" or LF or "end") reading from a network bytestream or an interactive stream works fine with JFlex.
-
How can I let my scanner read its input from a string?
String myString = "some input";
Scanner myScanner = new Scanner( new java.io.StringReader(myString) );
Why do standalone scanners have a different standard return type (int instead
of Yytoken)?
That's because int is a predefined type in Java and Yytoken is not. If a scanner
generated with %standalone option would have return type Yytoken, you would have
to provide this class for every standalone scanner you write. In most cases you don't want to do that,
because the scanner wouldn't be really standalone then.
The standard Yytoken for non standalone
cases stems from JLex and is only kept for compatibility (it's rarely used anyway).
If you still really want Yytoken as return type in a standalone scanner, you can always
explicitly specify it with %type Yytoken. If you just want to test your scanner scanner and see what
it does without a parser attached, use %debug instead of %standalone.
The expression ![a] seems to match "aa". Is negation
broken?
The semantics of the negation of an expression r is a
literal everything not matched by r. The expression
[a] matches strings that contain exactly one character
(namely "a"). The string "aa" is not matched by this
expression, hence it should be matched by ![a].
Negation doesn't seem to be awfully useful then, does it?
That depends on what you want to specify, of course. Pure negation
is not used very often, but the derived concept "set difference" can be
expressed by negation (in combination with "or"), and is occasionally
useful. Example: you want to match every one character sequence that is
not a letter or digit. The character class syntax doesn't at the moment
allow you to write something like [^[:letter:][:digit:]], but you
can help yourself with negation:
LD = [:letter:] | [:digit:]
NLD = !(![^] | {LD})
Algebraically this is equivalent to [^] & !{LD}
and [^] - {LD}, i.e., everything that is a one-character match
([^]) and not letter nor digit (there doesn't exist an "and"
or "minus" operator in JFlex, therefore the double negation form).
I use %8bit and get an Exception, but I know my platform
only uses 8 bit. Is %8bit broken?
Short answer: not broken, use %unicode. Long answer:
Most probably this is an encoding problem. Java uses Unicode internally and converts
the bytes it reads from files (or somewhere else) to Unicode first. The 8 bit value
of your platform may not be 8 bit anymore when converted to Unicode. On many Windozes
for instance Cp1252 (Windows-Latin-1) is used as standard encoding, and there the
character "single right quotation mark" has code \x92 but after conversion to
Unicode it's \u2019 which is not 8 bit any more. See also the section
on Encodings, platforms, and Unicode of the JFlex
manual for more information.
My scanner needs to read a file that is not in my platforms standard encoding, but in encoding XYZ. How?
Since the scanner reads Java Unicode characters, it is independent
of the actual character encoding a file or a string uses. The
transformation byte-stream to Java characters for files
usually happens in the java.io.InputStreamReader
object connected with the input stream. Class
java.io.FileReader uses the platforms default
encoding automatically. If you would like to explicitly
specify another encoding, for instance UTF-8, you
could do something like
Reader r = new InputStreamReader(new FileInputStream(file), "UTF8");
Now you have a Reader r that can be passed to the
scanner's constructor in the usual way.
For more information on encodings see also Sun's JDK documentation,
especially in Guide to Features - Java Platform item
Internationalization and there the FAQ and Supported Encodings.
|
|