BNF Syntax.

Introduction

A BNF specification or schema is a set of derivation rules, written as:

rulename ::= __expression__

where rulename is a nonterminal, and the __expression__ consists of one or more sequences of rulenames or terminals. This is an example BNF specification of a postal address:

S              ::= ( [#x20] | [#x9] | [#xD] | [#xA] )+        # Space characters
last_name      ::= [A-Za-z]+
first_name     ::= [A-Za-z]+
house_num      ::= [0-9]+
apt_num        ::= '#' <S>? [0-9]+
street_name    ::= ([A-Za-z0-9] | ' ' | '.')+
town_name      ::= ([A-Za-z] | ' ')+
ZIP_code       ::= [0-9][0-9][0-9][0-9][0-9]
state_code     ::= [A-Z][A-Z]
initial        ::= [A-Z]
name_part      ::= <first_name> <S> (<initial> "."? <S> )? <last_name> (<S> <jr>)?
street_address ::= <house_num> <S> <street_name> <apt_num>?
zip_part       ::= <town_name> <S>? "," <S>? <state_code> <S>? <ZIP_code>
jr             ::= "Sr." | "Jr."
postal_address ::= <name_part> <S> <street_address> <S> <zip_part>

RPA Directives

All directives start with '#!' The following directives are supported:

Occurance operators

 ? - Optional
 + - Multiple
 * - Optional/Multiple

Char Range

Example:

v::=[a-zA-Z12345-9]

Unicode Range

Use Square brackets '[' ']' to specify a single unicode character or a range. The character codes can be specified both in decimal or hex format. To specify hex format use 'x' or '0x' or 'X' or '0X'

Examples:

# specify space in hex unicode
space ::= [#x20]

# specify space in dec unicode
space ::= [#32]

# specify chars from a to z
atoz ::= [#x61-#x7a]

# specify chars from a to z using hex '0x'
atoz ::= [#0x61-#0x7a]

Strings

Single quoted or double quoted strings. Example:

v::='This is a single quoted string.'
v::="This is a double quoted string."

Brackets '(' ')'

Use brackets to change the default precedence or to group expressions. Examples:

v::= (abc)+ | [a-z]+
w::= (abc|xyz)+ - abcxyz

OR operator '|'

One of the choices specified with the operator OR('|') must match for the whole OR expression to match.

Examples:

v ::= abc | xyz
w ::= dev | <v> | [a-z]

The first of the choices that matches will be used, rest of the list is skipped. That means you have to be careful how you order the list. Example:

v ::= abc | abcd | abcde

Choices 'abcd' and 'abcde' will never be evaluated, because they are masked by 'abc'. You have to understand that the BNF schema is compiled to byte code and then run in a Virtual Machine (It is similar to the way Java works). In pseudo code this would look like:

if (match(ptr, 'abc')) {
        ADJUST(ptr, 3);
        return SUCCESS;
} else if (match(ptr, 'abcd')) {
        ADJUST(ptr, 4);
        return SUCCESS;
} else if (match(ptr, 'abcde')) {
        ADJUST(ptr, 5);
        return SUCCESS;
}
return FAILED;

If your input is 'abcde', then the very first match('abc') will match the first three letters and the ptr will be adjusted to point to the last two letters 'cd'. The other two choices in the if statement match(ptr, 'abcd'), match(ptr, 'abcde') will not even be tried. If you reorder the choices like this:

v ::= abcde | abcdc | abc

This would fix the masking problem.

MINUS operator '-'

The first expression must match and none of the expressions behind the operator MINUS ('-') for the whole expression to match. Examples:

word ::= [a-zA-Z]+
v ::= <word> - "Sun" - "Mon" - "Tue"

NOT Operator '^'

The operator NOT '^' will match one char if the expression behind the operator does not match.

v ::= ^a

is the same as:

v ::= . - a

Examples:

v ::= ^a
v ::= ^[a-z]
v ::= ^[#0x20]
v ::= ^(abc | def)
w ::= ^<v>

Operator AND

There is no operator AND, but the semantic can be implemented by combining the operator MINUS '-' and the operator NOT '^'.

Examples:

v ::= (abc) -^ (abcde)

'abc' will match only if it is followed by 'de'

Operator precedence

Listed from higher to lower:

^ Operator NOT
- Operator MINUS
| Operator OR

Special Charactes

 SPACE, TAB - These characters are ignored, if used outside a quoted string. 
 . - Any char. The dot can be used to match any char. If used inside a quoted string it is treated as a normal character.

Examples:

v ::= a b c | d e f

is exactly the same as:

v ::= abc|def

Use quotes to include a space in a rule:

v ::= "a b c" | "d e f"