Scanner

class Scanner(*args, **kwargs)

GScanner provides a general-purpose lexical scanner.

You should set input_name after creating the scanner, since it is used by the default message handler when displaying warnings and errors. If you are scanning a file, the filename would be a good choice.

The user_data and max_parse_errors fields are not used. If you need to associate extra data with the scanner you can place them here.

If you want to use your own message handler you can set the msg_handler field. The type of the message handler function is declared by ScannerMsgFunc.

Methods

class Scanner

cur_line() → int: Returns the current line in the input stream (counting from 1). This is the line of the last token parsed via get_next_token().

cur_position() → int: Returns the current position in the current line (counting from 0). This is the position of the last token parsed via get_next_token().

cur_token() → TokenType: Gets the current token type. This is simply the token field in the Scanner structure.

destroy() → None: Frees all memory used by the Scanner.

eof() → bool: Returns True if the scanner has reached the end of the file or text buffer.

get_next_token() → TokenType: Parses the next token just like peek_next_token() and also removes it from the input stream. The token data is placed in the token, value, line, and position fields of the Scanner structure.

input_file(input_fd: int) → None

Prepares to scan a file.

Parameters:: input_fd – a file descriptor

input_text(text: str, text_len: int) → None

Prepares to scan a text buffer.

Parameters:

text – the text buffer to scan
text_len – the length of the text buffer

lookup_symbol(symbol: str) → None

Looks up a symbol in the current scope and return its value. If the symbol is not bound in the current scope, None is returned.

Parameters:: symbol – the symbol to look up

peek_next_token() → TokenType

Parses the next token, without removing it from the input stream. The token data is placed in the next_token, next_value, next_line, and next_position fields of the Scanner structure.

Note that, while the token is not removed from the input stream (i.e. the next call to get_next_token() will return the same token), it will not be reevaluated. This can lead to surprising results when changing scope or the scanner configuration after peeking the next token. Getting the next token after switching the scope or configuration will return whatever was peeked before, regardless of any symbols that may have been added or removed in the new scope.

scope_add_symbol(scope_id: int, symbol: str, value: None) → None

Adds a symbol to the given scope.

Parameters:

scope_id – the scope id
symbol – the symbol to add
value – the value of the symbol

scope_foreach_symbol(scope_id: int, func: Callable[[...], None], *user_data: Any) → None

Calls the given function for each of the symbol/value pairs in the given scope of the Scanner. The function is passed the symbol and value of each pair, and the given user_data parameter.

Parameters:

scope_id – the scope id
func – the function to call for each symbol/value pair
user_data – user data to pass to the function

scope_lookup_symbol(scope_id: int, symbol: str) → None

Looks up a symbol in a scope and return its value. If the symbol is not bound in the scope, None is returned.

Parameters:

scope_id – the scope id
symbol – the symbol to look up

scope_remove_symbol(scope_id: int, symbol: str) → None

Removes a symbol from a scope.

Parameters:

scope_id – the scope id
symbol – the symbol to remove

set_scope(scope_id: int) → int

Sets the current scope.

Parameters:: scope_id – the new scope id

sync_file_offset() → None: Rewinds the filedescriptor to the current buffer position and blows the file read ahead buffer. This is useful for third party uses of the scanners filedescriptor, which hooks onto the current scanning position.

unexp_token(expected_token: TokenType, identifier_spec: str, symbol_spec: str, symbol_name: str, message: str, is_error: int) → None

Outputs a message through the scanner’s msg_handler, resulting from an unexpected token in the input stream. Note that you should not call peek_next_token() followed by unexp_token() without an intermediate call to get_next_token(), as unexp_token() evaluates the scanner’s current token (not the peeked token) to construct part of the message.

Parameters:

expected_token – the expected token
identifier_spec – a string describing how the scanner’s user refers to identifiers (None defaults to “identifier”). This is used if expected_token is IDENTIFIER or IDENTIFIER_NULL.
symbol_spec – a string describing how the scanner’s user refers to symbols (None defaults to “symbol”). This is used if expected_token is SYMBOL or any token value greater than %G_TOKEN_LAST.
symbol_name – the name of the symbol, if the scanner’s current token is a symbol.
message – a message string to output at the end of the warning/error, or None.
is_error – if True it is output as an error. If False it is output as a warning.

Fields

class Scanner

buffer

config: Link into the scanner configuration

input_fd

input_name: Name of input stream, featured by the default message handler

line: Line number of the last token from get_next_token()

max_parse_errors: Unused

msg_handler: Handler function for _warn and _error

next_line: Line number of the last token from peek_next_token()

next_position: Char number of the last token from peek_next_token()

next_token: Token parsed by the last peek_next_token()

next_value: Value of the last token from peek_next_token()

parse_errors: error() increments this field

position: Char number of the last token from get_next_token()

qdata: Quarked data

scope_id

symbol_table

text

text_end

token: Token parsed by the last get_next_token()

user_data: Unused

value: Value of the last token from get_next_token()