Python Backport Compiler Utilities¶
Utility library for the Python bpc
backport compiler.
Currently, the three individual tools (f2format
, poseur
,
walrus
) depend on this repo. The bpc
compiler is a
work in progress.
Module contents¶
Utility library for the Python bpc backport compiler.
- exception bpc_utils.BPCInternalError(message, context)[source]¶
Bases:
RuntimeError
Internal bug happened in BPC tools.
Initialize BPCInternalError.
- exception bpc_utils.BPCRecoveryError[source]¶
Bases:
RuntimeError
Error during file recovery.
- exception bpc_utils.BPCSyntaxError[source]¶
Bases:
SyntaxError
Syntax error detected when parsing code.
- class bpc_utils.BaseContext(node, config, *, indent_level=0, raw=False)[source]¶
Bases:
ABC
Abstract base class for general conversion context.
Initialize BaseContext.
- Parameters:
node (
NodeOrLeaf
) – parso ASTconfig (
Config
) – conversion configurationsindent_level (
int
) – current indentation levelraw (
bool
) – raw processing flag
- final __iadd__(code)[source]¶
Support of the
+=
operator.If
self._prefix_or_suffix
isTrue
, then thecode
will be appended toself._prefix
; else it will be appended toself._suffix
.- Parameters:
code (
str
) – code string- Returns:
self
- Return type:
- final __str__()[source]¶
Returns a stripped version of
self._buffer
.- Return type:
- final _process(node)[source]¶
Recursively process parso AST.
All processing methods for a specific
node
type are defined as_process_{type}
. This method first checks if such processing method exists. If so, it will call such method on thenode
; otherwise it will traverse through all children ofnode
, and perform the same logic on each child.- Parameters:
node (
NodeOrLeaf
) – parso AST- Return type:
- final _walk(node)[source]¶
Start traversing the AST module.
The method traverses through all children of
node
. It first checks if such child has the target expression. If so, it will toggleself._prefix_or_suffix
(set toFalse
) and save the last previous child asself._node_before_expr
. Then it processes the child withself._process
.- Parameters:
node (
NodeOrLeaf
) – parso AST- Return type:
- final static extract_whitespaces(code)[source]¶
Extract preceding and succeeding whitespaces from the code given.
- abstract has_expr(node)[source]¶
Check if node has the target expression.
- Parameters:
node (
NodeOrLeaf
) – parso AST- Return type:
- Returns:
whether
node
has the target expression
- final classmethod mangle(cls_name, var_name)[source]¶
Mangle variable names.
This method mangles variable names as described in Python documentation about mangling and further normalizes the mangled variable name through
normalize()
.
- final static missing_newlines(prefix, suffix, expected, linesep)[source]¶
Count missing blank lines for code insertion given surrounding code.
- final static normalize(name)[source]¶
Normalize variable names.
This method normalizes variable names as described in Python documentation about identifiers and PEP 3131.
- final static split_comments(code, linesep)[source]¶
Separates prefixing comments from code.
This method separates prefixing comments and suffixing code. It is rather useful when inserting code might break shebang and encoding cookies (PEP 263), etc.
- _buffer¶
Final converted result.
- _indent_level¶
Current indentation level.
- _indentation¶
Indentation sequence.
- _node_before_expr¶
Preceding node with the target expression, i.e. the insertion point.
- _prefix¶
Code before insertion point.
- _prefix_or_suffix¶
Flag to indicate whether buffer is now
self._prefix
.
- _root¶
Root node given by the
node
parameter.
- _suffix¶
Code after insertion point.
- _uuid_gen¶
UUID generator.
- config¶
Internal configurations.
- property string¶
Returns conversion buffer (
self._buffer
).
- class bpc_utils.Config(**kwargs)[source]¶
Bases:
MutableMapping
[str
,object
]Configuration namespace.
This class is inspired from
argparse.Namespace
for storing internal attributes and/or configuration variables.>>> config = Config(foo='var', bar=True) >>> config.foo 'var' >>> config['bar'] True >>> config.bar = 'boo' >>> del config['foo'] >>> config Config(bar='boo')
- class bpc_utils.Placeholder(name)[source]¶
Bases:
object
Placeholder for string interpolation.
Placeholder
objects can be concatenated withstr
, otherPlaceholder
objects andStringInterpolation
objects via the ‘+’ operator.Placeholder
objects should be regarded as immutable. Please do not modify the_name
internal attribute. Build new objects instead.Initialize Placeholder.
- property name¶
Returns the name of this placeholder.
- class bpc_utils.StringInterpolation(*args)[source]¶
Bases:
object
A string with placeholders to be filled in.
This looks like an object-oriented format string, but making sure that string literals are always interpreted literally (so no need to manually do escaping). The boundaries between string literals and placeholders are very clear. Filling in a placeholder will never inject a new placeholder, protecting string integrity for multiple-round interpolation.
>>> s1 = '%(injected)s' >>> s2 = 'hello' >>> s = StringInterpolation('prefix ', Placeholder('q1'), ' infix ', Placeholder('q2'), ' suffix') >>> str(s % {'q1': s1} % {'q2': s2}) 'prefix %(injected)s infix hello suffix'
(This can be regarded as an improved version of
string.Template.safe_substitute()
.)Multiple-round interpolation is tricky to do with a traditional format string. In order to do things correctly and avoid format string injection vulnerabilities, you need to perform escapes very carefully.
>>> fs = 'prefix %(q1)s infix %(q2)s suffix' >>> fs % {'q1': s1} % {'q2': s2} Traceback (most recent call last): ... KeyError: 'q2' >>> fs = 'prefix %(q1)s infix %%(q2)s suffix' >>> fs % {'q1': s1} % {'q2': s2} Traceback (most recent call last): ... KeyError: 'injected' >>> fs % {'q1': s1.replace('%', '%%')} % {'q2': s2} 'prefix %(injected)s infix hello suffix'
StringInterpolation
objects can be concatenated withstr
,Placeholder
objects and otherStringInterpolation
objects via the ‘+’ operator.StringInterpolation
objects should be regarded as immutable. Please do not modify the_literals
and_placeholders
internal attributes. Build new objects instead.Initialize StringInterpolation.
args
will be concatenated to construct aStringInterpolation
object.>>> StringInterpolation('prefix', Placeholder('data'), 'suffix') StringInterpolation('prefix', Placeholder('data'), 'suffix')
- Parameters:
args (
Union
[str
,Placeholder
,StringInterpolation
]) – the components to construct aStringInterpolation
object
- __mod__(substitutions)[source]¶
Substitute the placeholders in this
StringInterpolation
object with string values (if possible) according to thesubstitutions
mapping.>>> StringInterpolation('prefix ', Placeholder('data'), ' suffix') % {'data': 'hello'} StringInterpolation('prefix hello suffix')
- Parameters:
substitutions (
Mapping
[str
,object
]) – a mapping from placeholder names to the values to be filled in; all values are converted intostr
- Return type:
- Returns:
a new
StringInterpolation
object with as many placeholders substituted as possible
- __str__()[source]¶
Returns the fully-substituted string interpolation result.
>>> str(StringInterpolation('prefix hello suffix')) 'prefix hello suffix'
- Return type:
- Returns:
the fully-substituted string interpolation result
- Raises:
ValueError – if there are still unsubstituted placeholders in this
StringInterpolation
object
- classmethod from_components(literals, placeholders)[source]¶
Construct a
StringInterpolation
object fromliterals
andplaceholders
components. This method is more efficient than theStringInterpolation()
constructor, but it is mainly intended for internal use.>>> StringInterpolation.from_components( ... ('prefix', 'infix', 'suffix'), ... (Placeholder('data1'), Placeholder('data2')) ... ) StringInterpolation('prefix', Placeholder('data1'), 'infix', Placeholder('data2'), 'suffix')
- Parameters:
placeholders (
Iterable
[Placeholder
]) – thePlaceholder
components in order
- Return type:
- Returns:
the constructed
StringInterpolation
object- Raises:
TypeError – if
literals
isstr
; ifliterals
contains non-str
values; ifplaceholders
contains non-Placeholder
valuesValueError – if the length of
literals
is not exactly one more than the length ofplaceholders
- iter_components()[source]¶
Generator to iterate all components of this
StringInterpolation
object in order.>>> list(StringInterpolation('prefix', Placeholder('data'), 'suffix').iter_components()) ['prefix', Placeholder('data'), 'suffix']
- Yields:
the components of this
StringInterpolation
object in order
- property literals¶
Returns the literal components in this
StringInterpolation
object.
- property placeholders¶
Returns the
Placeholder
components in thisStringInterpolation
object.
- property result¶
Alias of
StringInterpolation.__str__()
to get the fully-substituted string interpolation result.>>> StringInterpolation('prefix hello suffix').result 'prefix hello suffix'
- class bpc_utils.UUID4Generator(dash=True)[source]¶
Bases:
object
UUID 4 generator wrapper to prevent UUID collisions.
Constructor of UUID 4 generator wrapper.
- Parameters:
dash (
bool
) – whether the generated UUID string has dashes or not
- bpc_utils.TaskLock()[source]¶
Function that returns a lock for possibly concurrent tasks.
- Return type:
- Returns:
a lock for possibly concurrent tasks
- bpc_utils.detect_encoding(code)[source]¶
Detect encoding of Python source code as specified in PEP 263.
- Parameters:
code (
bytes
) – the code to detect encoding- Return type:
- Returns:
the detected encoding, or the default encoding (
utf-8
)- Raises:
SyntaxError – if both a BOM and a cookie are present, but disagree
- bpc_utils.detect_files(files)[source]¶
Get a list of Python files to be processed according to user input.
This will perform glob expansion on Windows, make all paths absolute, resolve symbolic links and remove duplicates.
- Parameters:
files (
Iterable
[str
]) – a list of files and directories to process (usually provided by users on command-line)- Return type:
- Returns:
a list of Python files to be processed
See also
See
expand_glob_iter()
for more information.
- bpc_utils.detect_indentation(code)[source]¶
Detect indentation of Python source code.
- Parameters:
code (
Union
[str
,bytes
,TextIO
,NodeOrLeaf
]) – the code to detect indentation- Return type:
- Returns:
the detected indentation sequence
- Raises:
TokenError – when failed to tokenize the source code under certain cases, see documentation of
TokenError
for more details
Notes
In case of mixed indentation, try voting by the number of occurrences of each indentation value (spaces and tabs).
When there is a tie between spaces and tabs, prefer 4 spaces for PEP 8.
- bpc_utils.detect_linesep(code)[source]¶
Detect linesep of Python source code.
- Parameters:
code (
Union
[str
,bytes
,TextIO
,NodeOrLeaf
]) – the code to detect linesep- Returns:
the detected linesep (one of
'\n'
,'\r\n'
and'\r'
)- Return type:
Notes
In case of mixed linesep, try voting by the number of occurrences of each linesep value.
When there is a tie, prefer
LF
toCRLF
, preferCRLF
toCR
.
- bpc_utils.first_non_none(*args)[source]¶
Return the first non-
None
value from a list of values.- Parameters:
*args –
variable length argument list
If one positional argument is provided, it should be an iterable of the values.
If two or more positional arguments are provided, then the value list is the positional argument list.
- Returns:
the first non-
None
value, if all values areNone
or sequence is empty, returnNone
- Raises:
TypeError – if no arguments provided
- bpc_utils.first_truthy(*args)[source]¶
Return the first truthy value from a list of values.
- Parameters:
*args –
variable length argument list
If one positional argument is provided, it should be an iterable of the values.
If two or more positional arguments are provided, then the value list is the positional argument list.
- Returns:
the first truthy value, if no truthy values found or sequence is empty, return
None
- Raises:
TypeError – if no arguments provided
- bpc_utils.get_parso_grammar_versions(minimum=None)[source]¶
Get Python versions that parso supports to parse grammar.
- bpc_utils.map_tasks(func, iterable, posargs=None, kwargs=None, *, processes=None, chunksize=None)[source]¶
Execute tasks in parallel if
multiprocessing
is available, otherwise execute them sequentially.- Parameters:
func (
Callable
[...
,TypeVar
(T
)]) – the task function to executeposargs (
Optional
[Iterable
[object
]]) – additional positional arguments to pass tofunc
kwargs (
Optional
[Mapping
[str
,object
]]) – keyword arguments to pass tofunc
processes (
Optional
[int
]) – the number of worker processes (default: auto determine)
- Return type:
- Returns:
the return values of the task function applied on the input items and additional arguments
- bpc_utils.parse_boolean_state(s)[source]¶
Parse a boolean state from a string representation.
These values are regarded as
True
:'1'
,'yes'
,'y'
,'true'
,'on'
These values are regarded as
False
:'0'
,'no'
,'n'
,'false'
,'off'
Value matching is case insensitive.
- Parameters:
s (
Optional
[str
]) – string representation of a boolean state- Return type:
- Returns:
- Raises:
ValueError – if
s
is an invalid boolean state value
See also
See
_boolean_state_lookup
for default lookup mapping values.
- bpc_utils.parse_indentation(s)[source]¶
Parse indentation from a string representation.
If an integer or a string of positive integer
n
is specified, then indentation isn
spaces.If
't'
or'tab'
is specified, then indentation is tab.If
'\t'
(the tab character itself) or a string consisting only of the space character (U+0020) is specified, it is returned directly.
Value matching is case insensitive.
- bpc_utils.parse_linesep(s)[source]¶
Parse linesep from a string representation.
These values are regarded as
'\n'
:'\n'
,'lf'
These values are regarded as
'\r\n'
:'\r\n'
,'crlf'
These values are regarded as
'\r'
:'\r'
,'cr'
Value matching is case insensitive.
- Parameters:
- Returns:
the parsed linesep result, return
None
if input isNone
or empty string- Return type:
Optional[
Linesep
]- Raises:
ValueError – if
s
is an invalid linesep value
See also
See
_linesep_lookup
for default lookup mapping values.
- bpc_utils.parse_positive_integer(s)[source]¶
Parse a positive integer from a string representation.
- bpc_utils.parso_parse(code, filename=None, *, version=None)[source]¶
Parse Python source code with parso.
- Parameters:
- Return type:
- Returns:
parso AST
- Raises:
BPCSyntaxError – when source code contains syntax errors
- bpc_utils.recover_files(archive_file_or_dir, *, rr=False, rs=False)[source]¶
Recover files from a tar archive, optionally removing the archive file and archive directory after recovery.
This function supports three modes:
- Normal mode (when
rr
andrs
are bothFalse
): Recover from the archive file specified by
archive_file_or_dir
.
- Normal mode (when
- Recover and remove (when
rr
isTrue
): Recover from the archive file specified by
archive_file_or_dir
, and remove this archive file after recovery.
- Recover and remove (when
- Recover from the only file in the archive directory (when
rs
isTrue
): If the directory specified by
archive_file_or_dir
contains exactly one (regular) file, recover from that file and remove the archive directory.
- Recover from the only file in the archive directory (when
Specifying both
rr
andrs
asTrue
is not accepted.- Parameters:
- Raises:
ValueError – when
rr
andrs
are bothTrue
BPCRecoveryError – when
rs
isTrue
, and the directory specified byarchive_file_or_dir
is empty, contains more than one item, or contains a non-regular file
- Return type:
- bpc_utils.Linesep¶
Type alias for
Literal['\n', '\r\n', '\r']
.
Internal utilities¶
- bpc_utils.argparse._boolean_state_lookup¶
-
A mapping from string representation to boolean states. The values are used for
parse_boolean_state()
.
- bpc_utils.argparse._linesep_lookup¶
- Type:
Final[Dict[str,
Linesep
]]
A mapping from string representation to linesep. The values are used for
parse_linesep()
.
- bpc_utils.fileprocessing.LOOKUP_TABLE = '_lookup_table.json'¶
File name for the lookup table in the archive file.
- Type:
Final[str]
- bpc_utils.fileprocessing.is_python_filename(filename)[source]¶
Determine whether a file is a Python source file by its extension.
- bpc_utils.fileprocessing.expand_glob_iter(pattern)[source]¶
Wrapper function to perform glob expansion.
- class bpc_utils.logging.BPCLogHandler[source]¶
Bases:
StreamHandler
Handler used to format BPC logging records.
Initialize BPCLogHandler.
- format(record)[source]¶
Format the specified record based on log level.
The record will be formatted based on its log level in the following flavour:
DEBUG
[%(levelname)s] %(asctime)s %(message)s
INFO
%(message)s
WARNING
Warning: %(message)s
ERROR
Error: %(message)s
CRITICAL
Error: %(message)s
- format_templates = {'CRITICAL': 'Error: %(message)s', 'DEBUG': '[%(levelname)s] %(asctime)s %(message)s', 'ERROR': 'Error: %(message)s', 'INFO': '%(message)s', 'WARNING': 'Warning: %(message)s'}¶
- time_format = '%Y-%m-%d %H:%M:%S.%f%z'¶
- bpc_utils.misc.current_time_with_tzinfo()[source]¶
Get the current time with local time zone information.
- Return type:
- Returns:
datetime object representing current time with local time zone information
- class bpc_utils.misc.MakeTextIO(obj)[source]¶
Bases:
object
Context wrapper class to handle
str
and file objects together.- Variables:
Initialize context.
- bpc_utils.multiprocessing.mp¶
- Type:
Optional[ModuleType]
- Value:
<module ‘multiprocessing’>
An alias of the Python builtin
multiprocessing
module if available.
- bpc_utils.multiprocessing._mp_map_wrapper(args)[source]¶
Map wrapper function for
multiprocessing
.
- bpc_utils.multiprocessing._mp_init_lock(lock)[source]¶
Initialize lock for
multiprocessing
.- Parameters:
lock (
ContextManager
[None
]) – the lock to be shared among tasks- Return type:
- bpc_utils.multiprocessing.task_lock¶
- Type:
ContextManager[None]
A lock for possibly concurrent tasks.