501 lines
18 KiB
Python
501 lines
18 KiB
Python
"""rewrite.py - Post rewriting for financial reports
|
|
|
|
Introduction
|
|
------------
|
|
|
|
There are some kinds of posting metadata that's too impractical to write when
|
|
you enter it the books. For example, the ``expense-type`` of employee payroll
|
|
is usually determined by the employee's records or estimate at the end of the
|
|
year. It isn't known when payroll expenses are posted throughout the year, and
|
|
then there's too many of them to go back and code it manually.
|
|
|
|
Rewrite rules solve this problem. They provide a mechanism to make safe, bulk
|
|
transformations to postings just after they're loaded and before they're
|
|
reported. They let you fill in the gaps between the data in the books and
|
|
different reporting requirements.
|
|
|
|
Most reporting tools load rewrite rules written in YAML, so the examples in
|
|
this documentation are written that way. (If you're developing reporting tools,
|
|
note RewriteRule accepts a native Python dictionary.) One typical rule looks
|
|
like::
|
|
|
|
if:
|
|
- SUBJECT OP VALUE
|
|
[- SUBJECT2 OP2 VALUE2
|
|
- …]
|
|
action1:
|
|
- SUBJECT OP VALUE
|
|
[- SUBJECT2 OP2 VALUE2
|
|
- …]
|
|
[action2:
|
|
- …
|
|
…]
|
|
|
|
A ruleset, as in a YAML file, is just an array of hashes like this.
|
|
|
|
Conditions and Actions
|
|
----------------------
|
|
|
|
The hash must have at least two keys. One of them must be ``if``, and its value
|
|
is an array of condition strings. The rest can have any name you like and are
|
|
actions. Each action transforms an original posting that matched the ``if``
|
|
conditions and yields a new posting from it. The value is an array of action
|
|
strings. Conditions and actions are written the same way;
|
|
conditions just use test operators, while actions use assignment operators.
|
|
|
|
Subjects
|
|
--------
|
|
|
|
There are two kinds of subjects, attributes and metadata.
|
|
|
|
Attributes start with a ``.`` and access data directly on the posting line,
|
|
or from the parent transaction line. You can use these attributes:
|
|
|
|
================ =======================================================
|
|
Name Description
|
|
================ =======================================================
|
|
``.account`` The name of the account on the posting line
|
|
---------------- -------------------------------------------------------
|
|
``.date`` The date of the posting's transaction. When you work on
|
|
a date, write the value in ISO ``YYYY-MM-DD`` format.
|
|
---------------- -------------------------------------------------------
|
|
``.number`` The number part of the posting's position;
|
|
i.e., the amount without the currency.
|
|
================ =======================================================
|
|
|
|
Any other string is a metadata key. As usual, if a condition tries to read
|
|
metadata that does not exist on the posting, it will fall back to checking the
|
|
transaction. Metadata values are always treated as strings. NOTE: This means
|
|
comparisons against non-string metadata values, like dates and amounts, might
|
|
not work the way you want.
|
|
|
|
Condition Operators
|
|
-------------------
|
|
|
|
Conditions can always use Python's basic comparison operators:
|
|
``== != < <= > >=``. You can also use the following:
|
|
|
|
================ =======================================================
|
|
Name Description
|
|
================ =======================================================
|
|
``.account in`` The value is parsed as a space-separated list of
|
|
account names. The condition matches when the posting's
|
|
account is any of those named accounts, or any of their
|
|
respective subaccounts.
|
|
================ =======================================================
|
|
|
|
Action Operators
|
|
----------------
|
|
|
|
You can set ``.account`` and any metadata with ``=``. Values are always treated
|
|
as strings.
|
|
|
|
You can also transform the posting's number using ``.number *= NUMBER``. This
|
|
is mainly used to divide the posting's amount across multiple actions in one
|
|
rule.
|
|
|
|
Execution
|
|
---------
|
|
|
|
When rewrite rules are applied to postings, the first rule whose condition
|
|
matches "wins." When a source posting matches a rule's conditions, its actions
|
|
are applied, and the transformed posting(s) replace the source posting.
|
|
No more rewrite rules are considered for either the original source posting
|
|
or the transformed posting(s).
|
|
|
|
Validations
|
|
-----------
|
|
|
|
Rewrite rules are validated to help ensure that you don't break the fundamental
|
|
accounting equation, Equity = Assets - Liabilities.
|
|
|
|
* If an action assigns to ``.account``, there must also be a condition to check
|
|
that the ``.account`` is in the same category, using ``==`` or ``in``.
|
|
You cannot change an Asset into a Liability or Equity, and so on.
|
|
|
|
* All actions in a rewrite rule must multiply ``.number`` by a total of 1.
|
|
(Actions that don't explicitly multiply the number are understood to
|
|
multiply it by 1.) For example, a rewrite rule can have two actions that
|
|
each multiply the number by .5, or one by .8 and the other by .2. It
|
|
cannot have two actions that each multiply the number by 1, or .3,
|
|
etc. Otherwise, the different postings of a transaction would not balance.
|
|
|
|
* You cannot assign to ``.date`` at all. Otherwise, you might separate postings
|
|
of the same transaction in time, and the accounting equation would not hold
|
|
during the time gap.
|
|
|
|
"""
|
|
# Copyright © 2020 Brett Smith
|
|
#
|
|
# This program is free software: you can redistribute it and/or modify
|
|
# it under the terms of the GNU Affero General Public License as published by
|
|
# the Free Software Foundation, either version 3 of the License, or
|
|
# (at your option) any later version.
|
|
#
|
|
# This program is distributed in the hope that it will be useful,
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
# GNU Affero General Public License for more details.
|
|
#
|
|
# You should have received a copy of the GNU Affero General Public License
|
|
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
|
|
import abc
|
|
import datetime
|
|
import decimal
|
|
import enum
|
|
import logging
|
|
import operator as opmod
|
|
import re
|
|
|
|
from typing import (
|
|
Callable,
|
|
Dict,
|
|
Generic,
|
|
IO,
|
|
Iterable,
|
|
Iterator,
|
|
List,
|
|
Mapping,
|
|
Optional,
|
|
Pattern,
|
|
Sequence,
|
|
Set,
|
|
Tuple,
|
|
Type,
|
|
TypeVar,
|
|
Union,
|
|
)
|
|
from ..beancount_types import (
|
|
Meta,
|
|
MetaKey,
|
|
MetaValue,
|
|
)
|
|
|
|
from pathlib import Path
|
|
|
|
import yaml
|
|
|
|
from .. import data
|
|
|
|
Decimal = decimal.Decimal
|
|
T = TypeVar('T')
|
|
TestCallable = Callable[[T, T], bool]
|
|
|
|
CMP_OPS: Mapping[str, TestCallable] = {
|
|
'==': opmod.eq,
|
|
'>=': opmod.ge,
|
|
'>': opmod.gt,
|
|
'<=': opmod.le,
|
|
'<': opmod.lt,
|
|
'!=': opmod.ne,
|
|
}
|
|
|
|
# First half of this regexp is pseudo-attribute access.
|
|
# Second half is metadata keys, per the Beancount syntax docs.
|
|
SUBJECT_PAT = r'((?:\.\w+)+|[a-z][-\w]*)\b\s*'
|
|
|
|
logger = logging.getLogger('conservancy_beancount.reports.rewrite')
|
|
|
|
class _Registry(Generic[T]):
|
|
def __init__(self,
|
|
description: str,
|
|
parser: Union[str, Pattern],
|
|
default: Type[T],
|
|
*others: Tuple[str, Type[T]],
|
|
) -> None:
|
|
if isinstance(parser, str):
|
|
parser = re.compile(parser)
|
|
self.description = description
|
|
self.parser = parser
|
|
self.default = default
|
|
self.registry: Mapping[str, Type[T]] = dict(others)
|
|
|
|
def parse(self, s: str) -> T:
|
|
match = self.parser.match(s)
|
|
if match is None:
|
|
raise ValueError(f"could not parse {self.description} {s!r}")
|
|
subject = match.group(1)
|
|
operator = match.group(2)
|
|
operand = s[match.end():].strip()
|
|
if not subject.startswith('.'):
|
|
# FIXME: To avoid this type ignore, I would have to define a common
|
|
# superclass for Tester and Setter that provides a useful signature
|
|
# for __init__, including the versions that deal with Metadata,
|
|
# and then use that as the bound for our type variable.
|
|
# Not a priority right now.
|
|
return self.default(subject, operator, operand) # type:ignore[call-arg]
|
|
try:
|
|
retclass = self.registry[subject]
|
|
except KeyError:
|
|
raise ValueError(f"unknown subject in {self.description} {subject!r}") from None
|
|
else:
|
|
return retclass(operator, operand) # type:ignore[call-arg]
|
|
|
|
|
|
class Tester(Generic[T], metaclass=abc.ABCMeta):
|
|
OPS: Mapping[str, TestCallable] = CMP_OPS
|
|
|
|
def __init__(self, operator: str, operand: str) -> None:
|
|
try:
|
|
self.op_func = self.OPS[operator]
|
|
except KeyError:
|
|
raise ValueError(f"unsupported operator {operator!r}") from None
|
|
self.operand = self.parse_operand(operand)
|
|
|
|
@staticmethod
|
|
@abc.abstractmethod
|
|
def parse_operand(operand: str) -> T: ...
|
|
|
|
@abc.abstractmethod
|
|
def post_get(self, post: data.Posting) -> T: ...
|
|
|
|
def __call__(self, post: data.Posting) -> bool:
|
|
return self.op_func(self.post_get(post), self.operand)
|
|
|
|
|
|
class AccountTest(Tester[str]):
|
|
def __init__(self, operator: str, operand: str) -> None:
|
|
if operator == 'in':
|
|
self.under_args = operand.split()
|
|
for name in self.under_args:
|
|
self.parse_operand(name)
|
|
else:
|
|
super().__init__(operator, operand)
|
|
|
|
@staticmethod
|
|
def parse_operand(operand: str) -> str:
|
|
if data.Account.is_account(f'{operand}:RootsOK'):
|
|
return operand
|
|
else:
|
|
raise ValueError(f"invalid account name {operand!r}")
|
|
|
|
def post_get(self, post: data.Posting) -> str:
|
|
return post.account
|
|
|
|
def __call__(self, post: data.Posting) -> bool:
|
|
try:
|
|
return post.account.is_under(*self.under_args) is not None
|
|
except AttributeError:
|
|
return super().__call__(post)
|
|
|
|
|
|
class DateTest(Tester[datetime.date]):
|
|
@staticmethod
|
|
def parse_operand(operand: str) -> datetime.date:
|
|
return datetime.datetime.strptime(operand, '%Y-%m-%d').date()
|
|
|
|
def post_get(self, post: data.Posting) -> datetime.date:
|
|
return post.meta.date
|
|
|
|
|
|
class MetadataTest(Tester[Optional[MetaValue]]):
|
|
def __init__(self, key: MetaKey, operator: str, operand: str) -> None:
|
|
super().__init__(operator, operand)
|
|
self.key = key
|
|
|
|
@staticmethod
|
|
def parse_operand(operand: str) -> str:
|
|
return operand
|
|
|
|
def post_get(self, post: data.Posting) -> Optional[MetaValue]:
|
|
return post.meta.get(self.key)
|
|
|
|
|
|
class NumberTest(Tester[Decimal]):
|
|
@staticmethod
|
|
def parse_operand(operand: str) -> Decimal:
|
|
try:
|
|
return Decimal(operand)
|
|
except decimal.DecimalException:
|
|
raise ValueError(f"could not parse decimal {operand!r}")
|
|
|
|
def post_get(self, post: data.Posting) -> Decimal:
|
|
return post.units.number
|
|
|
|
|
|
TestRegistry: _Registry[Tester] = _Registry(
|
|
'condition',
|
|
'^{}{}'.format(
|
|
SUBJECT_PAT,
|
|
r'({}|in)'.format('|'.join(re.escape(s) for s in Tester.OPS)),
|
|
),
|
|
MetadataTest,
|
|
('.account', AccountTest),
|
|
('.date', DateTest),
|
|
('.number', NumberTest),
|
|
)
|
|
|
|
class Setter(Generic[T], metaclass=abc.ABCMeta):
|
|
_regparser = re.compile(r'^{}{}'.format(
|
|
SUBJECT_PAT,
|
|
r'',
|
|
))
|
|
_regtype = 'setter'
|
|
|
|
@abc.abstractmethod
|
|
def __call__(self, post: data.Posting) -> Tuple[str, T]: ...
|
|
|
|
|
|
class AccountSet(Setter[data.Account]):
|
|
def __init__(self, operator: str, value: str) -> None:
|
|
if operator != '=':
|
|
raise ValueError(f"unsupported operator for account {operator!r}")
|
|
self.value = data.Account(AccountTest.parse_operand(value))
|
|
|
|
def __call__(self, post: data.Posting) -> Tuple[str, data.Account]:
|
|
return ('account', self.value)
|
|
|
|
|
|
class MetadataSet(Setter[str]):
|
|
def __init__(self, key: str, operator: str, value: str) -> None:
|
|
if operator != '=':
|
|
raise ValueError(f"unsupported operator for metadata {operator!r}")
|
|
self.key = key
|
|
self.value = value
|
|
|
|
def __call__(self, post: data.Posting) -> Tuple[str, str]:
|
|
return (self.key, self.value)
|
|
|
|
|
|
class NumberSet(Setter[data.Amount]):
|
|
def __init__(self, operator: str, value: str) -> None:
|
|
if operator != '*=':
|
|
raise ValueError(f"unsupported operator for number {operator!r}")
|
|
self.value = NumberTest.parse_operand(value)
|
|
|
|
def __call__(self, post: data.Posting) -> Tuple[str, data.Amount]:
|
|
number = post.units.number * self.value
|
|
return ('units', post.units._replace(number=number))
|
|
|
|
|
|
SetRegistry: _Registry[Setter] = _Registry(
|
|
'action',
|
|
rf'^{SUBJECT_PAT}([-+/*]?=)',
|
|
MetadataSet,
|
|
('.account', AccountSet),
|
|
('.number', NumberSet),
|
|
)
|
|
|
|
class _RootAccount(enum.Enum):
|
|
Assets = 'Assets'
|
|
Liabilities = 'Liabilities'
|
|
Equity = 'Equity'
|
|
|
|
@classmethod
|
|
def from_account(cls, name: str) -> '_RootAccount':
|
|
root, _, _ = name.partition(':')
|
|
try:
|
|
return cls[root]
|
|
except KeyError:
|
|
return cls.Equity
|
|
|
|
|
|
class RewriteRule:
|
|
def __init__(self, source: Mapping[str, List[str]]) -> None:
|
|
self.new_meta: List[Sequence[MetadataSet]] = []
|
|
self.rewrites: List[Sequence[Setter]] = []
|
|
for key, rules in source.items():
|
|
if key == 'if':
|
|
self.tests = [TestRegistry.parse(rule) for rule in rules]
|
|
else:
|
|
new_meta: List[MetadataSet] = []
|
|
rewrites: List[Setter] = []
|
|
for rule_s in rules:
|
|
setter = SetRegistry.parse(rule_s)
|
|
if isinstance(setter, MetadataSet):
|
|
new_meta.append(setter)
|
|
elif any(isinstance(t, type(setter)) for t in rewrites):
|
|
raise ValueError(f"rule conflicts with earlier action: {rule_s!r}")
|
|
else:
|
|
rewrites.append(setter)
|
|
self.new_meta.append(new_meta)
|
|
self.rewrites.append(rewrites)
|
|
|
|
try:
|
|
if_ok = any(self.tests)
|
|
except AttributeError:
|
|
if_ok = False
|
|
if not if_ok:
|
|
raise ValueError("no `if` condition in rule") from None
|
|
|
|
account_conditions: Set[_RootAccount] = set()
|
|
for test in self.tests:
|
|
if isinstance(test, AccountTest):
|
|
try:
|
|
operands = test.under_args
|
|
except AttributeError:
|
|
operands = [test.operand]
|
|
account_conditions.update(_RootAccount.from_account(s) for s in operands)
|
|
if len(account_conditions) == 1:
|
|
account_condition: Optional[_RootAccount] = account_conditions.pop()
|
|
else:
|
|
account_condition = None
|
|
|
|
number_reallocation = Decimal()
|
|
for rewrite in self.rewrites:
|
|
rewrite_number = Decimal(1)
|
|
for rule in rewrite:
|
|
if isinstance(rule, AccountSet):
|
|
new_root = _RootAccount.from_account(rule.value)
|
|
if new_root is not account_condition:
|
|
raise ValueError(
|
|
f"cannot assign {new_root} account "
|
|
f"when `if` checks for {account_condition}",
|
|
)
|
|
elif isinstance(rule, NumberSet):
|
|
rewrite_number = rule.value
|
|
number_reallocation += rewrite_number
|
|
|
|
if not number_reallocation:
|
|
raise ValueError("no rewrite actions in rule")
|
|
elif number_reallocation != 1:
|
|
raise ValueError(f"rule multiplies number by {number_reallocation}")
|
|
|
|
def match(self, post: data.Posting) -> bool:
|
|
return all(test(post) for test in self.tests)
|
|
|
|
def rewrite(self, post: data.Posting) -> Iterator[data.Posting]:
|
|
for rewrite, new_meta in zip(self.rewrites, self.new_meta):
|
|
kwargs = dict(setter(post) for setter in rewrite)
|
|
if new_meta:
|
|
meta = post.meta.detached()
|
|
meta.update(meta_setter(post) for meta_setter in new_meta)
|
|
kwargs['meta'] = meta
|
|
yield post._replace(**kwargs)
|
|
|
|
|
|
class RewriteRuleset:
|
|
def __init__(self, rules: Iterable[RewriteRule]) -> None:
|
|
self.rules = list(rules)
|
|
|
|
def rewrite(self, posts: Iterable[data.Posting]) -> Iterator[data.Posting]:
|
|
for post in posts:
|
|
for rule in self.rules:
|
|
if rule.match(post):
|
|
yield from rule.rewrite(post)
|
|
break
|
|
else:
|
|
yield post
|
|
|
|
@classmethod
|
|
def from_yaml(cls, source: Union[str, IO, Path]) -> 'RewriteRuleset':
|
|
if isinstance(source, Path):
|
|
with source.open() as source_file:
|
|
return cls.from_yaml(source_file)
|
|
doc = yaml.safe_load(source)
|
|
if not isinstance(doc, list):
|
|
raise ValueError("YAML root element is not a list")
|
|
for number, item in enumerate(doc, 1):
|
|
if not isinstance(item, Mapping):
|
|
raise ValueError(f"YAML item {number} is not a rule hash")
|
|
for key, value in item.items():
|
|
if not isinstance(value, list):
|
|
raise ValueError(f"YAML item {number} {key!r} value is not a list")
|
|
elif not all(isinstance(s, str) for s in value):
|
|
raise ValueError(f"YAML item {number} {key!r} value is not all strings")
|
|
try:
|
|
logger.debug("loaded %s rewrite rules from YAML", number)
|
|
except NameError:
|
|
logger.warning("YAML source is empty; no rewrite rules loaded")
|
|
return cls(RewriteRule(src) for src in doc)
|