conservancy_beancount/conservancy_beancount/reports/rewrite.py
2020-09-10 15:16:49 -04:00

501 lines
18 KiB
Python

"""rewrite.py - Post rewriting for financial reports
Introduction
------------
There are some kinds of posting metadata that's too impractical to write when
you enter it the books. For example, the ``expense-type`` of employee payroll
is usually determined by the employee's records or estimate at the end of the
year. It isn't known when payroll expenses are posted throughout the year, and
then there's too many of them to go back and code it manually.
Rewrite rules solve this problem. They provide a mechanism to make safe, bulk
transformations to postings just after they're loaded and before they're
reported. They let you fill in the gaps between the data in the books and
different reporting requirements.
Most reporting tools load rewrite rules written in YAML, so the examples in
this documentation are written that way. (If you're developing reporting tools,
note RewriteRule accepts a native Python dictionary.) One typical rule looks
like::
if:
- SUBJECT OP VALUE
[- SUBJECT2 OP2 VALUE2
- …]
action1:
- SUBJECT OP VALUE
[- SUBJECT2 OP2 VALUE2
- …]
[action2:
- …
…]
A ruleset, as in a YAML file, is just an array of hashes like this.
Conditions and Actions
----------------------
The hash must have at least two keys. One of them must be ``if``, and its value
is an array of condition strings. The rest can have any name you like and are
actions. Each action transforms an original posting that matched the ``if``
conditions and yields a new posting from it. The value is an array of action
strings. Conditions and actions are written the same way;
conditions just use test operators, while actions use assignment operators.
Subjects
--------
There are two kinds of subjects, attributes and metadata.
Attributes start with a ``.`` and access data directly on the posting line,
or from the parent transaction line. You can use these attributes:
================ =======================================================
Name Description
================ =======================================================
``.account`` The name of the account on the posting line
---------------- -------------------------------------------------------
``.date`` The date of the posting's transaction. When you work on
a date, write the value in ISO ``YYYY-MM-DD`` format.
---------------- -------------------------------------------------------
``.number`` The number part of the posting's position;
i.e., the amount without the currency.
================ =======================================================
Any other string is a metadata key. As usual, if a condition tries to read
metadata that does not exist on the posting, it will fall back to checking the
transaction. Metadata values are always treated as strings. NOTE: This means
comparisons against non-string metadata values, like dates and amounts, might
not work the way you want.
Condition Operators
-------------------
Conditions can always use Python's basic comparison operators:
``== != < <= > >=``. You can also use the following:
================ =======================================================
Name Description
================ =======================================================
``.account in`` The value is parsed as a space-separated list of
account names. The condition matches when the posting's
account is any of those named accounts, or any of their
respective subaccounts.
================ =======================================================
Action Operators
----------------
You can set ``.account`` and any metadata with ``=``. Values are always treated
as strings.
You can also transform the posting's number using ``.number *= NUMBER``. This
is mainly used to divide the posting's amount across multiple actions in one
rule.
Execution
---------
When rewrite rules are applied to postings, the first rule whose condition
matches "wins." When a source posting matches a rule's conditions, its actions
are applied, and the transformed posting(s) replace the source posting.
No more rewrite rules are considered for either the original source posting
or the transformed posting(s).
Validations
-----------
Rewrite rules are validated to help ensure that you don't break the fundamental
accounting equation, Equity = Assets - Liabilities.
* If an action assigns to ``.account``, there must also be a condition to check
that the ``.account`` is in the same category, using ``==`` or ``in``.
You cannot change an Asset into a Liability or Equity, and so on.
* All actions in a rewrite rule must multiply ``.number`` by a total of 1.
(Actions that don't explicitly multiply the number are understood to
multiply it by 1.) For example, a rewrite rule can have two actions that
each multiply the number by .5, or one by .8 and the other by .2. It
cannot have two actions that each multiply the number by 1, or .3,
etc. Otherwise, the different postings of a transaction would not balance.
* You cannot assign to ``.date`` at all. Otherwise, you might separate postings
of the same transaction in time, and the accounting equation would not hold
during the time gap.
"""
# Copyright © 2020 Brett Smith
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
import abc
import datetime
import decimal
import enum
import logging
import operator as opmod
import re
from typing import (
Callable,
Dict,
Generic,
IO,
Iterable,
Iterator,
List,
Mapping,
Optional,
Pattern,
Sequence,
Set,
Tuple,
Type,
TypeVar,
Union,
)
from ..beancount_types import (
Meta,
MetaKey,
MetaValue,
)
from pathlib import Path
import yaml
from .. import data
Decimal = decimal.Decimal
T = TypeVar('T')
TestCallable = Callable[[T, T], bool]
CMP_OPS: Mapping[str, TestCallable] = {
'==': opmod.eq,
'>=': opmod.ge,
'>': opmod.gt,
'<=': opmod.le,
'<': opmod.lt,
'!=': opmod.ne,
}
# First half of this regexp is pseudo-attribute access.
# Second half is metadata keys, per the Beancount syntax docs.
SUBJECT_PAT = r'((?:\.\w+)+|[a-z][-\w]*)\b\s*'
logger = logging.getLogger('conservancy_beancount.reports.rewrite')
class _Registry(Generic[T]):
def __init__(self,
description: str,
parser: Union[str, Pattern],
default: Type[T],
*others: Tuple[str, Type[T]],
) -> None:
if isinstance(parser, str):
parser = re.compile(parser)
self.description = description
self.parser = parser
self.default = default
self.registry: Mapping[str, Type[T]] = dict(others)
def parse(self, s: str) -> T:
match = self.parser.match(s)
if match is None:
raise ValueError(f"could not parse {self.description} {s!r}")
subject = match.group(1)
operator = match.group(2)
operand = s[match.end():].strip()
if not subject.startswith('.'):
# FIXME: To avoid this type ignore, I would have to define a common
# superclass for Tester and Setter that provides a useful signature
# for __init__, including the versions that deal with Metadata,
# and then use that as the bound for our type variable.
# Not a priority right now.
return self.default(subject, operator, operand) # type:ignore[call-arg]
try:
retclass = self.registry[subject]
except KeyError:
raise ValueError(f"unknown subject in {self.description} {subject!r}") from None
else:
return retclass(operator, operand) # type:ignore[call-arg]
class Tester(Generic[T], metaclass=abc.ABCMeta):
OPS: Mapping[str, TestCallable] = CMP_OPS
def __init__(self, operator: str, operand: str) -> None:
try:
self.op_func = self.OPS[operator]
except KeyError:
raise ValueError(f"unsupported operator {operator!r}") from None
self.operand = self.parse_operand(operand)
@staticmethod
@abc.abstractmethod
def parse_operand(operand: str) -> T: ...
@abc.abstractmethod
def post_get(self, post: data.Posting) -> T: ...
def __call__(self, post: data.Posting) -> bool:
return self.op_func(self.post_get(post), self.operand)
class AccountTest(Tester[str]):
def __init__(self, operator: str, operand: str) -> None:
if operator == 'in':
self.under_args = operand.split()
for name in self.under_args:
self.parse_operand(name)
else:
super().__init__(operator, operand)
@staticmethod
def parse_operand(operand: str) -> str:
if data.Account.is_account(f'{operand}:RootsOK'):
return operand
else:
raise ValueError(f"invalid account name {operand!r}")
def post_get(self, post: data.Posting) -> str:
return post.account
def __call__(self, post: data.Posting) -> bool:
try:
return post.account.is_under(*self.under_args) is not None
except AttributeError:
return super().__call__(post)
class DateTest(Tester[datetime.date]):
@staticmethod
def parse_operand(operand: str) -> datetime.date:
return datetime.datetime.strptime(operand, '%Y-%m-%d').date()
def post_get(self, post: data.Posting) -> datetime.date:
return post.meta.date
class MetadataTest(Tester[Optional[MetaValue]]):
def __init__(self, key: MetaKey, operator: str, operand: str) -> None:
super().__init__(operator, operand)
self.key = key
@staticmethod
def parse_operand(operand: str) -> str:
return operand
def post_get(self, post: data.Posting) -> Optional[MetaValue]:
return post.meta.get(self.key)
class NumberTest(Tester[Decimal]):
@staticmethod
def parse_operand(operand: str) -> Decimal:
try:
return Decimal(operand)
except decimal.DecimalException:
raise ValueError(f"could not parse decimal {operand!r}")
def post_get(self, post: data.Posting) -> Decimal:
return post.units.number
TestRegistry: _Registry[Tester] = _Registry(
'condition',
'^{}{}'.format(
SUBJECT_PAT,
r'({}|in)'.format('|'.join(re.escape(s) for s in Tester.OPS)),
),
MetadataTest,
('.account', AccountTest),
('.date', DateTest),
('.number', NumberTest),
)
class Setter(Generic[T], metaclass=abc.ABCMeta):
_regparser = re.compile(r'^{}{}'.format(
SUBJECT_PAT,
r'',
))
_regtype = 'setter'
@abc.abstractmethod
def __call__(self, post: data.Posting) -> Tuple[str, T]: ...
class AccountSet(Setter[data.Account]):
def __init__(self, operator: str, value: str) -> None:
if operator != '=':
raise ValueError(f"unsupported operator for account {operator!r}")
self.value = data.Account(AccountTest.parse_operand(value))
def __call__(self, post: data.Posting) -> Tuple[str, data.Account]:
return ('account', self.value)
class MetadataSet(Setter[str]):
def __init__(self, key: str, operator: str, value: str) -> None:
if operator != '=':
raise ValueError(f"unsupported operator for metadata {operator!r}")
self.key = key
self.value = value
def __call__(self, post: data.Posting) -> Tuple[str, str]:
return (self.key, self.value)
class NumberSet(Setter[data.Amount]):
def __init__(self, operator: str, value: str) -> None:
if operator != '*=':
raise ValueError(f"unsupported operator for number {operator!r}")
self.value = NumberTest.parse_operand(value)
def __call__(self, post: data.Posting) -> Tuple[str, data.Amount]:
number = post.units.number * self.value
return ('units', post.units._replace(number=number))
SetRegistry: _Registry[Setter] = _Registry(
'action',
rf'^{SUBJECT_PAT}([-+/*]?=)',
MetadataSet,
('.account', AccountSet),
('.number', NumberSet),
)
class _RootAccount(enum.Enum):
Assets = 'Assets'
Liabilities = 'Liabilities'
Equity = 'Equity'
@classmethod
def from_account(cls, name: str) -> '_RootAccount':
root, _, _ = name.partition(':')
try:
return cls[root]
except KeyError:
return cls.Equity
class RewriteRule:
def __init__(self, source: Mapping[str, List[str]]) -> None:
self.new_meta: List[Sequence[MetadataSet]] = []
self.rewrites: List[Sequence[Setter]] = []
for key, rules in source.items():
if key == 'if':
self.tests = [TestRegistry.parse(rule) for rule in rules]
else:
new_meta: List[MetadataSet] = []
rewrites: List[Setter] = []
for rule_s in rules:
setter = SetRegistry.parse(rule_s)
if isinstance(setter, MetadataSet):
new_meta.append(setter)
elif any(isinstance(t, type(setter)) for t in rewrites):
raise ValueError(f"rule conflicts with earlier action: {rule_s!r}")
else:
rewrites.append(setter)
self.new_meta.append(new_meta)
self.rewrites.append(rewrites)
try:
if_ok = any(self.tests)
except AttributeError:
if_ok = False
if not if_ok:
raise ValueError("no `if` condition in rule") from None
account_conditions: Set[_RootAccount] = set()
for test in self.tests:
if isinstance(test, AccountTest):
try:
operands = test.under_args
except AttributeError:
operands = [test.operand]
account_conditions.update(_RootAccount.from_account(s) for s in operands)
if len(account_conditions) == 1:
account_condition: Optional[_RootAccount] = account_conditions.pop()
else:
account_condition = None
number_reallocation = Decimal()
for rewrite in self.rewrites:
rewrite_number = Decimal(1)
for rule in rewrite:
if isinstance(rule, AccountSet):
new_root = _RootAccount.from_account(rule.value)
if new_root is not account_condition:
raise ValueError(
f"cannot assign {new_root} account "
f"when `if` checks for {account_condition}",
)
elif isinstance(rule, NumberSet):
rewrite_number = rule.value
number_reallocation += rewrite_number
if not number_reallocation:
raise ValueError("no rewrite actions in rule")
elif number_reallocation != 1:
raise ValueError(f"rule multiplies number by {number_reallocation}")
def match(self, post: data.Posting) -> bool:
return all(test(post) for test in self.tests)
def rewrite(self, post: data.Posting) -> Iterator[data.Posting]:
for rewrite, new_meta in zip(self.rewrites, self.new_meta):
kwargs = dict(setter(post) for setter in rewrite)
if new_meta:
meta = post.meta.detached()
meta.update(meta_setter(post) for meta_setter in new_meta)
kwargs['meta'] = meta
yield post._replace(**kwargs)
class RewriteRuleset:
def __init__(self, rules: Iterable[RewriteRule]) -> None:
self.rules = list(rules)
def rewrite(self, posts: Iterable[data.Posting]) -> Iterator[data.Posting]:
for post in posts:
for rule in self.rules:
if rule.match(post):
yield from rule.rewrite(post)
break
else:
yield post
@classmethod
def from_yaml(cls, source: Union[str, IO, Path]) -> 'RewriteRuleset':
if isinstance(source, Path):
with source.open() as source_file:
return cls.from_yaml(source_file)
doc = yaml.safe_load(source)
if not isinstance(doc, list):
raise ValueError("YAML root element is not a list")
for number, item in enumerate(doc, 1):
if not isinstance(item, Mapping):
raise ValueError(f"YAML item {number} is not a rule hash")
for key, value in item.items():
if not isinstance(value, list):
raise ValueError(f"YAML item {number} {key!r} value is not a list")
elif not all(isinstance(s, str) for s in value):
raise ValueError(f"YAML item {number} {key!r} value is not all strings")
try:
logger.debug("loaded %s rewrite rules from YAML", number)
except NameError:
logger.warning("YAML source is empty; no rewrite rules loaded")
return cls(RewriteRule(src) for src in doc)