summary refs log tree commit diff stats
path: root/src/lib.rs
blob: 8fa727f8574b201f4454c729401d5f79d6608694 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// Copyright (C) 2021-2022 Soni L.
// SPDX-License-Identifier: MIT OR Apache-2.0

//! Datafu is a regex-inspired query language. It was primarily
//! designed for processing object trees parsed from configuration files, but
//! can be used with anything that supports serde.
//!
//! # Languge Reference
//!
//! Datafu expressions have the ability to iterate, index, validate and filter
//! data structures, through the use of the syntax elements below.
//!
//! ## Syntax Elements of Datafu Expressions
//!
//! An arrow is `->` and indicates indexing/iteration. Whether indexing or
//! iteration is used is defined by the elements that follow, with iteration
//! being used by default.
//!
//! A variable is a sequence of alphanumeric characters, not starting with
//! a digit. The value of the matched element will be identified by this name.
//!
//! A literal is a sequence of characters delimited by `'`, optionally
//! followed by `?`, with `%` as the escape character, and defines a
//! string-keyed indexing operation. A literal can contain any character,
//! except unescaped `%` or `'` symbols, which must be escaped as
//! `%%` and `%'`, respectively. The sequence of characters defined by
//! a literal is used as the string object in the indexing operation.
//!
//! A parameter is `$`, optionally followed by `?`, followed by a
//! sequence of alphanumeric characters, not starting with a digit, and
//! defines an object-keyed indexing operation. The sequence of characters
//! defined by a parameter is used to retrieve, from the pattern's
//! definitions, the object to be used in the indexing operation.
//!
//! A regex is a sequence of characters delimited by `/`, optionally
//! followed by `?`, with `%` as the escape character. A regex can
//! contain any character, except unescaped `%` or `/` symbols, which
//! must be escaped as `%%` and `%/`, respectively. The sequence of
//! characters defined by a regex is passed to the `regex` crate, which
//! may apply further restrictions on the characters used, and is used to
//! accept the respective keys processed by the iterator.
//!
//! A predicate is `:`, optionally followed by `?`, followed by an
//! `$` and a sequence of alphanumeric characters, not starting with a
//! digit, and is used to accept values to be processed based on an
//! external [`Predicate`].
//!
//! A key match is a datafu expression (including, but not limited to, the
//! empty datafu expression) enclosed within `[` and `]`, optionally
//! prefixed with an identifier and zero or more predicates, and applies the
//! enclosed predicates and datafu expression to the key (or index) being
//! processed. A key match enables additional validation of keys and/or
//! extraction of values from keys, and accepts a key if and only if the
//! enclosed predicates accept the key and the enclosed expression matches the
//! key. The matched key is stored in the identifier.
//!
//! A subvalue is a datafu expression (including, but not limited to, the
//! empty datafu expression) enclosed within `(` and `)`, and applies
//! the enclosed datafu expression to the value (or index) being processed.
//! A subvalue enables the ability to match multiple values on the same
//! object, and accepts a value if and only the enclosed expression
//! matches the value. A subvalue can be made optional by the presence of
//! a `?` after the subvalue - in case of no match, it will just omit
//! the relevant keys in the result. Optional subvalues are unrelated to
//! non-validating syntax elements (see below), they just use the same
//! syntax.
//!
//! Some syntax elements can be validating or non-validating. Validating
//! syntax elements will return a [`errors::MatchError::ValidationError`]
//! whenever a non-accepted element is encountered, whereas non-validating
//! ones will skip them. Whether an element is validating is determined by
//! the absence of an optional `?` in the documented position. Note that
//! it is possible for a validating syntax element to still yield results
//! before returning a [`errors::MatchError::ValidationError`], so one
//! needs to be careful when writing code where such behaviour could
//! result in a security vulnerability.
//!
//! The empty pattern matches anything, but only does so once.
//!
//! ## Syntax of Datafu Expressions
//!
//! Datafu Expressions follow the given syntax, in (pseudo-)extended BNF:
//!
//! ```text
//! expression ::= {arrow tag} {subvalue}
//! tag ::= identifier [arg] {predicate} | arg {predicate}
//! arg ::= parameter | literal | regex | keymatch
//!
//! arrow ::= '->'
//! keymatch ::= '[' {tag} {predicate} expression ']'
//! subvalue ::= '(' {predicate} expression ')' ['?']
//! ```
//!
//! For a description of the terminals "parameter", "literal", "regex" and
//! "predicate", see "Syntax Elements of Datafu Expressions" above.
//!
//! # Examples
//!
//! <!-- TODO -->

pub mod errors;
mod parser;
mod pattern;
mod vm;

pub use pattern::Pattern;

/// A predicate.
pub type Predicate = dyn (for<'x, 'de, 'a> Fn(
    &'x (dyn 'a + erased_serde::Deserializer<'de>)
) -> bool) + Send + Sync;

/// Helper to build predicates because HRTB inference is the worst.
pub fn pred<F>(f: F) -> Box<Predicate>
where
    F: (for<'x, 'de, 'a> Fn(
        &'x (dyn 'a + erased_serde::Deserializer<'de>)
    ) -> bool) +  Send + Sync + 'static,
{
    Box::new(f)
}