summary refs log tree commit diff stats
path: root/src/lib.rs
blob: bda97f46666d62d8aaf5bb80df0da2dedce442b8 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
// Copyright (C) 2021-2022 Soni L.
// SPDX-License-Identifier: MIT OR Apache-2.0

#![warn(elided_lifetimes_in_paths)]

//! Datafu is a regex-inspired query language. It was primarily
//! designed for processing object trees parsed from configuration files, but
//! can be used with anything that supports serde.
//!
//! # Languge Reference
//!
//! Datafu expressions have the ability to iterate, index, validate and filter
//! data structures, through the use of the syntax elements below.
//!
//! ## Syntax Elements of Datafu Expressions
//!
//! An arrow is `->` and indicates indexing/iteration. Whether indexing or
//! iteration is used is defined by the elements that follow, with iteration
//! being used by default.
//!
//! A variable is a sequence of alphanumeric characters, not starting with
//! a digit. The value of the matched element will be identified by this name.
//!
//! A literal is a sequence of characters delimited by `'`, optionally
//! followed by `?`, with `%` as the escape character, and defines a
//! string-keyed indexing operation. A literal can contain any character,
//! except unescaped `%` or `'` symbols, which must be escaped as
//! `%%` and `%'`, respectively. The sequence of characters defined by
//! a literal is used as the string object in the indexing operation.
//!
//! A parameter is `$`, optionally followed by `?`, followed by a
//! sequence of alphanumeric characters, not starting with a digit, and
//! defines an object-keyed indexing operation. The sequence of characters
//! defined by a parameter is used to retrieve, from the pattern's
//! definitions, the object to be used in the indexing operation.
//!
//! A regex is a sequence of characters delimited by `/`, optionally
//! followed by `?`, with `%` as the escape character. A regex can
//! contain any character, except unescaped `%` or `/` symbols, which
//! must be escaped as `%%` and `%/`, respectively. The sequence of
//! characters defined by a regex is passed to the `regex` crate, which
//! may apply further restrictions on the characters used, and is used to
//! accept the respective keys processed by the iterator.
//!
//! A predicate is `:`, optionally followed by `?`, followed by an
//! `$` and a sequence of alphanumeric characters, not starting with a
//! digit, and is used to accept values to be processed based on an
//! external [`Predicate`].
//!
//! A key match is a datafu expression (including, but not limited to, the
//! empty datafu expression) enclosed within `[` and `]`, optionally
//! prefixed with an identifier and zero or more predicates, and applies the
//! enclosed predicates and datafu expression to the key (or index) being
//! processed. A key match enables additional validation of keys and/or
//! extraction of values from keys, and accepts a key if and only if the
//! enclosed predicates accept the key and the enclosed expression matches the
//! key. The matched key is stored in the identifier.
//!
//! A subvalue is a datafu expression (including, but not limited to, the
//! empty datafu expression) enclosed within `(` and `)`, and applies
//! the enclosed datafu expression to the value (or index) being processed.
//! A subvalue enables the ability to match multiple values on the same
//! object, and accepts a value if and only the enclosed expression
//! matches the value. A subvalue can be made optional by the presence of
//! a `?` after the subvalue - in case of no match, it will just omit
//! the relevant keys in the result. Optional subvalues are unrelated to
//! non-validating syntax elements (see below), they just use the same
//! syntax.
//!
//! Some syntax elements can be validating or non-validating. Validating
//! syntax elements will return a [`errors::MatchError::ValidationError`]
//! whenever a non-accepted element is encountered, whereas non-validating
//! ones will skip them. Whether an element is validating is determined by
//! the absence of an optional `?` in the documented position. Note that
//! it is possible for a validating syntax element to still yield results
//! before returning a [`errors::MatchError::ValidationError`], so one
//! needs to be careful when writing code where such behaviour could
//! result in a security vulnerability.
//!
//! The empty pattern matches anything, but only does so once.
//!
//! ## Syntax of Datafu Expressions
//!
//! Datafu Expressions follow the given syntax, in (pseudo-)extended BNF:
//!
//! ```text
//! expression ::= [type] [predicate] {arrow tag} {subvalue}
//! tag ::= identifier [arg] [predicate] | arg [predicate]
//! arg ::= parameter | literal | regex | keymatch
//!
//! arrow ::= '->'
//! keymatch ::= '[' [name] expression ']'
//! subvalue ::= '(' expression ')' ['?']
//! ```
//!
//! For a description of the terminals "parameter", "literal", "regex" and
//! "predicate", see "Syntax Elements of Datafu Expressions" above.
//!
//! # Examples
//!
//! <!-- TODO -->

pub mod errors;
//pub mod type_tree;
mod parser;
mod pattern;
mod vm;

pub use pattern::Pattern;
pub use pattern::PatternBuilder;

/// A predicate.
type Predicate = dyn (Fn(
    &mut dyn erased_serde::Deserializer<'_>
) -> bool) + Send + Sync;

/// Helper to build predicates because closure inference is the worst.
///
/// # Examples
///
/// This doesn't work:
///
/// ```rust compile_fail
/// use serde::Deserialize;
/// use datafu::Predicate;
///
/// let x = Box::new(|v| String::deserialize(v).is_ok()) as Box<Predicate>;
/// ```
///
/// But this does:
///
/// ```rust
/// use serde::Deserialize;
///
/// let x = datafu::pred(|v| String::deserialize(v).is_ok());
/// ```
fn pred<F>(f: F) -> Box<Predicate>
where
    F: (Fn(
        &mut dyn erased_serde::Deserializer<'_>
    ) -> bool) +  Send + Sync + 'static,
{
    Box::new(f)
}