Mycroft — A Predicate Logic Language — Overview / Post-mortem

A few years ago, I designed a programming language & wrote a reference implementation for it. This language is the end product of about five years of thinking about planners, distributed computing, and what it means to make logical inferences about real-world problems. It’s primarily inspired by Prolog, but has a couple ideas imported from Church and from DHT systems like Chord.

Being a university undergraduate through most of my time designing this system (and not being in mathematics), what I designed doesn’t take advantage of potentially powerful ideas that are common in automated theorem provers, simply because I lacked the background to integrate those ideas. Nevertheless, I feel like Mycroft presents a couple ideas that (while not totally novel) are underrepresented in other planner-based languages, and I have done my best to foreground those.

My implementation looks fairly polished, and is still usable; however, early design choices with regard to how parsing was done have led to major bugs that severely limit the utility of this implementation for the purposes I intended, and these bugs have proven very difficult to track down and solve. Of course, a language with exactly one implementation is of limited value, and I never intended this to be anything more than a proof of concept — I think the ideas presented are valuable enough to warrant a clear writeup, regardless of problems in an unrelated portion of the implementation. I would be interested in seeing some features I integrated into Mycroft appear in more popular languages like Coq, or in planner libraries for non-planer-based languages, even if no independent implementation of Mycroft is ever written.

Problems with Prolog

Most of the design decisions I made in Mycroft are direct reactions to problems I see in Prolog. Some of these problems are almost fatal in terms of potential real-world use. Others are purely conceptual — places where I felt that Prolog’s design subtly failed to reflect some important aspect of problems it could otherwise be applied to.

Prolog is famously slow. This is not to say that Prolog cannot be optimized. Instead, it’s more accurate to say that a big draw of Prolog is the ability to write programs in the form of an intuitively-understandable set of constraints, without considering execution flow, but that programs so structured will almost inevitably be much slower than similarly naive implementations in imperative languages. Writing efficient Prolog involves tracing execution flow and rearranging the order in which paths are attempted in order to “fail fast” — and while Prolog execution order is fairly easy to understand, highly-optimized Prolog code (replete with cuts and un-idiomatic structures) can be highly unintuitive.

Naive code in Prolog is compact, but its execution is highly repetitive. Most of this repetition would not be necessary, had the planner access to certain out-of-band information. (Of course, since Prolog is standardized, no planner can make unsafe inferences or other ‘intelligent’ optimizations, and the heavy use of planner-execution-order-based optimizations in real Prolog code makes experimenting with optimizations difficult, since messing with the planner is almost guaranteed to break existing programs.)

Prolog freely mixes determinate (i.e., “pure prolog”) predicates and non-determinate predicates (of the type that would be hidden behind a monad in a modern functional language), which means it’s up to the developer to keep these straight and structure things so debugging is straightforward. This failure to distinguish determinate predicates from those that contain externalities also has implications for potential optimizations.

Furthermore, Prolog’s truth values are wholly binary. A Yes/No value makes sense in a toy planner, but for an expert system (theoretically expected to grow to incorporate new heuristics and provide meaningful advice to a practitioner in a real-world situation) a Yes/No value is of limited utility. Consider the use case of diagnostic aid in medicine (something expert systems have been used in for quite some time): a system that provides an un-annotated list of potential diagnoses is almost useless, unless it has an extremely limited scope; a system that presents a list of potential diagnoses with associated likelihoods based on a combination of base likelihoods, symptoms, and symptom scarcity is much more useful, and one that can proceed to re-weigh the list based on severity and on the risks involved in certain tests and treatments is even more so. Getting such numbers out of Prolog is possible but involves threading probablistic truth and confidence values through the system (possibly many at a time) and implementing the calculation of probabilities yourself, while almost totally circumventing Prolog’s existing system for returning truth values!

Prolog is also limited in the extent to which its structure deals with the expectation of incremental expansion. In part because of the use of binary truth values, there’s no way to express that something is uncomputable as distinct from false (and in fact, the normal way to negate a predicate within another predicate definition is semantically equivalent to ‘has no true expansion’ rather than ‘is false’). So, we’re stuck with a ‘closed world’ assumption. Closed world assumptions have a couple benefits — we can expand the entire herbrand universe and try every possible input to try to satisfy a constraint — but the assumption that our program completely models everything about a problem is, at a fundamental level, never true outside of fairly trivial or purely synthetic situations. We can model all of mathematics starting from first principles, but modeling real-world situations (like disease diagnosis or finding subterranian oil) necessarily means making lossy assumptions and operating on potentially-faulty information. Expert systems are typically loose webs of heuristics, all slightly dubious, and a system based on such heuristics that presents its conclusions as a simple Yes or No value is lying. We should have a little more humility about our conclusions, and so should our machines.

Getting more LIPS (Logical Inferences Per Second)

Speed is one of the big reasons Prolog is so rarely used. Even when it was popular, it was common to prototype an expert system in Prolog and port it by hand to C++ when it was debugged. Keeping compatibility with Prolog (as noted above) severely limits the set of possible optimizations.

Because Prolog is optimized based on an understanding of traversal order, we can’t really do what Miranda and Erlang do and evaluate multiple paths simultaneously. Anyhow, those languages perform this operation using guards — and the conditions in those guards are essentially predicates themselves. Most of the time, if a guard makes sense and is cheap enough to compute, it becomes the first term in a pred’s definition.

If a single easily-computed condition isn’t enough to determine the success of the entire operation, then don’t we want to execute all branches? Furthermore, if we’re trying to find out how likely something is to be true and using fuzzy truth values of any type, don’t we want to combine the truth values of all branches?

Today, multitasking is a lot easier than it was in the 70s — our CPUs have multiple cores and we often have access to whole clusters of machines. So, if we have multiple paths to choose from, it makes sense to choose all of them. To this end, Mycroft has a mechanism built into the planner itself which distributes jobs to peers. In the normal case, a predicate with several branches will have each branch sent to a peer — though we implement a cut operation specifically in order to make more traditional guards possible (and this cut operation will force operations to be run in a particular order).

We also have a lot more ram than we used to, and access to good hashing algorithms. Since determinate predicates will always have the same output with the same input, we can memoize those results and return them without recomputing. Since hostnames and pieces of code will hash to essentially similar formats, we can save memory by controlling our redudancy in a manner likely to produce a pretty even distribution of cached results: we store the result not only in the node that computed it but also in the node whose hostname’s hash is most similar to the hash of the piece of code we’ve run. (We can store it in the N machines with the closest hash, too, in case a machine goes down unexpectedly.)

The result is that a complicated predicate with N distinct branches can run on N machines within the time it takes to run its most difficult branch, and that difficult branches become easier as more of their solution space is memoized.

(Now, internally, I decompose big predicates into pairs of trees — one tree containing zero, one, or two predicates ANDed together, and the other ORed together. These synthetic predicates are callable and can be memoized as long as their children are determinate. So, in practice this means that rather than memoizing every top-level input to a big function, we are memoizing every intermediate operation. We store a lot of values really quickly, but for each one we have replaced a subtree of unknown complexity with a constant-time lookup. In other words, over sufficient iterations, all fully-determinate predicates approach O(1), and this happens much faster than it would if we were just memoizing the top of the tree.)

Non-determinate predicates, because they might write something or otherwise modify state, cannot be skipped unless a cut is present. So, best practice is to limit your use of them. Any determinate predicate cannot run a non-determinate predicate.

Opening up the universe

As I mentioned in ‘Problems with Prolog’, a closed universe is implied by two-valued logic, and neither two-value logic nor a closed universe is accurate when it comes to modeling the real world. (Two-valued logic isn’t even technically sufficient to represent the domain of proofs in pure math, as Godel pointed out. But, we aren’t representing this either, except incidentally.)

In order to address this, I use a composite truth value (CTV). This is a concept I have taken from Probablistic Logic Networks, but the CTVs I use operate differently from theirs. (If you want to use CTVs in a real system & you have a proper math background, I recommend reading Probablistic Logic Networks and using their system, rather than modeling your implementation on mine.) I have simplified the math as far as I can while keeping this more useful than pure fuzzy values for real decision-making purposes, but anybody familiar with statistics or probability will immediately see room for improvement.

I represent CTVs (in an ode to bra-ket notation) as an ordered pair of numbers encased in angle brackets. Each number is between 0 and 1 inclusive. The first number represents ‘truth’, and the second ‘confidence’.

There are a couple ‘special’ CTVs:

<1, 1> is YES (100% true, with 100% confidence)
<0, 1> is NO (100% false, with 100% confidence)
<*, 0> is NC (“No Confidence”, or “No Clue” — any value with a zero confidence value is NC)
Identities:
For any X,
YES AND X => X
NO OR X => X
NC OR X => X
YES OR X => YES
NO AND X => NO
NC AND X => NC
(These follow from the definitions and the math, but they are also used to optimize execution, letting us skip steps when they contain only determinate predicates.)
Otherwise:
<A,B> AND <C,D> => <A*C,B*D>
<A,B> OR <C,D> => <A*B+C*D-A*C,min(B,D)>

NC essentially means that there isn’t enough information (or not enough confidence) to make a determination. Any undefined variable is NC; any undefined predicate returns NC; a predicate, when it is asked to evaluate something that doesn’t make sense, should return NC rather than NO.

We can lossily ‘flatten’ a CTV into a fuzzy truth value by multiplying the two components.

By supporting NC, we have performed half of the work of integrating the open-world assumption into Mycroft. The other half is our handling of unification — which is to say, not handling it.

Rather than unification (which involves keeping a variable’s value in limbo until all the constraints that affect it are solved, then allowing it to collapse into the appropriate value), we simply make all variables immutable unless unassigned and set the semantics of predicatess such that they return True if their constraint is met on all parameters or if currently-assigned parameters can have a value set that will meet the constraints.

This is different from Prolog since no variable can remain in limbo for very long — if two predicates are in an AND relationship and the first value that the first predicate assigns causes the second predicate to fail, we do not automatically backtrack and try a second possible value — the whole conjunction fails (even if some value that would cause it to succeed exists).

If we want this backtracking behavior, we instead use a function that takes the set of valid inputs for each parameter and iterates over all permutations of all valid inputs, and unifies some parameter to either the first set of values to return a non-NO non-NC value or a list of all sets of values that do so.

Implementation details

I’ve spent most of this article describing features and attributes of Mycroft that I think would be valuable in planner-based languages generally, and I have tried to keep the discussion mostly relevant to such languages, which are probably implemented very differently from mine. However, I have some pride in parts of Mycroft’s design that are not so portable, so I’d like to briefly mention them here.

Mycroft implements a REPL, and the REPL has a built-in help system. The help system is really just an associative array between predicate names and short descriptions. This isn’t a hard thing to implement, but I consider it very important — every REPL should have a built-in help system containing information about all built-in functions and any documentation imported from modules, since being able to look at documentation during interactive development is extremely useful. Not only are all of my built-in functions documented here, but all of the language’s documentation is embedded here too, including information about internals and a language tutorial. The man page for the interpreter is actually generated using the internal help system.

The mechanism for work distribution is round-robin-based, and proceeds to send to the next item in the list if the predicate isn’t answered within a timeout period or the host gets NC as a reply. There’s a similar distribution model for predicate definitions, and for memoized results (which are stored as facts — predicate definitions that, because they have particular parameter values associated with them, have exactly one known return value). Upon getting a shutdown request, a node will use this method to give its internally-stored definitions and memoized results to its peers. When ‘directed mode” is enabled, we perform a sort by hash similarity on this host list before the round-robin mechanism for requests is used, so that no node is overloaded and a result can generally be expected from the first node. The number of nodes to try for a request before giving up is essentially equivalent to the replication factor because of this shared code.

Mycroft internals: networking and message passing

Mycroft polyfills a table called mycnet with the following settings:
mycnet.port=1960
mycnet.directedMode=true
mycnet.backlog=512
The following values:
mycnet.peers={} -- each element is an array consisting of a hostname followed by a port
mycnet.peerHashed={} -- an associative array of the elements of the peers table keyed by the sha256 hash of the serialized elements
mycnet.mailbox={} -- a set of requests we have recieved from our peers
mycnet.pptr=1 -- the index into the peers table pointing to the 'current' selected peer
mycnet.forwardedLines={} -- a list of facts we've already forwarded, to prevent infinite loops
And, the following methods:
mycnet.getPeers(world) -- get a list of peers
mycnet.getCurrentPeer(world) 
mycnet.getNextPeer(world) -- increment mycnet.pptr and return getCurrentPeer
mycnet.forwardRequest(world, c) -- send a line of code to next peer
mycnet.forwardFact(world, c) -- send a line of code to all peers, if it has not already been sent
mycnet.checkMailbox(world) -- get a list of requests from peers
mycnet.yield(world) -- process a request from the mailbox, if one exists
mycnet.restartServer()
mycnet.hashPeers(world, force) -- hash the list of peers. If 'force' is set, then rehash all peers; otherwise, hash only if the number of peers has changed
If directedMode is set to false, a request will be sent to the next peer in the peers table, in round-robin fashion.
If directedMode is set to true, then for any given request, the order in which it is sent will be defined by the comparison of some consistent hash (in this case sha256) of the signature of the first predicate with the hash of the entire serialized representation of the peer — we send to the peer with the most similar hash first, and send to all others in order of more or less decreasing similarity. This method of routing is similar to Chord’s DHT routing.
The rationale behind directedMode is that routing will (in the normal case) require many fewer hops and much less replication will be necessary: even though the routing is arbitrary, it represents a consistent ordering and is more resistant to split-brain problems in a non-fully-connected network than round robin routing in a similarly connected network outside of pathological cases. However, this comes at the cost of making sybil attacks on particular known requests easier: an attacker with the ability to set his own node name can guarantee control over that predicate for as long as he is on the network by ensuring he is always in control of the node name with the closest hash to the predicate (so if the attacker can precompute hash collisions — which may not be so difficult since for performance reasons on embedded hardware we aren’t using a cryptographic hash — he can avoid needing to control most of the network).
Because of the request ordering in directedMode, even if the node that already has the solution for a particular determinate query is not (or no longer) the best candidate in terms of hash similarity, it will as a result of responding to the request distribute the response to be memozied by the most-matching node in its peer list first (meaning that over time the routing improves).
Memoized results should eventually be put into a priority queue, and a redistribution protocol of memoized results is planned as follows: results for which the current node is the closest hash match should never be discarded by the gc even if they are very stale, and results at the bottom of the queue should be forwarded to the best-matching node that responds within timeout period before being discarded by the gc. A node that is going offline should send its entire ‘world’ to every node, in the order defined by directedMode. (Currently there is no distinction between memoized results and user-defined facts that have been distributed by a peer; however, since only facts are memoized, and facts are small, this may be sufficient.)
On NodeMCU (i.e., on an ESP8622), a device should determine whether or not an AP exists with some special name and connect to it if it does, but switch into AP mode and create it if it does not. This is not yet impemented.
Depending upon whether or not a node is in daemon mode (along with other circumstances), timeouts are changed in order to ensure reliability. Currently, the default timeout for daemon mode for server listen in 300 seconds and the current default timeout outside of daemon mode is one tenth of a second. Timeouts on connections to other peers during request forwarding are set to one tenth of a second. This is hard-coded but subject to future tuning.

Mycroft has a fairly conventional error reporting system (similar to that found in Python or Java) for runtime errors, and user code can throw and catch arbitrary errors. This is missing from Prolog, where NO often is overloaded to mean error.

Mycroft gets rid of Prolog’s awkward syntax for lists in favor of using parentheses and commas, and uses a couple built-in list manipulation predicates to slice and dice lists into other lists, append lists, or get values. (Of course, lists are also immutable, and these list manipulation functions double as conditionals).

Mycroft has a ‘paranoid’ and ‘non-paranoid’ mode. In non-paranoid mode, nodes can be added to or removed from the cluster, filesystem operations are possible, modules can be imported, the state of the world can be exported, functions written in Lua can be added or removed, and the interactive interpreter can be spawned. The paranoid mode is essentially sandboxed. One can start a cluster and then immediately set paranoid mode on all nodes other than the one running the interactive interpreter, if one wants to be safe. (Or, you can set paranoid mode on the interactive interpreter too, though that makes it difficult to perform some operations.)

I’m proud of Mycroft’s syntax for preserving everything I like about Prolog’s syntax and getting rid of everything I dislike:

Mycroft syntax

Mycroft has a prolog-like syntax, consisting of predicate definitions and queries.
A predicate definition is of the form:
det predicateName(Arg1, Arg2...) :- predicateBody.
where predicateBody consists of a list of references to other predicates and truth values.
A predicate that is marked ‘det’ is determinate — if it ever evaluates to a different value than it has cached, an error is thrown. An indeterminate predicate can be marked ‘nondet’ instead, meaning that its return value can change and so results from it will not be memoized.
A predicate body is of the form:
item1, item2... ; itemA, itemB...
where each item is either a reference to a predicate or a truth value. Sections separated by ‘;’ are ORed together, while items separated by ‘,’ are ANDed together.
A reference to a predicate is of the form predicateName(Arg1, Arg2…)
A truth value (or composite truth value, or CTV) is of the form
<Truth, Confidence>
where both Truth and Confidence are floating point values between 0 and 1. <X| is syntactic sugar for <X, 1.0>; |X> is syntactic sugar for <1.0, X>; YES is syntactic sugar for <1.0, 1.0>; NO is syntactic sugar for <0.0, 1.0>; and, NC is syntactic sugar for < X, 0.0> regardless of the value of X.
A query is of the form:
?- predicateBody.
The result of a query will be printed to standard output.
Comments begin with a hash mark:
# This is a "comment".
Variables begin with a capital letter, and are immutable:
?- set(X, "Hello"), print(X). # prints "Hello"
?- set(X, "Hello"), set(X, "Goodbye"). # fails
?- set(x, "Hello"). # also fails
Strings are surrounded by double quotes, however, words containing only letters, numbers, and underscores that do not begin with an uppercase letter will be evaluated as strings. Unbound variables will also evaluate as the string value of their names:
?- equal(hello, "hello"). # true
?- equal(X, "X"). # also true
For some simple examples, see the test suite.

More by A pig in a cage on antibiotics

Topics of interest

More Related Stories