Friday, 13 March 2015

How to implement transactions for a triple store in Prolog

For the current research project I'm working on, we're using and RDF triple-store and need to be able to store a large numbers of triples, and assure that after any update a number of constraints are satisfied on the structure of the triples according to our ontology.

In order to implement inserts, updates and deletes, they need to be packaged up inside of a transaction that only commits if the constraints are satisfied in the final image of the triple store.  This means we need a way to do a post-condition driven transaction.

One easy way to implement this is to simply copy the entire database, run the insert/update/delete statements and see if we satisfy the transaction.  If we are changing enormous numbers of triples, it might indeed be the fastest and most reasonable approach.   If we are only changing one triple however, this seems a totally mad procedure.

There is another approach and I'll describe it as follows.

Let's assume that our constraint predicate 'constraint/0' is already defined.   What we want to do is keep two alternative layers over the actual triple store, one negative, and one positive.  We then carry out something along the lines of the painters algorithm.  How this works will become more obvious as we write our predicates.  If the constraint C is satisfied on the final version, we then "commit" by deleting everything from the negative graph, and insert everything from the positive graph.

 :- module(transactionGraph,[rdf1/4, insert/4, update/5, delete/4]).  
 :- use_module(library(semweb/rdf_db)).  
 % How to implement post-condition transactions  
 neg(A1,A2) :- atom_concat('neg-', A1, A2).   
 pos(A1,A2) :- atom_concat('pos-', A1, A2).  

What we are going to do is name two new graphs that are derived from our current graphs.  To do this properly we'd want to use a gensym/uuid (which most prologs provide).

 :- rdf_meta xrdf(r,r,t,?).
xrdf(X,Y,Z,G) :- (pos(G,GP), rdf_db:rdf(X,Y,Z,GP)    % If in positive graph, return results
                  *-> true                           % 
    ; (neg(G,GN), rdf_db:rdf(X,Y,Z,GN) % If it's not negative
              *-> false                       % If it *is* negative, fail
       ; rdf_db:rdf(X,Y,Z,G))).        % otherwise bind the rdf result.

This is the meat of the idea.   Our rdf triple store keeps triples in the predicate rdf/4, along with the name of the graph.  We are going to mimic this predicate with some additional logic.  You can safely ignore the 'rdf_meta' clause as it's nothing to do with the logic.

What we want is to check if our triple is in the positive graph.   If not we check to see if it is in the negative graph - negation will be seen as failure.  Notice, if it's in the negative graph we perform a cut using the ! symbol.  This means that we are not going to search any other clauses in rdf1/4 and this keeps us from using the underlying database.

 % you can safely ignore rdf_meta for understanding this programme  
 % it only affects namespace prefix handling.
:- rdf_meta insert(r,r,t,?).
insert(X,Y,Z,G) :- 
    pos(G,G2),
    % positive pos graph
    rdf_assert(X,Y,Z,G2), 
    % retract from the negative graph, if it exists.
    (neg(G,G3), rdf_db:rdf(X,Y,Z,G3), rdf_retractall(X,Y,Z,G3) ; true).

Here we have our insert statement.  What we do is assert ourselves into the positive graph and make sure we retract any deletes which may have gone into the negative graph.


:- rdf_meta delete(r,r,t,?).
delete(X,Y,Z,G) :- 
    pos(G,G2), % delete from pos graph
    rdf_db:rdf(X,Y,Z,G2),
    rdf_retractall(X,Y,Z,G2), 
    false.
delete(X,Y,Z,G) :- % assert negative
    rdf_db:rdf(X,Y,Z,G),
    neg(G, G2),
    % write('Asserting negative fact: '),
    % write([X,Y,Z,G2]),nl,
    rdf_assert(X,Y,Z,G2).

Here we have the delete statement, which is a little unusual.  First, we delete from the positive graph if there is anything there.  We then fail, after side-effecting the database, because we have to do something further, and we want to run another clause.

We then check to see if we are in the underlying graph, and if we are, we need to insert a negative statement to ensure that the triple from the underlying database will not be visible.

 new_triple(_,Y,Z,subject(X2),X2,Y,Z).  
 new_triple(X,_,Z,predicate(Y2),X,Y2,Z).  
 new_triple(X,Y,_,object(Z2),X,Y,Z2).
:- rdf_meta update(r,r,o,o,?).
update(X,Y,Z,G,_) :-  
    rdf_db:rdf(X,Y,Z,G), 
    neg(G,G3),
    rdf_assert(X,Y,Z,G3), fail. % delete previous subject if it exists and try update
update(X,Y,Z,G,Action) :- 
    pos(G,G2),
    (rdf_db:rdf(X,Y,Z,G2) -> rdf_update(X,Y,Z,G2,Action) % exists in pos graph 
     ; new_triple(X,Y,Z,Action,X2,Y2,Z2),
       rdf_assert(X2,Y2,Z2,G2)). % doesn't yet exist in pos graph (insert). 

Finally, we get to the update.  Updates take an action to define which part of the triple they want to impact.  You can see the function symbol in new_triple/7.  This predicate has to remove the old value from the underlying graph by asserting a negative - it then 'fails', similarly to the delete statement in order to see if we need to update the triple in the positive graph.  If it is not in the positive graph, we need to insert it there as the old value has been negated from the underlying graph.

Now we can query rdf1/4 and we will get behaviour that mimics the actual inserts and updates happening on the underlying store.  We can run our constraints on this rdf1 instead of rdf.  If it succeeds we bulk delete from the negative graph and bulk insert from the positive graph.

Now, if you're clever you might see how this trick could be used to implement nested transactions with post conditions!

No comments:

Post a Comment