While Vars ensure safe use of mutable mutable storage locations via thread isolation, transactional references (Refs) ensure safe shared use of mutable storage locations via a software transactional memory (STM) system. Refs are bound to a single storage location for their lifetime, and only allow mutation of that location to occur within a transaction.
Clojure transactions should be easy to understand if you've ever used database transactions - they ensure that all actions on Refs are atomic and isolated. Atomic means that every change to Refs made within a transaction occurs or none do. Isolated means that no transaction sees the effects of any other transaction while it is running. Another feature common to STMs is that, should a transaction have a conflict while running, it is automatically retried.
There are many ways to do STMs (locking/pessimistic, lock-free/optimistic and hybrids) and it is still a research problem. The Clojure STM is, I think, novel in its use of multiversion concurrency control with adaptive history queues for snapshot isolation, and providing a distinct commute operation.
In practice, this means:
All reads of Refs will see a consistent snapshot of the 'Ref world' as of the starting point of the transaction (its 'read point'). The transaction will see any changes it has made. This is called the in-transaction-value.
All changes made to Refs during a transaction (via
set,alterorcommute) will appear to occur at a single point in the 'Ref world' timeline (its 'write point').No changes will have been made by any other transactions to any Refs that have been
set/altered/ensuredby this transaction.Changes may have been made by other transactions to any Refs that have been
commuted by this transaction. That should be okay since the function applied bycommuteshould be commutative.Readers and commuters will never block writers, commuters, or other readers.
Writers will never block commuters, or readers.
I/O and other activities with side-effects should be avoided in transactions, since transactions will be retried.
If a constraint on the validity of a value of a Ref that is being changed depends upon the simultaneous value of a Ref that is not being changed, that second Ref can be protected from modification by calling
ensure. Refs 'ensured' this way will be protected (item #3), but don't change the world (item #2).The Clojure MVCC STM is designed to work with the persistent collections, and it is strongly recommended that you use the Clojure collections as the values of your Refs. Since all work done in an STM transaction is speculative, it is imperative that there be a low cost to making copies and modifications. Persistent collections have free copies (just use the original, it can't be changed), and 'modifications' share structure efficiently. In any case:
The values placed in Refs must be, or be considered, immutable!! Otherwise, Clojure can't help you.
(ref init-val)
Creates and returns a Ref with an initial value of init-val.
(dosync exprs*) - Macro
Runs the exprs (in an implicit do) in a transaction that encompasses exprs and any nested calls. Starts a transaction if none is already running on this thread. Any uncaught exception will abort the transaction and flow out of dosync. The exprs may be run more than once, but any effects on Refs will be atomic.
(deref ref)
@ref
Within a transaction, returns the in-transaction-value of ref, else returns the most-recently-committed value of ref.
(ensure ref)
Must be called in a transaction. Protects the ref from modification by other transactions. Returns the in-transaction-value of ref. Allows for more concurrency than (set ref @ref)
(alter ref fun args*)
Must be called in a transaction. Sets the in-transaction-value of ref to:
(apply fun in-transaction-value-of-ref args)
(ref-set ref val)
Must be called in a transaction. Sets the value of ref. Returns val.
(commute ref fun args*)
Must be called in a transaction. Sets the in-transaction-value of ref to:
(apply fun in-transaction-value-of-ref args)
At the commit point of the transaction, sets the value of ref to be:
(apply fun most-recently-committed-value-of-ref args)
Thus fun should be commutative, or, failing that, you must accept last-one-in-wins behavior. commute allows for more concurrency than set.
Here's a non-trivial example, a parallel map function. Producer threads and a consumer thread coordinate through a shared todo list and outstanding job count. There is no explicit locking of these, nor any lock object created to govern access to them. Instead, they are kept in refs and accessed in tranactions. Note the use of Java library threading, and blocking queue classes:
(import '(java.util.concurrent Executors LinkedBlockingQueue))
(defn my-pmap [f coll]
(let [nthreads (.. Runtime (getRuntime)
(availableProcessors))
exec (. Executors (newFixedThreadPool nthreads))
todo (ref (seq coll))
out (ref 0)
q (new LinkedBlockingQueue)
produce #(let [job (dosync
(when @todo
(let [item (first @todo)]
(alter todo rest)
(commute out inc)
(list item))))]
(when job
(. q (put (f (first job))))
(recur)))
tasks (doseq dnu (map #(. exec (submit %))
(replicate nthreads produce)))
consume (fn thisfn []
(if (dosync (and (or @todo (pos? @out))
(commute out dec)))
(fnseq (. q (take)) thisfn)
(do
(. exec (shutdown))
(doseq x tasks)
nil)))]
(consume)))
Now we create a compute-intensive function and time it with normal and parallel map (on a quad-proc box):
(defn ack [m n]
(cond
(zero? m) (inc n)
(zero? n) (recur (dec m) 1)
(pos? n) (recur (dec m) (ack m (dec n)))))
user=> (time (reduce + (map (partial ack 3) (replicate 1000 6))))
user=> "Elapsed time: 14939.482 msecs"
509000
user=> (time (reduce + (my-pmap (partial ack 3) (replicate 1000 6))))
"Elapsed time: 6138.683 msecs"
509000