Clojure

The Reader

Clojure is a homoiconic language, which is a fancy term describing the fact that Clojure programs are represented by Clojure data structures. This is a very important difference between Clojure (and Common Lisp) and most other programming languages - Clojure is defined in terms of the evaluation of data structures and not in terms of the syntax of character streams/files. It is quite common, and easy, for Clojure programs to manipulate, transform and produce other Clojure programs.

That said, most Clojure programs begin life as text files, and it is the task of the reader to parse the text and produce the data structure the compiler will see. This is not merely a phase of the compiler. The reader, and the Clojure data representations, have utility on their own in many of the same contexts one might use XML or JSON etc.

One might say the reader has syntax defined in terms of characters, and the Clojure language has syntax defined in terms of symbols, lists, vectors, maps etc. The reader is represented by the function read, which reads the next form (not character) from a stream, and returns the object represented by that form.

Since we have to start somewhere, this reference starts where evaluation starts, with the reader forms. This will inevitably entail talking about data structures whose descriptive details, and interpretation by the compiler, will follow.

Reader forms

Symbols

  • Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, ', ?, <, > and = (other characters may be allowed eventually).

  • '/' has special meaning, it can be used once in the middle of a symbol to separate the namespace from the name, e.g. my-namespace/foo. '/' by itself names the division function.

  • '.' has special meaning - it can be used one or more times in the middle of a symbol to designate a fully-qualified class name, e.g. java.util.BitSet, or in namespace names. Symbols beginning or ending with '.' are reserved by Clojure. Symbols containing / or . are said to be 'qualified'.

  • Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s.

Literals

  • Strings - Enclosed in "double quotes". May span multiple lines. Standard Java escape characters are supported.

  • Numbers - generally represented as per Java

    • Integers can be indefinitely long and will be read as Longs when in range and clojure.lang.BigInts otherwise. Integers with an N suffix are always read as BigInts. Octal notation is allowed with a 0 prefix, and hexadecimal notation is allowed with a 0x prefix. When possible, they can be specified in any base with radix from 2 to 36 (see Long.parseLong()); for example 2r101010, 052, 8r52, 0x2a, 36r16, and 42 are all the same Long.

    • Floating point numbers are read as Doubles; with M suffix they are read as BigDecimals.

    • Ratios are supported, e.g. 22/7.

  • Characters - preceded by a backslash: \c. \newline, \space, \tab, \formfeed, \backspace, and \return yield the corresponding characters. Unicode characters are represented with \uNNNN as in Java. Octals are represented with \oNNN.

  • nil Means 'nothing/no-value'- represents Java null and tests logical false

  • Booleans - true and false

  • Symbolic values - ##Inf, ##-Inf, and ##NaN

  • Keywords - Keywords are like symbols, except:

    • They can and must begin with a colon, e.g. :fred.

    • They cannot contain '.' in the name part, or name classes.

    • Like symbols, they can contain a namespace, :person/name, which may contain '.'s.

    • A keyword that begins with two colons is auto-resolved in the current namespace to a qualified keyword:

      • If the keyword is unqualified, the namespace will be the current namespace. In user, ::rect is read as :user/rect.

      • If the keyword is qualified, the namespace will be resolved using aliases in the current namespace. In a namespace where x is aliased to example, ::x/foo resolves to :example/foo.

Lists

Lists are zero or more forms enclosed in parentheses: (a b c)

Vectors

Vectors are zero or more forms enclosed in square brackets: [1 2 3]

Maps

  • Maps are zero or more key/value pairs enclosed in braces: {:a 1 :b 2}

  • Commas are considered whitespace, and can be used to organize the pairs: {:a 1, :b 2}

  • Keys and values can be any forms.

Map namespace syntax

Added in Clojure 1.9

Map literals can optionally specify a default namespace context for keys in the map using a #:ns prefix, where ns is the name of a namespace and the prefix precedes the opening brace { of the map. Additionally, #:: can be used to auto-resolve namespaces with the same semantics as auto-resolved keywords.

A map literal with namespace syntax is read with the following differences from a map without:

  • Keys

    • Keys that are keywords or symbols without a namespace are read with the default namespace.

    • Keys that are keywords or symbols with a namespace are not affected except for the special namespace _, which is removed during read. This allows for the specification of keywords or symbols without namespaces as keys in a map literal with namespace syntax.

    • Keys that are not symbols or keywords are not affected.

  • Values

    • Values are not affected.

    • Nested map literal keys are not affected.

For example, the following map literal with namespace syntax:

#:person{:first "Han"
         :last "Solo"
         :ship #:ship{:name "Millennium Falcon"
                      :model "YT-1300f light freighter"}}

is read as:

{:person/first "Han"
 :person/last "Solo"
 :person/ship {:ship/name "Millennium Falcon"
               :ship/model "YT-1300f light freighter"}}

Sets

Sets are zero or more forms enclosed in braces preceded by #: #{:a :b :c}

deftype, defrecord, and constructor calls (version 1.3 and later):

  • Calls to Java class, deftype, and defrecord constructors can be called using their fully qualified class name preceded by # and followed by a vector: #my.klass_or_type_or_record[:a :b :c]

  • The elements in the vector part are passed unevaluated to the relevant constructor. defrecord instances can also be created with a similar form that takes a map instead: #my.record{:a 1, :b 2}

  • The keyed values in the map are assigned unevaluated to the relevant fields in the defrecord. Any defrecord fields without corresponding entries in the literal map are assigned nil as their value. Any extra keyed values in the map literal are added to the resulting defrecord instance.

Macro characters

The behavior of the reader is driven by a combination of built-in constructs and an extension system called the read table. Entries in the read table provide mappings from certain characters, called macro characters, to specific reading behavior, called reader macros. Unless indicated otherwise, macro characters cannot be used in user symbols.

Quote (')

'form(quote form)

Character (\)

As per above, yields a character literal. Example character literals are: \a \b \c.

The following special character literals can be used for common characters: \newline, \space, \tab, \formfeed, \backspace, and \return.

Unicode support follows Java conventions with support corresponding to the underlying Java version. A Unicode literal is of the form \uNNNN, for example \u03A9 is the literal for Ω.

Comment (;)

Single-line comment, causes the reader to ignore everything from the semicolon to the end-of-line.

Deref (@)

@form ⇒ (deref form)

Metadata (^)

Metadata is a map associated with some kinds of objects: Symbols, Lists, Vector, Sets, Maps, tagged literals returning an IMeta, and record, type, and constructor calls. The metadata reader macro first reads the metadata and attaches it to the next form read (see with-meta to attach meta to an object):
^{:a 1 :b 2} [1 2 3] yields the vector [1 2 3] with a metadata map of {:a 1 :b 2}.

A shorthand version allows the metadata to be a simple symbol or string, in which case it is treated as a single entry map with a key of :tag and a value of the (resolved) symbol or string, e.g.:
^String x is the same as ^{:tag java.lang.String} x

Such tags can be used to convey type information to the compiler.

Another shorthand version allows the metadata to be a keyword, in which case it is treated as a single entry map with a key of the keyword and a value of true, e.g.:
^:dynamic x is the same as ^{:dynamic true} x

Metadata can be chained in which case they are merged from right to left.

Dispatch (#)

The dispatch macro causes the reader to use a reader macro from another table, indexed by the character following

  • #{} - see Sets above

  • Regex patterns (#"pattern")

    A regex pattern is read and compiled at read time. The resulting object is of type java.util.regex.Pattern. Regex strings do not follow the same escape character rules as strings. Specifically, backslashes in the pattern are treated as themselves (and do not need to be escaped with an additional backslash). For example, (re-pattern "\\s*\\d+") can be written more concisely as #"\s*\d+".

  • Var-quote (#')

    #'x(var x)

  • Anonymous function literal (#())

    #(…​)(fn [args] (…​))
    where args are determined by the presence of argument literals taking the form %, %n or %&. % is a synonym for %1, %n designates the nth arg (1-based), and %& designates a rest arg. This is not a replacement for fn - idiomatic use would be for very short one-off mapping/filter fns and the like. #() forms cannot be nested.

  • Ignore next form (#_)

    The form following #_ is completely skipped by the reader. (This is a more complete removal than the comment macro which yields nil).

Syntax-quote (`, note, the "backquote" character), Unquote (~) and Unquote-splicing (~@)

For all forms other than Symbols, Lists, Vectors, Sets and Maps, `x is the same as 'x.

For Symbols, syntax-quote resolves the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.

For Lists/Vectors/Sets/Maps, syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted, but forms can be exempted from such recursive quoting by qualifying them with unquote or unquote-splicing, in which case they will be treated as expressions and be replaced in the template by their value, or sequence of values, respectively.

For example:

user=> (def x 5)
user=> (def lst '(a b c))
user=> `(fred x ~x lst ~@lst 7 8 :nine)
(user/fred user/x 5 user/lst a b c 7 8 :nine)

The read table is currently not accessible to user programs.

extensible data notation (edn)

Clojure’s reader supports a superset of extensible data notation (edn). The edn specification is under active development, and complements this document by defining a subset of Clojure data syntax in a language-neutral way.

Tagged Literals

Tagged literals are Clojure’s implementation of edn tagged elements.

When Clojure starts, it searches for files named data_readers.clj or data_readers.cljc at the root of the classpath. Each such file must contain a Clojure map of symbols, like this:

{foo/bar my.project.foo/bar
 foo/baz my.project/baz}

The key in each pair is a tag that will be recognized by the Clojure reader. The value in the pair is the fully-qualified name of a Var which will be invoked by the reader to parse the form following the tag. For example, given the data_readers.clj file above, the Clojure reader would parse this form:

#foo/bar [1 2 3]

by invoking the Var #'my.project.foo/bar on the vector [1 2 3]. The data reader function is invoked on the form AFTER it has been read as a normal Clojure data structure by the reader. For your own data reader functions, you should report errors by throwing instances of RuntimeException with messages providing error information.

Reader tags without namespace qualifiers are reserved for Clojure. Default reader tags are defined in default-data-readers but may be overridden in data_readers.clj / data_readers.cljc or by rebinding *data-readers*. If no data reader is found for a tag, the function bound in *default-data-reader-fn* will be invoked with the tag and value to produce a value. If *default-data-reader-fn* is nil (the default), a RuntimeException will be thrown.

If a data_readers.cljc is provided, it is read with the same semantics as any other cljc source file with reader conditionals.

Built-in tagged literals

Clojure 1.4 introduced the instant and UUID tagged literals. Instants have the format #inst "yyyy-mm-ddThh:mm:ss.fff+hh:mm". NOTE: Some of the elements of this format are optional. See the code for details. The default reader will parse the supplied string into a java.util.Date by default. For example:

(def instant #inst "2018-03-28T10:48:00.000")
(= java.util.Date (class instant))
;=> true

Since *data-readers* is a dynamic var that can be bound, you can replace the default reader with a different one. For example, clojure.instant/read-instant-calendar will parse the literal into a java.util.Calendar, while clojure.instant/read-instant-timestamp will parse it into a java.util.Timestamp:

(binding [*data-readers* {'inst read-instant-calendar}]
  (= java.util.Calendar (class (read-string (pr-str instant)))))
;=> true

(binding [*data-readers* {'inst read-instant-timestamp}]
  (= java.util.Timestamp (class (read-string (pr-str instant)))))
;=> true

The #uuid tagged literal will be parsed into a java.util.UUID:

(= java.util.UUID (class (read-string "#uuid \"3b8a31ed-fd89-4f1b-a00f-42e3d60cf5ce\"")))
;=> true

Default data reader function

If no data reader is found when reading a tagged literal, the *default-data-reader-fn* is invoked. You can set your own default data reader function and the provided tagged-literal function can be used to build an object that can store an unhandled literal. The object returned by tagged-literal supports keyword lookup of the :tag and :form:

(set! *default-data-reader-fn* tagged-literal)

;; read #object as a generic TaggedLiteral object
(def x #object[clojure.lang.Namespace 0x23bff419 "user"])

[(:tag x) (:form x)]
;=> [object [clojure.lang.Namespace 599782425 "user"]]

Reader Conditionals

Clojure 1.7 introduced a new extension (.cljc) for portable files that can be loaded by multiple Clojure platforms. The primary mechanism for managing platform-specific code is to isolate that code into a minimal set of namespaces, and then provide platform-specific versions (.clj/.class or .cljs) of those namespaces.

In cases where is not feasible to isolate the varying parts of the code, or where the code is mostly portable with only small platform-specific parts, 1.7 also introduced reader conditionals, which are supported only in cljc files and at the default REPL. Reader conditionals should be used sparingly and only when necessary.

Reader conditionals are a new reader dispatch form starting with #? or #?@. Both consist of a series of alternating features and expressions, similar to cond. Every Clojure platform has a well-known "platform feature" - :clj, :cljs, :cljr. Each condition in a reader conditional is checked in order until a feature matching the platform feature is found. The reader conditional will read and return that feature’s expression. The expression on each non-selected branch will be read but skipped. A well-known :default feature will always match and can be used to provide a default. If no branches match, no form will be read (as if no reader conditional expression was present).

Implementors of non-official Clojure platforms should use a qualified keyword for their platform feature to avoid name collisions. Unqualified platform features are reserved for official platforms.

The following example will read as Double/NaN in Clojure, js/NaN in ClojureScript, and nil in any other platform:

#?(:clj     Double/NaN
   :cljs    js/NaN
   :default nil)

The syntax for #?@ is exactly the same but the expression is expected to return a collection that can be spliced into the surrounding context, similar to unquote-splicing in syntax quote. Use of reader conditional splicing at the top level is not supported and will throw an exception. An example:

[1 2 #?@(:clj [3 4] :cljs [5 6])]
;; in clj =>        [1 2 3 4]
;; in cljs =>       [1 2 5 6]
;; anywhere else => [1 2]

The read and read-string functions optionally take a map of options as a first argument. The current feature set and reader conditional behavior can be set in the options map with these keys and values:

  :read-cond - :allow to process reader conditionals, or
               :preserve to keep all branches
  :features - persistent set of feature keywords that are active

An example of how to test ClojureScript reader conditionals from Clojure:

(read-string
  {:read-cond :allow
   :features #{:cljs}}
  "#?(:cljs :works! :default :boo)")
;; :works!

However, note that the Clojure reader will always inject the platform feature :clj as well. For platform-agnostic reading, see tools.reader.

If the reader is invoked with {:read-cond :preserve}, the reader conditional and non-executed branches will be preserved, as data, in the returned form. The reader-conditional will be returned as a type that supports keyword retrieval for keys with :form and a :splicing? flag. Read but skipped tagged literals will be returned as a type that supports keyword retrieval for keys with :form and :tag keys.

(read-string
  {:read-cond :preserve}
  "[1 2 #?@(:clj [3 4] :cljs [5 6])]")
;; [1 2 #?@(:clj [3 4] :cljs [5 6])]

The following functions can also be used as predicates or constructors for these types:
reader-conditional? reader-conditional tagged-literal? tagged-literal