Tuplanolla Sampsa Kiiskinen

2014

I am designing a programming language for education and entertainment. Currently that means my own entertainment and the education of no one.

Note: this document is a collection of notes for myself. It merely outlines the original plan with some rambling. Some things have already changed (like <- being renamed to =).

Eigenlanguage

It scales without deforming!
— A bewildered bystander

Eigenlanguage is a programming language prototype. It is a purely functional, dynamically (although maybe eventually with optionally pluggable static types) typed and lazily evaluated one. Its features are a mix of the most distinct features of Haskell, Scheme and J. The goal is to make it as minimal as possible and focus on pedagogical aspects. Symmetries similar to those in physics will also be emphasized heavily. It is by no means a serious project, but might one day turn out to be.

I also secretly want to write a book about building a complex system, like this programming language right here. It is not usually documented what the underlying thought process and key decisions of a large project were and how they shaped the outcome.

Table of Contents

1   The Gist of It

1.1   Syntax

The idea is that programs are applications of single parameter functions.

a (b c) d

The code above applies b to c and then a to their result and d. In addition to function application, there are three special forms. Here are some outdated points about them.

The first form is equivalence definition or binding.

<- (a b
    c d
    ...
    e f)
   g

The code above reads “let a = b, c = d, … and e = f in g”. It allows b to depend on c and d to depend on a simultaneously. It also makes top level definitions look nice: files start with <- and contain nothing but name-definition pairs.

<- (
a b
c d
) ()

I hope it does not cause problems for people with dyslexia. Accessibility is important.

The second form is function definition.

-> (a b ... c)
   d

The code above reads “a function from a, b, … and c to d”. It is essential for writing programs.

The third form is data definition.

[a b (c d) ... e]

The code above reads “a list consisting of a, b, c d, … and e”. It makes treating code as data and data as code possible, but not as convenient as in Scheme. There is a problem with the Schemely quote-unquote thing using ` and its inverse ,. It is explained in the footnote section. The direct way to express data requires knowing its size beforehand, because functions only have one parameter. In Scheme terms.

(list a b (c d) ... e)
(cons a (cons b (cons (c d) ... (cons e ()) ... )))

One more special not-quite-form is (). It is an instance of a type with only one value: nothing. The form can be written as   or as (), but it is distinct from the empty list []!

One or two more forms are needed for first class modules.

Some primitives and derived definitions are needed in addition to special forms.

always - * + if < evaluate ...

The symbols above are used in the example section.

There are also comments and literals for convenience.

% This is a (free form) line comment.
%% This is
   a (free form) block comment. :) %%
%This_is_a_(syntactical)_comment.
%(This is
  another (syntactical) comment.)
2 % This is a literal integer.
'3' % This is a literal character.
"four" % This is a literal string.

Their syntax has not been considered much yet, but maybe # is a bad character since it clashes with CPP.

1.2   Execution

The plan is to compute all pure terms at compile-time and only compute directly used values lazily at run-time, so that the resulting repeatable work is minimized.

Ideally compilation should always terminate, but encoding such a totality restriction without an explicit static type system might not be viable.

Pluggable optional (and ideally dependent) static types sound interesting; like in a related article by Gilad Bracha.

For example C++ has constexpr while Coq provides totality with the burden of a complicated type system.

2   Examples

All of these examples evaluate to 13. They worked in the prototype that was slapped together in two days without prior experience working with parsers, interpreters or compilers. Maybe that is a good thing considering the design goals.

2.1   Factorials

<- f (-> n
         (if (< n 2)
             n
             (* n (f (- n 1)))))
   (- (f 4) 11)

2.2   Fibonacci Numbers

<- f (-> n
         (if (< n 2)
             n
             (+ (f (- n 1))
                (f (- n 2)))))
   (f 7)

2.3   Data as Code

+ (evaluate [+ 2 (always 3 4)]) 7

2.4   Laziness In Action

if (< 2 3)
   13
   (<- f (-> x (f x)) (f f))
always 13 (<- f (-> x (f x)) (f f))

3   Something Is Missing

Something that is usually shown to demonstrate a language was not shown yet.

<- main
   (-> io
       (<- io' (print "Hello," io)
           (print "world!" io')))
<- main
   (-> io
       (print "world!"
              (print "Hello,"
                     io)))
<- main
   (o (print "world!") (print "Hello,"))
<- main
   ((flip o) (print "Hello,") (print "world!"))
<- main
   (flip (fold print) ["Hello," "world!"])

A uniqueness type is passed to main or any other effectful computation by the runtime. It is not even awkward with the right higher order functions.

4   Literals

There are some syntactic extensions for convenience. Characters are like 'e' or 'newline', strings are like "example" and may span multiple lines unless spliced together with trailing \s and rational complex numbers are like -13.d2^9_16+20/7^3i (being negative 19.82 billion in base 16 plus imaginary 2.857 thousand in base 10). How to come up with sane syntax for them? That is a difficult question with no right answer.

4.1   Reserved Things

Certain characters or patterns in names are reserved for special uses. Some reservations are

  • ( and ) for groups,
  • [ and ] for data,
  • / for name qualification,
  • leading # for compiler or interpreter symbols (like #-2 for a previous result),
  • leading " for strings,
  • leading ' for characters,
  • possibly leading +, - with 09 (and their friends) for numbers,
  • leading % for comments,
  • certain Unicode things for sanity (say hi to U+0000 NULL and our friends from the U+2000-space like ZERO WIDTH JOINER and RIGHT-TO-LEFT MARK)

and for future use

  • {} for some purpose,
  • leading ! for strictness control,
  • leading , and ` for data as code stuff and
  • leading anything for occasional whims,
  • FULLY CAPITALIZED THINGS FOR SOME REASON (PERHAPS TYPE ANNOTATIONS).

Notably x'+n! is fine, as is a#4%m.

5   Static Types

It should be possible to bolt on a type inference system without making it mandatory. Types could be specified with : expression TYPE or such. The convention could be to write types in capital letters. However restricting symbol names to lowercase is not a good idea, because writing mathematical expressions or such would be suffering.

6   Modules

This can of worms has no right answer either. There is the question whether to import modules statically or dynamically and whether to use constant or parametrized modules. One thing is clear though: modules require a special form to bind names.

6.1   Qualified Names

It should obviously be possible to import (name alias) to rename something from name to alias/something.

6.2   Static and Dynamic Imports

If importing is static, it is impossible to load new modules without closing the program. That is especially unfortunate for the interactive interpreter that should be omnipotent, but improves the performance of compiled modules a lot. If importing is dynamic, there are still a few options how it can be done. One could have

  • import name context as a pure computation that reads symbols from name and makes them available in context or terminates the runtime if the module is not found;
  • import name consequent alternative as a similar pure computation, but this time failures are handled by alternative or
  • the very same thing as an effectful computation.

The first option is not useful at all as the dynamic nature is thrown away. The second option does not seem reasonable, because importing modules is essentially an effectful computation, so modules can be installed and uninstalled while the program is running. The third option is the semantically correct one, but very cumbersome. Especially problematic is that upon compiling a module with dynamic loading, it is impossible to resolve dependencies and compile them.

Contrast these.

(import stuff % This is static and pure; stuff is compiled first.
 main
 (-> io
     (stuff/display "Success!" io)))
main % This is dynamic and pure (and probably wrong).
(-> io
    (import-now stuff
     (stuff/display "Success!" io)
     (put-line "Failure!" io)))
main % This is dynamic and impure; it fails if stuff has not been compiled.
(-> io
    (import-with io stuff % This has to be a special form.
     (-> io' (stuff/display "Success!" io'))
     (-> io' (put-line "Failure!" io'))))

main % This is the same thing in better form.
(-> io
    (import-with io stuff
     (stuff/display "Success!")
     (put-line "Failure!")))

6.3   Import Parametrization

Standard ML has module functors that basically allow one to give modules to modules as they are imported. Now, can that be generalized? Could modules be lambda terms? The answer is yes, but — always but — only dynamic imports could make use of such a system. Static imports have to appear at top level, so they only have compile-time constants and other modules available to them and therefore have to work like Standard ML’s functors. How curious.

6.4   Dynamic Exports

Modules need to be defined in places where they can be found. A balance between restricting definitions and searching for them has to be established. Say that we define the stuff module in a file called not-stuff.eigen.

(export stuff
 display
 (-> s (put-line (string-join "*** " (string-join s " ***")))))

How do we know where to look when we encounter import stuff now? If we looked in every source file, we could encounter malformed or incomplete ones and choke on them. It is a better idea to follow a naming convention and only look for stuff.eigen in the standard search paths.

6.5   The Half-Conclusion

It is best to have both static and dynamic imports with static exports.

6.6   Source Files

What should files contain?

  • If a value, evaluating it does nothing, which is useless.
  • If a function, it has to be main, so it can be compiled, but nothing else.
  • If a bunch of functions with names, they can be built together, but get lost without a namespace.
  • If a module, then everything is dandy.

Modules.

6.7   Modular Plans

A module with no imports.

% m.eigen
module m
y1 x1
y2 x2
...
yn xn

A parametrized module with some imports and some aliased imports.

% m.eigen
module (m p1 p2 ... pn)
(i1 (i2 a2) i3 (i4 a4) i5 ... (in an))
y1 x1 % You could use p2 in x1, which is now t/f, which is [1 2].
y2 x2 % You could also use p1, which is now w, which is 3.
...
yn xn

% does not matter _.eigen
module _
(t (m w t/f)) % Both w and _/w refer to 3 here.
w 3
% meanwhile in t.eigen
module t
f [1 2]

An alternative SMLish parametrization where parameters are in fact modules.

% m.eigen
module (m p1 p2 ... pn)
(i1 (i2 a2) i3 (i4 a4) i5 ... (in an))
y1 x1 % This time you could use p2/f in x1, which is now t/f.
y2 x2 % You still have p1, which is w, which is 3.
...
yn xn

% does not matter _.eigen
module _
(t (m w t))
w 3

What a mess. Build the rest yourself. Here are the combinations.

Module Parameters E-Aliases Exports Imports Arguments I-Aliases Bindings
x             x
x       x     x
x       x   x x
x       x x   x
x       x x x x
x x           x
x x     x     x
x x     x   x x
x x     x x   x
x x     x x x x
x     x       x
x     x x     x
x     x x   x x
x     x x x   x
x     x x x x x
x x   x       x
x x   x x     x
x x   x x   x x
x x   x x x   x
x x   x x x x x
x   x x       x
x   x x x     x
x   x x x   x x
x   x x x x   x
x   x x x x x x
x x x x       x
x x x x x     x
x x x x x   x x
x x x x x x   x
x x x x x x x x

Notice how a new special form m/x that works like hash-table-get m `x pops up.

6.8   More Modular Plans

That was just warmup. Dynamic imports are where the fun begins.

6.9   Syntax

Modules are so flexible that it is difficult to build a well-behaving special form for them. If they are to play nicely with other special forms, they must have a fixed arity.

An informal and annotated example follows.

module (m p p') (
export ((alias ea e)
        e')
import ((alias ia (i a a'))
        (alias ia' i')
        (i'' a'' a''')
        i''')
y x
y' x'
)

Questions follow.

  • Without import, nothing is imported?
  • Without export, nothing is exported?
  • Should keywords be replaced by special character spaghetti?
  • Should alias order be swapped?
  • Should brackets be used instead of parentheses?
  • Could export lists could obey set operations?
  • Could spurious parentheses be trashed?
  • Should y be usable in a?
  • How about m/y if y is exported as y?
  • Could you feed ia' to i as a?

6.9.1   Bad Idea

The only useful set operation might be negation,

% both export [a c e] of [a b c d e f]
-> (a c e)
-> (! b d f)

but allowing such a thing and keeping it symmetric imposes some serious requirements on imports.

% imports every library in the system
<- (!)

6.9.2   Good Idea

Here is one approach to formalizing the previous idea.

It is at least possible to reuse existing forms in module context to construct a wonderful symmetry and avoid reserving any more symbols.

Here the arity of <-> is 2 and the arities of ->, <- and = are 1. Notably all of them have singleton and list forms as their first parameters and have one less parameter than the equivalent ordinary forms.

<-> (m p p') (
-> ((= (ea e))
    e')
<- ((= (ia (i a a')
        ia' i'))
    (i'' a'' a''')
    i''')
y x
y' x'
)

Now by replacing list forms with actual list syntax, other forms become -> [x y] (+ x y) and = [x 2 y 3] (+ x y) and…

<-> (m p p') [
-> [(= [ea e])
    e']
<- [(= [ia (i a a')
        ia' i'])
    (i'' a'' a''')
    i''']
y x
y' x'
]

The full power of modules with parametrization, aliases, bindings and everything — it is all there.

6.10   Looking for Patterns

The general module form looks like

<-> (m p_1 ...) {
-> [e_1
    (= [b_2 e_2
        b_3 e_3
        ... ...]
     ...)
    ...]
<- [i_1
    (i_2 a_2_1 ...)
    (= [c_3 i_3
        c_4 (i_4 a_4_1 ...)
        ... ...]
     ...)
    ...]
y_1 x_1
... ...
}

where { and } denote a pair list, but

<-> (m [p_1 ...]) {
-> [e_1
    (= {b_2 e_2
        ...

is another option. That leads to the idea of pseudonymous functions

-> x (+ x 2)
-> [x y] (+ x y)
-> (f x) (+ (f x 2) 3)
-> (f [x y]) (+ (f x 2) y)

and in turn into anonymous modules.

<-> [p_1 ...] {
-> [e_1
    (= {b_2 e_2
        ...

What?

7   Representation

There are things like line numbers, indentation, comments and colors that are typically stripped from the source code and not put into the syntax tree. That is a shame for tools that need to process such details and requires implementing a separate parser for the purpose.

I want to do things differently and attempt to include tags into the syntax tree. It might turn out horribly wrong, but I suspect it is the most reasonable plan regarding the entire ecosystem.

Basically there is two tier hierarchy in the syntax representation.

data Expression = ESymbol Name
                | EPair Expression Expression -- Function application and data.
                | ENothing
                -- ...
                | ETag Tag Expression

data Tag = TRecursive Direction
         -- ...
         | TIndentation Integer
         | TColor Integer

8   Package Management

Programming language implementations typically come with a package manager. In the cases of GHC and CHICKEN they are actually not-package-managers, because managing packages is a satisfiability problem in disguise.

Cabal for Haskell has version constraints, but offers no way to find the optimal ones. If every package had a list of interface changes for each version, the not-package-manager could

  • scan a project,
  • list all the uses of certain parts of an interface and
  • find the oldest version of the package that shares compatible (untouched) parts of the interface.

That would solve the forwards compatibility problem. The backwards compatibility problem could also be partially solved with this approach, but only up to the newest released version. One cannot predict the future and would still have to rely on strict semantic versioning and good faith for the rest.

9   Doctrine of Features

Implicit conversions make reasoning about programs difficult, so they shall not be built into the language. That is a feature.

Unsafe operations shall be prefixed with fuck or something more inappropriate, so that using them will get you fired. That is also a feature.

The only valid use case of floating point numbers is deriving approximations in high performance computing, so support for them shall not be in the core, but in a special purpose module instead. That, feature.

Building macro systems that make code unreadable or translation units difficult to manage shall also be impossible. Feature.

Security shall not be an afterthought, but performance can be. It is fine to sacrifice speed for correctness. Proper sandboxing shall also be built into the core language. Special purpose languages are more useful than languages without a purpose.

Convention shall not come in the way of progress. If fixing a broken design decision requires breaking every program in existence, so be it.

Comments shall be a part of the parse tree, so that pretty printers and other lexical analyzers can preserve them.

The toolchain shall be built so that everything one might want to do can be done through the interpreter and without an internet connection. That includes reading and searching the documentation, controlling installed and loaded modules and issuing shell commands.

The name of the language and its tools shall be chosen so that searching for them on the Internet is easy. Names that are terrible include C, Scheme and Axiom.

10   Progress

Done means “kind of works in the black box sense, but crashes on error”.

10.1   By Component

These names are carefully picked so that they have different initials.

Not Done Getting Done Done Well Done Task Purpose
    x   Lexer Split code into tokens
    x   Parser Make sense of tokens
    x   Formatter Print code nicely
  x/√2 x/√2   Tester Find things that break
    x   Evaluator Run code
    x   Interpreter Interact with the user
  x     Utilities Work like a standard library
x       Compiler Produce machine code
x       Optimizer Improve code production
x       Manager Manage modules and packages
x       Analyzer Warn about problems
x       Scrutinizer Complain about style
x       Generator Extract information
x       Debugger Observe and control execution

10.2   By Functionality

Not Done Getting Done Done Well Done Concept
      x Trendy logo
    x   Lazy evaluation
x       Strict evaluation
x       Unsafe operations
x       Mutable references
x       Sane error messages
x       Proper bootstrapping
  x     Modularity and encapsulation
  x     Test or example coverage
x       Thesis about the subject
  x     Endless procrastination

11   Footnotes

These ideas are in the footnotes, because they are not entirely serious. Not that any of my ideas are, but I digress.

11.1   Abbreviations

Being able to replace symbols with unambiguous abbreviations is horrible for source code, but excellent for interactive use.

Logo

Having a logo is trendy and cool. Eigenlanguage is obviously both trendy and cool. Therefore Eigenlanguage has a logo.

Figuring out what it represents is left as an exercise for the reader.

11.3   Numeric Types

The most advanced numeric type built into the language should cover the cyclotomic subfield of complex numbers. It is not built into any other language, but seems fun. It allows exact representations of all rational numbers, complex sums of rational numbers, square roots of rational numbers, sines and cosines of rational multiples of rotations and then some.

Nonbelievers should look at GAP or Haskell, both of which have libraries for them. This is the number from section 4.

please -((1*16+3+13/16+2/16^2)*16^9)+((20/7)*10^3)*i
-1362041503744 + 20000/7*e(4) -- e(n) = exp(i * cyc / n)
please imag it
20000/7
please it ^ 3
8000000000000/343
please sqrtRat <$> toRat it
Just 2000000/49*e(56)^5 + 2000000/49*e(56)^13 - 2000000/49*e(56)^15 - 2000000/49*e(56)^23 - 2000000/49*e(56)^29 + 2000000/49*e(56)^31 - 2000000/49*e(56)^37 - 2000000/49*e(56)^39 + 2000000/49*e(56)^45 + 2000000/49*e(56)^47 - 2000000/49*e(56)^53 + 2000000/49*e(56)^55
please (/ 5) <$> it
Just 400000/49*e(56)^5 + 400000/49*e(56)^13 - 400000/49*e(56)^15 - 400000/49*e(56)^23 - 400000/49*e(56)^29 + 400000/49*e(56)^31 - 400000/49*e(56)^37 - 400000/49*e(56)^39 + 400000/49*e(56)^45 + 400000/49*e(56)^47 - 400000/49*e(56)^53 + 400000/49*e(56)^55
please (*) <$> it <*> it
Just 320000000000/343

11.4   Special Syntax

An alternative syntax where (a b c d ... e) and {b a c x d ... e} for all x are equal is kind of a reasonable idea.

11.5   Super Special Forms

Here is a function a with the parameters b, c and d resulting in e. The result is another function that is applied to f and g. I don’t know which is a better shorthand for multiple parameters.

(-> a b c d
    e) f g
-> (a b c d)
   e f g

The former stands out as a special form, while the latter is uniform with other functions and easier to parse. The single parameter form would be equal for both, but the latter would have two valid forms: a symbol and a singlet.

Similarly for bindings.

(<- a b
    c d
    e) f g
<- (a b
    c d)
    e f g

The former saves parentheses at the top level, while the latter is uniform and easier to parse. Top level definitions with either should work fine for modules and their export lists.

<-
a b
c d
'(a c)
<- (
a b
c d
) '(a c)

Alternatively modules could get their own little special form, but that may not be necessary.

11.6   Wrapping Special Forms

Some strange things can be done with the uniform special form syntax if it is combined with special form data parametrization. Assuming such a combination, we can define a donger.

<- -->
   (-> 'p (-> 'e
          (-> p e)))
(--> 'x
     '(+ x 2)) 3

(-> 'p (-> 'e
       (-> p e)) 'x
                 '(+ x 2)) 3

((-> 'e (-> 'x e)) '(+ x 2)) 3

(-> 'x '(+ x 2)) 3

3 + 2

5

This reminds one of vau-calculus and hygiene problems.

<- -->
   (-> 'p (-> 'e
          (-> p (pair 'x e))))

Alas the only safe definition for the donger — without macro systems, that is — would be ->, making it useless.

11.7   Macros Out of Nowhere

Consider this contrived Scheme example.

(eval `(begin (sleep 1)
              (eval `(print ,,(begin (print "One!")
                                     "Two!")))))

With one comma, both are printed at once, but with two commas not. Can we live with just symmetric quotation?

evaluate `(evaluate `(a b ,c d ... e))

11.8   Problems from Recursive Order

The grammar is obviously left recursive, since a b c is paired up as (a b) c. However lists are typically right recursive, because otherwise they would grow backwards, which is strange. Unifying them requires that

  • function application order is reversed,
  • intuition about lists is thrown away,
  • evaluation is coupled with restructuring or
  • the duality is embraced and facilities for both are written.

The fifth option is to throw away the most direct form of “code as data”. Consider the following informal notation. Deciding that

(a b c d) = ((((() a) b) c) d)
[a b c d] = (a (b (c (d ()))))

makes list construction trivial.

[x y] = (: x (: y ())).

The problem with code as data is that nested structures are lost. There is no way to construct a one-to-one mapping between identical programs and lists. A counterexample to sketch a proof:

(f x y) = ((f x) y)
`(f x y) = (: f (: x y)) /= (: (: f x) y) = `((f x) y).

This leads to the realization that the homoiconicity of Lisp forces multiparametricity on it. Single parameter functions merely denote pairs. Similarly the strict evaluation of Lisp leads to macros. I personally think the way small decisions shape the language is curious, but you may disagree.

11.8.1   Resulting Things

Now the singleton () would be the monoidal identity and disappear conveniently for lists, x (: y ()) = [x y], and functions, (: () f) x = f x. The deeper you dig, the more structure you find from essentially nothing.

11.8.2   Strange Idea

What if (, ), [ and ] merely controlled the recursive order and ` and , the mode (code or data) of their contents? Further, what if ` and , just changed the immediate mode and `` and ,, made the mode persistent?

% three arguments for f
(f x y z)
% x through three functions
[f g h x]
% the structure (: 1 (: 5 (: 4 ()))) into f
f `[1 (+ 2 3) 4]
% the structure (: 1 (: x (: 4 ()))) into f
% where x is (: (: (: () +) 2) 3)
f ``[1 (+ 2 3) 4]

11.8.3   Another Strange Idea

What if there were similar !, !!, ? and ?? for controlling lazy and strict evaluation? Would such a thing work at all?

11.9   The Difficulty of Choices

What really concerns me is arbitrariness. Choosing the default recursive order, packing method (using ` and ,) or how to modularize the standard library is hard and the cost of failure is enormous. Just look at disasters like null references, the direction of electric current, the pi-or-two-pi manifesto or the discrepancy between typographic and mathematical coordinate systems.

11.10   Unicode

There is no doubt that Unicode should be encoded as UTF-8 and used everywhere. Alas it brings its own problems. What to do with nonbreaking spaces in source code for example? One must either

  • forbid their use with a set of rules that are guaranteed to be complex, arbitrary and unmaintainable;
  • treat them like any other spaces and forget the problems their unintentional use causes, ignoring clean code, line wrapping, compatibility with other tools and more, or
  • allow using them in identifiers, which is a good way to create pitfalls for unsuspecting programmers.

All of the options are equally bad.

11.11   Escape Character

The model used by C is archaic and complicated. It is difficult to write parsers for and teach to infants. After disucssing the matter with Internet people, the guy behind More Magic suggested a brilliantly simple solution.

If you’re designing a new language, you should probably accept something like $123$ or so, and have $$ mean a literal $.

However % is already used for special treatment of wrapped code, so it shall be the escape character. What goes between the escape characters could be the code point, in hex, or a name for a commonly used code point, like n for 0a (line break).

11.12   Indentation

Smart tabs are a cool idea, but not a practical one. Ordinary tabs are semantic, but do not work well with line wrapping. Spaces are difficult to maintain, but the only option left. Everything is hard.

11.13   Pretty Printing

Pretty printing should convert a parse tree into a form that is as readable as possible. It features formatting, spacing, indentation and line wrapping. Restructuring and spell checking are out of its scope. Prettiness and readability are difficult to define, but based on the gathered data, it seems that pretty code follows certain key points. Assuming there is no format to begin with,

  • regions without nesting should be separated into paragraphs,
  • long lines should be split,
  • splits should favor special forms and boundaries like “) (“,
  • indentation should go by nested structures,
  • common forms like if should be given special treatment,
  • common types like strings should also be treated with care,
  • the effect of name lengths on indentation should be minimized,
  • excessively long indents should be crushed and
  • all-encompassing top level parentheses should be left unindented.

Some ambiguity is in indenting

(+ 1
   2 3)

or

(+ 1
 2 3)

depending on parameter uniformity.

11.14   Library Structure

A collection of common functions is provided with a single name for convenience. Only special forms are imported by default. The standard library structure is something like the following. Alas this tree just demonstrates the idea and is not prescriptive by any means.

logic
logic/control
  if p c a
  unless p c a % flipped if
logic/primitive
  not
  and
  or
arithmetic
arithmetic/computation
  + x y
  - x y
arithmetic/comparison
  < x y
  =< x y
list
list/construct
  pair x y
  list n x y z ...
list/query
  empty xs
  length xs
  length-beyond n xs
list/access
  head xs % first element
  tail xs % all but first
  get n xs
list/modify
  join xs ys
  reverse xs
list/higher-order
  map f xs % apply function to all
  fold f y xs % as in Haskell (b -> a -> b) -> b -> ([a] -> b)
  fold-flipped f y xs % fold (flip f) etc
  fold-reverse f y xs % the wrong fold
  fold-something f xs % the initial value fold
  fold-indexed f ... % some strange fold
effect/input
  get-line io
effect/output
  put-line s io
% and then some abstract nonsense

11.15   Partial Special Forms

Would it make any sense?

map (-> x) [1 x (* x x)]

No.

map (-> e (e x)) [f g h]

= ($ (-> (x f) (f x)))
  (map ($ x) [f g h])

11.16   Array Indexing

It is generally a good idea to allow indexing arrays with arbitrary elements. When that is not possible, it is probably a good idea to start from one, considering this is a language without indexed loops, overflows, pointer arithmetic or direct machine instruction mapping.

11.17   Exceptions

Since computers can fail anywhere, anytime and often do so in mysterious ways, exceptions are unavoidable. There is no way to avoid things like running out of memory or being interrupted by the operating system.

Experience also tells that there should be a single exception system with a uniform and predictable message format.

The plan is simple: anything can throw an exception, but catching requires a uniqueness type to guarantee sequential dispatch. Each exception may contain one arbitrary object.

11.18   Intermission

Here you have a line: -. Put an arrow at the end: ->.

Here you have another line: -. Double it: =.

Here you have one more line: -. Put an arrow at the end and double it: =>>.

11.19   One More Note

Tail calls must not leak space.

11.20   Some Thoughts

Implementing your ideas has two beneficial consequences. It shows the problems that were missed during the planning, but also solves some of the problems that were postponed. Some things are natural to implement in a certain way, which is usually — in retrospect — the way they should have been planned as well.

11.21   On Function Irrelevance

Symbolic algebra systems traditionally work with expressions. Programming languages traditionally work with functions. Both are bad ideas.

What is so bad about expressions?

Consider a function that enumerates all possible symbols and replaces them with x in a given expression. It doesn’t conserve the structure of expressions (all variables become x), but doesn’t affect lambda terms, where symbols are bound (and always distinct from x).
— drunk rambling by me

So expressions are bad in terms of hygiene.

How about functions then? Functions defined as mere relations between sets are unfit for the massive symbolic manipulation framework we have. It is not possible to differentiate “that one relation”, just like it is not possible to read the source code of “that one executable”.

Some information must be kept, which calls for the unification of expressions and functions.

Functions with different definitions, but same relations between the domain and the image should not be considered equal in every case. It is possible to define a weak equality of a sort. Such a concept already exists for proofs (proof irrelevance) and an analogy for functions would be beneficial for both computation and analytical reasoning.

11.22   Name

It appears Eigen is already the name of some language, so not only is Eigenlanguage long, but also confusing. The extension .el is also reserved for Emacs Lisp.

Other fun name ideas include Qud, which could be used to describe code like: instead of “this is very Pythonic” one would say “this is very Qud”.

One silly idea is SPiLQ for Solving Problems in Large Quantities, but there is no way in hell people will type that correctly.

12   Source Code

That was the plan. The reality is on GitHub.

13   Summary

13.1   Application

For all functions f and arbitrary x; f x applies f to x.

13.2   Function Definition

For all symbols x and arbitrary y; -> x y defines anonymous function f, so that application f z binds x to z in y.

For all symbols x1, x2, …, xn and arbitrary y; -> (x1 x2 ... xn) y is equivalent to -> x1 (-> x2 ... (-> xn y)).

13.3   Binding

For all symbols x and arbitrary y and z; = (x y) z binds x to y in y and z.

For all symbols x1, x2, …, xn and arbitrary y1, y2, …, yn; = (x1 y1 x2 y2 ... xn yn) z binds x1 to y1 in y2, y3, …, yn and z, x2 to y2 in y1, y3, …, yn and z, … xn to yn in y1, y2, …, yn-1, yn and z.

13.4   List Definition

For all arbitrary x1, x2, …, xn; [x1 x2 ... xn] is equivalent to : x1 (: x2 ... (: xn ())).

13.5   Module Definition

Can’t be bothered. It’s pretty obvious.

14   Notes

List library must support find-and-delete; consider checking for a full house in a Yahtzee deal.

Numbers with a negative radix sound fun.

Haskell can do zip <*> tail, which is a delightful way to stencil a list.

Something to unify fold and zip should exist (such as merge).

Haskell has this fun problem, because comments are not escaped in strings.

{-
str = "-}"
-}

However this is fine.

{-
{-
-}
-}

14.1   Why Make a Small Language

Because there are less edge cases. Here is an example of such cases.

Can you immediately tell which of the following Python statements are correct and why?

[] = []
[] = ()
[] = ""
() = []
() = ()
() = ""
"" = []
"" = ()
"" = ""

15   Actual Footnotes

Look, it kind of works!

$ make deep-clean run
rm -f *.hi *.o
rm -f Main
rm -f Lexer.hs Parser.hs
alex Lexer.x
happy Parser.y
ghc Main.hs
[1 of 9] Compiling Hack             ( Hack.hs, Hack.o )
[2 of 9] Compiling Common           ( Common.hs, Common.o )
[3 of 9] Compiling Formatter        ( Formatter.hs, Formatter.o )
[4 of 9] Compiling Lexer            ( Lexer.hs, Lexer.o )
[5 of 9] Compiling Parser           ( Parser.hs, Parser.o )
[6 of 9] Compiling Evaluator        ( Evaluator.hs, Evaluator.o )
[7 of 9] Compiling Tester           ( Tester.hs, Tester.o )
[8 of 9] Compiling Interpreter      ( Interpreter.hs, Interpreter.o )
[9 of 9] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
Eigenlanguage Interpreter 0.0.0/2014-09-28 (approximately)
%#1 % This is a comment.
%#2 + 1 2 % Warmup arithmetic.
3
%#3 * # 2 % Previous results are bound to # and friends.
6
%#4 < #3 #2 % These are the friends.
false
%#5 if # unbound-identifier 42 % Lazy evaluation only does what is needed.
42
%#6 `(- 43 ,#) % Code is data...
`(- 43 42)
%#7 evaluate # % ...and data is code.
1
%#8 print-character '?' io % Input and output make use of a uniqueness type.
IO
%#8 #perform % The result is merely a description of effects that can be run.
?
%#9 <- (square (-> x
%..                (* x x))
%..     meaning #5) (square meaning) % Those were the three special forms.
1764
%#10 - # %(This is a syntactic comment.) 1024
740
%#11 always 13 unbound-identifier % This is an ordinary function, like if.
13
%#12 <- (f (-> n
%...           (if (< n 2)
%...               n
%...               (+ (f (- n 1))
%...                  (f (- n 2)))))) (f #) % Recursive bindings work fine.
233
%#13 <- (f (-> n
%...           (if (< n 1)
%...               #
%...               (f (- n 1))))) (f 1048576) % Tail calls are optimized.
233
%#14 head [1 (+ 2 3) 4] % In addition to code as data, there is structured data.
1
%#15 #quit