Case and point: the syntax of module-level definitions (types, function, and module itself) grows out of control. The blog post describes the components that major programming languages allow you to specify for modules and symbols defined in them.
The goal is to have a convenient cheat sheet for people starting their own programming language. Look at the list, mark what components you want to see in your programming language, and make decision how you’re going to fit it all.
Module
- Module name. In Python, the module name is defined by the file name. In Go, module name is specified explicitly in each file and there should be only one module in one directory (which complicates scripting). In Elixir, module is a language construct similar to how you define a class in OOP languages. Rust uses a mixed approach: the module name is the file name and you can explicilty define nested modules inside of it.
- Visibility. In theory, it’s enough to only be able to mark specific symbols as public or private. On practice, it’s very convenient being able to do so for the whole module at once. Rust treats modules as language constructs and allows you to set the same powerful visibility rules as for any type or function. Go doesn’t have visibility rules for packages but if you name a module “internal”, everything inside it (including nested modules) will be visible only from the same project.
- Imports
- Import path. The import path usually contains both the library name and the path to a specific module in that library. Go also includes the major version number for the library for version numbers from v2 and higher.
- Imported symbols. Most of languages allow to import only specific unqualified names. Go doesn’t have it and people don’t complain, so maybe it’s not necessary.
- “import all” flag. I personally don’t use “import all” that often (it’s very convenient for unit tests in Rust, though) but maybe it’s only fair to let users to write their own “prelude” packages and not keep the privilege only for the stdlib.
- Alias. Name conflicts are common (both with other packages and with local names), so it should be possible to rename imported modules and symbols.
- Types
- Constants
- Global variables
- Documentation
- License notice. The Apache 2.0 license suggests (or even requires?) to add a license notice in every text file in the project. It’s not a documentation and it’s not a particularly useful comment. For all languages I know, it gets put into comments. Maybe, it’s time to give it its own section? The section might also be used to specify code owners for the file which might be useful for multi-team projects.
- Compilation flags
- Metadata
Type
- Name
- Visibility
- Documentation
- Invariants
- Fields
- Name.
- Type.
- Default value.
- Documentation.
- Descriptors. You might want to allow specifying custom logic or values for accessing type (not instance) fields. That’s especially usefult for ORMs. For example, descriptors let you make a DSL to construct SQL queries like in LINQ:
name = select(User.name).from(User).exec()
. Python has descriptors, and almost all ORMs and validation libraries use them for specifying fields. - Metadata. Even if you don’t add descriptors, it should be possible to specify arbitrary metadata. For example, Go json library uses it to know how to map JSON field names to the struct fields. Or Rust clap library uses field metadata a lot to know how CLI flags are supported, how they are mapped to struct fields, to provide help text for flags, and everything else you’d need to make powerful CLI.
Function
- Name
- Visibility
- Documentation
- Examples. Most languages (like Python, Elixir, Rust) tell you to put code examples right into the function docs following a special syntax (usually, mimicing REPL output) and then provide tools to parse, run, and check such examples. The problem, however, is that almost always such examples don’t get the same IDE assistance as regular code: no autocomplete, no syntax highlighting, no code formatting, no linting. Go does it a bit better and allows you to define “testable examples” which are almost like regular tests but included in the documentation. However, you won’t see them in your IDE tooltips or when you “go to definition”. I think we need to give a special treatment to examples and take the best of two worlds: get them out of docstrings like in Go but keep them next to the code like in Rust.
- Tests. If a function is pure, it’s easy to write unit tests for it. And I believe that such tests should live next to the function. That’s why it’s a common practice in Rust to place tests right into the module where the tested code is defined.
- Metadata
- Decorators
- Method receiver
- Exceptions. Do you know what errors your function may raise (or return, if your language is functional)? Rust allows you to specify specific types of errors, but on practice people don’t want to bother and take anyhow which makes all errors to be of the same type. Still, I believe that the language should allow users to specify what errors a function can raise and then a special type checker should check that the erros are handled correctly. This specification mught be optional for people who just want to “ship it” but that might be very helpful for library designers, both for API safety and documentation purposes. There is nothing like this in Python (except third-party solutions like deal) and that often leads to unexpected exceptions occuring in unexpected places. Even “no exception” languages like Go or Rust might panic unexpectedly.
- Markers. If we track exceptions and how they are propagated, why don’t let users specify their own markers? For example, we can say that a function uses IO and then recursively mark all function calling that one as using IO as well. Similar to the IO monad in Haskell, except not that tedious and without any effect on the runtime.
- Type variables
- Name
- Constraint
- Function variants. Some languages, like Erlang, Elixir, and Julia, allow function-overload (multiple dispatch), either based on argument type or on arbitrary argument conditions. In such languages, a function with one name might have multiple bodies and even signatures.
- Arguments
- Name or pattern
- Type
- Documentation
- Metadata
- Guards
- Post-conditions.
- Body
- Documentation
- Arguments
Better syntax
Perhaps, we should stop trying to invent it’s own place and syntax for each of these components. Maybe, we should learn from LISP and treat it all the same. Make a hierarchical structure and let everything to be defined in it. For example, here is YAML-based module definition:
name: main
funcs:
add:
doc: Add together two positive integers.
examples:
- add(3, 4) == 7
args:
left: {type: int}
right: {type: int}
returns: {type: int}
guards:
- left > 0
- right > 0
decorators:
- lru_cache
body: |
return left + right
To compare, the same in Python:
@lru_cache
def add(left: int, right: int) -> int:
"""Add together two positive integers.
Example:
>>> add(3, 4)
7
"""
assert left > 0
assert right > 0
return left + right
The YAML syntax has several benefits:
- It’s more flexible. You can easily support simple oneline examples as more complex multiline ones with setup, teardown, title, and maybe description.
- It’s extendable. Adding new features to the language is easy and it won’t affect in any way the existing code. Simply add new fields to the structure.
- It’s easier for newcomers to answer questions like “what is the return type of this function?”
- Since everything has a word, newcomers can search answers online much better. It’s easier to find answers for “what is decorator in LANGUAGE_NAME” rather than “What is @ in Python”.
However, that syntax is much more verbose and without a good IDE assistance is harder to read when you’re looking for something specific. But how a “good IDE assistance” might look like?
Some people say that the future belongs to visual programming languages, like Enso or Node-RED. Some say to always bet on text. I say we need to find the balance. And the balance as I see it is to let function bodies stay text but let modules and definitions in them to get a better representation. Let’s use tables, graphs, and icons, and then a structured YAML-like representation will be a perfect fit for it, and the readability will get even better than in any other text-only language. Plus, readability is subjecttive, and we should let people to configure the best code representation (tables, lists, graphs, text, or mixed) on the IDE side.