A Slightly Longer Introduction to F#

A Simple F# Program

Let’s look at a very simple F# program in source code form.

[<EntryPoint>]
let main argv =
	printfn "Hello, %s!" argv.[0]
	0

Type this program into an editor and save it with the name helloworld.fs. I recommend typing the program instead of copying-and-pasting it because retyping it will force you to notice important details about the program.

Hopefully it’s not too much of a stretch to figure out what this program does! We will look at this program line-by-line to understand what its parts are, but first, let’s understand how to run this program.

The F# Compiler

As with C, your computer cannot understand an F# program in source code form. It must be translated into another form. Unlike C, however, we do not translate F# into an executable binary. Instead, the F# compiler, called fsharpc, converts F# source code into an architecture independent form; architecture independence means that the resulting compiled program can be run on any computer: a personal computer, a cellphone, a supercomputer, a watch, or even an embedded computer (like the kind in “smart lightbulbs”).

How is this independence achieved? By using a virtual machine†. As with C’s simplified model of a computer, a virtual machine provides a simple abstraction that hides many of the quirks present in specific hardware. Unlike C, however, a virtual machine goes much further: it looks like a new kind of simple hardware, with its own instruction set and simplified semantics. Virtual machines even allow languages to be type safe at the virtual hardware layer, which means that many of the security vulnerabilities commonly exploited in languages like C simply are not possible.

The task of building a virtual machine for your platform is up to the language implementors, who have typically have a much better understanding of how to address code portability concerns than your typical programmer. This design is essentially the same taken by the Java programming language: javac produces Java byte code, which is then interpreted by the Java Virtual Machine (JVM). This allows you to write your code once and run it anywhere. “Write once, run anywhere” was even the slogan used by Sun Microsystems when they originally marketed the Java programming language in the mid 1990’s††.

Here is a (snippet of the) translation of the above program into virtual machine byte code, which is the virtual equivalent to machine code. This byte code, called the Common Intermediate Language (CIL), is specifically for Microsoft’s VM, the Common Language Runtime (CLR).

0000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00
0000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00
0000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68
0000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f
0000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20
0000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00
0000080 50 45 00 00 4c 01 03 00 da 06 ca 5b 00 00 00 00
0000090 00 00 00 00 e0 00 0e 01 0b 01 08 00 00 08 00 00
...

Although it looks similar from our perspective to x86 machine instructions, the format is quite different. Here’s the same information using CIL’s instruction mnemonics (i.e., “assembly”):

.class public abstract sealed auto ansi 
  Program
    extends [mscorlib]System.Object
{
  .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) 
    = (01 00 07 00 00 00 00 00 ) // ........
    // int32(7) // 0x00000007

  .method public static int32 
    main(
      string[] argv
    ) cil managed 
  {
    .entrypoint
    .custom instance void [FSharp.Core]Microsoft.FSharp.Core.EntryPointAttribute::.ctor() 
      = (01 00 00 00 )
    .maxstack 4
    .locals init (
      [0] class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string, class [FSharp.Core]Microsoft.FSharp.Core.Unit> V_0,
      [1] string V_1
    )

    // [3 5 - 3 25]
    IL_0000: ldstr        "Hello, %s!"
    IL_0005: newobj       instance void class [FSharp.Core]Microsoft.FSharp.Core.PrintfFormat`5<class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string, class [FSharp.Core]Microsoft.FSharp.Core.Unit>, class [mscorlib]System.IO.TextWriter, class [FSharp.Core]Microsoft.FSharp.Core.Unit, class [FSharp.Core]Microsoft.FSharp.Core.Unit, string>::.ctor(string)
    IL_000a: call         !!0/*class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string, class [FSharp.Core]Microsoft.FSharp.Core.Unit>*/ [FSharp.Core]Microsoft.FSharp.Core.ExtraTopLevelOperators::PrintFormatLine<class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string, class [FSharp.Core]Microsoft.FSharp.Core.Unit>>(class [FSharp.Core]Microsoft.FSharp.Core.PrintfFormat`4<!!0/*class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string, class [FSharp.Core]Microsoft.FSharp.Core.Unit>*/, class [mscorlib]System.IO.TextWriter, class [FSharp.Core]Microsoft.FSharp.Core.Unit, class [FSharp.Core]Microsoft.FSharp.Core.Unit>)
    IL_000f: stloc.0      // V_0
    IL_0010: ldarg.0      // argv
    IL_0011: ldc.i4.0     
    IL_0012: ldelem       [mscorlib]System.String
    IL_0017: stloc.1      // V_1
    IL_0018: ldloc.0      // V_0
    IL_0019: ldloc.1      // V_1
    IL_001a: callvirt     instance !1/*class [FSharp.Core]Microsoft.FSharp.Core.Unit*/ class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string, class [FSharp.Core]Microsoft.FSharp.Core.Unit>::Invoke(!0/*string*/)
    IL_001f: pop          

    // [4 5 - 4 6]
    IL_0020: ldc.i4.0     
    IL_0021: ret          

  } // end of method Program::main
} // end of class Program

As you can see, there is quite a lot of “high-level” stuff baked into this bytecode. The CLR knows about data types, classes, methods, namespaces, etc.

Note that you do not need to know CIL in this class. I’m just showing this to you to give you some perspective.

† Specifically, a process virtual machine, which is a virtual machine specifically designed to host a language. Such VMs are simpler than fully virtual machines, which are intended to mimic an actual processor, along with all its quirks, in order to host an entire operating system.

†† It should be noted that neither Microsoft nor Sun Microsystems invented the idea of portable bytecode. That honor appears to go to Martin Richards, who came up with O-code to make it easier to port the BCPL language in the mid-1960’s.

History

F# was developed by a research group at Microsoft Research, led by Don Syme. Although F# has some novel features, particularly the ways in which it interoperates with other .NET code, its syntax and semantcs are largely inspired by the ML family of programming languages. Syntactically, F# added whitespace-sensitivity (like Python) and “lightweight” refinements of older ML syntax (like the incredibly annoying let ... in syntax from Standard ML) that, in my opinion, makes it very pleasant to use. If you like Python, you’ll probably like F#.

ML was designed by researchers at the University of Edinburgh, most notably Robin Milner, around 1973. ML stands for “meta language,” because it was originally designed to be a meta language for writing “proof tactics” (you can think of these as search procedures) for the LCF automated theorem prover. Although ML was heavily inspired by mathematical logic and early functional programming languages like LISP (the first version of ML was written in LISP!), its authors made a concerted effort early on to create something “elegant.” But what makes ML especially interesting is that the language design was not static. Milner was inspired by other programming language research happening concurrently at Edinburgh, notably the HOPE programming language. ML borrowed many ideas from these other languages when it made the language feel simpler or more elegant to its authors. For example, pattern matching originally came from HOPE.

I enjoy reading ML’s early design documents, because it is clear that the most important thing was to build a “simple and well-understood” language. ML was also one of the first languages to have a complete formal specification. Nonetheless, ML has a strong pragmatic streak that makes it—in my opinion—a lot more fun to program in than other programming languages.

ML quickly outgrew its origins in the LCF project and was used widely among academics starting in the late 1970’s. “Standard ML” (SML) arose in the 1980’s out of a desire to allow for many implementations of ML. One of the most popular versions of SML is the “Standard ML of New Jersey” (SML/NJ) implementation that was jointly developed by Bell Labs and Princeton University, both of which are in New Jersey. Our lab machines have smlnj interpreters installed on them, so if you’re curious about the differences between F# and SML, have a look.

Compiling using `fsharpc`

For this class, we will be using the fsharpc compiler. If you’ve programmed on Windows before, you may have used the Visual Studio IDE (confusingly, this is a totally different product than “Visual Studio Code”). You are welcome to use Visual Studio if you have it, but as it is quite expensive, I do not require it for this class.

To compile helloworld.fs, type:

$ fsharpc helloworld.fs

If you made no mistakes when you typed in your program, fsharpc will print

Microsoft (R) F# Compiler version 4.1
Copyright (c) Microsoft Corporation. All Rights Reserved.

and nothing else.

If fsharpc prints an error message, go back and look carefully at your program to find your mistake and try again.

Once you have successfully compiled your program, you should see a helloworld.exe file in your working directory. The following command lists the current directory and shows that I now have a helloworld.exe file.

$ ls -l
total 5160
-rwxr-xr-x  1 dbarowy  staff  2632192 Mar 21  2018 FSharp.Core.dll
-rwxr-xr-x  1 dbarowy  staff     3584 Oct 19 16:31 helloworld.exe
-rw-r--r--  1 dbarowy  staff       84 Oct 19 12:31 helloworld.fs

You will probably also see a FSharp.Core.dll file here. This is the F# standard library, and it must accompany your .exe file. On Windows, you may not see this file because it is instead installed in a system-wide location.

Running the program

Note that all the F# compiler does is convert the program into CIL byte code. It does not actually run the program. If you are on Linux (lab environment) or the MacOS to run the program, type

$ mono helloworld.exe Dan

If you are on Windows, typing

C:\mydir>helloworld.exe Dan

should suffice.

You should be rewarded with the text Hello Dan! printed on screen. Try it using your name as an argument.

MSBuild

Manually managing the fsharpc compiler can get a little annoying in the same way that managing clang can be annoying. Therefore, this class asks you to produce code as a part of an MSBuild project. MSBuild is Microsoft’s (much more sophisticated) equivalent to Makefiles. You should have first encountered dotnet in the Brief Introduction to F#.

Because MSBuild projects are written in XML, we will mostly create and manage our projects using a command line tool called dotnet. This tool automatically generates and edits MSBuild files for you. However, since the dotnet tool is still in its infancy (it was first released in late 2016), we will occasionally need to modify MSBuild files by hand.

New projects

We create new projects using the dotnet new command. Typing this command without arguments will show you a set of project templates that you can use to create a new project. For this class, we will mostly use the command dotnet new console -lang F# which creates a new project for a command-line program using F# as the language.

Note that dotnet new creates a new project in the current directory. Be careful if you are putting your code in a location that already has code as the effect may not be what you intend.

The default console project is a helloworld program, which is quite convenient.

Building

To build a project, cd to the directory containing your project files and type

$ dotnet build

Running

To run a project, cd to the directory containing your project files and type

$ dotnet run

Adding a new file to a project

By default, your project contains only a single file called Program.fs. Unlike Java, F# does not care where you put your code. It could all go into a single source code file.

However, as your projects grow in size, you will find it beneficial to organize your code across multiple files. I like to organize my code according to “responsibilities.” For example, maybe I have a program that reads input, does some processing, builds a data structure, computes some values, and then prints out the result. In this case, I might have a file called io.fs for input and output processing, utils.fs to handle data manipulation (like converting data from arrays into hash tables), and algorithms.fs for the core computation. I personally like to keep very little in the Program.fs file, which mostly just contains the main function.

However, use whatever system of organization makes sense to you.

To add a new file to your project, you need to do two things. Suppose we create a new file called io.fs and we want to call its code from the Program.fs file. Look for a .fsproj file in your project directory. This is your project specification. Open it up with your favorite code editor. You should see something like

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <Compile Include="Program.fs" />
  </ItemGroup>

</Project>

We need to add a Compile tag just above the Compile tag for Program.fs so that MSBuild will compile io.fs first. Here’s what my .fsproj file looks like after I make the change:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <Compile Include="io.fs" />
    <Compile Include="Program.fs" />
  </ItemGroup>

</Project>

Running dotnet build should now allow Program.fs to refer to code stored in io.fs. Note that if you’re following along at home, you will likely see the following error when you try the above.

error FS0222: Files in libraries or multiple-file applications must begin with a namespace or module declaration

What’s the deal? In short, in F#, you must place all library code inside either a module or a namespace. Both of these two things are a form of code organization. We’ll stick to modules for this course, as namespaces are only slightly different and are not really necessary unless you are mixing F# and C# code (C# has no notion of modules, only namespaces).

Put a module declaration at the top of your module to make this error message go away. For example, in io.fs, put:

module IO

and in your Program.fs, write

open IO

“This means something. This is important.” Understanding code you wrote.

Please watch Close Encounters of the Third Kind. Thank you.

(If you don’t get the reference above, your homework is to gather your friends together and watch Close Encounters of the Third Kind.)

You may recall my previous admonishment against using code that you don’t understand. Let’s understand the helloworld.fs program we just typed in, line by line.

Entry points

The first line,

[<EntryPoint>]

marks the function as the entry point to the program. The entry point is the location in the program where computation begins. In C, the main function is always the entry point. F# gives you a bit more flexibility: it can be any one function, as long as that one function is labeled with the [<EntryPoint>] annotation.

Function definitions

The next line,

let main argv =

denotes the start of a function definition. Unlike C and Java, F# is whitespace-sensitive, like Python. In F# code must be indented using spaces (not tabs– F# is an opinionated language, meaning that code style is enforced in the language). The body of the function definition begins at the = character and extends until the end of the indented region below.

Note that, unlike most languages you’ve studied so far, F# functions are true functions, meaning that they must always return a value. While side-effecting functions are possible in F#, e.g., like the swap function we’ve studied from C, they are strongly discouraged. Thus you are encouraged to write pure functions. In this class, you should assume that we will be writing pure functions unless otherwise specified.

This function definition looks fairly sparse, doesn’t it? In fact, despite the fact that F# is a statically typed programming language, there are no type annotations in the above. That’s because F# can usually infer your type annotations without your help.

In F#, declarations of all kinds start with the keyword let. You may recall that let can be constructed as a form of syntactic sugar, meaning that it is made out of other primitives. This is indeed how let works in F#. In general, you should assume that let simply means that we should bind the expression to the right of the = to the name on the left of the =.

You might be thinking, “wait, variables and functions are declared the same way in F#?” Indeed they are. So

let main argv = ...

declares a function called main with a single argument called argv, bound to the expression on the right, and

let x = 1

declares a variable called x bound to the value 1. The way that F# knows the difference between a function and a variable is that the expression on the right is parameterized by the arguments (e.g., argv) to the left of the equals sign.

So in this case, we are declaring a function main that has a single argument, argv. Since you’re new to F#, you may be thinking “how am I supposed to know what type argv is?” This is admittedly one of the downsides of type inference– that information is not obvious in the program text. That said, unlike a dynamically typed language like Python, if you get the type of argv wrong, fsharpc will tell you. Suppose for a moment that my main function was:

let main argv =
    argv + 1

Then F# will report,

error FS0001: The type 'int' does not match the type 'string []'

and I will not be able to run the program. Since the type check fails—argv is a string[], not an int—compilation also fails. This is a feature of a statically-typed language, because it enables you to find bugs in your program before you run it.†

You can also add type annotations yourself, and if you are at all unsure what the types of various things are, I encourage you to write them. Let’s rewrite our main function with types.

let main(argv: string[]) : int =
    printfn "Hello, %s!" argv.[0]
	0

The syntax of a typed function in F# is the following:

let <function name> (<arg_1>: <type_1>) ... (<arg_n>: <type_n>) : <return type> =
	<expression>

It’s up to you how you want to write your programs. F# doesn’t care, and neither do I. I encourage you to try out the parens-less syntax, however, as once you are accustomed to it, you will find F# programs very easy to read.

† An interesting consequence of static types is that the halting problem is decidable (i.e., computable) for well-typed programs (i.e., programs whose types are valid and consistent). In fact, this is an easy thing to compute, because well-typed programs always halt. This property was demonstrated by Alonzo Church with his simply typed lambda calculus. The pure, functional subset of ML was the very first language that had this property. The tradeoff is that some programs are not expressible in F#, such as the following program, which we originally encountered when looking at lambda desugarings of Python:

let f g = g g

[<EntryPoint>]
let main argv =
    f f

This program yields the following compiler error (which is a good thing!):

error FS0001: Type mismatch. Expecting a    ''a'    but given a    ''a -> 'b'    The types ''a' and ''a -> 'b' cannot be unified.

Function body

The meat of our main function is the following:

	printfn "Hello, %s!" argv.[0]
	0

Note that this code is indented from main. This is how we know that the code is a part of the main function definition. My personal convention is to use 4 spaces. Others use 2. Again, choose what you like, but note that the F# compiler will not let you use tabs.

The first line calls the printfn function. Function calls in F# work exactly the same way as application in the lambda calculus. In other words, an expression of the form

a b

means that we should apply the argument b to the function a. The above is actually valid F#. Here’s an example where that might make more sense to you:

let b = 1
let a x = x + 1
a b

which returns 2.

Remember our “parentheses axiom”? <expr> = (<expr>)? Well, that applies here, too. So we could rewrite main as:

let main(argv: string[]) : int =
    printfn("Hello, %s!")(argv.[0])
	0

and this is exactly the same program. “BUT WAIT,” you say, “WHY DOES printfn HAVE TWO PARENS????”

Ah. That’s because, in F#, just as in the lambda calculus, function calls are curried.

Curried functions definitions, function types, and function application (aka “function calls”)

Recall that the lambda calculus has no notation for functions of multiple arguments. It doesn’t have them because they are not necessary. For example:

λx.λy.x y

Is a function of two arguments. More precisely, it is a function that takes an argument x and returns a function that takes a y.

The equivalent F# function (aside from the fact that we name the function f) is:

let f x y = x y

The type of the function tells us the same thing. The -> type notation tells us that a type is actually a function. So, for instance,

let a x = x + 1

has type

int -> int

because it is a function that takes an int and returns an int.

The type for let f x y = x y is:

('a -> 'b) -> 'a -> 'b

which may make your mind melt a little the first time you see something like it. But these things are actually easy to read with a little practice. First a tick variable is a polymorphic type, meaning that any type can be used there. So x, our first variable, must be a function from any type 'a to any other type 'b. Since the types of 'a and 'b need not be the same, we have two different letters (a and b).

If we partially apply the function f by giving it a function, it then returns a function from 'a to 'b. Here’s a concrete example:

let f x y = x y
let a x = x + 1
let g = f a

The type of g is int -> int. The polymorphic types 'a and 'b went away because the function a deals with integers (remember, it’s an int -> int) so their types are no longer unknown.

You may find it hard to envision why we would ever want partially applied functions. I did too, when I first learned F#! And, in fact, I went years without explicitly constructing any curried functions, which seems to suggest that they are not necessary. In fact, they are not necessary. Java, for example, does not have syntax for partial function application or curried functions and many useful programs are written in Java. Nonetheless, when I discovered their first “killer application”, parsing, it changed the way that I thought about them. I now write much more concise, readable code than the code I wrote before. Part of the reason is that I curry my functions when it makes sense.

Let’s look at the type of the printfn function. It is:

TextWriterFormat<'a> -> 'a

which is, perhaps, a little puzzling until you recall that F# is designed to interoperate with other .NET code. Most of Microsoft’s .NET code is written for C#, and as C# has historically been a knock-off† of Java, it is perhaps not surprising that C# has generics. This means that F# programs can have both polymorphic types like 'a and generic types! For the most part, F# will handle the hairy details of converting between these for you, and you can see that in the above type declaration we use both: TextWriterFormat is a C# generic class, but we can give it a polymorphic type 'a.

Anyway, with the above type declaration, we can see that printfn takes a TextWriterFormat<'a> and returns an 'a. Hang on… isn’t printfn a curried function? Actually, no— and the reason is that TextWriterFormat<'a> secretly is a function. For example, when used in printfn, the string %s causes F# to infer that you need a function string -> unit, and so the type of printfn becomes:

(string -> unit) -> string -> unit

and you’d call it like:

printfn "%s" "heya"

If we use the format string %s %d, then printfn becomes:

(string -> int -> unit) -> string -> int -> unit

and you’d call it like:

printfn "%s %d" "hello" 1

This is how printfn is able to “magically” determine how many parameters to take depending on the given format string. Cool, huh? Java cannot do this, and in fact, Java’s String.format method cannot be statically type checked (it checks dynamically and throws an exception when you mess up).

†My friends at Microsoft would probably be a little offended by this statement. Although the JVM is much more sophisticated than the CLR, the C# language has evolved well past the Java language at this point, and for me, C# is a much more pleasant language, with far fewer ugly corners, than Java.

Return value

The last line in our main program is 0. In F#, the last expression in a function definition is the return value. If you recall from our discussion of C, returning 0 tells the operating system that “everything ran OK.” Any other value signals an error.

In this case, the return value is

The rest of F#

As mentioned before, the ML tradition favors pragmatism over mathematical purity. Therefore, it allows a programmer great flexibility to wriggle out of tough situations using mutable variables, side effects, imperative code, and casts. I strongly discourage you from using these features in this class. In fact, for this class, use of mutability, side effects, imperative code, and casts will be penalized. Why? Because I want you to learn functional programming. When you’re all grown up and you leave Williams College, feel free to use those other features. I myself do this in some circumstances, particularly when it is important that my code be fast. By the end of this semester, you will have an appreciation for the tradeoffs that these little “escape hatches” entail, which are substantial.

Expressions

First, like LISP, everything in F# is an expression. Using the fsharpi read-eval-print-loop (REPL) program,

> 1;;
val it : int = 1

we can immediately see that everything we type in returns a value.

There are no statements in F#, although there are functions that look similar. Remember printfn from above? You may recall that when we called it like

printfn "Hello, %s!" "Dan"

it returned unit. What is unit? Well, the purpose of F# is essentially to produce a side effect. It does not return anything. But in F# everything is an expression, so something must be returned. In this case, since we don’t expect anything back, the special value () is returned, which means “nothing” and has type unit. Let’s see for ourselves in fsharpi.

> printfn "Hello, %s!" "Dan";;
Hello, Dan!
val it : unit = ()

Lambda expressions

I claimed earlier that F# function calls are essentially the same as function application from the lambda calculus. Because F# has named functions, this may not have been immediately apparent. Lambda expressions are a feature of F# that allow us to create functions “anonymously,” and using this feature, the similarity should be much clearer.

The following is a lambda expression that computes the identity function:

fun x -> x

Observe the similarity to the following lambda expression:

λx.x

Let’s apply our lambda expression.

(λx.x)y

Using beta-reduction, you know that the result is y. We can write exactly the same program in F#.

let y = 1
(fun x -> x) y

which evaluates to 1.

Lambda expressions are very useful in F#, and we use them widely, particularly in map and fold functions, which we will get to later.

Types

F# is a statically typed programming language, which means that every variable and datum in the language must have an associated type label, and that all operations on data must type check, meaning that those operations are logically consistent.

F# has a small set of primitive data types. These types represent fundamental categories of data representation in the underlying CLR virtual machine. The complete list is here. Primitive types are written in lowercase in F#.

F# also allows you to create user-defined types. Note that the convention in F# is to write variables in all lowercase and user-defined types in upper camel case.

Algebraic data types and pattern matching

Algebraic data types (ADTs) and pattern matching are two features first widely used in the Hope programming language, which was developed concurrently with ML. ADTs and pattern matching are like chocolate and peanut butter: better together (this commercial is totally ridiculous and worth the 30 seconds of your time.)

Algebraic data types are a way of concisely express hierarchies of types without inheritance. Inheritance is a feature from object-oriented programming, which we will discuss later in the semester. If you already happen to know what inheritance is, you can think of ADTs as its functional equivalent. As you will see, they are in many ways much more elegant and easy to reason about, although there is a big tradeoff when it comes to large-scale software engineering projects.

Let’s create a data type that represents a small set of animals. We’d like, at various points in our program, to be able to work generically with animals, and then at other points, to be able to work specifically with specific animals, like ducks and geese.

type Animal =
| Duck
| Goose
| Cow
| Human

The above is an algebraic data type. In F#, we call this kind of type definition a discriminated union. In other ML variants, these are sometimes called union types or sum types. These terms mean the same thing.

What is the meaning of Duck or Cow? They are, in fact, constructors. So if we want a Duck, we type Duck. Let’s try it in fsharpi:

> let donald = Duck;;
val donald : Animal = Duck

Likewise, if we want a Cow, we type:

> let ephelia = Cow;;
val ephelia : Animal = Cow

Notice that the types of donald and ephelia are, quite specifically, Animals.

Let’s now make a function that takes an Animal and does the “right thing” depending on the kind of animal.

let squeeze a =
    match a with
    | Duck -> "quack!"
    | Goose -> "honk!"
    | Cow -> "meeeeph!"
    | Human -> "WHY ARE YOU SQUEEZING MEEEEEE????" 

The match ... with expression tells F# that we want to perform the appropriate thing using pattern matching. Pattern matching is a feature of many functional programming languages that let you concisely expression conditional logic.

So if we squeeze ephelia, the function does the right thing:

> squeeze ephelia;;
val it : string = "meeeeph!"

Note that pattern matching is type-safe. Suppose we forgot to put in the case for Human. fsharpc will report that we missed a case.

    match a with
  --------^

warning FS0025: Incomplete pattern matches on this expression. For example, the value 'Human' may indicate a case not covered by the pattern(s).

Pattern matching also lets us deal concisely with cases that don’t matter to us. For example, imagine that all we really care about is squeezing Cows. We could write:

let squeeze a =
  match a with
  | Cow -> "meeeeph!"
  | _ -> "complaint"

ADTs that store data

The ADTs we’ve seen so for are limited in their usefulness because they don’t store any data. We can extend them to store any data that we want.

type Expr =
| Variable of char
| Abstraction of char * Expr
| Application of Expr * Expr

The above is the complete type hierarchy needed to represent the untyped lambda calculus. Expressions of the form 'a * 'b are tuples. So a char * Expr is a 2-tuple that stores a char on the left and an Expr on the right. You can have tuples of any arity in F#.

Also, observe that, we can also write types recursively. Application stores two instances of Expr!

Let’s write a prettyprint function that, when given an Expr, prints it out all pretty-like.

let rec prettyprint e =
  match e with
  | Variable(c) -> c.ToString()
  | Abstraction(v,e') -> "(λ" + v.ToString() + "." + (prettyprint e') + ")"
  | Application(e', e'') -> (prettyprint e') + (prettyprint e'')

Each case, or pattern guard, ensures that e has the form specified on the left side before binding the variables to the given names and then executing the code on the right side. For example, Abstraction(v, e') checks that e is an Expr of the form Abstraction, which contains a 2-tuple. If so, the left side of the tuple is bound to v and the right side of the tuple is bound to e'.

Pattern guards are composed from deconstructors, which are have the same syntax and are essentially the inverse of constructors. For example, you can construct a tuple and deconstruct it using the same syntax.

> let tup = (1, "hi");;
val tup : int * string = (1, "hi")

> let (a,b) = tup;;
val b : string = "hi"
val a : int = 1

Anyway, let’s try out our prettyprint function:

- let e = Variable('x');;
val e : Expr = Variable 'x'

> prettyprint e;;
val it : string = "x"

> let e2 = Abstraction('x', e);;
val e2 : Expr = Abstraction ('x',Variable 'x')

> prettyprint e2;;
val it : string = "(λx.x)"

> let e3 = Application(e2, Variable('y'));;
val e3 : Expr = Application (Abstraction ('x',Variable 'x'),Variable 'y')

> prettyprint e3;;
val it : string = "(λx.x)y"

I personally find this an amazing way to build software. Many concepts in computer science are simple and elegant in theory but difficult to implement in code in practice. F# and other ML languages give us the tools to express simple concepts simply.

Limitations

There are two things to note about the above definitions. First, note that I was not able to write the following definition:

type Expr =
| Variable of char
| Abstraction of Variable * Expr
| Application of Expr * Expr

This definition produces the following error:

  | Abstraction of Variable * Expr
  -----------------^^^^^^^^

error FS0039: The type 'Variable' is not defined.

This is because, although the entire type expression is recursive, individual cases are not.

Secondly, we had to use the rec keyword in our function definition.

let rec prettyprint e =

Leaving out the rec produces the following error:

    | Abstraction(v,e') -> "(λ" + v.ToString() + "." + (prettyprint e') + ")"
  ------------------------------------------------------^^^^^^^^^^^

error FS0039: The value or constructor 'prettyprint' is not defined.

This is because F# strictly adheres to the lambda desugaring approach to interpreting let. Recall that let z = U in V desugars to the expression (λz.V)U. Well, if z is a function name, it is not available to use in the expression U because U is outside the lambda abstraction. Adding the rec keyword tells F# that the name of the function should be available inside the function body.

Equality and type checking

Type checking is the process of ensuring that the use of values is logical an consistent. Checking for equality is one of these cases where we can check at compile time whether two values will ever be equal without having to check values. We can do this because, most of the time, when the left hand side and the right hand side of an equality expression have different types, they will never be equal in value.

F#’s type system is a little different than the kind you’ve seen before in Java and C. Those languages use a form of types called a nominal type system. In essence, a nominal type is simply a label. Nominal type checking, therefore, boils down to ensuring that these labels match. For example, the following C program fails to type check:

int i = 1;
char *c = i;

because i is an int and c is a char *, although because C is weakly typed, this is only a warning and not a fatal compilation error:

program.c:3:9: warning: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int'
      [-Wint-conversion]
  char *c = i;
        ^   ~

Java adds extra expressiveness to C’s nominal type system which allows it to express subtypes (i.e., inheritance), which is why we say that it has a nominal type system with subtyping. Subtyping essentially means that, under some circumstances, some labels can be substituted for others. Java also enforces types strongly, so the Java equivalent to the above C program is a fatal compilation error.

F# uses a system of structural typing. In this case, equality is more than just checking labels, although labels are at the “bottom” of the type system. Structural type checking means that the type checker must prove, inductively, that two expressions are equivalent because they yield the same structural type.

An example helps to clarify. For example,

> let a = 1;;
val a : int = 1

> let b = 1;;
val b : int = 1

> a = b;;
val it : bool = true

In this case, not only are the types for a and b equal, so are the values (note that F# denotes equality using the = symbol, not the == symbol; “not equal” is denoted with <>).

Neither a nor b have any “structure” and so the base case for structural type checking is nominal: we simply check that the labels match (int equals int).

But how do we check the equality of something more complicated?

The following checks inductively:

> let c = (1,"hi");;
val c : int * string = (1, "hi")

> let d = (1,"hi");;
val d : int * string = (1, "hi")

> c = d;;
val it : bool = true

To see that (1,"hi") equals (1,"hi"), we need to know the type of each expression. Both are 2-tuples. Thus, we know for two 2-tuples to be equal, we must check that both left sides are equal and that both right sides are equal (inductive step). Both left sides are int and 1 equals 1 (base case). Both right sides are string and "hi" equals "hi" (base case). Therefore, both 2-tuples are equal. Therefore (1,"hi") equals (1,"hi").

Structrual type systems make equality comparisons very easy, and they extend to what are normally “opaque” types in other languages, like lists and arrays.

Lists

Lists are frequently utilized in functional programming. As we saw with LISP, lists can be used as a fundamental unit of composition for many more complicated data structures. F# also allows you to use lists in this manner, and because they are so important and widely-used, they are even easier to manipulate than in LISP.

For example, the following lets us define a list using list literal syntax in F#:

let xs = [1; 2; 3; 4;]

We can also perform a variety of operations on lists easily:

> let xs = [1; 2; 3; 4;];;
val xs : int list = [1; 2; 3; 4]

> List.head xs;;
val it : int = 1

> List.tail xs;;
val it : int list = [2; 3; 4]

> let xs' = 0 :: xs;;
val xs' : int list = [0; 1; 2; 3; 4]

> List.length xs';;
val it : int = 5

> List.append xs' xs';;
val it : int list = [0; 1; 2; 3; 4; 0; 1; 2; 3; 4]

> List.rev xs';;
val it : int list = [4; 3; 2; 1; 0]

> List.sum xs;;
val it : int = 10

> List.map (fun x -> x - x) ;;
val it : (int list -> int list) = <fun:it@60-6>

> List.fold (fun acc x -> acc + x) 0 xs;;  // this is fold left
val it : int = 10

> List.foldBack (fun x acc -> acc + x.ToString()) xs "";; // this is fold right; note that we change the type
val it : string = "4321"

> xs' = xs';;
val it : bool = true

> xs = xs';;
val it : bool = false

> let ys = [0; 1; 2; 3; 4];;
val ys : int list = [0; 1; 2; 3; 4]

> xs' = ys;;
val it : bool = true

Notice that we were able to compare two lists simply, using =. This is possible because of F#’s structural type system.

We can even pattern-match on lists, which is incredibly useful for functions that recurse on lists:

let rec list_length xs =
  match xs with
  | []     -> 0
  | x::xs' -> 1 + list_length xs'

where [] or nil represents the empty list and :: means cons.

> list_length xs;;
val it : int = 4

You can find the complete documentation on F# lists here.

Nonetheless, F# is a pragmatic language, and as a matter of pragmatism, we should probably recognize that lists have undesirable performance properties for many applications, particularly numerical computing. For many applications, richer data types are desirable. F# has these too.

Arrays

Lists have O(n) performance for random-access. In other words, in the worst case, the cost of accessing an element is the cost of traversing the entire list. Arrays have much better performance for random-access: O(1). In other words, the worst case performance for arrays is a constant time, or the amount of time it takes to execute a single operation.

The following lets us define an array using array literal syntax in F#:

let arr = [|1; 2; 3; 4;|]

We can also perform a variety of operations on arrays easily:

> let arr = [|1; 2; 3; 4;|];;
val arr : int [] = [|1; 2; 3; 4|]

> let i = arr.[3];;
val i : int = 4

> Array.length arr;;
val it : int = 4

> Array.rev arr;;
val it : int [] = [|4; 3; 2; 1|]

> Array.filter (fun x -> x > 2) arr;;
val it : int [] = [|3; 4|]

> arr.[0..1];;
val it : int [] = [|1; 2|]

> arr.[2..];; 
val it : int [] = [|3; 4|]

> arr.[..2];;
val it : int [] = [|1; 2; 3|]

> Array.init 10 (fun i -> i * i);;
val it : int [] = [|0; 1; 4; 9; 16; 25; 36; 49; 64; 81|]

> Array.sum (Array.init 10 (fun i -> i * i));;
val it : int = 285

> Array.map (fun x -> x + 1) arr;;
val it : int [] = [|2; 3; 4; 5|]

> Array.fold (fun acc x -> acc + x) 0 arr;;  // this is fold left
val it : int = 10

> Array.foldBack (fun x acc -> acc + x) arr 0;; // this is fold right
val it : int = 10

> let arr2 =  [|1; 2; 3; 4;|];;
val arr2 : int [] = [|1; 2; 3; 4|]

> arr = arr2;;
val it : bool = true

Notice that we were able to compare two arrays simply, using =. This is possible because of F#’s structural type system.

You can find the complete documentation on F# arrays here.

Other types

F# has many other types, including Tuple, Sequence, Map, Option, classes, interfaces, and abstract classes, and many others. The latter three types are object oriented features of F#. Please do not use classes, interfaces, or abstract classes unless I ask you to do so.

You can find the complete F# language reference here. I encourage you to refer to it often as it is thorough and quite easy to read.

Conditionals

if/else expressions look a bit like their counterparts in Python:

if x > 0 then
  1
else
  2

Of course, since conditionals are expressions in F#, you can also do neat tricks like use them to conditionally assign values, much like how you might use the ternary operator (a ? b : c) in C:

let y = if x > 0 then
          1
        else
          2

Indentation is important for conditionals. Note that the body of the true and false clauses must be indented past the start of the if expression.

Loops

While F# has looping constructs, for and while, you should not use them in this class. Instead, you should use map, fold, and recursion instead.

Here’s a handy guide for deciding which construct to use in F# when your brain tells you that you need to use a loop.

Problem	Java	F#
You want to do something until a condition is met	`while`	Write a recursive function and make the condition a base case.
You want to do something for a bounded number of times	`for`	Write a recursive function and make the bound a base case.
You want to convert every element of a collection into something else	`for` or `foreach`	`map`
You want to accumulate a value	`for`, `foreach`, or `while`	`fold`
You want to convert a data structure into another data structure	Recursive function	Recursive function or `fold`

Recursion instead of `while`

Let’s start with recursion to solve problems. One nice thing about a while loop is that you don’t need to know how many times your program needs to repeat itself until it is done computing a value. For example, when doing a membership query in an unsorted linked list in C (e.g., “is 4 in the list?”), you can use a while loop to simply continue getting the next element until it is found.

bool contains(listnode *xs, int x) {
  while (xs != null) {
    if (xs->head == x) {
      return true;
    } else {
      xs = xs->tail;
    }
  }
  return false;
}

What characterizes problems like this is that there is no obvious bound on the loop before we find the element we’re looking for. We solve this class of problems in functional programming with recursion.

let rec contains xs x =
  match xs with
  | [] -> false
  | y::ys ->
    if x = y then
	  true
    else
	  contains ys x

(Remember to put rec in your function definition for recursive functions.)

This particular solution pattern matches on the list to deconstruct it into a head (y) and a tail (ys), but you could also explicitly call List.head and List.tail if you wanted.

You might argue that this is a silly example because we know that there can’t be any more comparisons than the length of the string. That’s true. But there are other problems where you actually don’t know at all, and the same pattern applies. For example, what if we wanted to run a little experiment: how many times do we need to flip a coin before a heads comes up? The outcome is determined by the laws of probability. It probably won’t take many coin flips for heads to come up—in fact, we know that we have a 50% chance on the very first try—but it could take a very long time. There’s a nonzero probability that it could take a trillion coin flips.

let rec num_tries_until_match(r: System.Random)(n: int)(i: int)(e: int) =
  let rn = r.Next(n)
  if rn = e then 
    i
  else
    num_tries_until_match r n (i + 1) e

Where n is the number of “sides” of our coin (or die, or whatever), i is the count so far, and e is the value we’re looking for. Let’s say that, when n = 2, then when e = 0 that’s heads and when e = 1 that’s tails. If we call this a few times, you can see that the answer can vary quite a bit:

> let r = System.Random();;
val r : System.Random

> num_tries_until_match r 2 1 0;;
val it : int = 3

> num_tries_until_match r 2 1 0;;
val it : int = 1

> num_tries_until_match r 2 1 0;;
val it : int = 2

> num_tries_until_match r 2 1 0;;
val it : int = 4

> num_tries_until_match r 2 1 0;;
val it : int = 5

Map

On the other hand, many problems do exhibit bounded iteration. For example, if you want to convert every number in a list into a string, you know exactly how many you need to do: whatever the length of the list is. This is what map is for.

> List.map (fun x -> x.ToString()) [11;22;33;44];;
val it : string list = ["11"; "22"; "33"; "44"]

Note that the type of the output need not be the type of the input. Here, we convert int values into string values.

Let’s look at the function definition for List.map:

('a -> 'b) -> 'a list -> 'b list

We’ll step through, bit by bit. The parens tell us that the first argument is a 'a -> 'b. Right away, because it contains an ->, we know that it is a function. What kind of function? A function from some unknown type 'a to some other unknown type 'b. Now, it is entirely plausible that 'a and 'b are the same type (e.g., int). But what this type definition tells us is that they don’t have to be. We call this first parameter the mapping function. It “converts” or “maps” a given value to another value.

Next parameter, 'a list. This parameter is the set of inputs, specifically a list in this case.

Finally, the last parameter, 'b list. This last parameter is the type of the output. This should make intuitive sense, because if you call the mapping function on each element of a list of 'as, you will get a get a list of 'bs.

Map has a few interesting properties. For starters, if the mapping function 'a -> 'b always terminates, then map always terminates. For anybody who has ever accidentally written a loop that doesn’t terminate, this is a nice feature! Second, notice that each call of the mapping function only depends on one of the values in the list. In fact, each call of the function is totally independent from every other call. Problems that have this structure are called embarassingly parallel, because it takes no work at all to actually do the computation in parallel. In fact, F# makes this almost absurdly easy to do, although we cannot do it with lists (for hopefully obvious reasons… think about this a bit if you don’t know why):

> Array.map (fun x -> x.ToString()) [|11;22;33;44|];;
val it : string [] = [|"11"; "22"; "33"; "44"|]

> Array.Parallel.map (fun x -> x.ToString()) [|11;22;33;44|];;
val it : string [] = [|"11"; "22"; "33"; "44"|]

Note that it is not always faster to do things in parallel! In this case, because our list is small, the serial version is actually faster.

> let timeit (f: int[] -> string[])(input: int[]) =
-   let sw = System.Diagnostics.Stopwatch.StartNew()
-   f input |> ignore
-   sw.ElapsedTicks
- ;;
val timeit : f:(int [] -> string []) -> input:int [] -> int64

> let input = [|11;22;33;44|];;
val input : int [] = [|11; 22; 33; 44|]

> timeit (fun xs -> Array.map (fun x -> x.ToString()) xs) input;;
val it : int64 = 5657L

> timeit (fun xs -> Array.Parallel.map (fun x -> x.ToString()) xs) input;;
val it : int64 = 7296L

Fold

Folding is useful anytime you want to accumulate a value. This is, of course, useful in scenarios like the following. Here, we multiply the numbers 1 through 7 together.

> let product = List.fold (fun acc x -> acc * x) 1 [1;2;3;4;5;6;7];;
val product : int = 5040

Let’s look at fold’s type signature:

('a -> 'b -> 'a) -> 'a -> 'b list -> 'a

The first parameter is a function of two values, 'a and 'b, which returns a 'a. In our fold above, that function, which we call the folding function, multiplies those two values together. The thing to remember is that the first parameter to our folding function is the accumulator: it’s where we store the result of the operation from one iteration of fold to the next. The second parameter to the folding function is the element, which is taken from the list (we’ll get to that in a second).

The second parameter to fold is the initial value of the accumulator. Since we want to multiply, in this case, we set it to 1.

The last parameter to fold is the list. This is where the folding function’s 'b parameter comes from.

Finally, the return value is a 'a, because that’s the type of our accumulator. When there are no more elements in the list, the accumulator is simply returned.

Here’s another example, where we convert strings into numbers and sum them:

List.fold (fun acc x -> acc + int x) 0 ["1";"2";"3";"4"];;
val it : int = 10

fold is an incredibly powerful function. It is not limited to lists. You can fold arrays, trees, graphs, etc. In fact, you can implement map using fold (try challenging yourself to figure it out). But a consequence of this power, which is the ability to make each step of a computation interdependent, is that it is no longer embarassingly parallel. A great deal of research in computer science has been devoted toward finding just enough structure in certain problems to parallelize special cases of fold. In general, it’s impossible.

The particular fold we discuss here is technically “fold left.” The term “left” refers to the side of the input sequence from which we take elements. Folding “right” takes elements from the end instead of the beginning.

By the way, in LISP, fold is called reduce. When you pair mapping and folding together, you get a form of computation called map-reduce. This is where the term came from that inspired Google’s MapReduce framework, which is a platform for fault-tolerant, massively parallel computation.

Other important features

F# has essentially every feature that a modern language like Java has, so there are too many features to discuss in this reading. However, there are a few that are worth a mention.

Raising Exceptions

You can define an exception in F# like:

exception MyError of string

And you can throw it like:

raise (MyError("Error message"))

F# also has a “lightweight” syntax that I use frequently:

failwith "something bad happened!"

Runtime exceptions are more useful in F# than they are in ordinary languages. Because F# is functional and strongly-typed, sometimes the type checker gets in the way. One very useful trick that you can use as your “stub out” a method is to use failwith to make the type checker temporarily go away.

let rec prettyprint e =
  match e with
  | Variable(c) -> c.ToString()
  | Abstraction(v,e') -> failwith "TODO1"
  | Application(e', e'') ->  failwith "TODO2"

I use this trick frequently, although you should be aware that doing so trades a compile-time error for a runtime one!

Catching Exceptions

You can catch exceptionsing using F#’s try ... with syntax.

try 
  prettyprint (Abstraction('x', Variable('x')))
with
| MyError(msg) ->  "oops: " + msg

Which returns the string value “oops: TODO1”.

Option types

Like Java and C#, you can store null in values.

let x = null

However, the use of null is now widely regarded as a mistake in computer science. Tony Hoare (winner of the Turing Award), and inventor of null, called it a “billion-dollar mistake”:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

Instead, in F#, we prefer the type-safe option type. Let’s write a function that uses option.

let onedivx x =
  if x = 0.0 then
    None
  else
    Some (1.0 / x) 

Now when we use onedivx, we get back an option type:

> onedivx 0.0;;
val it : float option = None

> onedivx 1.1;;
val it : float option = Some 0.9090909091

option gives us a way to signal the failure of a computation in a type-safe manner. In fact, the type-checker forces us to deal with the error:

> (onedivx 0.0) + 2.2;;            

  (onedivx 0.0) + 2.2;;
  ----------------^^^

error FS0001: The type 'float' does not match the type 'float option'

To use the return value, we must “unwrap” it first, using pattern matching.

> match onedivx 0.0 with
- | Some res -> printfn "%f" res     
- | None     -> printfn "Oh, dear!"
- ;;
Oh, dear!
val it : unit = ()

Forward pipe

Forward pipe, |>, is my single favorite feature of F#. It allows you to build sophisticated data-processing pipelines easily.

|> passes the result of the left side to the function on the right side. For example, instead of writing:

List.map (fun x -> x + 1) [1;2;3;4]

you can write:

[1;2;3;4] |> List.map (fun x -> x + 1)

I find the latter easier to read. But the benefit really becomes apparent when you need to do multiple operations.

> [1;2;3;4]
- |> List.map (fun x -> x + 1)
- |> List.zip [5;6;7;8]
- |> List.filter (fun (_,y) -> y > 3)
- |> List.fold (fun acc (x,y) -> acc + x * y) 0
- ;;
val it : int = 68

Cool, huh?