The goal of this post is to show in Elm how to go from a case of having nested pattern matching and refining it into a point free style using the idioms of the language and common libraries.
In order to illustrate the unnecessary complexity that can happen when dealing with nested pattern matching, let us pretend we are presented the following – slightly contrived – problem:
Given a string input \(s\) corresponding to an integer \(n\), determine whether the integer \(n\) and \(42\) are coprime.
One way to determine whether two integers are coprime is to check whether their greatest common divisor (GCD) is \(1\). In order to implement this as a function in Elm we would have to cobble together the following functions:
Such that we first parse the string into an integer, then compute its GCD^{1} with 42, and finally check whether the result is equal to 1.
Implementing our isCoprimeWith42
function in a straight-forward manner where
we call each function in sequence and pattern match on its result, gives us the
following definition:
Aside from taking up several lines of code, the above implementation also
includes some redundancies in the form of the two Nothing -> False
cases, and
we are forced to come up with new variable names, n
and m
, for each time we
case on a Maybe
.
To simplify our function above, we introduce the following set of helpers from the standard library:
Now, we want to use the pipe operator |>
to elegantly tie everything together
by transforming the function into a linear sequence of function calls, where the
output of each function is piped into the next function as input:
Looking at String.toInt
, which takes a String
and produces Maybe Int
,
and how we may combine it with gcd
, which takes an Int
and also produces a
Maybe Int
, we turn to Maybe.andThen
. With this function, we can connect the
two by wrapping the call to gcd
in a lambda
expression and partially
applying it to
Maybe.andThen
. This produces a new function that takes a Maybe Int
, which it
gets from String.toInt
, and produces a Maybe Int
:
With the Maybe Int
that we got from the snippet above, we can now use
Maybe.map
to perform the final computation on our number Maybe.map (\n -> 1
== n)
determining if the number is indeed coprime with \(42\) giving us a
Maybe Bool
. Unfortunately, our original problem didn’t say anything about
returning a Maybe
so we have to use the Maybe.withDefault
function to return
a default value, False
, for the cases where s
could not be parsed into an
Int
or the gcd
function could not produce a result:
Piecing all of the above together, we get our new definition:
While this implementation is perfectly fine and much more declaratively written compared to our original definition, we will use the final section of this post to illustrate how this can be made even more precise by slightly shifting our way of thinking about functions.
An alternative to using the pipe operator |>
, where we take our argument s
and first pass it to String.toInt
and then pass the result to
Maybe.andThen
, is to use the function
composition operator >>
which has the type:
This works similar to the function composition we know from math, where \(h = g \circ f\) defines a new function \(h\) that is equivalent to \(h(x) = g(f(x))\). In Elm, this means that the function:
is equivalent to
Notice how the function argument x
is absent in the second definition, this is
because the argument and return types of the composition operator >>
are all
functions, meaning we are operating on the functions themselves rather than the
arguments of the functions, in contrast to the return type of the pipe operator
|>
which is a plain value:
This style of programming where the argument names are kept implicit is called
point-free style and comes
with a slightly different perspective where focus is on function composition in
contrast to function application. Since these are all equivalent, we could even
have defined h
in a third way as:
Where we explicitly name the argument to be given to our newly composed function and apply it.
With this idea of function composition in mind, looking at our code it would be
tempting to define a function that composes Maybe.map
and Maybe.withDefault
such that we get a new function that applies a function f
if the given Maybe
is a Just
and if it is None
returns a default value. Fortunately, we do not
need to reinvent the functional wheel as the elm-community
package
maybe-extra
has implemented this and similar helper functions:
Thus, we can transform the following lines from our previous definition:
into
and taking some inspiration from our previous observations on partial
application and point-free style programming, we can even rewrite our lambda
expression (\n -> 1 == n)
as ((==) 1)
by partially applying 1
to the
equality function (==)
function like so:
Combining the lessons above we get the following implementation:
which is semantically equivalent to the isCoprimeWith42
function from the
previous section that used pipes and named arguments.
Finally, we can do some parameterization of our function and generalize the
\(42\) to any integer \(n\) and create our isCoprimeWith42
by partially
applying 42
to isCoprimeWith
:
As seen from the final definition, the ideas introduced in this post can be used interchangeably when implementing any kind of function in Elm and so the most important quality to enforce with these different techniques should be readability rather than brevity.
In this blog post, we have shown how to transform a function that uses nested pattern matching and gradually refine the function to become increasingly declarative in its definition while explaining the underlying principles, resulting in a function utilizing a point-free style where appropriate along with existing functions from the standard and community libraries.
For illustrative purposes we have defined the GCD function such that it
returns a Maybe Int
to account for the case gcd(0, 0)
whose value some
might consider to be undefined. ↩
The goal of this blog post is to present a set of practices that significantly improves the reliability and reproducibility of software builds. The common theme of these practices is to introduce proper versioning of all dependencies, from packages to tools and compilers.
The post is structured as follows. In Section 2 we present the problem of builds becoming unreliable over time through some common scenarios. In Section 3 we discuss how to lock the version of your package dependencies, then in Section 4 we discuss how to make your build scripts self-contained, and finally we look at how to make the compiler and runtime explicit in Section 5. We conclude the post in Section 6.
As a software developer, there is a certain class of bugs which tends to slowly sneak their way into your code base when working on software that has one or more of these characteristics:
While most bugs - both syntactic and semantic ones - are usually caused by the code written by you or your colleagues, this other class of bugs cannot immediately be caught by your compiler or another test case. Specifically, the class of bugs we are talking about are the ones that occur when a code base is compiled, tested, or built on different machines or in different points in time, where either:
have slowly drifted from the versions originally used to build and test the project. A bug of this kind usually manifests itself as either:
To illustrate this problem, we use as a – slightly exaggerated – example from the real world where we actually managed to experience all three types of bugs at different points in time in the same codebase. The concrete project was a web app that was:
The cause and subsequent effect of each of the three types of bugs played out as follows:
^1.2
, rather than a
specific version number, 1.2.3
, or at least a bounded minor version
number, ~1.2
, in the bower.json
file. The effect was that over time these
dependencies started to cause bugs on runtime, such as the classic undefined
is not a function
, or failing tests as these AngularJS dependencies started
to introduce (unintended) breaking changes in their newer minor versions,
which would then be installed the next time someone – like our build server
– would build the project from scratch.grunt
CLI, grunt-cli
, introduced
breaking changes in its newest version, as it was still running version
0.x.y
, and it subsequently turned out that each developer had installed
their own version of the grunt CLI locally on their machine with npm -g
install grunt-cli
, rather than use the one specified in the project’s
package.json
. Given that we used grunt
for building, testing, and
packaging our code, and a few other things, it was no surprise that the
immediate effect was that our grunt tasks started breaking left and right
making us unable to build our AngularJS web app.node.js
introduced a breaking change
in their runtime API on which one of our dependencies in package.json
depended on. The effect was that when we tried to run our built script, after
having updated the node.js
runtime, the script printed out the not so
helpful error message function not defined in nodejs version 8.0
,^{1} with
no indication as to which function it was trying to invoke.In each of the three cases above, the inevitable solution was to spend a non-trivial amount of time doing detective work trying to figure out which package-, tool-, and runtime version, respectively, that could successfully build and test the project.^{2}
The causes of the three bugs above can be boiled down to:
These problem are all symptoms of the “Works on my machine” life style and can have a frustrating impact on your colleagues, your build server, and your future self as it steals a non-trivial amount of time when it appears at some random point in the future – usually close to a deadline – where you may not have that time to spend tracking down the cause.
In the next three sections we look at concrete practices that can help avoid this class of bugs from happening while also discussing how different languages handle this issue.
As mentioned in the previous section, the first – and most obvious – cause of
dependency related bugs is “relying on a loosely defined version of a
package”. In the AngularJS case, the problem manifested by specifying our
dependencies in the bower.json
file with just a bounded minor version:
which would then break the application when the angular
version 1.5
was
released with breaking changes. Fixing the issue is trivial by simply specifying
a concrete version, 1.4.0
, or a bounded minor version,^{3} ~1.4
, of each
dependency:
Unfortunately, this does not protect us against the case where one of our dependencies have defined one of their own dependency versions too loosely.
To illustrate the problem of locking down nested dependencies, we shift our
focus from the bower
package manager and over to the package.json
file and
the npm
package manager. Going back to the AngularJS project, we now specify
our dependencies with a concrete version number like so:
Unfortunately, it turns out that the jasmine-reporters
dependency also uses
^
when specifying some of its dependencies:
Fortunately, to help fix the problem of locking down the versions of nested
dependencies, newer versions of npm
have introduced the concept of a lock
file, package-lock.json
, which locks down the versions of all dependencies,
both direct and nested by specifying them in the package-lock.json
file, when
running npm install
:
This also makes it less risky to use the ~
notation when specifying
dependencies:
as the package-lock.json
file will contain information about the exact
versions actually installed, as seen above.
Not only does specifying exact versions and using lock files make your builds
more reliable, it also makes it easier to update versions as you can see the
whole diff of changed (nested) versions in the package-lock.json
file.
Likewise, it also makes your builds more secure as you can pinpoint which
exactly versions you are running on your server(s) in the rare case where an npm
package becomes compromised.
Lock files are not just a JavaScript-specific concept, but can be found in many
other languages, such as Elixir, where dependencies are specified in a mix.exs
file:
where a dependency can either be specific or bounded, ~>
, and is then resolved
to specific version in the accompanying mix.lock
lock file:
Finally, some languages take it a step further, like Elm, where the package
file, elm.json
, requires exact versions of all dependencies:
and the package manager enforces semantic versioning of packages, such that:
As demonstrated in this section we can achieve a higher level of confidence in our builds by diligently specifying the exact version of all of our package dependencies, thus ensuring that we will not be caught off guard by breaking changes in a dependency. In the next section we examine how to make our builds more reliable by making build and test scripts self-contained.
Having learned how to properly version all of our dependencies, the next step is to make sure that our test and build scripts are not “relying on a globally installed version of a tool” but on the properly versioned tools that we have specified, i.e. making sure that everything needed to build the source code and run the tests on another machine is properly specified.
If we return to our AngularJS example, we originally had the following set of
scripts defined in our package.json
file:
which exhibit the second cause of dependency related bugs, as we are referring
to the globally installed versions of both bower
and grunt
– and
technically also npm
, but we address that in the Section
5 – which resulted in tests
failing and the build pipeline breaking.
In order to fix the issue, we do the following:
bower
and grunt
to our list of dependencies in the
package.json
file,node_modules
folder of the project:thus ensuring that whenever npm test
is executed – be it on our own
development laptop or the company build server – the same version of bower
and grunt
is used.
While this issue of having to use 3rd party tools for building and testing is
especially prevalent for JavaScript projects, it can also occur for
non-JavaScript projects like Elm, where a project still might rely on tools
installed via npm
, like elm-test
, in which case it is also worth considering
to add proper versioning of any tool dependencies and make the test scripts
self-contained, e.g.
As demonstrated in this section, we can achieve an even greater level of confidence in our builds by combining the lesson from the previous section, of versioning our dependencies, with the practice of locally installing all needed 3rd party tools in our current project folder, thus making our build and test scripts self-contained. In the next section we take the final step and look at how to lock down the the compiler and runtime environment to further improve the reliability and reproducible of our builds.
The final step on our road towards more reliable and reproducible builds is to avoid “relying on a globally installed version of a compiler or runtime environment” by making these explicit in each of our projects.
As mentioned in Section
2, one of the problems
we faced in the AngularJS project, was that our build script would all of a
sudden print the error message function not defined in nodejs version 8.0
and
exit without further explanation. While the error message did indicate that the
error had been introduced by changing the version of the nodejs
runtime, it
did not indicate which function it was trying to invoke that was now gone.
Fortunately, this issue can also be fixed by introducing proper versioning to
our project. Specifically, we introduced the
asdf tool, which is a “CLI tool that can
manage multiple language runtime versions on a per-project basis” similar to
what nvm
does for nodejs
, gvm
does for go
, and rbenv
does for ruby
.
Thus, using asdf
we could make sure that all of our different projects across
all different machines, both development and CI, would be using the exact same
versions of compilers and runtime environments when running our build and test
scripts.
Without going into the practical details of how to setup asdf
, the general
idea of asdf
is to add a .tool-versions
file to your project that contains
the version number(s) of the runtime(s) and compiler(s) needed to start the
project. This is then enforced by installing a
shim in the user’s favorite
shell that picks the proper runtime
or compiler based on the content of the .tool-versions
file, whenever the user
makes a call to such a runtime or compiler. In the specific case of our
AngularJS project, the .tool-versions
file simply contains the needed nodejs
version:
which is then installed onto the user’s machine when running asdf install
before running npm install
or similar.
Besides the AngularJS project, we have also used this .tool-verions
technique
to specify the versions of erlang
and elixir
in some of our microservices:
and nodejs
and elm
for some of our newer frontends:
as asdf
supports a simple plugin system
that makes it easy to support new languages and cmd line tools using the
.tool-versions
file.
In this section, we have shown how to properly version our compiler and runtime
dependencies using a .tool-versions
file and making these versions enforceable
by using asdf
. In the next section, we conclude this post.
In this blog post we presented a set of practices for significantly improving the reliability and reproducibility of software builds. These practices focused on:
.tool-versions
file and enforceable by asdf
.A final note: making builds truly reliable is not a trivial task and therefore the above principles are not an exhaustive list of what can be done to achieve this goal. Most notably, there is also the large topic of running scripts and servers in containers such as docker, which we have not covered.
The specific wording of the error message escapes me but it was about as helpful as the example message above. ↩
Note that the time spend doing detective work can dramatically increase if more than one of the three types of bugs occur simultaneously. ↩
This presumes that your dependencies do not introduce breaking changes in their patch versions. ↩
“We are our choices.”
– Jean-Paul Sartre
The goal of this blog post is to define the concept of sum types and compare the implementation of sum types in three different functional programming languages: Kotlin, Elixir, and Elm.
The post is structured as follows. In Section 2, we define the concept of sum types. Then, in Sections 3, 4, and 5 we look at concrete implementations of sum types in Kotlin, Elixir, and Elm, respectively. The post is concluded in Section 6.
In this section, we define the concept of sum types.
In our
post on enum types,
we defined an enum type as a “data type consisting of a set of named
values which we call the members of the type”, e.g. we defined shape
as:
In the case of a sum type (sometimes called a tagged union), we may look at it as a generalization of the enum type, where each member of a sum type may take its own set of arguments. Conversely, we may also look at enum types as the subset of sum types for which each member is a unit type, i.e. each member’s type constructor takes zero arguments.
In our
post on product types
we implemented three different types of shapes: rectangle, circle, and
triangle. Now, with the above definition of sum types in mind, we want to define
a shape
type that can be either a Rectangle
, a Circle
, or a Triangle
. In
our ML-like syntax, we could express our shape
type and its members as:
where we declare a datatype
with the name shape
and three type constructors,
Rectangle
, Circle
, and Triangle
, where the first type constructor,
Rectangle
, takes two float
values, the height
and width
, the second type
constructor, Circle
, takes a single float
value, the radius
, and the third
type constructor, Triangle
, takes two float
values, the base
and height
.
This scenario reminds us of the issue with tuple types we discussed in the
previous post, i.e. it is not clear which of the arguments has which semantic
meaning. Fortunately, we can substitute the float * float
arguments to the
Rectangle
type constructor, with the rectangle
product type we defined in
the previous post:
and likewise for the Circle
type constructor:
and Triangle
type constructor:
Combining all this, we get the following definition of the shape
sum type:
where the semantic meaning of the arguments to each of the type constructors is now more obvious. Note that being able to create these kinds of types that are compositions of both sum- and product types are what we usually refer to as Algebraic data types.
As in the case of enum types, we can pattern match on sum types. This provides
the opportunity for us to merge the three different area functions of the
previous post into one area
function that takes an instance of the shape
sum
type as argument, pattern matches on its type constructor and calculates the
area of that type of shape. We express this area
function in our ML-like
syntax as:
We see that the area
function pattern matches on the type constructor of the
shape
argument, i.e. Rectangle
, Circle
and Triangle
, and in doing so
also unwraps the arguments of each type constructor and binds these to suitable
variable names, thereby allowing us to calculate the area of the shape in the
matched clause. As in the previous post, we can also destructure the arguments
and directly access their fields in each of the clauses:
thus removing the need to qualify the use of the different field values in the body expression of each of the clauses.
In the following three sections, we look at how to express the above shape
sum
type, along with the area
example function, in each of our three programming
languages.
In this section, we implement the shape
sum type and area
function in
Kotlin.
If we look at the definition of the enum type we defined in the previous post:
we might expect that we could simply add the needed set of arguments to each of
the defined members in order to obtain the desired sum type. Unfortunately,
while an enum class
is actually able to take a set of arguments, these are
declared for the whole class and not for the individual member, which is too
constrained to fit with our definition above of sum types. Luckily, Kotlin has
introduced the concept of
a sealed class,
which allows us to define a “restricted class hierarchies, when a value can
have one of the types from a limited set, but cannot have any other type” which
sounds a lot like our definition of a sum type. Thus, in order to define our
custom sum type we declare our new type as sealed class Shape
followed by a
class declaration for each of the members of the sum type, Rectangle
,
Circle
, and Triangle
, each of which then has to be declared as a subclass of
Shape
:
Note that in contrast to our ML-like example, we do not explicitly list each of
the members of our Shape
sum type when declaring it, but instead do it
implicitly as we define each of the actual member types and declare a member
type to be a subclass of Shape
.
Just as we could pattern match on instances of an enum class
using a when
(<var>) {...}
expression, so is it the case for instances of a sealed
class
. Thus, we define our area
function in Kotlin as:
A few details worth noting:
is
keyword in each of the matching clauses as we are matching on
a subclass type and not a specific value, andshape
variable into its correct member type,
e.g. we do not have to cast shape
as a Rectangle
in order to access
shape.height
once we are inside the body expression of the is Rectangle
clause.Finally, we can run the above code by implementing the main
function,
instanting a variable of type Shape
and passing it to the area
function:
Having implemented our shape
sum type and area
function in Kotlin, we move
on to repeat the exercise in Elixir.
In this section, we implement the shape
sum type and area
function in
Elixir.
As in the case of the shape
enum type, we create a module named Shape
and
use the @type
directive to define a type named t
, which is either a
Rectangle.t
, Circle.t
or Triangle.t
type:
where Rectangle.t
, Circle.t
and Triangle.t
correspond to the product types
we defined in our previoust post:
Having defined our Shape.t
type and its members, Rectangle.t
, Circle.t
and
Triangle.t
, we can now define our area
function which takes an argument of
type Shape.t
and calculates the area of the shape by pattern matching on the
concrete member of the shape
sum type:
While the case <var> do ...
expression is similar to the one we used for
enum types, we do note that - as in the Kotlin case - we automatically
unwrap the arguments of the matching member/type constructor and bind these to
suitable variable names.
Finally, we can test the above code by instantiating a value of type Shape.t
and pass it to the area
function:
Having implemented our shape
sum type and area
function in both Kotlin and
Elixir, we move on to our final language example, Elm.
In this section, we implement the shape
sum type and area
function in Elm.
In the case of Elm, we once again return to the ML-like syntax we saw at the
beginning of this post, where we define our sum type, Shape
, using the type
keyword followed by listing each of the members of the type, Rectangle
,
Circle
, and Triangle
:
Here, we simply inline the definition of Rectangle
, Circle
, Triangle
from
our previous post into their corresponding clauses in the Shape
sum
type. Alternatively, we would have to change the names of the clauses or
argument types in order to avoid names clashing, e.g.
which in this case is less aesthetic than the former definition.
It is worth appreciating that in order to go from an enum type to a sum type in Elm, all we had to do was add arguments to the members / type constructors of the type. Unsurprisingly, Elm does not make an actual distinction between enum and sum types, but sees the former as a subset of the later, as we also discussed in Section 2.
The similarity to our ML-like syntax also holds in the case of pattern matching
in the area
function:
where the difference are minor. Finally, we can run the above code snippets by
implementing the main
function, where we instantiate a value of type Circle
,
pass it to the area
function and print it as a text
DOM element:
In this blog post, we have defined the concept of sum types, and compared the implementation of sum types in the three different programming languages: Kotlin, Elixir, and Elm.
While all three languages supported the concept of sum types, it is noticeable that Elm required the least introduction of new syntax, as it does not really make a distinction between enum types and sum types, as the former can be expressed in terms of the latter.
]]>“All for one and one for all,
united we stand divided we fall.”
– Alexandre Dumas, The Three Musketeers
The goal of this blog post is to define the concept of product types and compare the implementation of product types in three different functional programming languages: Kotlin, Elixir, and Elm.
The post is structured as follows. In Section 2, we define the concept of product types. Then, in Sections 3, 4, and 5 we look at concrete implementations of product types in Kotlin, Elixir, and Elm, respectively. The post is concluded in Section 6.
In this section, we define the concept of product types.
A product type is a composite data type that compounds two or more types in a fixed order; we call these compounded types the fields of the product type. A common example of a product type is the point type, which compounds two float types, corresponding to an x- and a y-coordinate, into a new type. We can express this point type in our ML-like syntax as:
where we declare point
as a datatype
consisting of two float values, the x-
and y-coordinate, separated by a *
(not be confused with the multiplication
operator). Any instance of the point
type is then
a tuple of two floats, e.g. (3, 2)
. We
can access the fields of such a tuple using pattern matching - sometimes also
called destructuring:
Here, we construct a point
named p
as the tuple (3, 2)
, then assign its
fields to x
and y
, by pattern matching on the structure of the tuple, and
finally add them together. Product types defined in terms of tuples may also be
called tuple types.
Unfortunately, defining products as tuples has the downside that it is not clear
from the actual definition of a type what is the semantic meaning behind each of
its fields, e.g. without the comment in the definition of the point
type
above, it is not clear which float
corresponds to x
and which one
corresponds to y
. However, we can improve the situation by requiring that each
field of a product type has to be assigned a name, which gives us the following
new definition of the point
type:
Any instance of the point
type is now a tuple with named fields, e.g. (x = 3,
y = 2)
. Product types defined in terms of named fields are also called record
types or structs.
Accessing the named fields of a product type, without having to pattern match on
its whole structure, is straightforward, as seen in the following example, where
we compute
the Euclidean distance
between two points, p
and q
:
Here, we access an individual field using the common <var>.<field>
expression.
In each of the following sections, we return to the shape
example from
our
previous post,
and implement product type versions of each of the different shapes:
rectangle
, circle
, and triangle
. Specifically, we define each of the
shapes in terms of their corresponding mathematical definition, i.e. a rectangle
has a height and width, a circle has a radius, and a triangle has a base
and a height. In our ML-like syntax, we express this as follows:
For our example function, we want to compute the area of each of these three different shapes, so we have to implement corresponding area functions:
Note that in our reference implementation above, we use destructuring on each of the different types directly in the header of their corresponding area function, in order to make the definitions more concise. Alternatively, we could have chosen to access the fields of the product types without destructuring, e.g.
In the next section, we implement the rectangle
, circle
, and triangle
product types along with their corresponding area example functions in Kotlin.
In this section, we implement the rectangle
, circle
, and triangle
product
types along with their area functions in Kotlin.
As discussed in the previous post, Kotlin is heavily influenced by Java which means that all non-primitive data types are defined in terms of classes, and product types are no exception. Likewise, we also discussed that we prefer to separate data and logic, and thus would like to avoid defining our product types as plain old classes, e.g.
Instead, we would like to signal to the Kotlin compiler - and other developers -
that we are defining product types, which should not do much beyond store some
data. Fortunately, Kotlin introduces the concept
of data class, which
does exactly this while also automatically deriving reasonable implementations
of equals
, toString
, and copy
. Defining our product types, Rectangle
,
Circle
, and Triangle
, as data classes is now straightforward, as we just
need to add the data
keyword before the class
keyword:
Note also the conciseness Kotlin brings when specifying a class, Rectangle
,
and its fields, height
and width
, compared to a traditional Java class.
Implementing our three area functions is also rather straightforward, as each function takes an argument of their expected shape type and returns the calculated area of that type:
If we wanted to pattern match on the fields of each of the types, as demonstrated in the previous section, we could instead use Kotlin’s destructuring declarations to do just that:
However, in the case of our area functions, it would not do much in terms of making the code more elegant.
Finally, in order to test our code, we implement the main
function which
instantiates a variable of type Rectangle
and prints the result of calling
rectangle_area
on it:
Having implemented our product types, rectangle
, circle
, and triangle
,
along with their area functions in Kotlin, we move on to repeat the exercise in
Elixir.
In this section, we implement the rectangle
, circle
, and triangle
product
types along with their area functions in Elixir.
In order to define our different shape types in Elixir, we take a slightly different approach than in the case of the enum type, by encapsulating each of our types in a module named after the corresponding type:
Breaking down the above definition, we first look at the @type
declaration of
t
, where __MODULE__
refers to the name of the enclosing module, Rectangle
,
and the %<name>{<property_name>: <property_type>, ...}
construct declares a
struct
type called <name>
and with a set of <property_name>: <property_type>
pairs. While the @type
directive declares the Rectangle.t
type, the
defstruct
keyword defines the actual data structure of a Rectangle
, by
taking a list of [<property_name>: <default_value>]
as its arguments,
corresponding to the properties declared in our type declaration. In this case,
we define the type Rectangle
to have two properties, height
and width
,
both of type float
and both with default value 0.0
.
We define Circle
and Triangle
in a similar manner:
We can now refer to the three product types as Rectangle.t
, Circle.t
, and
Triangle.t
respectively, allowing us to define our three area functions, which
given an argument of the corresponding shape type, returns the computed area of
that shape:
Note, that Elixir allows us to pattern match not just on the type but also directly on its fields at the same time, making them readily available in the body of the function declaration.
We test the code by instantiating a value of type Rectangle.t
and pass it to
its area function:
While the Kotlin and Elixir implementations are quite similar in many ways, it is noteworthy that the concept of pattern matching on the structure of types is a more natural feature of the Elixir language compared to Kotlin.
Having implemented our rectangle
, circle
, and triangle
product types in
Kotlin, we move on to our final language example, Elm.
In this section, we implement the rectangle
, circle
, and triangle
product
types along with their area functions in Elm.
In order to implement our product types, rectangle
, circle
, and triangle
,
in Elm, we can use a syntax similar to what we saw in Section 2. We specify a
product type using the type alias
keywords followed by listing each of the
fields of the type, e.g. height
and width
, separated by ,
and encapsulated
by {...}
:
As in the Elixir case, we can pattern match (or destructure) our product type arguments directly in the header of our function declarations:
thus making our code more concise. Besides a few syntactic differences, there is not much difference between the ML-like reference example and our actual Elm implementation.
Once again, we implement the main
function, in which we instantiate a value of
type Rectangle
, pass it to the rectangleArea
function, and print it as a
text DOM element:
Having implemented our rectangle
, circle
, and triangle
product types in
Elm, we are ready to conclude this post in the next section.
In this blog post, we have defined the concept of product types, and compared the implementation of product types in the three different programming languages: Kotlin, Elixir, and Elm.
While all three languages support product types on a language level, we note that pattern matching on the structure of types in general is a fundamental part of programming in Elixir, and thus it shines a bit brighter here than the other languages.
]]>The goal of this blog post is to define the concept of enum types and compare the implementation of enum types in three different functional programming languages: Kotlin, Elixir, and Elm.
The post is structured as follows. In Section 2, we define the concept of enum types. Then, in Sections 3, 4, and 5 we look at concrete implementations of enum types in Kotlin, Elixir, and Elm, respectively. The post is concluded in Section 6.
In this section, we define the concept of enum types.
An enum - short for enumerated - type is a data type consisting of a set of named values which we call the members of the type. One of the most common enum types in modern programming languages is the boolean type, which has exactly two members: true and false. We can express this boolean type in an ML-like syntax as:
where we declare a datatype
with the name boolean
that has
two type constructors, True
and False
(separated by a |
), corresponding to the two members of the
enum type. Consequently, any instance of the type boolean
can only have
the value of either True
or False
, which allows us to do
exhaustive pattern matching
like so:
where we define a function, foo
, that takes an argument, bool
, of type boolean
and cases on its members/type constructors, True
and False
, using a case
<var> of ...
expression.
For practical reasons, a boolean type is usually included by default in most
modern programming languages, so in the following three sections we instead look
at how to express a shape
enum type, with members Rectangle
, Circle
,
and Triangle
:
in each of our three programming languages of choice. Furthermore, in order to
see how each language handles pattern matching, we also implement an example
function, edges
, which takes an argument of type shape
and returns the
number of edges of the given shape:
In the next section, we implement shape
and edges
in Kotlin.
In this section, we implement the shape
enum type and edges
example function
in Kotlin.
Given Java’s strong
influence on Kotlin, it is no surprise that Kotlin has inherited
Java’s
class-oriented paradigm,
where all non-primitive data types are defined in terms of classes. Furthermore,
Kotlin has also inherited the enum
keyword from Java, which - as the name
suggests - is used for defining enum types (or classes). Thus, in order to
define our custom enum type we declare our new type as enum class Shape
followed by listing each of the members of the enum type, Rectangle
, Circle
,
and Triangle
:
each separated by a ,
and terminated with a ;
.
Kotlin also allows us to do pattern matching on enums, as demonstrated below
where we define the edges
function, which takes an argument of type Shape
and returns its number of edges:
Instead of a case <var> of ...
expression, Kotlin uses a when (var) {...}
expression for pattern matching and as in our reference example in the previous
section, the body of the expression includes a clause for each of the members of
the enum type (class).
Before we move on, there are a few things worth noting about the above code snippets:
Shape
,
and logic, edges
, such that we do not have to introduce the concept of edges
into our enum class definition as a method, but instead we can define a
separate function, edges
, somewhere else in the source code, resulting in
lower
coupling,else
clause in the body of the when
expression, as the pattern matching of shape
in the when
expression is
exhaustive, and lastly,edges
takes an argument of type Shape
rather than Shape?
, the
type system enforces the constraint that edges
cannot be called with a
null
reference, which helps make Kotlin code easier to reason about than
traditional Java code, as it reduces the number of needed null checks.Finally, in order to test the above code, we write a main
function which
instantiates a variable of type Shape
and prints the result of calling edges
on it:
Having implemented our shape
enum type and edges
example function in Kotlin,
while demonstrating how to pattern match on its members, we move on to repeat
the exercise in Elixir.
In this section, we implement the shape
enum type and edges
example function
in Elixir.
Unlike Kotlin, Elixir does not have a dedicated keyword or construct for
defining an enum type as part of the language, so instead we have to use the
@type
directive to declare our own enum types. The @type
directive allows us
to combine existing types, and instances of types, into new custom types. These
custom types can then be enforced by a static analysis tool
like dialyxir, which is used for type
checking Elixir source code.
In order to define our shape
enum in Elixir, we create a module named Shape
and declare a custom @type
named t
inside of it, where the members of t
are the atoms :rectangle
,
:circle
, and :triangle
:
Here, ::
separates the name of the type, on the left, from its definition, on
the right, while |
separates each of the members of the type, and finally :
is used for constructing each of the atoms.
Having defined the above module, we can then refer to the shape
enum type as
Shape.t
, in the same way as we would refer to the String.t
type.
Similar to the Kotlin case, we define an example function, edges
, which given
an argument of type Shape.t
, returns the numbers of edges of the matched
Shape.t
member, via pattern matching:
Here, the case expression used in Elixir, case <var> do ...
, is very
similar to the case expression used in Section 2, case
<var> of ...
, and likewise for the actual clauses for each of the members.
Once again, we notice a few things about the code snippets above:
Shape
enum type, this is always the
case in Elixir, as it does not include constructs for combining data and logic
as classes, and@type
directive to define our custom type, we use the
@spec
directive to state that the edges
function takes as input a value
of type Shape.t
and returns a value of type integer
.Again, we can test the above code by instantiating a value of type Shape.t
and
pass it to edges
:
While the function call and definition above looks very similar to the Kotlin
version, there is one distinctive difference: because of Elixir’s dynamic type
checking, we cannot fully guarantee that edges
is never given an argument
that is not of type Shape
on runtime, which may result in a runtime error if
we don’t include an else
-clause in the case
expression. However, by
specifying proper type signatures of our functions combined with Elixir’s
excellent type inference engine and tools like dialyxir, all of which we have
discussed above, we can do much to reduce this risk without scattering
else
-clauses in our code.
Finally, we note that we could also have written edges
in a slightly more
Elixir idiomatic way:
where we inline the pattern matching in the function declaration. In this
particular case, we chose the former style with the case
expression as it
closer resembles the other example snippets.^{1}
Having implemented our shape
enum type and edges
example function in both
Kotlin and Elixir, we move on to our final language example, Elm.
In this section, we implement the shape
enum type and edges
example function
in Elm.
In the case of Elm, we return to an ML-like syntax similar to what we saw at the
beginning of this post, where we define our type, Shape
, using the type
keyword followed by listing each of the members of the type, Rectangle
, Circle
, and
Triangle
, separated by |
:
As in the Kotlin case, we can do exhaustive pattern matching without any
else
-clause in our case
expression, as the Shape
type can only be
constructed using the three listed members:
While the above snippets are very similar to the original examples in
Section 2, there is the added function declaration,
edges : Shape -> Int
, which states that edges
takes a value of type
Shape
and returns a value of type Int
.
Note also, that unlike Kotlin and Elixir, we do not even have to think about the
possibility of passing a null
or nil
reference to edges
, as these concepts
are not even part of the Elm language.
Finally, in order to run the above code, we implement the main
function, where
we instantiate a value of type Shape
, pass it to the edges
function, and
print it as a text DOM element:
Having implemented our shape
enum type and edges
example function in our
third and final language, Elm, we conclude this post in the next section.
In this blog post, we have defined the concept of enum types, and compared the implementation of enum types in the three different programming languages: Kotlin, Elixir, and Elm.
Across the three implementations, we notice that Elixir is the only language which does not have a dedicated keyword or construct for defining enum types while Elm has the highest level of type safety by default, as it does not include the concept of a null reference.
]]>“One chord is fine.
Two chords are pushing it.
Three chords and you’re into jazz.”
– Lou Reed
The goal of this blog post is to state Moessner’s idealized theorem, Long’s idealized theorem, and conjecture a further generalization.
The chapter is structured as follows. In Section 2 we start by adapting Moessner’s theorem to the dual sieve in order to obtain Moessner’s idealized theorem. Then, in Section 3, we repeat the process for Long’s theorem, which motivates the statement of Long’s idealized theorem along with Long’s weak theorem. Finally, we conjecture a further generalization of Long’s idealized theorem in Section 4. The post is concluded in Section 5.
In order to obtain Moessner’s idealized theorem, we start from Moessner’s original theorem, generalize it and adapt it to the dual sieve.
Theorem 1 (Moessner’s theorem). Given an initial sequence of positive natural numbers,
\[\begin{equation*} 1, 2, 3, \dots, \end{equation*}\]and a natural number \(k \ge 2\), we obtain the result sequence of successive powers,
\[\begin{equation*} 1^k, 2^k, 3^k, \dots, \end{equation*}\]when performing the following procedure:
The above procedure is repeated if \(k>1\) and stops if \(k=1\). We call the procedure above Moessner’s sieve and call \(k\) the rank of the sieve.
As we saw in the first post on Moessner’s theorem we can generalize the initial sequence and still obtain the same result sequence,
Theorem 2 (Moessner’s generalized theorem). Given the initial sequence of \(1\) followed by \(0\)s,
\[\begin{equation*} 1, 0, 0, \dots, \end{equation*}\]and a natural number \(k + 2\), we obtain the result sequence of successive powers,
\[\begin{equation*} 1^k, 2^k, 3^k, \dots, \end{equation*}\]when applying Moessner’s sieve on the initial sequence.
So, if we let the rank \(k = 4\), an example application of Theorem 2 becomes,
\[\begin{equation*} \begin{array}{*{19}{r}} 1 & 0 & 0 & 0 & 0 & \textbf{0} & 0 & 0 & 0 & 0 & 0 & \textbf{0} & 0 & 0 & 0 & 0 & 0 & \textbf{0} & \dots \\ 1 & 1 & 1 & 1 & \textbf{1} & & 1 & 1 & 1 & 1 & \textbf{1} & & 1 & 1 & 1 & 1 & \textbf{1} & & \dots \\ 1 & 2 & 3 & \textbf{4} & & & 5 & 6 & 7 & \textbf{8} & & & 9 & 10 & 11 & \textbf{12} & & & \dots \\ 1 & 3 & \textbf{6} & & & & 11 & 17 & \textbf{24} & & & & 33 & 43 & \textbf{54} & & & & \dots \\ 1 & \textbf{4} & & & & & 15 & \textbf{32} & & & & & 65 & \textbf{108} & & & & & \dots \\ 1 & & & & & & 16 & & & & & & 81 & & & & & & \dots \end{array} \end{equation*}\]where we still obtain a result sequence of values to the fourth power. If we compare the above example to where we left off in the post on the dual of Moessner’s sieve,
\[\begin{equation*} \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 0 & & 1 & 2 & 3 & 4 & & \\ 0 & & 1 & 3 & 6 & & & \\ 0 & & 1 & 4 & & & & \\ 0 & & 1 & & & & & \\ 0 & & & & & & & \end{array} \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 4 & & 5 & 6 & 7 & 8 & & \\ 6 & & 11 & 17 & 24 & & & \\ 4 & & 15 & 32 & & & & \\ 1 & & 16 & & & & & \\ 0 & & & & & & & \end{array} \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 8 & & 9 & 10 & 11 & 12 & & \\ 24 & & 33 & 43 & 54 & & & \\ 32 & & 65 & 108 & & & & \\ 16 & & 81 & & & & & \\ 0 & & & & & & & \end{array} \end{equation*}\]where we introduced the concept of an initial configuration of a sieve along with the concept of seed tuples, we notice that instead of an initial sequence of \(1\) followed by \(0\)s, we have an initial configuration of two seed tuples, where the horizontal seed tuple is filled with \(0\)s and the vertical seed tuple is filled with a \(1\) followed by \(0\)s. Likewise, instead of a result sequence of values to the forth power, we obtain a sequence of Moessner triangles, whose bottom-most elements enumerate the same result sequence of values to the forth power. If we apply these differences to the statement of Theorem 2, we obtain Moessner’s idealized theorem,
Theorem 3 (Moessner’s idealized theorem). Given an initial configuration of two seed tuples of length \(k + 2\),
\[\begin{equation*} (0,0,0,\dots,0) \text{ and } (1,0,0,\dots,0), \end{equation*}\]we obtain the sequence of Moessner triangles of rank \(k\), where the bottom-most elements enumerate the sequence,
\[\begin{equation*} 1^k, 2^k, 3^k, \dots. \end{equation*}\]Having gone from Moessner’s original theorem to Moessner’s idealized theorem, which reflects the structure of the dual sieve, we proceed to apply similar transformations to Long’s theorem.
In order to state Long’s idealized theorem, we follow the same process as established in the previous section, and repeat the traditional definition of Long’s theorem and afterwards adapt it to the dual sieve.
Theorem 4 (Long’s theorem). Given an initial sequence, which can be described as an arithmetic progression,
\[\begin{equation*} c, c + d, c + 2d, c + 3d, \dots, \end{equation*}\]we obtain the result sequence,
\[\begin{equation*} c \cdot 1^{k - 1}, (c + d) \cdot 2^{k - 1}, (c + 2d) \cdot 3^{k - 1}, \dots, \end{equation*}\]when applying Moessner’s sieve of rank \(k\) on the initial sequence.
We can visualize Long’s theorem as the sieve,
\[\begin{equation*} \begin{array}{*{11}{r}} c & c+d & c+2d & c+3d & & c+4d & c+5d & c+6d & c+7d & & \dots \\ c & 2c+d & 3c+3d & & & 4c+7d & 5c+12d & 6c+18d & & & \dots \\ c & 3c+d & & & & 7c+8d & 12c+20d & & & & \dots \\ c & & & & & 8c+8d & & & & & \dots \end{array} \end{equation*}\]for which Long^{1} also noted that we can generalize the sieve by adding a row of \(d\)s,
\[\begin{equation*} \begin{array}{*{11}{r}} d & d & d & d & d & d & d & d & d & d & \\ c & c+d & c+2d & c+3d & & c+4d & c+5d & c+6d & c+7d & & \\ c & 2c+d & 3c+3d & & & 4c+7d & 5c+12d & 6c+18d & & & \\ c & 3c+d & & & & 7c+8d & 12c+20d & & & & \\ c & & & & & 8c+8d & & & & & \end{array} \end{equation*}\]However, this unfortunately gives us an inconsistent initial column, since it contains a \(d\) at the top but \(c\)s in the remaining entries. So, we take the liberty of adjusting the initial configuration of the sieve to better suit our dual sieve, by ridding ourselves of the above inconsistency. Thus, we move the \(c\)s of the initial column into the vertical seed tuple, and at the same time generalize to a single seed value \(c\), while putting the \(d\)s into the horizontal seed tuples, yielding the following sieve,^{2}
\[\begin{equation*} \begin{array}{r r : *{5}{r:} r *{4}{r:} r} & & d & d & d & d & d & & d & d & d & d & d \\ & & & & & & & & & & & & \\ c & & c+d & c+2d & c+3d & c+4d & & & c+5d & c+6d & c+7d & c+8d & \\ 0 & & c+d & 2c+3d & 3c+6d & & & & 4c+11d & 5c+17d & 6c+24d & & \\ 0 & & c+d & 3c+4d & & & & & 7c+15d & 12c+32d & & & \\ 0 & & c+d & & & & & & 8c+16d & & & & \\ 0 & & & & & & & & & & & & \end{array} \end{equation*}\]While moving the \(c\)s has changed the coefficients of the \(d\)s in the sieve, we now have a more consistent initial configuration, which we believe to be in the spirit of Long’s original theorem, with one constant, \(c\), in the vertical seed tuple and the horizontal seed tuples filled with the constant \(d\). As it turns out, we can perform a further generalization of the initial configuration by replacing the sequence of \(d\)s with a \(d\) followed by \(0\)s, while putting it in the vertical seed tuple, as we did with the sequence of \(1\)s when we defined the dual of Moessner’s sieve,
\[\begin{equation*} \begin{array}{r r : *{6}{r:} r *{5}{r:} r} & & 0 & 0 & 0 & 0 & 0 & 0 & & 0 & 0 & 0 & 0 & 0 & 0 \\ & & & & & & & & & & & & & & \\ d & & d & d & d & d & d & & & d & d & d & d & d & \\ c & & c+d & c+2d & c+3d & c+4d & & & & c+5d & c+6d & c+7d & c+8d & & \\ 0 & & c+d & 2c+3d & 3c+6d & & & & & 4c+11d & 5c+17d & 6c+24d & & & \\ 0 & & c+d & 3c+4d & & & & & & 7c+15d & 12c+32d & & & & \\ 0 & & c+d & & & & & & & 8c+16d & & & & & \\ 0 & & & & & & & & & & & & & & \end{array} \end{equation*}\]This results in a minimal initial configuration consisting of a vertical seed tuple containing a constant \(d\) and a constant \(c\) followed by \(0\)s. We can now state Long’s idealized theorem as follows,
Theorem 5 (Long’s idealized theorem). Given an initial configuration of two seed tuples of length \(k + 2\),
\[\begin{equation*} (0,0,0,\dots,0) \text{ and } (d,c,0,\dots,0), \end{equation*}\]we obtain the sequence of Moessner triangles of rank \(k\), where the bottom-most elements enumerate the sequence,
\[\begin{equation}\label{eq:long-result-stream} d \cdot {(1 + t)}^{k} + c \cdot {(1 + t)}^{k-1}, \end{equation}\]for values of \(t \ge 0\), when applying the dual of Moessner’s sieve on the initial configuration.
As a result of the transformations made above, we now notice that the coefficients of the \(c\)s correspond to the values of Moessner triangles at rank \(k\) while the coefficients of the \(d\)s now correspond to the values of Moessner triangles at rank \(k - 1\). This observation suggests that we can view the above sieve as the composition of two sieves, one creating Moessner triangles of rank \(3\) filled with \(c\)s,
\[\begin{equation*} \begin{array}{*{14}{r}} && 0 & 0 & 0 & 0 & 0 & && 0 & 0 & 0 & 0 & 0 \\\\ c && c & c & c & c & & c && c & c & c & c & \\ 0 && c & 2c & 3c & & & 3c && 4c & 5c & 6c & & \\ 0 && c & 3c & & & & 3c && 7c & 12c & & & \\ 0 && c & & & & & c && 8c & & & & \\ 0 && & & & & & 0 && & & & & \end{array} \end{equation*}\]and one creating Moessner triangles of rank \(4\) filled with \(d\)s,
\[\begin{equation*} \begin{array}{*{16}{r}} & & 0 & 0 & 0 & 0 & 0 & 0 & && 0 & 0 & 0 & 0 & 0 & 0 \\\\ d & & d & d & d & d & d & & d && d & d & d & d & d & \\ 0 & & d & 2d & 1d & 4d & & & 4d && 5d & 6d & 7d & 8d & & \\ 0 & & d & 3d & 6d & & & & 6d && 11d & 17d & 24d & & & \\ 0 & & d & 4d & & & & & 4d && 15d & 32d & & & & \\ 0 & & d & & & & & & d && 16d & & & & & \\ 0 & & & & & & & & 0 && & & & & & \end{array} \end{equation*}\]This suggests that there is an additional step between Moessner’s idealized theorem and Long’s idealized theorem, where we generalize the seed value of \(1\) in Moessner’s idealized theorem to a constant, \(c\), and obtain Long’s weak theorem,
Theorem 6 (Long’s weak theorem). Given an initial configuration of two seed tuples of length \(k + 2\),
\[\begin{equation*} (0,0,0,\dots,0) \text{ and } (c,0,0,\dots,0), \end{equation*}\]we obtain the sequence of Moessner triangles of rank \(k\), where the bottom-most elements enumerate the sequence,
\[\begin{equation*} c \cdot {(1 + t)}^{k}, \end{equation*}\]for values of \(t \ge 0\), when applying the dual of Moessner’s sieve on the initial configuration.
Having defined Long’s idealized theorem and Long’s weak theorem, we try to look beyond Long’s theorem in the next section.
Since Long’s idealized theorem describes the result sequence generated by Moessner’s sieve, when starting from a seed tuple of two constants, \(c\) and \(d\),
\[\begin{equation*} \begin{array}{r r : *{6}{r:} r *{5}{r:} r } & & 0 & 0 & 0 & 0 & 0 & 0 & & 0 & 0 & 0 & 0 & 0 & 0 \\ & & & & & & & & & & & & & & \\ d & & d & d & d & d & d & & & d & d & d & d & d & \\ c & & c+d & c+2d & c+3d & c+4d & & & & c+5d & c+6d & c+7d & c+8d & & \\ 0 & & c+d & 2c+3d & 3c+6d & & & & & 4c+11d & 5c+17d & 6c+24d & & & \\ 0 & & c+d & 3c+4d & & & & & & 7c+15d & 12c+32d & & & & \\ 0 & & c+d & & & & & & & 8c+16d & & & & & \\ 0 & & & & & & & & & & & & & & \end{array} \end{equation*}\]we now ask the obvious question of what happens if we start from a seed tuple of \(3\) or even \(n\) values? Looking at the result sequence of the above sieve, we know that it enumerates the values of the binomial, \(c \cdot {(1+t)}^3 + d \cdot {(1+t)}^4\), which gives us the idea to label \(c = a_3\) and \(d = a_4\), and fill the rest of the seed tuple with \(a_i\),
\[\begin{equation*} \begin{array}{r : r *{5}{r:} r} & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ a_4 & & a_4 & a_4 & a_4 & a_4 & a_4 & \\ a_3 & & a_3+a_4 & a_3+2a_4 & a_3+3a_4 & a_3+4a_4 & & \\ a_2 & & a_2+a_3+a_4 & a_2+2a_3+3a_4 & a_2+3a_3+6a_4 & & & \\ a_1 & & a_1+a_2+a_3+a_4 & a_1+2a_2+3a_3+4a_4 & & & & \\ a_0 & & a_0+a_1+a_2+a_3+a_4 & & & & & \\ 0 & & & & & & & \end{array} \end{equation*}\]yielding the above Moessner triangle. Now, if we examine the entries of the hypotenuse in this triangle,
\[\begin{equation*} \begin{array}{*{9}{r}} & & & & & & & & a_4\\ & & & & & & a_3 & + & 4a_4\\ & & & & a_2 & + & 3a_3 & + & 6a_4\\ & & a_1 & + & 2a_2 & + & 3a_3 & + & 4a_4\\ a_0 & + & a_1 & + & a_2 & + & a_3 & + & a_4 \end{array} \end{equation*}\]we notice that we can rearrange them into the following Pascal-like triangle,
\[\begin{equation*} \begin{array}{*{9}{c}} & & & & a_0 & & & & \\ & & & a_1 & & a_1 & & & \\ & & a_2 & & 2a_2 & & a_2 & & \\ & a_3 & & 3a_3 & & 3a_3 & & a_3 & \\ a_4 & & 4a_4 & & 6a_4 & & 4a_4 & & a_4 \end{array} \end{equation*}\]where the sum of the entries yields the following result,
\[\begin{equation*} a_0 + 2a_1 + 4a_2 + 8a_3 + 16a_4, \end{equation*}\]located at the bottom of the first column of the second triangle of the sieve, which we can restate as,
\[\begin{equation*} 2^0 \cdot a_0 + 2^1 \cdot a_1 + 2^2 \cdot a_2 + 2^3 \cdot a_3 + 2^4 \cdot a_4. \end{equation*}\]Likewise, if we calculated the next triangle and the subsequent first column, we would obtain the values,
\[\begin{equation*} a_0 + 3a_1 + 9a_2 + 27a_3 + 81a_4, \end{equation*}\]which we can once again restate as,
\[\begin{equation*} 3^0 \cdot a_0 + 3^1 \cdot a_1 + 3^2 \cdot a_2 + 3^3 \cdot a_3 + 3^4 \cdot a_4. \end{equation*}\]This observation suggests that the application of Moessner’s sieve on an initial configuration with the vertical seed tuple,
\[\begin{equation*} a_4, a_3, a_2, a_1, a_0, \end{equation*}\]yields a sequence of Moessner triangles whose bottom-most elements enumerate the values of the polynomial,
\[\begin{equation*} p(t) = \sum_{i=0}^4 a_i \cdot {(1 + t)}^i, \end{equation*}\]where \(t\) is the triangle index. Thus, we conjecture that applying the dual sieve on an initial configuration where the vertical seed tuple consists of the constants,
\[\begin{equation*} a_n, a_{n-1}, \dots, a_1, a_0, \end{equation*}\]yields a sequence of Moessner triangles where the bottom-most elements, comprising the result sequence, enumerate the values of the polynomial,
\[\begin{equation*} p(t) = \sum_{i=0}^n a_i \cdot {(1 + t)}^i. \end{equation*}\]The reason why the above statement is a conjecture and not a theorem is because the previous theorems have all been formalized and proved in my Master’s thesis, while this last statement has not. However, there are strong indications of its correctness as we can decompose a seed tuple consisting of any sum of two tuples and have proved that the conjecture holds for the binomial \(a_{i+1} \cdot (1 + t)^{i+1} + a_i \cdot (1 + t)^i\), as stated by Long’s idealized theorem.
In this blog post, we have introduced idealized versions of Moessner’s theorem and Long’s theorem – along with Long’s weak theorem – stated in terms of the dual of Moessner’s sieve. Furthermore, we have conjectured a new generalization of Long’s theorem that connects it to polynomial evaluation.
This post is the final excerpt from my Master’s thesis, in which I formalize and prove all the statements mentioned in this and the previous posts in the Coq proof assistant.
]]>“The trick, William Potter,
is not minding that it hurts.”
– Robert Bolt, Lawrence of Arabia (1962)
The goal of this blog post is to introduce a new combinatorial property which connects Moessner triangles of different rank but with the same triangle index, thus acting as a dual to the existing connection between Moessner triangles of the same rank but different triangle index. This duality proposes the view of Moessner’s sieve as generating a 2-dimensional grid of triangles instead of just a 1-dimensional sequence of triangles.
The post is structured as follows. In Section 2, we motivate the idea of viewing Moessner’s sieve as generating a grid of triangles, and introduce a rank upgrading procedure, which takes a seed tuple of a Moessner triangle of rank \(r\) and returns the seed tuple of the same Moessner triangle of rank \(r + 1\). As a dual to the first section, we introduce a set of rank decomposition rules in Section 3, which allows us to describe any entry of a Moessner triangle of rank \(r + 1\) as a sum of entries in the same Moessner triangle of rank \(r\). The post is concluded in Section 4.
In this section, we propose the idea of viewing the output of Moessner’s sieve
as a grid of triangles by first observing a connection between the seed tuples
of the \(t\)th Moessner triangle of different rank, \(r\) and \(r + 1\). Using
this observation, we introduce a rank upgrading procedure, upgradeSeedTuple
,
which takes a seed tuple of a Moessner triangle of rank \(r\) and returns the
seed tuple of the same Moessner triangle of rank \(r + 1\). Finally, we
demonstrate its application.
In order to motivate the idea of Moessner’s sieve generating a grid of triangles, we start by examining the first three Moessner triangles of rank \(3\) and \(4\), along with their respective seed tuples,
\[\begin{equation*} \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & \\\\ 1 & & 1 & 1 & 1 & 1 & & \\ 0 & & 1 & 2 & 3 & & & \\ 0 & & 1 & 3 & & & & \\ 0 & & 1 & & & & & \\ 0 & & & & & & & \\ \\\\ & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 0 & & 1 & 2 & 3 & 4 & & \\ 0 & & 1 & 3 & 6 & & & \\ 0 & & 1 & 4 & & & & \\ 0 & & 1 & & & & & \\ 0 & & & & & & & \end{array} \quad \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & \\\\ 1 & & 1 & 1 & 1 & 1 & & \\ 3 & & 4 & 5 & 6 & & & \\ 3 & & 7 & 12 & & & & \\ 1 & & 8 & & & & & \\ 0 & & & & & & & \\ \\\\ & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 4 & & 5 & 6 & 7 & 8 & & \\ 6 & & 11 & 17 & 24 & & & \\ 4 & & 15 & 32 & & & & \\ 1 & & 16 & & & & & \\ 0 & & & & & & & \end{array} \quad \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & \\\\ 1 & & 1 & 1 & 1 & 1 & & \\ 6 & & 7 & 8 & 9 & & & \\ 12 & & 19 & 27 & & & & \\ 8 & & 27 & & & & & \\ 0 & & & & & & & \\ \\\\ & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 8 & & 9 & 10 & 11 & 12 & & \\ 24 & & 33 & 43 & 54 & & & \\ 32 & & 65 & 108 & & & & \\ 16 & & 81 & & & & & \\ 0 & & & & & & & \end{array} \end{equation*}\]Now, for both sieves we know that we can move from left to right, i.e., increase the index of the triangles, but we do not know if we can move from top to bottom, i.e., increase the rank of the triangles. However, if we remember that we can characterize each seed tuple as an instance of the binomial expansion \({(1 + t)}^r\), where \(r\) is the rank of the Moessner triangle and \(t\) is the triangle index, we search for a combinatorial property that allows us to go from the seed tuple corresponding to the binomial expansion where \(r = r'\) to the seed tuple corresponding to the binomial expansion where \(r = r' + 1\), thus obtaining the needed vertical movement in the grid of triangles.
If we examine the two seed tuples generated by the first Moessner triangles, \((1, 3, 3, 1)\) and \((1 ,4, 6, 4, 1)\), we observe that we can obtain the second seed tuple from the first using the following scheme,
\[\begin{align} \tag{1}\label{eq:substitute-moessner-triangle-one} 1 &= 1 + 0\\ 4 &= 3 + 1\\ 6 &= 3 + 3\\ 4 &= 1 + 3\\ 1 &= 0 + 1, \end{align}\]where we obtain the \((i + 1)\)th element of rank \(r + 1\) by adding the \((i + 1)\)th element of rank \(r\) plus the value of an accumulator which contains the value of the \(i\)th element of rank \(r\) – coincidentally this calculation is also equivalent to an application of Pascal’s rule in Pascal’s triangle for these values. However, when we examine the next pair of seed tuples, \((1, 6, 12, 8)\) and \((1, 8, 24, 32, 16)\), we realize that the above scheme is insufficient for calculating the second tuple from the first. Fortunately, we receive a hint from the fact that the last elements of the two tuples are equal to \(2^3\) and \(2^4\), respectively, which means that we can obtain the latter by multiplying the former by \(2\). With this in mind, we change the scheme accordingly and get,
\[\begin{align} \tag{2}\label{eq:substitute-moessner-triangle-two} 16 &= 2 \cdot 8 + 0\\ 32 &= 2 \cdot 12 + 8\\ 24 &= 2 \cdot 6 + 12\\ 8 &= 2 \cdot 1 + 6\\ 1 &= 2 \cdot 0 + 1, \end{align}\]which now yields the desired result. It turns out that this Pascal-like property, of adding the two nearest entries of the seed tuple of rank \(r'\), holds in general if we substitute the \(2\) with \((1 + t)\). For example, if we look at the hypotenuses of the third pair of triangles, where \(t = 2\), we get the following calculations,
\[\begin{align} \tag{3}\label{eq:substitute-moessner-triangle-three} 81 &= (1 + 2) \cdot 27 + 0\\ 108 &= (1 + 2) \cdot 27 + 27\\ 54 &= (1 + 2) \cdot 9 + 27\\ 12 &= (1 + 2) \cdot 1 + 9\\ 1 &= (1 + 2) \cdot 0 + 1, \end{align}\]which confirm the correctness of the formula – this property can also be seen from the multiplicative property, \({(1 + t)}^{r + 1} = (1 + t) \cdot {(1 + t)}^r\), of the binomial expansion. Thus, we have now demonstrated how to obtain the seed tuple of rank \(r + 1\), when given the seed tuple of rank \(r\), which means that we can now move in a vertical direction as well as a horizontal direction in the grid of triangles shown at the beginning of this section.
Having covered the motivation for perceiving Moessner’s sieve as generating a grid of triangles, rather than a sequence of triangles, we move on to construct a rank upgrading procedure, which given a seed tuple of rank \(r\) returns the corresponding seed tuple of rank \(r + 1\), thus implementing the vertical direction discussed above.
When taking the description of the rank upgrading procedure in the previous
section and translating it into Haskell, we initially note that the procedure
should take a seed tuple, xs
, an accumulator, a
, and a triangle index, t
,
as inputs. Furthermore, we want to pattern match on the structure of the seed
tuple, xs
, as the procedure works by traversing the tuple and operating on its
elements. Lastly, we observe that for the base case of the pattern matching, xs
= []
, we simply return a list containing just the accumulator, while in the
inductive case of the pattern matching, xs = x : xs'
, we add the accumulator,
a
, to (t + 1) * x
and cons the intermediate result with the result of the
recursive call on xs'
. Putting these pieces together we get the procedure,
for which we also define a wrapper function that initializes the accumulator to
0
,
such that the three examples in Figure \ref{eq:substitute-moessner-triangle-one}-\ref{eq:substitute-moessner-triangle-three} can be expressed as the propositions,
Having defined upgradeSeedTuple
and demonstrated its use, we take a step back
and investigate the dual of this section. Specifically, our next step is to show
how to decompose the entries of the \(t\)th Moessner triangle of rank \(r + 1\)
in terms of the same Moessner triangle of rank \(r\).
In this section, we take the dual approach of the previous section by first motivating the introduction of a series of rank decomposition rules, which allows us to describe the entries of a Moessner triangle of rank \(r + 1\) in terms of the same Moessner triangle of rank \(r\).
Starting from the same example as in the previous section, we examine the first three Moessner triangles of rank \(3\) and \(4\),
\[\begin{equation*} \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & \\\\ 1 & & 1 & 1 & 1 & 1 & & \\ 0 & & 1 & 2 & 3 & & & \\ 0 & & 1 & 3 & & & & \\ 0 & & 1 & & & & & \\ 0 & & & & & & & \\ \\\\ & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 0 & & 1 & 2 & 3 & 4 & & \\ 0 & & 1 & 3 & 6 & & & \\ 0 & & 1 & 4 & & & & \\ 0 & & 1 & & & & & \\ 0 & & & & & & & \end{array} \quad \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & \\\\ 1 & & 1 & 1 & 1 & 1 & & \\ 3 & & 4 & 5 & 6 & & & \\ 3 & & 7 & 12 & & & & \\ 1 & & 8 & & & & & \\ 0 & & & & & & & \\ \\\\ & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 4 & & 5 & 6 & 7 & 8 & & \\ 6 & & 11 & 17 & 24 & & & \\ 4 & & 15 & 32 & & & & \\ 1 & & 16 & & & & & \\ 0 & & & & & & & \end{array} \quad \begin{array}{*{8}{r}} & & 0 & 0 & 0 & 0 & 0 & \\\\ 1 & & 1 & 1 & 1 & 1 & & \\ 6 & & 7 & 8 & 9 & & & \\ 12 & & 19 & 27 & & & & \\ 8 & & 27 & & & & & \\ 0 & & & & & & & \\ \\\\ & & 0 & 0 & 0 & 0 & 0 & 0 \\\\ 1 & & 1 & 1 & 1 & 1 & 1 & \\ 8 & & 9 & 10 & 11 & 12 & & \\ 24 & & 33 & 43 & 54 & & & \\ 32 & & 65 & 108 & & & & \\ 16 & & 81 & & & & & \\ 0 & & & & & & & \end{array} \end{equation*}\]and use the knowledge we have gathered so far to drive our motivation. Instead of looking at the calculations in Formula \ref{eq:substitute-moessner-triangle-two} and \ref{eq:substitute-moessner-triangle-three} as the upgrading of a seed tuple, we flip the perspective and see it as an example of decomposing the hypotenuse in terms of the Moessner triangle of lower rank. Taking this idea one step further, we propose the idea that there exists a set of rank decomposition rules which work for all entries of a triangle and not just the hypotenuse/seed tuple. With this idea in mind, we focus on the first column of the second and third pair of Moessner triangles and try to apply the same scheme as before, except that we make two minor adjustments,
which gives us the following calculations, for the second and third triangles,
\[\begin{equation*} \begin{aligned} 16 &= 1 \cdot 8 + 8\\ 15 &= 1 \cdot 7 + 8\\ 11 &= 1 \cdot 4 + 7\\ 5 &= 1 \cdot 1 + 4\\ 1 &= 1 \cdot 0 + 1, \end{aligned} \qquad\text{and}\qquad \begin{aligned} 81 &= 2 \cdot 27 + 27\\ 65 &= 2 \cdot 19 + 27\\ 33 &= 2 \cdot 7 + 19\\ 9 &= 2 \cdot 1 + 7\\ 1 &= 2 \cdot 0 + 1, \end{aligned} \end{equation*}\]demonstrating that the property also holds for the initial column of every Moessner triangle. Remembering that the different Moessner triangles are constructed using Pascal’s rule, we restate the calculations in the formula above as,
\[\begin{equation*} \begin{aligned} 16 &= 2 \cdot 8 + 0\\ 15 &= 2 \cdot 7 + 1\\ 11 &= 2 \cdot 4 + 3\\ 5 &= 2 \cdot 1 + 3\\ 1 &= 2 \cdot 0 + 1, \end{aligned} \qquad\text{and}\qquad \begin{aligned} 81 &= 3 \cdot 27 + 0\\ 65 &= 3 \cdot 19 + 8\\ 33 &= 3 \cdot 7 + 12\\ 9 &= 3 \cdot 1 + 6\\ 1 &= 3 \cdot 0 + 1, \end{aligned} \end{equation*}\]by realizing that each of the values used for accumulators is actually the sum of one of the values in the seed tuple (western neighbor) and the entry which we have already multiplied by \(t\) (northern neighbor),
\[\begin{equation*} \begin{aligned} 16 &= 1 \cdot 8 + (8 + 0)\\ 15 &= 1 \cdot 7 + (7 + 1)\\ 11 &= 1 \cdot 4 + (4 + 3)\\ 5 &= 1 \cdot 1 + (1 + 3)\\ 1 &= 1 \cdot 0 + (0 + 1), \end{aligned} \qquad\text{and}\qquad \begin{aligned} 81 &= 2 \cdot 27 + (27 + 0)\\ 65 &= 2 \cdot 19 + (19 + 8)\\ 33 &= 2 \cdot 7 + (7 + 12)\\ 9 &= 2 \cdot 1 + (1 + 6)\\ 1 &= 2 \cdot 0 + (0 + 1). \end{aligned} \end{equation*}\]Thus, we get \((1 + t)\) times the entry above the desired entry (northern neighbor) and a value of the seed tuple/hypotenuse of the previous triangle (western neighbor).
Noting that we now have a Pascal-like rule which works across ranks, we examine whether it also holds true for the subsequent columns of the Moessner triangles. As such, we try to calculate the second column of the second and third pair of triangles using the first columns for accumulator values, instead of the seed tuples,
\[\begin{equation} \begin{aligned} 32 &= 2 \cdot 12 + 8\\ 17 &= 2 \cdot 5 + 7\\ 6 &= 2 \cdot 1 + 4\\ 1 &= 2 \cdot 0 + 1, \end{aligned} \qquad\text{and}\qquad \begin{aligned} 108 &= 3 \cdot 27 + 27\\ 43 &= 3 \cdot 8 + 19\\ 10 &= 3 \cdot 1 + 7\\ 1 &= 3 \cdot 0 + 1. \end{aligned} \end{equation}\]Again, we obtain the desired results, which demonstrates a consistent Pascal-like property across ranks and triangles. Thus, we have now shown how it is possible to state an entry of a Moessner triangle of rank \(r + 1\) as a sum of entries in the same Moessner triangle of rank \(r\).
Next, we transform our motivating examples into concrete rank decomposition rules.
A subtle point lies in the fact that while the Moessner triangles have a finite
number of entries in each column, this is not the case of our characteristic
function rotatedMoessnerEntry
,
as the gray values above are the results of computing entries outside of the
Moessner triangles using our characteristic function. Thus, we obtain a more
general, and easier to state, set of the rank decomposition rules by stating
them in terms of our characteristic function, rotatedMoessnerEntry
, rather
than directly on the triangle creation procedure, createTriangleVertically
.
In the previous section, we demonstrated two Pascal-like properties that could be merged into one simpler property, expressing an entry of a Moessner triangle of rank \(r + 1\) in terms of the same entry in the triangle of rank \(r\) along with the entry above it (northern neighbor), which works for all columns of a Moessner triangle. We can define this decomposition rule in the following way,
which states that the entry in the \((r + 1)\)th row and \(c\)th column of a
Moessner triangle of rank \(n + 1\), is the sum of \(t\) times the entry at the
\(r\)th row and \(c\)th column of rank \(n\) and the entry at the \((r + 1)\)th
row and \(c\)th column of rank \(n\). This rule captures the examples we have
shown above, and it can be proved by nested induction on the row and column
indices, r
and c
. From this rule follows the two Pascal-like rule,
and
which captures the two cases where the column index, c
, is either 0
or
greater than 0
.
Combining the above rules and the procedure of the previous section, we have now shown a new property of Moessner’s sieve that creates a vertical connection between the seed tuples and entries of two Moessner triangles with the same triangle index, \(t\), but different ranks, \(r\) and \(r + 1\), thus acting as a dual to the existing properties which horizontally connects two triangles with different triangle index, \(t\), but same rank, \(r\), in this implicit grid of triangles.
In this post, we have introduced a new combinatorial property which connects Moessner triangles of different rank but with the same triangle index, thus acting as a dual to the existing connection between Moessner triangles of the same rank but different triangle index. This duality implies a 2-dimensional grid of Moessner triangles, where the triangle index is increasing as we go along the horizontal axis, from left to right, while the rank is increasing when going along the vertical axis, from top to bottom. These grid properties have been introduced as a rank upgrading procedure, which takes a seed tuple of the \(t\)th Moessner triangle of rank \(r\) and returns the seed tuple of the \(t\)th Moessner triangle of rank \(r + 1\), and several rank decomposition rules, which describe an entry of the \(t\)th Moessner triangle of rank \(r + 1\) as a sum of entries in the \(t\)th Moessner triangle of rank \(r\).
The rank upgrading procedure, upgradeSeedTuple
, was the result of the
observation that we could obtain the seed tuple of the Moessner triangle of rank
\(r + 1\) by adding pairs of entries in the seed tuple of the Moessner triangle
of rank \(r\) where one was multiplied with the triangle index.
Conversely, the rank decomposition rules were the result of exploring whether the decomposition rule only applied for the seed tuples or if it persisted into the entries of the Moessner triangles.
This post was a small excerpt from my Master’s thesis, in which I also prove the correctness of the decomposition rules stated above, and relate them to the actual triangle creation procedures of the dual sieve.
]]>The goal of this post is to derive Moessner’s sieve from Horner’s method for polynomial division, thus concluding this three part series on Horner’s method.
The post is structured as follows. In Section 2, we introduce and formalize Horner blocks in the context of Taylor polynomials. Having defined Horner blocks, we transform them into Moessner triangles in Section 3 and in the process obtain an alternative formalization of Moessner’s sieve. In Section 4, we state an equivalence relation between Horner’s method and Moessner’s sieve. The post is concluded in Section 5.
In this section, we introduce and formalize the concept of Horner blocks and show how they relate to Taylor polynomials.
We start this section by picking up from where we left off in the previous blog post and represent the repeated application of Horner’s method for polynomial division in a tabular format. Given the polynomial \(p(x) = 2x^3 + 4x^2 + 11x + 3\), we can restate its repeated division with the binomial \(d(x) = x - 2\) (captured in Formulas 7-9 in the previous blog post) by stacking the calculations on top of each other,
\[\begin{equation} \tag{1}\label{eq:horner-block-example} \begin{array}{ c | c c c c } 2 & 2 & 4 & 11 & 3 \\ & & 4 & 16 & 54 \\ & 2 & 8 & 27 & \textbf{57}\\ \\ & & 4 & 24 \\ & 2 & 12 & \textbf{51}\\ \\ & & 4 \\ & 2 & \textbf{16} \\ \\ & \textbf{2}. & \end{array} \end{equation}\]In this way, the three calculations are merged into a triangular array, such that the hypotenuse of the triangle, highlighted in boldface, enumerates the coefficients of the resulting Taylor polynomial,
\[\begin{equation*} P_{3,2}(x) = 2 {(x - 2)}^3 + 16 {(x - 2)}^2 + 51 (x - 2) + 57. \end{equation*}\]We call this construction a Horner block and formalize it by first defining a
Block
to be a List
of Polynomials
,
which we then use in the definition of the following procedure,
which performs the repeated application of Horner’s method for polynomial
division, hornersPolyDiv
, while removing the last entry of each intermediate
results – what init
does. As in the previous posts, cs
corresponds to the
coefficients of a polynomial \(p\), while x
corresponds to the point \(k\),
and the final argument n
specifies the number of divisions to be
made. However, we note that there exists an extra base case in Formula
\ref{eq:horner-block-example}, as no value is dropped from the initial
Polynomial
in the Horner block. Hence, we define a wrapper function,
which performs a single division without removing the last entry, followed by a
call to createHornerBlockAcc
. Thus, we can obtain the Taylor polynomial of a
polynomial \(p\) at a point \(k\) by reading the hypotenuse of the Block
returned by createHornerBlock
when given a list of \(p\)’s coefficients and a
value of \(k\).
In this section, we transform Horner blocks into Moessner triangles and in the process derive an alternative formalization of Moessner’s sieve.
If we let \(p(x) = x^3\) and want to obtain the Taylor polynomial \(P_{3,3}\), we can do so in two ways:
From the above calculations, we first observe that the two result hypotenuses – highlighted in boldface – are identical. Secondly, we observe that the first remainder calculated in each of the three triangles in Formula \ref{eq:horner-x-3-three-divs}, \((1, 8, 27)\), are equal to the powers of \(3\), \((1^3, 2^3, 3^3)\), and thus equal to the values of \(p(1)\), \(p(2)\) and \(p(3)\), which demonstrates that Horner’s method can be used to enumerate the values of \(p\) for the set of positive natural numbers. Lastly, upon closer examination of the procedure used above, we note that given a polynomial,
\[\begin{equation*} p(x) = a_3 x^3 + a_2 x^2 + a_1 x + a_0, \end{equation*}\]the repeated division of \(p\) with \(x - 1\) has the following structure,
\[\begin{equation*} \begin{array}{ c c c c } a_3 & a_2 & a_1 & a_0 \\ & a_3 & a_3 + a_2 & a_3 + a_2 + a_1 \\ a_3 & a_3 + a_2 & a_3 + a_2 + a_1 & \mathbf{a_3 + a_2 + a_1 + a_0} \\ \\ & a_3 & 2a_3 + a_2 & \\ a_3 & 2a_3 + a_2 & \mathbf{3a_3 + 2a_2 + a_1} & \\ \\ & a_3 & & \\ a_3 & \mathbf{3a_3 + a_2} & & \\ \\ \mathbf{a_3} & & & \end{array} \end{equation*}\]where every non-shifted row, starting with the first row,
\[\begin{equation} \tag{3}\label{eq:horner-block-stripped} \begin{array}{ c c c c } a_3 & a_2 & a_1 & a_0 \\ a_3 & a_3 + a_2 & a_3 + a_2 + a_1 & \mathbf{a_3 + a_2 + a_1 + a_0} \\ a_3 & 2a_3 + a_2 & \mathbf{3a_3 + 2a_2 + a_1} & \\ a_3 & \mathbf{3a_3 + a_2} & & \\ \mathbf{a_3} & & & \end{array} \end{equation}\]is the partial sum of the former non-shifted row. If we take the results of Formula \ref{eq:horner-x-3-three-divs} and strip away the left-most column and the top row, containing the value of \(k\) and the exponents, we get the following three Horner blocks,
\[\begin{equation} \tag{4}\label{eq:horner-x-3-three-divs-blocks} \begin{array}{ c c c c } 1 & 0 & 0 & 0 \\ & 1 & 1 & 1 \\ 1 & 1 & 1 & \textbf{1} \\ & 1 & 2 & \\ 1 & 2 & \textbf{3} & \\ & 1 & & \\ 1 & \textbf{3} \\ \textbf{1} & & & \end{array} \qquad \begin{array}{ c c c c } 1 & 3 & 3 & 1 \\ & 1 & 4 & 7 \\ 1 & 4 & 7 & \textbf{8} \\ & 1 & 5 & \\ 1 & 5 & \textbf{12} & \\ & 1 \\ 1 & \textbf{6} & & \\ \textbf{1} \end{array} \qquad \begin{array}{ c c c c } 1 & 6 & 12 & 8 \\ & 1 & 7 & 19 \\ 1 & 7 & 19 & \textbf{27} \\ & 1 & 8 & \\ 1 & 8 & \textbf{27} & \\ & 1 & & \\ 1 & \textbf{9} & & \\ \textbf{1} & & & \end{array} \end{equation}\]Here, we note the regular structure of the blocks where every block is created from the hypotenuse of the previous block. Next, we perform the same transformation on Formula \ref{eq:horner-x-3-three-divs-blocks} as seen in Formula \ref{eq:horner-block-stripped}, where every shifted row is removed in order to expose the partial summation pattern between each intermediate result,
\[\begin{equation} \tag{5}\label{eq:horner-x-3-three-divs-blocks-stripped} \begin{array}{ c c c c } 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & \textbf{1} \\ 1 & 2 & \textbf{3} & \\ 1 & \textbf{3} \\ \textbf{1} & & & \end{array} \qquad \begin{array}{ c c c c } 1 & 3 & 3 & 1 \\ 1 & 4 & 7 & \textbf{8} \\ 1 & 5 & \textbf{12} & \\ 1 & \textbf{6} & & \\ \textbf{1} \end{array} \qquad \begin{array}{ c c c c } 1 & 6 & 12 & 8 \\ 1 & 7 & 19 & \textbf{27} \\ 1 & 8 & \textbf{27} & \\ 1 & \textbf{9} & & \\ \textbf{1} & & & \end{array} \end{equation}\]Lastly, we remove the redundant rows, which appear as both the hypotenuse of one block and the initial row of the subsequent block, e.g., \((1,3,3,1)\) is both the hypotenuse of the first Horner block and also the first row of the second Horner block. Furthermore, we pile the blocks on top of each other,
\[\begin{equation} \tag{6}\label{eq:horner-moessner-x-3} \begin{array}{ r r r r } 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & \textbf{1} \\ 1 & 2 & \textbf{3} & \\ 1 & \textbf{3} \\ \textbf{1} & & & \\ 1 & 4 & 7 & \textbf{8} \\ 1 & 5 & \textbf{12} & \\ 1 & \textbf{6} & & \\ \textbf{1} \\ 1 & 7 & 19 & \textbf{27} \\ 1 & 8 & \textbf{27} & \\ 1 & \textbf{9} & & \\ \textbf{1} & & & \end{array} \end{equation}\]resulting in a rotated mirror image of Moessner’s sieve,
\[\begin{equation*} \begin{array}{*{12}{r}} 1 & 1 & 1 & \textbf{1} & 1 & 1 & 1 & \textbf{1} & 1 & 1 & 1 & \textbf{1}\\ 1 & 2 & \textbf{3} & & 4 & 5 & \textbf{6} & & 7 & 8 & \textbf{9} & \\ 1 & \textbf{3} & & & 7 & \textbf{12} & & & 19 & \textbf{27} & & \\ \textbf{1} & & & & \textbf{8} & & & & \textbf{27} & & & \end{array} \end{equation*}\]where the right-most column enumerates the successive powers of \(x^3\), which is the statement of Moessner’s theorem for \(n = 3\).
Now, if we examine Formula \ref{eq:horner-moessner-x-3} from the
perspective of our dual sieve, we observe that the rows of the Horner-based
sieve do indeed enumerate the columns of the traditional sieve, just like our
triangle creation procedure, createTriangleVertically
. Furthermore, we observe
that the Horner-based sieve collects the values of the hypotenuse of the
previous block in order to create the next, as made explicit in Formula
\ref{eq:horner-moessner-x-3}, just like the dual sieve,
createTrianglesVertically
. Together, these observations suggest that we can
state an equivalence relation between createTriangleVertically
and
createHornerBlock
,
where we use the prefix of streams, (take (S r) σ)
, instead of lists to
directly capture the relation between the length of the input tuples/polynomial
and the number of divisions to be made. The above statement can be proved by
induction on the prefix length, r
, thus showing that
createTriangleVertically
and createHornerBlockAcc
have a simple to state
equivalence relation, which captures the fact that the alternative sieve
described above is actually emulating createTrianglesVertically
, which
accounts for it being the “rotated mirror image” of Moessner’s sieve.
In this post, we have derived Moessner’s sieve from Horner’s method.
In order to derive Moessner’s sieve, we transformed the successive calculations of Taylor polynomials, called Horner blocks, into a rotated mirror image of Moessner’s sieve, which we then showed had an equivalence relation to the dual of Moessner’s sieve.
This post - and the previous two - was a small excerpt from my Master’s thesis, in which I also formalize all the discussed optics, in the three blog posts, in Coq and prove the above equivalence between Moessner’s sieve and Horner’s method for polynomial division.^{1}
The observation that you can derive Moessner’s sieve from Horner’s method was first observed - but not proved - by Jan van Yzeren in the paper “A Note on An Additive Property of Natural Numbers” (1959). ↩
The goal of this post is to derive Taylor polynomials using Horner’s method for polynomial division.
The post is structured as follows. In Section 2, we introduce the concept of Taylor polynomials and Taylor’s theorem. In Section 3, we derive a procedure for obtaining Taylor polynomials using Horner’s method for polynomial division. The post is concluded in Section 4.
In this section, we first state the polynomial remainder theorem followed by the definitions of Taylor series and Taylor polynomials, which we use to finally state Taylor’s theorem.
As pointed out in the previous blog post, if we divide a polynomial, \(p\), with a binomial, \(x - k\), the remainder of the division is equal to \(p(k)\), which is captured by the polynomial remainder theorem.
Theorem 1 (Polynomial remainder theorem). Given a polynomial,
\[\begin{equation*} p(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0, \end{equation*}\]where \(a_0, \dots, a_n \in \mathbb{N}\), and a binomial,
\[\begin{equation*} d(x) = x - k, \end{equation*}\]where \(k \in \mathbb{N}\), the remainder of dividing \(p\) with \(d\), denoted \(r\), is equal to \(p(k)\). Furthermore, \(d\) divides \(p\) if and only if \(p(k) = 0\).
Next, we define Taylor series and Taylor polynomials in order to state Taylor’s theorem.
A Taylor series is the representation of a function as an infinite sum of terms, calculated from the values of the function’s derivatives at a specific point.
Definition 1 (Taylor series). Given a function \(p\) and a natural number \(k\), the Taylor series of \(p\) is,
\[\begin{equation*} \frac{p(k)}{0!} {(x - k)}^0 + \frac{p'(k)}{1!} {(x - k)}^1 + \frac{p''(k)}{2!} {(x - k)}^2 + \frac{p^{(3)}(k)}{3!} {(x - k)}^3 + \cdots, \end{equation*}\]which can be written as,
\[\begin{equation*} \sum_{i=0}^{\infty} \frac{p^{(i)}(k)}{i!} {(x - k)}^i. \end{equation*}\]A Taylor series with a finite number of terms, \(n \in \mathbb{N}\), is called a Taylor polynomial and written,
\[\begin{equation*} \sum_{i=0}^{n} \frac{p^{(i)}(k)}{i!} {(x - k)}^i. \end{equation*}\]Since we are working solely with polynomials, and not other types of functions, we are able to restate any polynomial as a Taylor polynomial, calculating the exact same values. This brings us to the following simplified version of Taylor’s theorem - without the error function - defined over polynomials and natural numbers.
Theorem 2 (Taylor’s theorem) Given a polynomial, \(p\), and two natural numbers, \(n\) and \(k\), the \(n\)-th order Taylor polynomial of \(p\), \(P_{n,k}\), at the point \(k\) is,
\[\begin{equation} \tag{1}\label{eq:taylor-s-theorem} P_{n,k}(x) = \sum_{i=0}^n \frac{p^{(i)}(k)}{i!} {(x - k)}^i. \end{equation}\]Having stated the above definitions and theorems, we now show how to obtain Taylor polynomials using Horner’s method.
From Theorem 2, we know that given a polynomial,
\[\begin{equation*} p(x) = \sum_{i=0}^n a_i x^i, \end{equation*}\]where \(a_0, \dots, a_n \in \mathbb{N}\), and a \(k \in \mathbb{N}\), the Taylor polynomial of \(p\) at the point \(k\) is,
\[\begin{equation*} P_{n,k}(x) = \sum_{i=0}^n \frac{p^{(i)}(k)}{i!} {(x - k)}^i, \end{equation*}\]where every occurrence of the variable \(x\) has been substituted with \(x - k\) and every coefficient \(a_i\) has been substituted with \(\frac{p^{(i)}(k)}{i!}\). Thus, we need a way to compute these new values using Horner’s method.
If we let \(p(x) = 2x^3 + 4x^2 + 11x + 3\) and \(k = 2\), we can calculate the coefficients of \(P_{3,2}\) – without the use of Horner’s method – by evaluating \(p\) and its first three derivatives for \(x = 2\),
\[\begin{align} \frac{p(2)}{0!} &= \frac{2 \cdot 2^3 + 4 \cdot 2^2 + 11 \cdot 2 + 3}{0!} = \frac{57}{0!} = 57\tag{2}\label{eq:taylor-poly-ex-p-2}\\ \frac{p'(2)}{1!} &= \frac{6 \cdot 2^2 + 8 \cdot 2 + 11}{1!} = \frac{51}{1!} = 51\tag{3}\label{eq:taylor-poly-ex-pp-2}\\ \frac{p''(2)}{2!} &= \frac{12 \cdot 2 + 8}{2!} = \frac{32}{2!} = 16\tag{4}\label{eq:taylor-poly-ex-ppp-2}\\ \frac{p^{(3)}(2)}{3!} &= \frac{12}{3!} = 2\tag{5}\label{eq:taylor_poly_ex_pppp-2}, \end{align}\]which yields the \(3\)-rd order Taylor polynomial of \(p\) at point \(2\),
\[\begin{align} \tag{6}\label{eq:taylor-poly-ex-p-2-result} P_{3,2}(x) &= \frac{p(2)}{0!} {(x - 2)}^0 + \frac{p'(2)}{1!} {(x - 2)}^1\\ &+ \frac{p''(2)}{2!} {(x - 2)}^2 + \frac{p^{(3)}(2)}{3!} {(x - 2)}^3\\ P_{3,2}(x) &= 57 {(x - 2)}^0 + 51 {(x - 2)}^1 + 16{(x - 2)}^2 + 2{(x - 2)}^3\\ P_{3,2}(x) &= 2 {(x - 2)}^3 + 16 {(x - 2)}^2 + 51 (x - 2) + 57. \end{align}\]Looking at the calculations above, we do not only have to evaluate four polynomials and divide each of them with a factorial, but we also have to take the repeated derivative of \(p\). It would be useful if we could calculate these values using our existing definitions. From Theorem 1, we know that dividing \(p\) with a binomial, \(d(x) = x - 2\),^{1}
\[\begin{equation} \tag{7} \begin{array}{ c | c c c c } & x^3 & x^2 & x^1 & x^0 \\ & 2 & 4 & 11 & 3 \\ 2 & & 4 & 16 & 54 \\ \hline & 2 & 8 & 27 & 57 \end{array} \end{equation}\]yields the quotient \(q_0(x) = 2x^2 + 8x + 27\) and remainder \(r_0 = p(2)\), which is also equal to \(\frac{p(2)}{0!}\), since \(0! = 1\). This corresponds to the result of Formula \ref{eq:taylor-poly-ex-p-2}, which is also why we have subscripted the remainder with a \(0\), since it is the value of the coefficient of \(P_{3,2}\) with index \(i = 0\),
\[\begin{equation*} r_0 = \frac{p(2)}{0!} = 57. \end{equation*}\]Furthermore, it turns out that if we keep dividing the obtained quotient, a pattern emerges that connects the remainders of the subsequent divisions with the remaining coefficients of \(P_{3,2}\). If we divide the quotient of the first division, \(q_0(x) = 2x^2 + 8x + 27\), with the same binomial as before, \(d(x) = x - 2\),
\[\begin{equation} \tag{8} \begin{array}{ c | c c c } & x^2 & x^1 & x^0 \\ & 2 & 8 & 27 \\ 2 & & 4 & 24 \\ \hline & 2 & 12 & 51 \end{array} \end{equation}\]we get the quotient \(q_1(x) = 2x + 12\) and remainder \(r_1 = 51\). In line with the previous result, we notice that the remainder, \(r_1\), is equal to the result of Formula \ref{eq:taylor-poly-ex-pp-2}, i.e., the value of the coefficient of \(P_{3,2}\) with index \(i = 1\),
\[\begin{equation*} r_1 = \frac{p'(2)}{1!} = 51. \end{equation*}\]If we repeat this procedure once more with the quotient \(q_1(x) = 2x + 12\),
\[\begin{equation} \tag{9} \begin{array}{ c | c c } & x^1 & x^0 \\ & 2 & 12 \\ 2 & & 4 \\ \hline & 2 & 16 \end{array} \end{equation}\]we get the remainder \(r_2 = 16\), which matches the coefficient with index \(i = 2\) in Formula \ref{eq:taylor-poly-ex-ppp-2},
\[\begin{equation*} r_2 = \frac{p''(2)}{2!} = 16, \end{equation*}\]and the quotient \(q_2 = 2\), which is also equal to the last remainder, \(r_3\), since \(q_2\) is constant, and therefore it is also equal to the coefficient with index \(i = 3\) in Formula \ref{eq:taylor_poly_ex_pppp-2},
\[\begin{equation*} q_2 = r_3 = \frac{p^{(3)}(2)}{3!} = 2. \end{equation*}\]Now, with the following coefficients in hand,
\[\begin{align*} r_3 &= \frac{p'''(2)}{3!} = 2\\ r_2 &= \frac{p''(2)}{2!} = 16\\ r_1 &= \frac{p'(2)}{1!} = 51\\ r_0 &= \frac{p(2)}{0!} = 57, \end{align*}\]the \(3\)-rd order Taylor polynomial of \(p\) at point \(2\) becomes,
\[\begin{align*} P_{3,2}(x) &= \frac{p(2)}{0!} {(x - 2)}^0 + \frac{p'(2)}{1!} {(x - 2)}^1 + \frac{p''(2)}{2!} {(x - 2)}^2 + \frac{p^{(3)}(2)}{3!} {(x - 2)}^3\\ P_{3,2}(x) &= r_0 {(x - 2)}^0 + r_1 {(x - 2)}^1 + r_2 {(x - 2)}^2 + r_3 {(x - 2)}^3\\ P_{3,2}(x) &= 57 {(x - 2)}^0 + 51 {(x - 2)}^1 + 16 {(x - 2)}^2 + 2 {(x - 2)}^3\\ P_{3,2}(x) &= 2 {(x - 2)}^3 + 16 {(x - 2)}^2 + 51 (x - 2) + 57, \end{align*}\]which is equal to the last Taylor polynomial in Formula \ref{eq:taylor-poly-ex-p-2-result}. Thus, we have demonstrated how to obtain the Taylor polynomial of a polynomial \(p\) at a point \(k\), by repeatedly dividing the resulting quotient polynomials with a binomial, \(x - k\), using Horner’s method, where \(p\) is the initial polynomial to be divided.^{2}
In this post, we have shown how to obtain Taylor polynomials with Horner’s method for polynomial division.
In our next post, we use what we have learned from this – and the previous – blog post to derive Moessner’s sieve^{3} from Horner’s method.
Note that we use the tabular representation for polynomial division, which we introduced in Formula 7 of the previous blog post. ↩
See “The wonder of Horner’s method” (2003) by Alex Pathan and Tony Collyer. ↩
Moessner’s sieve is the procedure described in “Eine Bemerkung über die Potenzen der natürlichen Zahlen” (1951) by Alfred Moessner, and the term Moessner’s sieve was first coined by Olivier Danvy in the paper “A Characterization of Moessner’s sieve” (2014). ↩
“It might be worth-while to point out
that the purpose of abstracting is not to be vague,
but to create a new semantic level
in which one can be absolutely precise”
– Edsger W. Dijkstra, 1972 (EWD340)
The goal of this post is to introduce a characteristic function of Moessner’s sieve, which computes the entries of a given Moessner triangle without needing to compute the prefix of the sieve.
The post is structured as follows. In Section
2, we derive the operational description
of a characteristic function of Moessner’s sieve, which we then formalize in
Section 3 as the two Haskell functions
moessnerEntry
and rotatedMoessnerEntry
. The post is concluded in Section
4.
In order to come up with a characteristic function of the triangles generated by Moessner’s sieve, we first have to uncover the patterns by which they are constructed. Hence, let us examine the first three Moessner triangles created by applying Moessner’s sieve of rank \(5\) – yielding successive powers of \(4\) – on the sequence of \(1\)s,
\[\begin{equation} \tag{1}\label{char-eq:sieve-rank-six-three-triangles} \begin{array}{*{5}{r}} 1 & 1 & 1 & 1 & \textbf{1} \\ 1 & 2 & 3 & \textbf{4} & \\ 1 & 3 & \textbf{6} & & \\ 1 & \textbf{4} & & & \\ \textbf{1} & & & & \end{array} \begin{array}{*{5}{r}} 1 & 1 & 1 & 1 & \textbf{1} \\ 5 & 6 & 7 & \textbf{8} & \\ 11 & 17 & \textbf{24} & & \\ 15 & \textbf{32} & & & \\ \textbf{16} & & & & \end{array} \begin{array}{*{5}{r}} 1 & 1 & 1 & 1 & \textbf{1} \\ 9 & 10 & 11 & \textbf{12} & \\ 33 & 43 & \textbf{54} & & \\ 65 & \textbf{108} & & & \\ \textbf{81} & & & & \end{array} \end{equation}\]and see if we can discover any properties that help us characterize the triangles. As previously pointed out in this series of posts and by Hinze,^{1} we can observe that the initial triangle generated by Moessner’s sieve is always equal to the rotated Pascal’s triangle, having a depth equal to the rank of the Moessner triangle plus one. Furthermore, we also notice that the subsequent Moessner triangles exhibit Pascal-like properties, i.e., Pascal’s rule holds for all triangles, as every entry is the sum of its immediate western and northern neighbors, as previously illustrated and originally noted by Long.^{2} Knowing that the Moessner triangles behave in a Pascal-like fashion, hints at a possible binomial coefficient-like characteristic function, parameterized over the first row and column of a given Moessner triangle. If we again focus on Figure \ref{char-eq:sieve-rank-six-three-triangles}, it is trivial to see that the first row of every Moessner triangle is filled with \(1\)s, while we need to discover a new property in order to characterize the first column of every triangle.
Returning to the initial Moessner triangle, we know from the equivalence between Pascal’s triangle and the binomial coefficient, that the hypotenuse of the triangle will always enumerate the coefficients of the monomials of the binomial expansion of \((1 + t)^r\), where \(r\) is equal to the rank of the triangle, and \(t\) is a variable. Using the initial Moessner triangle of Figure \ref{char-eq:sieve-rank-six-three-triangles} as an example, we get the binomial expansion,
\[\begin{equation*} \color{black} (1 + t)^4 \color{lightgray} = \color{black} 1 \color{lightgray} \cdot t^4 + \color{black} 4 \color{lightgray} \cdot t^3 + \color{black} 6 \color{lightgray} \cdot t^2 + \color{black} 4 \color{lightgray} \cdot t^1 + \color{black} 1 \color{lightgray}\cdot t^0 \color{black}, \end{equation*}\]where the values of the hypotenuse, \((1,4,6,4,1)\), do indeed enumerate the binomial coefficients of the expansion. Incidentally, the hypotenuse also enumerates the actual terms of the binomial expansion when \(t = 1\),
\[\begin{align*} \color{black} (1 + 1)^4 &= \color{black} 1 \cdot 1^4 \color{lightgray} + \color{black} 4 \cdot 1^3 \color{lightgray} + \color{black} 6 \cdot 1^2 \color{lightgray} + \color{black} 4 \cdot 1^1 \color{lightgray} + \color{black} 1 \cdot 1^0 \\ \color{black} &= \color{black} 1 \color{lightgray} + \color{black} 4 \color{lightgray} + \color{black} 6 \color{lightgray} + \color{black} 4 \color{lightgray} + \color{black} 1, \end{align*}\]which raises the question of what happens if we let \(t\) denote the triangle index, starting from \(t = 1\). As it turns out, letting \(t = 2\),
\[\begin{align} \tag{2}\label{eq:binomial-expansion-example} \color{black} (1 + 2)^4 &= \color{black} 1 \cdot 2^4 \color{lightgray} + \color{black} 4 \cdot 2^3 \color{lightgray} + \color{black} 6 \cdot 2^2 \color{lightgray} + \color{black} 4 \cdot 2^1 \color{lightgray} + \color{black} 1 \cdot 2^0\\ \color{black} &= \color{black} 16 \color{lightgray} + \color{black} 32 \color{lightgray} + \color{black} 24 \color{lightgray} + \color{black} 8 \color{lightgray} + \color{black} 1 \end{align}\]results in the terms of the binomial expansion to be equal to the values found in the hypotenuse of the second Moessner triangle, \((16,32,24,8,1)\), in Figure \ref{char-eq:sieve-rank-six-three-triangles}. We can observe that this property holds for all triangles,
\[\begin{align*} \color{black} (1 + 3)^4 &= \color{black} 1 \cdot 3^4 \color{lightgray} + \color{black} 4 \cdot 3^3 \color{lightgray} + \color{black} 6 \cdot 3^2 \color{lightgray} + \color{black} 4 \cdot 3^1 \color{lightgray} + \color{black} 1 \cdot 3^0\\ \color{black} &= \color{black} 81 \color{lightgray} + \color{black} 108 \color{lightgray} + \color{black} 54 \color{lightgray} + \color{black} 12 \color{lightgray} + \color{black} 1, \end{align*}\]as seen here for \(t = 3\), and was recently pointed out by Danvy et al.^{3} as a characterization of the values dropped in the individual triangles of Moessner’s sieve. Combining this observation with the fact that the entries of a Moessner triangle are created using Pascal’s rule, leads us to the realization that the first column of the \((1 + t)\)th Moessner triangle enumerates the partial sums of the monomials of the binomial expansion \({(1 + t)}^r\),
\[\begin{equation} \tag{3}\label{eq:partial-sums-hpyotenuses} \color{lightgray} \begin{array}{*{5}{r}} 1 & 1 & 1 & 1 & \color{black}{1} \\ 1 & 2 & 3 & \color{black}{4} & \\ 1 & 3 & \color{black}{6} & & \\ 1 & \color{black}{4} & & & \\ \color{black}{1} & & & & \end{array} \color{black}{\Rightarrow} \color{lightgray} \begin{array}{*{5}{r}} \color{black}{1} & 1 & 1 & 1 & \color{black}{1} \\ \color{black}{5} & 6 & 7 & \color{black}{8} & \\ \color{black}{11} & 17 & \color{black}{24} & & \\ \color{black}{15} & \color{black}{32} & & & \\ \color{black}{16} & & & & \end{array} \color{black}{\Rightarrow} \color{lightgray} \begin{array}{*{5}{r}} \color{black}{1} & 1 & 1 & 1 & 1 \\ \color{black}{9} & 10 & 11 & 12 & \\ \color{black}{33} & 43 & 54 & & \\ \color{black}{65} & 108 & & & \\ \color{black}{81} & & & & \end{array} \color{black} \end{equation}\]as seen in Figure \ref{eq:partial-sums-hpyotenuses}, where \((1,4,6,4,1)\) partially sums to \((1,5,11,15,16)\), and \((1,8,24,32,16)\) partially sums to \((1,9,33,65,81)\).
Having characterized how every Moessner triangle is constructed using Pascal’s rule, where the first row of a triangle is a sequence of \(1\)s and the first column is a partial sum parameterized over the binomial expansion, we are ready to formalize the characteristic function in Haskell.
Synthesizing the observations made in the previous section, we now present two
characteristic functions of Moessner’s sieve. First, we define a characteristic
function that is analogous to our existing binomialCoefficient
function,
moessnerEntry
, followed by a rotated version, rotatedMoessnerEntry
, defined
both as its own formalization and defined in terms of the first characteristic
function, moessnerEntry
, similar to what we did when we defined
rotatedBinomialCoefficient
.
In order to translate the informal description of the characteristic function
made in Section 2 into some Haskell code,
we first define it in an analogous way to the existing
binomialCoefficient
function. As such, we rotate the first two Moessner
triangles of Figure \ref{char-eq:sieve-rank-six-three-triangles} into a
Pascal-like configuration,
where we use the same row-and-entry indexing scheme, n
and k
, as in the case
of the binomialCoefficient
function,
Just like the binomialCoefficient
function, we have four combinations of n
and k
being either equal to 0
or greater than 0
, where the only difference
from the binomialCoefficient
function is the additional case where n > 0
and
k == 0
, corresponding to the first column of a rotated Moessner triangle as
discussed in the previous section. While we simply return 1
in the case of the
binomialCoefficient
function, we instead have to add the appropriate monomial
of the last row of the previous triangle. For example, in Figure
\ref{char-eq:moessner-triangles-pascal-like} the value 11
in the third row of
the second triangle is obtained by adding 5
, located immediately above it, and
the value 6
, located at the third entry of the last row of the previous
triangle. This is the exact same behavior as we saw in Figure
\ref{eq:partial-sums-hpyotenuses}, but for the rotated Moessner triangles.
Combining the logic of the four cases of n
and k
, yields the following
binomial coefficients-like characteristic function of the Pascal-like Moessner
triangle,
indexed using the row and column indices n
and k
, where r
denotes the rank
of the triangle and t
the triangle index. The monomial
function, used
in the new case of n > 0 && k == 0
, is defined as,
and computes a single monomial of the binomial expansion \({(1 + t)}^r\), when
given a rank, r
, a triangle index, t
, and an index, n
, of a monomial in
the expansion.
To illustrate this, we compute the fourth element from the right in the binomial expansion of Formula \ref{eq:binomial-expansion-example}, \(4 \cdot 2^3\), by letting \(r = 4\), \(t = 2\), and \(n = 3\), yielding the expected result:
Likewise, if we want to compute the third entry of the fourth row of the second
triangle in Figure \ref{char-eq:moessner-triangles-pascal-like}, having value
\(7\), we let \(r = 4\), \(t = 1\), \(n = 3\), and \(k = 2\), and pass them to
moessnerEntry
:
Thus, the above results demonstrate the correctness of our first formalization of a characteristic function of Moessner’s sieve.
Having defined a binomial coefficient-like characteristic function of Moessner’s sieve, we move on to define its rotated counterpart, which provides a more appropriate indexing scheme when describing the entries of the triangles generated by Moessner’s sieve.
In order to rotate the characteristic function moessnerEntry
in an analogous
fashion to binomialCoefficient
, we observe once again that
binomialCoefficient
and moessnerEntry
exhibit the same triangular structure,
which means the relation between moessnerEntry
and its rotated counterpart,
rotatedMoessnerEntry
, is identical to the existing relation between
binomialCoefficient
and rotatedBinomialCoefficient
,
Thus, we can simply define the rotated version of moessnerEntry
using the same
transformation as above,
where n
denotes the rank, t
the triangle index, r
the row index, and c
the column index.
Similarly, we can also define rotatedMoessnerEntry
by reusing the
formalization we lifted from the rotated Pascal’s triangle,
and observe that the only case that has changed is the case where c == 0
,
corresponding to the case n > 0 && k == 0
we discussed in the previous
section, i.e. the first column of each Moessner triangle. This brings us to the
following formalization of rotatedMoessnerEntry
,
which we can use to calculate the entries of the triangles in Figure \ref{char-eq:sieve-rank-six-three-triangles}, without first having to compute the whole prefix of Moessner’s sieve.
To illustrate the application of our final formalization, we want to calculate the entry, \(108\), located in the second column of the fourth row in the third triangle in Figure \ref{char-eq:sieve-rank-six-three-triangles}, which we do by passing the following values \(n = 4\), \(t = 2\), \(r = 3\), and \(c = 1\) to our characteristic function of Moessner’s sieve,
and obtain the expected result of \(108\).
Now that we have defined our two characteristic functions of Moessner’s sieve,
moessnerEntry
and rotatedMoessnerEntry
, we are ready to conclude this post.
In this post, we have introduced two characteristic functions of Moessner’s
sieve, moessnerEntry
and rotatedMoessnerEntry
, which computes the entries of
a given Moessner triangle without having to compute the prefix of Moessner’s
sieve.
The characteristic functions were derived by observing that every Moessner triangle behaves in a Pascal-like way combined with the fact that the values dropped in the traditional Moessner’s sieve enumerate the monomials of the binomial expansion.
This post was a small excerpt from my Master’s thesis, in which I also prove the correctness of the above characteristic functions, and use the characteristic function in my proof of Moessner’s idealized theorem.
]]>