What I like about R is that it takes its inspiration from Scheme. Being a functional programming language, R indeed supports the idea of closures. What are they? This is my attempt to unravel the mystery about closures.

First what closures are not. Some writers have confused this notion with that of algebraic closures. Functional programming closures have nothing to do with algebraic closures. Don’t you believe it if  you get told it came from Group Theory. It does not. An algebraic closure is a property of an algebraic structure. Namely if $a, b \in G$ and $\bullet$ is an operation in $G$, then $a \bullet b \in G$. In other words the result of applying the operation on the members of $G$, is also a member in $G$. This says that $G$ is closed under the operation $\bullet$, whatever that operation may be.

Functional programming closures have something to do with how the language should handle free variables in relation to functions. Since functional programming takes its cue from lambda calculus, let us define what free variables are in this  computational system. It is best done by an example below:

Consider $\lambda x.yx$. In this expression, $y$ is free as it is not controlled by the lambda symbol, but $x$ is bound because $\lambda$ is enclosing that $x$, as we read from left to right. This is where the functional programming (FP) closures come from. FP closures tell us how it will handle those free variables when they occur in a function definition. The way this is explained in FP is in the area of environments and R have those too. Simply put when a function is defined, the system remembers that state of the environment where and when that function is defined. In other words, it remembers the declarations made inside the function such as names of variables and nested functions declared inside it.

The best way I think this is illustrated is in the way FP closures mimic classes in OOP languages. Look at this code.

make_bal<- function (){
val <- 0
bal <- function (method){
val <<- val + x
}
get_method <- function() {
val
}
if (method == "dep") {
}
else {
get_method
}
}
bal
}

In R, the label “function” acts as our lambda declaration. In line 2, val is a free variable because it is not defined as an argument of the function. In line 4, x is bound. It is defined as a parameter in the function definition of add_method. However val is not accessible outside make_bal because it is defined inside it and so we could say it is a  “closed” to the outside world. In line 5 we use the <<- operator similar to set! in Scheme. That operator will hunt for val up the environment chain and will get the first one it finds which is in line 2.

> account<-make_bal()
> account("get")()
[1] 0
> account("dep")(50)
> account("get")()
[1] 50
> account("dep")(50)
> account("get")()
[1] 100
> withdraw<-account("dep")
> withdraw(-50)
> account("get")()
[1] 50



In 1, we define a function from it, and call it account. In 2, we check that val is starting at 0. In 4, we made a deposit of 50. Each time we make a deposit, our balance increases as seen in line 8. Finally we can define a function called withdraw provided the amount we pass is always negative and as can be seen when we call it in line 13, the balance has been reduced by the amount we took out of our balance.

One can see that in an FP like R, the notion of classes will not be missed because closures handle that need.

As usually happens in computer science, computer scientists borrow notions from mathematics and this use of the term “closures” is one of them. Then also as it usually happens, it produces negative consequences, rather than clarify sometimes it confuses people in the process. It would have been better had computer scientists chosen a different term.

Credits: Andy Balaam’s Scheme Closure.

I am one of those who often need to flip between Excel or R depending on the needs of the client. Calling Excel in R is not that trivial so here are some notes on how to do that. So let’s say that you have an Excel VBA program that you want to take over after R has done its job.
You use the system2 command in R like so
> system2(“/Applications/Microsoft\ Excel.app/Contents/MacOS/Microsoft\ Excel”, c(“/Users/Extranosky/Temp/VehicleRepair.xlsm”))
The second argument should not have spaces in its file or directory name. I have not had success in passing a file name to be opened by Excel in this platform that had spaces in its directory.

This command will call up Excel open up the file as an argument and hopefully you got an open event in that VBA so things can be done seamlessly and automatically.

Symbols of infinity from Wikipaedia

In this post I will show an example of how mathematics can be very mysterious. I showed this to what might be classified as “first year in maths” students and they came out of class perplexed, as if their mind had to hit the reset button and reboot.

Theorem: Let $\mathbf{N}$ be the set of natural numbers and $\mathbf{Z}$ the set of integers. We have $\mathbf{N} \subseteq \mathbf{Z}$ and $|\mathbf{N}| = |\mathbf{Z}|$.

This means that though $\mathbf{N}$ is a subset of $\mathbf{Z}$, the size of $\mathbf{N}$ is equal to the size of $\mathbf{Z}$. In technical language the cardinality of the natural numbers is equal to the cardinality of the integers, What duh?

Proof:

a.) $\mathbf{N} \subseteq \mathbf{Z}$, this is trivial because the natural numbers $\mathbf{N}$ are just the positive integers in $\mathbf{Z}$. So every element of $\mathbf{N}$ is found in $\mathbf{Z}$.

b.) In order to show that the size of the two sets are equal we need to establish a bijective function from one set to the other. That is a function which is both surjective and injective. Another way of saying this is to say that we need a function that is onto and at the same time one-to-one from one set to the other. Obtaining such a function proves that the size of both sets are the same.

We will just get the one suggested by wikipaedia: We let
$f : \mathbf{Z} \rightarrow \mathbf{N}$ with $f(x) = 2|x|$ when and only when $x \in \mathbf{Z}$ and $x< 0$,

else $f(x) = 2x + 1$ when and only when $x \geq 0$.

1.) $f$ is one-to-one, i.e. injective. Let $f(x)= f(y)$, then we have two cases, either $2|x| = 2|y|$ or $2x+1 = 2y+1$. The first case we have $\Rightarrow |x| = |y|$, $\Rightarrow x,y < 0, \Rightarrow x = y$. On the other hand, if the second is the case then again this $\Rightarrow 2x+1=2y+1 \Rightarrow x = y$.

2.) $f$ is onto, i.e. surjective. Let $b \in \mathbf{N}$ then $b$ is positive and either even or odd. If even and positive, then in general $\Rightarrow b = 2|a|$ for some integer $a$,(the property of even numbers). Since $a$ is an integer, then it is an integer in $\mathbf{Z}$ and $|a| = b/2$ so that $f(a) = 2(b/2)=b$ and choose $a < 0$. On the other hand, if $b$ is positive and odd, then $b = 2a + 1$ for some integer $a$ (property of odd numbers) $\Rightarrow a = (b - 1)/2$ and $a$ has to be positive since $b$ is positive, i.e., $a \geq 0$ and so we have $f(a) = 2(b -1)/2+ 1 = b$. So in both cases we have seen that for every $b \in \mathbf{N}$ , we have found a matching $a \in \mathbf{Z}$. $\blacksquare$

How can a subset have the same size as it's superset? If this does not boggle your mind, perhaps you missed the point. The reason for this is that we have here two infinite sets, and this mystery only happens when infinity is involved. Now some mathematicians are not happy with this that is why they do not believe in infinite sets. It seems infinity is just a concept that has no matching physical reality and we can be indifferent with it. I suggest the concept of infinity is a metaphysical concept. So can one can reason that it is a concept that exists just in our mind and is not "real". Just like unicorns or fairies? I do not think so. There is no reason for us to believe in unicorns and in fact not all civilisations believe in the mythical horse. However the concept of infinity is different. It is because the concept per se is a necessity. The mind requires it when presented with the nature of numbers. For is it not true that the set of natural numbers is infinite? We can conceive it and by force of nature admit it. It is a necessary truth so in that sense infinite sets are real and transcends material physicality.

Have you ever encountered mathematicians who do not believe in “real numbers”? Well there are some, mainly those who come form a computer science ideology. I am starting to understand why they do not think real numbers are real or useful as a concept. Firstly what do we mean by a real number? It comes from looking at the number line as a continuum. It is treating the number line consisting of infinite number of points. For example the numbers between 0 and 1 – there are infinite “real numbers” there.

Take an example of a so called real number $\pi$. It is written as 3.141592653589793… Now notice the ellipses in the number. They are there to say that the decimals after the last 3 as printed is infinitely long. So people think that a real number can be represented by those dot dot dot and so the real numbers have unending decimal series. In actuality the symbol $\pi$ is the limit of that series of decimals once considered.

Now I can appreciate why A/Prof. N. Wildberger insists the need for something to be written down, and we will explain the reason why later. If you think for a moment, we are not capable of writing a real number down. Those dot dot dots are a semantic idea to signify to us that the digits following goes into an infinite series. That is not really writing a number down. Why do we need to be able to write something down with finality?

Well it is because we can put the process of writing into an algorithm. We can put it into a function. So imagine again $\pi$. The fact that we can not write the number down with completion means we can not put that generation of the numbers into an algorithm that will stop. It can not and we won’t let it stop precisely because the number of decimals in the tail end is infinite. So an algorithm that goes into an infinite loop, making it useless. Since  we can not locate precisely where $\pi$ is in the continuum line we can not even have a function to compute it. In a sense the digits following are not decidable.

From a computer science point of view, the algorithm must terminate and if it does not, then the function is undefined at that point. The problem stems from the idea of infinity of points present in the number line. Yet in practice we can not really even locate real numbers in the number line. You can have the function stop at the 100th position of $\pi$ but that is not $\pi$ itself. This is the best we can do but that number is not exactly $\pi$ rather it is “something like or close to $\pi$“.

So real numbers are unreal, man.

This year I began doing private maths tutoring and I have been learning a lot about the deficiencies in mathematics education that are encountered by our high school students. I am very skeptical about this “new maths” approach. For one thing, the students are not taught to use pen and paper to write out their reasoning and calculation. For another, they make the student rely heavily on intuition. Sometimes intuition helps but other times, intuition can mislead.

Let me illustrate this problem, not original to me.

Assume we have 3 cards with two faces. One card is colored black on both sides, the other is colored white on either side, and the last has black on one side and white on the other. Let us drop the cards in a hat and then choose a card, and then when we get a card,  we choose a side to see at random too. Question: If the side we see is black, what is the probability the other side is black also?  Did you answer 1/2? Your intuition has misled you. You probably thought, by this data we can dismiss the possibility of the card with both white colors (the second card definition)  and just deal with the first and last card. This is not the true situation.

Here is our analysis. The sample space $\Omega = \{ BB, WW,BW \}$ describes the possible color combination of our cards, e.g. BB means one side is black and the other side is black also, etc.  Let $\beta_s =$  “the side we see is black”, $\beta_o =$ “the other side is black”.

So the situation is asking what is $P(\beta_o | \beta_s)$?

$= \frac {P(\beta_o \cap \beta_s)} {P(\beta_s)}= \frac {1/3} {3/6}= 2/3$

$P(\beta_s)$ actually has 3 ways of getting  a black out of 6 ways of getting a face. Then also $P(\beta_o \cap \beta_s)$ is tantamount to getting the first card in our description which is 1 out of 3.

So the moral of the story is that intuition can not be a substitute for formalism. Formalism actually yields a more accurate result. Our intuition is trumped by the formal analysis, which is a better way of approaching the problem.

It is a common question asked in data science or data analysis forums if one should use Python or R one’s data work. So far, I myself have managed not to learn Python. I have managed to ward off the urge to do so. Now I have learned plenty of programming languages and have actual work experience in the following: C,C++,Java,Perl, PHP, Tcl, VBA. If I look further back, I should mention COBOL, Fortran, Assembler, Algol and BPL – ancient Burroughs programming language based on Algol hence, BPL. In fact, I should name Scheme/Lisp and Ocaml (see older posts) as one of the languages I can code and program. Currently, I am playing around Clojure . I can really learn Python if I wanted to. However, I don’t.

Oh please, not another language to learn!

Why? Because for statistical type of work R is enough, yes, I can even use R for data cleansing and munging, where Python could probably help. However for that type of task, R has so many functions I can avail of without touching Python. Anyway, which one is close to statistics? Python or R? It is R and if I want to do any general purpose computation I can do it all in R because of those functions. Lastly, the nice thing is that R takes some of those functional programming insights into its philosophy, it took its inspiration from Scheme.

You can get a copy of this textbook here.