Now please do not get upset. I am using the word “intelligent” the way computer scientists used the word some 60 decades ago. According to the late John McCarthy, the man who coined the words “artificial intelligence”, something is intelligent if the ‘thing’ has common sense.

We shall therefore say that a program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything of anything it is told and what it already know — John McCarthy.

The fact that in statistics we have to get a sample of size $n$ and it has to be large as large as we can afford with our resources, makes Statistics not so intelligent, if we follow McCarthy’s definition. The Central Limit Theorem(CTL) is the saving grace of Statistics, at least the classical one. In classical statisics, before we proceed to use it, we must first go behind the probability distribution $f(X)$ that governs the values of our random variable $X$ under study. This is not usually known, but CTL says, don’t lose hope, forget $X$ just get a lot of samples and consider $\bar{X}$, just get lots and lots of examples of $X$ and form its $\bar{X}$, at least $n >30$ because we know that $\bar{X}$ is going to be Normal. However, when someone has common sense, you do not have to give it tons and tons of data for it to know what you are saying, so by computer science standard, Statistics is not intelligent 😉

I think Cloud Based Machine Learning(CBML) is the way of the future. The reason I say this is because it combines the need for big data with machine learning processing. You have it all on one platform. The more I experiment using a CBML the more I appreciate the convenience it affords. The nice thing too is that there is a graphical workflow one can use to conceptualize the steps in your experiment. Very nice.

This is my own version of MAGA. To appear at the IJCCI 2018, Seville, Spain.

Last month I took the exercise of tracking the history of the term AI. In the process I found myself meandering towards some of the histories of maths that govern AI’s present state of the art.

Most scientists believe that mathematics is part of the Sciences. I do not hold such a view, rather, I do consider it really a part of Philosophy, so still part of Humanities. In going through the history of mathematical structures present in computer science, I am struck at the thought of how mathematics is a deeply human enterprise. I know this is obvious to some, and though this truth is somewhat present in a small way at the back of my mind, it is only now that I am dealing with it personally. This personal exercise brought this truth home to me in a profound way and I am delighted in that discovery.

I am amazed how the use of history can inform how one does his/her maths. In my examination, history can tell us what a mathematician or computer scientist went through. We can learn the trial and errors they went through and their character of tenacity in “keeping the faith”.  An example I can recommend worth studying is the struggle that Geoffrey Hinton went through on how he became the “grandfather of deep learning”. It was not only the story about the maths of neural network but also the story of how one’s passion and faith in an idea is benefitting society today.

Just playing around with two-layer Artificial Neural Networks (ANNs) wherein the hidden activation function is a sigmoid of the form found below

$\sigma(a_{j}) = \frac {1}{1 + e^{-a_{j}}}$

The final output will be of the form
$a_{k}= \sum_{j=1}^{M}w_{kj}^{(2)}\sigma(a_{j}) + w_{k0}^{(2)}$
where $a_{j}$ is the result of the first layer and has the form
$a_{j}= \sum_{i=1}^{D}w_{ji}^{(1)} + w_{j0}^{(1)}$

We will now show that there is an equivalent network that computes the same thing but using hyperbolic tangent tanh.

$\tanh(a) = \frac {e^{a}-e^{-a}}{e^{a} + e^{-a}}$ as the hidden function.
Proof:
It is known that $tanh(a) = 2 \sigma(2a) + 1$ see here
$\Rightarrow$
$tanh(a/2) = 2 \sigma(a) + 1 \Rightarrow \sigma(a) = \frac{1}{2} [tanh(\frac{a}{2}) - 1]$
We now use this form of $\sigma(a)$ in $a_{k}$ and we can see that the same can be computed using tanh. Also, looking at the plot of these functions, it is not hard to see that they are a bijection of each other. QED $\blacksquare$

Note: Notations and idea inspired by C. Bishop. Pattern Recognition and Machine Learning, 2006

I remember 20+ years ago. Twenty years ago, if you ever told anyone in IT, especially those in IT management that you were working in the field of AI, you would be given a skeptical look. They would not want to know you or what you do. Why? Because AI, 20 years ago was a failed promise. I remember an agent-oriented company in the USA even shied away from ever mentioning on their company website that they were using agent technology in their product, despite being a thoroughgoing supplier of an agent platform! AI used to be an irrelevant idea.

So after working for more than 20 years doing agent-related research, I sit in wonder and ask, what happened? Now, the AI label is being thrown around everywhere. Suddenly it is in vogue. People love it and praising it when 20 years ago, it was despised. What changed? Until you take a closer look…

When people talk about a product having “AI”, what they mean is that they have a product that can predict. Prediction is indeed a part of AI, but that is a very small part of it, it is not the sum of it all. In fact, it is not even the crucial part of it.

Will the real “AI” please stand up?

In the wider, classically and often cited AI book, the book of Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, lists 4 approaches to AI:

• Acting Humanly
• Thinking Humanly
• Thinking Rationally
• Acting Rationally

By the Russell-Norvig definition, something has AI if that thing can operate autonomously, perceive their environment, persist over a prolonged period of time, adapt to change, and create and pursue goals. In other words, Russell and Norvig, go for a rational agent. A rational agent acts in such a way that it achieves the best outcome or the best-expected outcome based on its goals. The ability to predict is a component of this rationality, definitely, but it is not the sum total of the agent’s being. By this definition then, having the ability to predict is not necessarily AI. Unless that predictive power lies inside an agent, then, only then we may say, there is AI there.

What I like about R is that it takes its inspiration from Scheme. Being a functional programming language, R indeed supports the idea of closures. What are they? This is my attempt to unravel the mystery about closures.

First what closures are not. Some writers have confused this notion with that of algebraic closures. Functional programming closures have nothing to do with algebraic closures. Don’t you believe it if  you get told it came from Group Theory. It does not. An algebraic closure is a property of an algebraic structure. Namely if $a, b \in G$ and $\bullet$ is an operation in $G$, then $a \bullet b \in G$. In other words the result of applying the operation on the members of $G$, is also a member in $G$. This says that $G$ is closed under the operation $\bullet$, whatever that operation may be.

Functional programming closures have something to do with how the language should handle free variables in relation to functions. Since functional programming takes its cue from lambda calculus, let us define what free variables are in this  computational system. It is best done by an example below:

Consider $\lambda x.yx$. In this expression, $y$ is free as it is not controlled by the lambda symbol, but $x$ is bound because $\lambda$ is enclosing that $x$, as we read from left to right. This is where the functional programming (FP) closures come from. FP closures tell us how it will handle those free variables when they occur in a function definition. The way this is explained in FP is in the area of environments and R have those too. Simply put when a function is defined, the system remembers that state of the environment where and when that function is defined. In other words, it remembers the declarations made inside the function such as names of variables and nested functions declared inside it.

The best way I think this is illustrated is in the way FP closures mimic classes in OOP languages. Look at this code.

make_bal<- function (){
val <- 0
bal <- function (method){
val <<- val + x
}
get_method <- function() {
val
}
if (method == "dep") {
}
else {
get_method
}
}
bal
}

In R, the label “function” acts as our lambda declaration. In line 2, val is a free variable because it is not defined as an argument of the function. In line 4, x is bound. It is defined as a parameter in the function definition of add_method. However val is not accessible outside make_bal because it is defined inside it and so we could say it is a  “closed” to the outside world. In line 5 we use the <<- operator similar to set! in Scheme. That operator will hunt for val up the environment chain and will get the first one it finds which is in line 2.

> account<-make_bal()
> account("get")()
[1] 0
> account("dep")(50)
> account("get")()
[1] 50
> account("dep")(50)
> account("get")()
[1] 100
> withdraw<-account("dep")
> withdraw(-50)
> account("get")()
[1] 50



In 1, we define a function from it, and call it account. In 2, we check that val is starting at 0. In 4, we made a deposit of 50. Each time we make a deposit, our balance increases as seen in line 8. Finally we can define a function called withdraw provided the amount we pass is always negative and as can be seen when we call it in line 13, the balance has been reduced by the amount we took out of our balance.

One can see that in an FP like R, the notion of classes will not be missed because closures handle that need.

As usually happens in computer science, computer scientists borrow notions from mathematics and this use of the term “closures” is one of them. Then also as it usually happens, it produces negative consequences, rather than clarify sometimes it confuses people in the process. It would have been better had computer scientists chosen a different term.

Credits: Andy Balaam’s Scheme Closure.