Java: The Interesting Parts
A friend of mine who studies data science has lately been asking me a lot about some concepts in programming languages and computer systems. Since he hasn’t studied much programming nor computers earlier, Python, as it was introduced to him in his data science courses, was the only language that he knew. Among the curious learners out there like my friend, there are probably people who have experience only with a dynamically-typed scripting language like Python, Ruby, or JavaScript. They’re missing out on some intellectually interesting concepts! Now, it wouldn’t be a surprise if such interesting concepts would be found in languages that are very different from Python, e.g. C or Haskell; however, even in a language like Java that has a reputation of being dull and mundane, there are nonetheless language features that are treasures for curious minds.
I’m going to assume you already know how to program using simple imperative programming constructs like variables and assignment, and that you know some basic object-oriented programming concepts like classes, objects, methods etc.
Concepts I’m going to discuss in this article:
- compilation
- types and subtypes
- generics
Compilation
Programming languages are categorised into compiled languages, and interpreted languages. When you run a Python program, you start a process to run the Python interpreter, while pointing the interpreter to your source file at the same time. Java, on the other hand, has the somewhat-unusual property of being both a compiled language and an interpreted language — because there are two steps. First, you run the Java compiler to compile the source code into JVM byte code; then, you run the Java VM bytecode interpreter, while pointing the interpreter to the bytecode we’ve just compiled.
Types and subtypes
In Java, the variables, instead of the values, have types. The type of the same variable will, by definition, remain constant throughout the program. We can’t assign a value to a variable if their types are not compatible; code wouldn’t compile.
If the value is of a type that is a subtype of the type of the variable, the assignment is legal.
Here, Animal
is called the “base” class or the “parent” class; Goat
is
called the “derived” class or the “child” class. We say a Goat “is-a” Animal.
We say that Goat
is a “subtype” of the type Animal
, and Animal
is a
“supertype” of Goat
. As we will see, when we focus on their behaviour when we
assign values around, we really care more about Animal
being a type than a
class.
There are two or three ways types can be put into in a subtype relationship:
In the above, C1
is a subtype of P1
, C2
is a subtype of P2
, and Goat
is a subtype of Animal
. As such, the following usage of the above types would
be valid:
Abstractly speaking, C
should be a subtype of P
if and only if every
instance of type C
could be used anywhere an instance of P
would. This idea
was first formulated as the Liskov substitution principle (LSP). People
nowadays just say “a C
is-a P
” in a more casual tone.
As a side note, notice that polymorphism is demonstrated by the call to
legCount
inside probablyNotABird
. Although the variable is of type Animal
,
the code that was executed really came from the class Goat
(hence 4
was
returned). Dynamic dispatch is another term (which contrasts
with “static dispatch”) that refers to how this decision of “which method’s code
to execute” is made. It’s “dynamic” because there are cases where it’s
impossible to know which method’s code is going to be executed until we run the
program (e.g. if the user enters 1
to choose a Goat
, and 2
to choose a
Bird
). This dispatching behaviour is what object-oriented programmers mean
when they are talking about “polymorphism”. Since in the world of programming
languages, there are other types of polymorphisms, the type
shown above is called subtype polymorphism to
distinguish itself from the other types of polymorphism, one of which I will
discuss in the next section on “generics”.
Generics
“Generics”, “generic types”, “parameterised types” are all talking about the
same concept related to syntax that looks like this: List<T>
.
Once upon a time, back in the Java 1.5 days if I recall correctly, Java had no
generics. A List
is just a List
, containing Object
s, like Python’s lists
do.
Since the definition and behaviour of a list doesn’t actually depend on the type of the elements it contains, and we’d like to enforce and know that some one type is the only type whose elements may be in a given list, generics was introduced to Java. It’s a great feature, and it’s used everywhere by now.
The idea is to not think of List
as a type, but rather something that can
produce a type, if you give it another type as a parameter. The syntax is
like List<T>
, where List
is called a “raw type”, List<T>
is called a
“parameterised type”, and T
is called a “type parameter”. When we want to
“instantiate a generic type” to get a parameterized type back, we pass a “type
argument” to the type parameter. I think the terms are unnecessarily
complicated, but at least the ideas are coherent. If you are interested in
rigorously using the correct terms, refer to Oracle’s Java tutorial on
generics.
The syntax for defining a generic type looks like this:
It’s also possible for not an entire class to be parameterised over a type, but instead only a single method inside a non-generic class. The syntax looks like this:
These two methods above are examples of parametric polymorphism, in the sense that these methods can deal with all kinds of lists: lists of integers, lists of strings, lists of animals etc. We can say that these methods are “polymorphic”. We might also say that generic types are also, in a similar idea to generic methods, “polymorphic”. Parametric polymorphism happens to be a very popular concept in functional programming, to the point where if a functional programmer was talking to you about “polymorphism”, parametric polymorphism is probably what they meant.
Notice that we won’t “lose precision” when using generics. The type that comes out is exactly the type that goes in, instead of some supertype or subtype of it.
If we have a type parameter that can’t be inferred from the type of the method arguments, Java will try using the return type, by inspecting the type of the variable we’re trying to assign the method’s return value to.
In case Java can’t infer the type parameter (be it due to the program’s ambiguity or the compiler’s incompetence), we will need to use a rather uncommon syntax to explicitly pass the type parameter. We can also choose to use the syntax even when it’s not necessary.
When subtyping meets generics
Interesting things happen when we use generics and subtypes together. For
convenience, let me use C < P
to denote that C
is a subtype of P
(think
“the child is smaller than the parent”).
Let me convince you that it is sensible for Java to reject the above programs. Let’s look at something subtle you could do if Java had let you compile the code above:
We clearly have gotten ourselves into an absurd situation. We know at
compile-time that goats
has clearly always been a list of Goat
s, with none
of that forceful and unsafe type-casting business involved. When we retrieved
the 0-th element from it though, we still ended up having something that has no
way of being a Goat
, since might actually be a Snake
!
To deal with these issues, the language feature in Java is called “bounded
generics”, which uses the syntax List<? extends T>
or List<? super T>
. Here
we can see the two keywords extends
and super
being given new usages on top
of their more traditional role in the class hierarchy. “Constrained generics”
and “bounded type parameters” are also recognisable ways to refer to the same
language feature. The term “bounded type parameters” is also mentioned in the
generics tutorial from Oracle that I mentioned earlier.
Bounded generics is based on a more rudimentary feature called “wildcard
generics” (also mentioned in the tutorial) which uses the more rudimentary
syntax List<?>
. With wildcard and bounded generics, we can do things like the
following:
The types Iterable and Consumer are not hypothetic.
There are a lot more intricasies about the intersection of subtype polymorphism and parameteric polymorphism. The two scary-sounding concepts in C#, “covariance and contravariance” is exactly there to deal with those intricasies.
That’s all!
The above is my pick of some of the most interesting features in Java, from a programming language point of view. As we can see, when you start looking into it, even a language that is “dull” and “boring” might have features that are quite intellectually sophisticated. I believe the features I’ve mentioned above simply have no analogue at all in the the world of dynamically-typed scripting languages like Python. Learning languages is great fun, and there’s usually something special and worth learning about in each language out there. I hope that this article has given you some new ideas to think about, and perhaps a bit of motivation and curiosity to look for other interesting features in various languages. Until next time!