The following Scala Easter egg is the one of the most dangerous "features" of the language.
I know there might be flaming involved, but be sure its not my intention. I like Scala a lot and I'm happily using it in production code. This post is a result of the following discussions: Seq repeated execution unexpected behavior and Strange behavior of map function depending on first argument
The following code is using a method that returns a Seq[Int] containing random numbers and acts on them (prints the first one).
object SeqTest {
def main(args: Array[String]) {
val randomInts : Seq[Int] = mkRandomInts()
println(randomInts.first)
println(randomInts.first)
val randomIntsList = randomInts.toList
println(randomIntsList.first)
println(randomIntsList.first)
def mkRandomInts() = {
val randInts = for {
i <- 1 to 3
val rand = i + (new Random).nextInt
} yield rand
randInts
}
}As you can see from the output blow the sequence is reevaluated each time it is accessed, returning different random number. Once the code transforms the sequence into a List the results are stable.
-1867060800
312920158
-133186413
-133186413
Decompiling the code shows that mkRandomInts returns a
scala.RandomAccessSeq.Projection which is a result of the
Range we created using
i <- 1 to 3
As Jorge’s explained it:
the "gotcha" is that, because of the way Scala's collections work, Range's laziness is "contagious" when you use functional for-comprehensions (for ... yield ...) and a Range is the first thing in the comprehension.I'm not sure the Scaladoc explains that so well.
Martin and others explained that it comes to conserve memory since we don’t really want to have a list in size of 100000 when doing
(0 to 100000). This is all nice and true, but it is not a good reason to have a repeated execution each time the code access a data structure derived from iteration on a range.
The example above is not about clean/nice/efficient code its about the
principle of least surprise (POLS). This behavior will cause (in my case did cause) very hard to track bugs. It is also an undocumented behavior, in spite of it existing in a very common pattern equivalent to java’s
for (int i = 0; i < 100000; i++)
The implicit side effects of such optimization may be unacceptable. For example, the code executed in the for loop might change state in DB or file system or be CPU intensive. In the latter case, it would be very hard to understand why the application is so slow when repeatedly accessing elements of a sequence.
If Seq would have force() on it then a protective act would know to call it in such cases, but it does not have it (only RandomAccessSeq.Projection got it).
For example, every java InputStream has close() and java programmers are accustomed to close any stream they use in a finally "just in case", even if in some of them (e.g. StringBufferInputStream) close does nothing. But if we'll educate programmers to force a Seq anywhere we see it we create more boilerplate mess and we don’t want it in Scala :-)
Having such case forces the programmer to know about it and be ready => more potential bugs that
will happen => learning curve is even higher.
To conclude, the problem is two fold:
First calling a chunk of code that (without explicit instruction) gets executed only when its derived output is accessed.
Second even if lazy is cool and expected, repeated execution is not. We have lazy data structures all around but they usually cache the data once they fetch it or in the case of lazy iterators (like in jdbc), you need to explicitly recreate them.