Scala 中函数与方法的差别

本文主要举例介绍 Scala 中函数与方法的差别。

1. 引子

Spark 开发过程中,我们经常会定义一些 udf 函数,用于对 DataFrame 的列进行变换,例如:

1
2
3
val thres = 3.0
val log10Udf = udf { x: Long => math.min(math.log10(x + 2.0), thres) }
df.withColumn("newColumn", log10Udf($"oldColumn")).show

这里的 log10Udf 是个啥呢?我们去看 udf 的定义,会发现它返回的是一个 UserDefinedFunction 对象:

1
2
3
4
def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT]): UserDefinedFunction = {
val inputTypes = Try(ScalaReflection.schemaFor(typeTag[A1]).dataType :: Nil).toOption
UserDefinedFunction(f, ScalaReflection.schemaFor(typeTag[RT]).dataType, inputTypes)
}

可以确认一下:

1
2
3
4
5
scala> log10Udf.getClass
res1: Class[_ <: org.apache.spark.sql.expressions.UserDefinedFunction] = class org.apache.spark.sql.expressions.UserDefinedFunction

scala> log10Udf.isInstanceOf[java.lang.Object]
res2: Boolean = true

这里就引出了一个小小的疑问,我们平时 def 的东西也是个对象吗?例如我们尝试一下:

1
2
3
4
5
6
7
8
9
scala> def inc(x:Int) = x + 1
inc: (x: Int)Int

scala> inc.isInstanceOf[java.lang.Object]
<console>:13: error: missing argument list for method inc
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `inc _` or `inc(_)` instead of `inc`.
inc.isInstanceOf[java.lang.Object]
^

从上面的报错信息我们可以分析出来两点:

  1. def 的结果并不是一个对象,而是一个 method
  2. 可以通过 _ 操作符将一个 method 转成一个 function

确认一下:

1
2
3
4
5
scala> (inc _).isInstanceOf[java.lang.Object]
res3: Boolean = true

scala> (inc _).getClass
res4: Class[_ <: Int => Int] = class $$Lambda$1421/1772552470

2. method vs. function

一个很自然的问题就提出来了:methodfunction 有啥差别呢?可以去官方文档上查询关键字 function:

A function can be invoked with a list of arguments to produce a result. A function has a parameter list, a body, and a result type. Functions that are members of a class, trait, or singleton object are called methods. Functions defined inside other functions are called local functions. Functions with the result type of Unit are called procedures. Anonymous functions in source code are called function literals. At run time, function literals are instantiated into objects called function values.

简单的来讲,method 是定义在 class, trait 或者 object 里的 function,而匿名函数(Anonymous functions)会在运行时被实例化成 object。所谓的匿名函数就是没有名字的函数(-_-!),例如:(x: Int) => x + 2 就是一个匿名函数,我们可以将它赋值给一个变量:

1
val inc2 = (x: Int) => x + 2

把它写到一个类定义里,再尝试着编译一下,会得到这个匿名函数的定义:

1
public final class test$$anonfun$1 extends java.lang.Object implements scala.Function1,scala.ScalaObject

其中,值得关注的是这个匿名函数实现了 trait scala.Function1(实际上,scala 包里定义了 Function0 ~ Function22 表示接受 0 ~ 22 个参数的函数),它的代码如下:

1
2
3
4
5
6
7
8
9
10
@annotation.implicitNotFound(msg = "No implicit view available from ${T1} => ${R}.")
trait Function1[@specialized(scala.Int, scala.Long, scala.Float, scala.Double) -T1, @specialized(scala.Unit, scala.Boolean, scala.Int, scala.Float, scala.Long, scala.Double) +R] extends AnyRef { self =>
/** Apply the body of this function to the argument.
* @return the result of function application.
*/
def apply(v1: T1): R
@annotation.unspecialized def compose[A](g: A => T1): A => R = { x => apply(g(x)) }
@annotation.unspecialized def andThen[A](g: R => A): T1 => A = { x => g(apply(x)) }
override def toString() = "<function1>"
}

根据注释推测,这个匿名函数实际上就是定义了 apply 方法;当我们调用 inc2(3) 时,我们实际上是调用了这个对象的 apply 方法。确认一下:

1
2
3
4
5
val inc2 = (x: Int) => x + 2
val anonfun = new Function1[Int, Int] {
def apply(x: Int): Int = x + 2
}
assert(inc2(0) == anonfun(0))

嗯,终于理顺了 ~ 好了,我们再用这个例子串一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
scala> def inc(x: Int) = x + 1
inc: (x: Int)Int

scala> val inc2 = (x: Int) => x + 2
inc2: Int => Int = $$Lambda$1035/948692477@8f2e3e6

scala> val inc3 = inc _
inc3: Int => Int = $$Lambda$1042/1739111611@1bb96449

scala> inc2.toString
res0: String = $$Lambda$1035/948692477@8f2e3e6

scala> inc.toString
<console>:13: error: missing argument list for method inc
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `inc _` or `inc(_)` instead of `inc`.
inc.toString
^

scala> val inc4 = inc
<console>:12: error: missing argument list for method inc
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `inc _` or `inc(_)` instead of `inc`.
val inc4 = inc
^

3. Take-aways

这篇短文主要比较了之前容易忽略的两个概念:methodfunction 的联系和细微的差别:

  1. method 是定义在 class, trait 或者 object 里的 function
  2. 在运行时,匿名函数会被实例化成 object,因此可以把匿名函数赋值给一个变量,调用匿名函数实际上是调用这个对象的 apply 方法;
  3. method 本身只是一段代码,并不会被实例化,也不能对它进行赋值等操作,但是 method 通过 _ 操作符可以转换成一个 function 类型;