| 1 | // Copyright 2013 The Go Authors. All rights reserved. |
|---|---|
| 2 | // Use of this source code is governed by a BSD-style |
| 3 | // license that can be found in the LICENSE file. |
| 4 | |
| 5 | /* |
| 6 | Package pointer implements Andersen's analysis, an inclusion-based |
| 7 | pointer analysis algorithm first described in (Andersen, 1994). |
| 8 | |
| 9 | A pointer analysis relates every pointer expression in a whole program |
| 10 | to the set of memory locations to which it might point. This |
| 11 | information can be used to construct a call graph of the program that |
| 12 | precisely represents the destinations of dynamic function and method |
| 13 | calls. It can also be used to determine, for example, which pairs of |
| 14 | channel operations operate on the same channel. |
| 15 | |
| 16 | The package allows the client to request a set of expressions of |
| 17 | interest for which the points-to information will be returned once the |
| 18 | analysis is complete. In addition, the client may request that a |
| 19 | callgraph is constructed. The example program in example_test.go |
| 20 | demonstrates both of these features. Clients should not request more |
| 21 | information than they need since it may increase the cost of the |
| 22 | analysis significantly. |
| 23 | |
| 24 | # CLASSIFICATION |
| 25 | |
| 26 | Our algorithm is INCLUSION-BASED: the points-to sets for x and y will |
| 27 | be related by pts(y) ⊇ pts(x) if the program contains the statement |
| 28 | y = x. |
| 29 | |
| 30 | It is FLOW-INSENSITIVE: it ignores all control flow constructs and the |
| 31 | order of statements in a program. It is therefore a "MAY ALIAS" |
| 32 | analysis: its facts are of the form "P may/may not point to L", |
| 33 | not "P must point to L". |
| 34 | |
| 35 | It is FIELD-SENSITIVE: it builds separate points-to sets for distinct |
| 36 | fields, such as x and y in struct { x, y *int }. |
| 37 | |
| 38 | It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once, |
| 39 | so values can flow in at one call to the function and return out at |
| 40 | another. Only some smaller functions are analyzed with consideration |
| 41 | of their calling context. |
| 42 | |
| 43 | It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation |
| 44 | site and context, so the objects returned by two distinct calls to f: |
| 45 | |
| 46 | func f() *T { return new(T) } |
| 47 | |
| 48 | are distinguished up to the limits of the calling context. |
| 49 | |
| 50 | It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the |
| 51 | complete Go program and summaries for native code. |
| 52 | |
| 53 | See the (Hind, PASTE'01) survey paper for an explanation of these terms. |
| 54 | |
| 55 | # SOUNDNESS |
| 56 | |
| 57 | The analysis is fully sound when invoked on pure Go programs that do not |
| 58 | use reflection or unsafe.Pointer conversions. In other words, if there |
| 59 | is any possible execution of the program in which pointer P may point to |
| 60 | object O, the analysis will report that fact. |
| 61 | |
| 62 | # REFLECTION |
| 63 | |
| 64 | By default, the "reflect" library is ignored by the analysis, as if all |
| 65 | its functions were no-ops, but if the client enables the Reflection flag, |
| 66 | the analysis will make a reasonable attempt to model the effects of |
| 67 | calls into this library. However, this comes at a significant |
| 68 | performance cost, and not all features of that library are yet |
| 69 | implemented. In addition, some simplifying approximations must be made |
| 70 | to ensure that the analysis terminates; for example, reflection can be |
| 71 | used to construct an infinite set of types and values of those types, |
| 72 | but the analysis arbitrarily bounds the depth of such types. |
| 73 | |
| 74 | Most but not all reflection operations are supported. |
| 75 | In particular, addressable reflect.Values are not yet implemented, so |
| 76 | operations such as (reflect.Value).Set have no analytic effect. |
| 77 | |
| 78 | # UNSAFE POINTER CONVERSIONS |
| 79 | |
| 80 | The pointer analysis makes no attempt to understand aliasing between the |
| 81 | operand x and result y of an unsafe.Pointer conversion: |
| 82 | |
| 83 | y = (*T)(unsafe.Pointer(x)) |
| 84 | |
| 85 | It is as if the conversion allocated an entirely new object: |
| 86 | |
| 87 | y = new(T) |
| 88 | |
| 89 | # NATIVE CODE |
| 90 | |
| 91 | The analysis cannot model the aliasing effects of functions written in |
| 92 | languages other than Go, such as runtime intrinsics in C or assembly, or |
| 93 | code accessed via cgo. The result is as if such functions are no-ops. |
| 94 | However, various important intrinsics are understood by the analysis, |
| 95 | along with built-ins such as append. |
| 96 | |
| 97 | The analysis currently provides no way for users to specify the aliasing |
| 98 | effects of native code. |
| 99 | |
| 100 | ------------------------------------------------------------------------ |
| 101 | |
| 102 | # IMPLEMENTATION |
| 103 | |
| 104 | The remaining documentation is intended for package maintainers and |
| 105 | pointer analysis specialists. Maintainers should have a solid |
| 106 | understanding of the referenced papers (especially those by H&L and PKH) |
| 107 | before making making significant changes. |
| 108 | |
| 109 | The implementation is similar to that described in (Pearce et al, |
| 110 | PASTE'04). Unlike many algorithms which interleave constraint |
| 111 | generation and solving, constructing the callgraph as they go, this |
| 112 | implementation for the most part observes a phase ordering (generation |
| 113 | before solving), with only simple (copy) constraints being generated |
| 114 | during solving. (The exception is reflection, which creates various |
| 115 | constraints during solving as new types flow to reflect.Value |
| 116 | operations.) This improves the traction of presolver optimisations, |
| 117 | but imposes certain restrictions, e.g. potential context sensitivity |
| 118 | is limited since all variants must be created a priori. |
| 119 | |
| 120 | # TERMINOLOGY |
| 121 | |
| 122 | A type is said to be "pointer-like" if it is a reference to an object. |
| 123 | Pointer-like types include pointers and also interfaces, maps, channels, |
| 124 | functions and slices. |
| 125 | |
| 126 | We occasionally use C's x->f notation to distinguish the case where x |
| 127 | is a struct pointer from x.f where is a struct value. |
| 128 | |
| 129 | Pointer analysis literature (and our comments) often uses the notation |
| 130 | dst=*src+offset to mean something different than what it means in Go. |
| 131 | It means: for each node index p in pts(src), the node index p+offset is |
| 132 | in pts(dst). Similarly *dst+offset=src is used for store constraints |
| 133 | and dst=src+offset for offset-address constraints. |
| 134 | |
| 135 | # NODES |
| 136 | |
| 137 | Nodes are the key datastructure of the analysis, and have a dual role: |
| 138 | they represent both constraint variables (equivalence classes of |
| 139 | pointers) and members of points-to sets (things that can be pointed |
| 140 | at, i.e. "labels"). |
| 141 | |
| 142 | Nodes are naturally numbered. The numbering enables compact |
| 143 | representations of sets of nodes such as bitvectors (or BDDs); and the |
| 144 | ordering enables a very cheap way to group related nodes together. For |
| 145 | example, passing n parameters consists of generating n parallel |
| 146 | constraints from caller+i to callee+i for 0<=i<n. |
| 147 | |
| 148 | The zero nodeid means "not a pointer". For simplicity, we generate flow |
| 149 | constraints even for non-pointer types such as int. The pointer |
| 150 | equivalence (PE) presolver optimization detects which variables cannot |
| 151 | point to anything; this includes not only all variables of non-pointer |
| 152 | types (such as int) but also variables of pointer-like types if they are |
| 153 | always nil, or are parameters to a function that is never called. |
| 154 | |
| 155 | Each node represents a scalar part of a value or object. |
| 156 | Aggregate types (structs, tuples, arrays) are recursively flattened |
| 157 | out into a sequential list of scalar component types, and all the |
| 158 | elements of an array are represented by a single node. (The |
| 159 | flattening of a basic type is a list containing a single node.) |
| 160 | |
| 161 | Nodes are connected into a graph with various kinds of labelled edges: |
| 162 | simple edges (or copy constraints) represent value flow. Complex |
| 163 | edges (load, store, etc) trigger the creation of new simple edges |
| 164 | during the solving phase. |
| 165 | |
| 166 | # OBJECTS |
| 167 | |
| 168 | Conceptually, an "object" is a contiguous sequence of nodes denoting |
| 169 | an addressable location: something that a pointer can point to. The |
| 170 | first node of an object has a non-nil obj field containing information |
| 171 | about the allocation: its size, context, and ssa.Value. |
| 172 | |
| 173 | Objects include: |
| 174 | - functions and globals; |
| 175 | - variable allocations in the stack frame or heap; |
| 176 | - maps, channels and slices created by calls to make(); |
| 177 | - allocations to construct an interface; |
| 178 | - allocations caused by conversions, e.g. []byte(str). |
| 179 | - arrays allocated by calls to append(); |
| 180 | |
| 181 | Many objects have no Go types. For example, the func, map and chan type |
| 182 | kinds in Go are all varieties of pointers, but their respective objects |
| 183 | are actual functions (executable code), maps (hash tables), and channels |
| 184 | (synchronized queues). Given the way we model interfaces, they too are |
| 185 | pointers to "tagged" objects with no Go type. And an *ssa.Global denotes |
| 186 | the address of a global variable, but the object for a Global is the |
| 187 | actual data. So, the types of an ssa.Value that creates an object is |
| 188 | "off by one indirection": a pointer to the object. |
| 189 | |
| 190 | The individual nodes of an object are sometimes referred to as "labels". |
| 191 | |
| 192 | For uniformity, all objects have a non-zero number of fields, even those |
| 193 | of the empty type struct{}. (All arrays are treated as if of length 1, |
| 194 | so there are no empty arrays. The empty tuple is never address-taken, |
| 195 | so is never an object.) |
| 196 | |
| 197 | # TAGGED OBJECTS |
| 198 | |
| 199 | An tagged object has the following layout: |
| 200 | |
| 201 | T -- obj.flags ⊇ {otTagged} |
| 202 | v |
| 203 | ... |
| 204 | |
| 205 | The T node's typ field is the dynamic type of the "payload": the value |
| 206 | v which follows, flattened out. The T node's obj has the otTagged |
| 207 | flag. |
| 208 | |
| 209 | Tagged objects are needed when generalizing across types: interfaces, |
| 210 | reflect.Values, reflect.Types. Each of these three types is modelled |
| 211 | as a pointer that exclusively points to tagged objects. |
| 212 | |
| 213 | Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that |
| 214 | the value v is not of type T but *T; this is used only for |
| 215 | reflect.Values that represent lvalues. (These are not implemented yet.) |
| 216 | |
| 217 | # ANALYSIS ABSTRACTION OF EACH TYPE |
| 218 | |
| 219 | Variables of the following "scalar" types may be represented by a |
| 220 | single node: basic types, pointers, channels, maps, slices, 'func' |
| 221 | pointers, interfaces. |
| 222 | |
| 223 | Pointers: |
| 224 | |
| 225 | Nothing to say here, oddly. |
| 226 | |
| 227 | Basic types (bool, string, numbers, unsafe.Pointer): |
| 228 | |
| 229 | Currently all fields in the flattening of a type, including |
| 230 | non-pointer basic types such as int, are represented in objects and |
| 231 | values. Though non-pointer nodes within values are uninteresting, |
| 232 | non-pointer nodes in objects may be useful (if address-taken) |
| 233 | because they permit the analysis to deduce, in this example, |
| 234 | |
| 235 | var s struct{ ...; x int; ... } |
| 236 | p := &s.x |
| 237 | |
| 238 | that p points to s.x. If we ignored such object fields, we could only |
| 239 | say that p points somewhere within s. |
| 240 | |
| 241 | All other basic types are ignored. Expressions of these types have |
| 242 | zero nodeid, and fields of these types within aggregate other types |
| 243 | are omitted. |
| 244 | |
| 245 | unsafe.Pointers are not modelled as pointers, so a conversion of an |
| 246 | unsafe.Pointer to *T is (unsoundly) treated equivalent to new(T). |
| 247 | |
| 248 | Channels: |
| 249 | |
| 250 | An expression of type 'chan T' is a kind of pointer that points |
| 251 | exclusively to channel objects, i.e. objects created by MakeChan (or |
| 252 | reflection). |
| 253 | |
| 254 | 'chan T' is treated like *T. |
| 255 | *ssa.MakeChan is treated as equivalent to new(T). |
| 256 | *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store |
| 257 | |
| 258 | and load. |
| 259 | |
| 260 | Maps: |
| 261 | |
| 262 | An expression of type 'map[K]V' is a kind of pointer that points |
| 263 | exclusively to map objects, i.e. objects created by MakeMap (or |
| 264 | reflection). |
| 265 | |
| 266 | map K[V] is treated like *M where M = struct{k K; v V}. |
| 267 | *ssa.MakeMap is equivalent to new(M). |
| 268 | *ssa.MapUpdate is equivalent to *y=x where *y and x have type M. |
| 269 | *ssa.Lookup is equivalent to y=x.v where x has type *M. |
| 270 | |
| 271 | Slices: |
| 272 | |
| 273 | A slice []T, which dynamically resembles a struct{array *T, len, cap int}, |
| 274 | is treated as if it were just a *T pointer; the len and cap fields are |
| 275 | ignored. |
| 276 | |
| 277 | *ssa.MakeSlice is treated like new([1]T): an allocation of a |
| 278 | |
| 279 | singleton array. |
| 280 | |
| 281 | *ssa.Index on a slice is equivalent to a load. |
| 282 | *ssa.IndexAddr on a slice returns the address of the sole element of the |
| 283 | slice, i.e. the same address. |
| 284 | *ssa.Slice is treated as a simple copy. |
| 285 | |
| 286 | Functions: |
| 287 | |
| 288 | An expression of type 'func...' is a kind of pointer that points |
| 289 | exclusively to function objects. |
| 290 | |
| 291 | A function object has the following layout: |
| 292 | |
| 293 | identity -- typ:*types.Signature; obj.flags ⊇ {otFunction} |
| 294 | params_0 -- (the receiver, if a method) |
| 295 | ... |
| 296 | params_n-1 |
| 297 | results_0 |
| 298 | ... |
| 299 | results_m-1 |
| 300 | |
| 301 | There may be multiple function objects for the same *ssa.Function |
| 302 | due to context-sensitive treatment of some functions. |
| 303 | |
| 304 | The first node is the function's identity node. |
| 305 | Associated with every callsite is a special "targets" variable, |
| 306 | whose pts() contains the identity node of each function to which |
| 307 | the call may dispatch. Identity words are not otherwise used during |
| 308 | the analysis, but we construct the call graph from the pts() |
| 309 | solution for such nodes. |
| 310 | |
| 311 | The following block of contiguous nodes represents the flattened-out |
| 312 | types of the parameters ("P-block") and results ("R-block") of the |
| 313 | function object. |
| 314 | |
| 315 | The treatment of free variables of closures (*ssa.FreeVar) is like |
| 316 | that of global variables; it is not context-sensitive. |
| 317 | *ssa.MakeClosure instructions create copy edges to Captures. |
| 318 | |
| 319 | A Go value of type 'func' (i.e. a pointer to one or more functions) |
| 320 | is a pointer whose pts() contains function objects. The valueNode() |
| 321 | for an *ssa.Function returns a singleton for that function. |
| 322 | |
| 323 | Interfaces: |
| 324 | |
| 325 | An expression of type 'interface{...}' is a kind of pointer that |
| 326 | points exclusively to tagged objects. All tagged objects pointed to |
| 327 | by an interface are direct (the otIndirect flag is clear) and |
| 328 | concrete (the tag type T is not itself an interface type). The |
| 329 | associated ssa.Value for an interface's tagged objects may be an |
| 330 | *ssa.MakeInterface instruction, or nil if the tagged object was |
| 331 | created by an instrinsic (e.g. reflection). |
| 332 | |
| 333 | Constructing an interface value causes generation of constraints for |
| 334 | all of the concrete type's methods; we can't tell a priori which |
| 335 | ones may be called. |
| 336 | |
| 337 | TypeAssert y = x.(T) is implemented by a dynamic constraint |
| 338 | triggered by each tagged object O added to pts(x): a typeFilter |
| 339 | constraint if T is an interface type, or an untag constraint if T is |
| 340 | a concrete type. A typeFilter tests whether O.typ implements T; if |
| 341 | so, O is added to pts(y). An untagFilter tests whether O.typ is |
| 342 | assignable to T,and if so, a copy edge O.v -> y is added. |
| 343 | |
| 344 | ChangeInterface is a simple copy because the representation of |
| 345 | tagged objects is independent of the interface type (in contrast |
| 346 | to the "method tables" approach used by the gc runtime). |
| 347 | |
| 348 | y := Invoke x.m(...) is implemented by allocating contiguous P/R |
| 349 | blocks for the callsite and adding a dynamic rule triggered by each |
| 350 | tagged object added to pts(x). The rule adds param/results copy |
| 351 | edges to/from each discovered concrete method. |
| 352 | |
| 353 | (Q. Why do we model an interface as a pointer to a pair of type and |
| 354 | value, rather than as a pair of a pointer to type and a pointer to |
| 355 | value? |
| 356 | A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2}, |
| 357 | {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and |
| 358 | type-unsafe combination (T1,V2). Treating the value and its concrete |
| 359 | type as inseparable makes the analysis type-safe.) |
| 360 | |
| 361 | Type parameters: |
| 362 | |
| 363 | Type parameters are not directly supported by the analysis. |
| 364 | Calls to generic functions will be left as if they had empty bodies. |
| 365 | Users of the package are expected to use the ssa.InstantiateGenerics |
| 366 | builder mode when building code that uses or depends on code |
| 367 | containing generics. |
| 368 | |
| 369 | reflect.Value: |
| 370 | |
| 371 | A reflect.Value is modelled very similar to an interface{}, i.e. as |
| 372 | a pointer exclusively to tagged objects, but with two generalizations. |
| 373 | |
| 374 | 1. a reflect.Value that represents an lvalue points to an indirect |
| 375 | (obj.flags ⊇ {otIndirect}) tagged object, which has a similar |
| 376 | layout to an tagged object except that the value is a pointer to |
| 377 | the dynamic type. Indirect tagged objects preserve the correct |
| 378 | aliasing so that mutations made by (reflect.Value).Set can be |
| 379 | observed. |
| 380 | |
| 381 | Indirect objects only arise when an lvalue is derived from an |
| 382 | rvalue by indirection, e.g. the following code: |
| 383 | |
| 384 | type S struct { X T } |
| 385 | var s S |
| 386 | var i interface{} = &s // i points to a *S-tagged object (from MakeInterface) |
| 387 | v1 := reflect.ValueOf(i) // v1 points to same *S-tagged object as i |
| 388 | v2 := v1.Elem() // v2 points to an indirect S-tagged object, pointing to s |
| 389 | v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X |
| 390 | v3.Set(y) // pts(s.X) ⊇ pts(y) |
| 391 | |
| 392 | Whether indirect or not, the concrete type of the tagged object |
| 393 | corresponds to the user-visible dynamic type, and the existence |
| 394 | of a pointer is an implementation detail. |
| 395 | |
| 396 | (NB: indirect tagged objects are not yet implemented) |
| 397 | |
| 398 | 2. The dynamic type tag of a tagged object pointed to by a |
| 399 | reflect.Value may be an interface type; it need not be concrete. |
| 400 | |
| 401 | This arises in code such as this: |
| 402 | |
| 403 | tEface := reflect.TypeOf(new(interface{}).Elem() // interface{} |
| 404 | eface := reflect.Zero(tEface) |
| 405 | |
| 406 | pts(eface) is a singleton containing an interface{}-tagged |
| 407 | object. That tagged object's payload is an interface{} value, |
| 408 | i.e. the pts of the payload contains only concrete-tagged |
| 409 | objects, although in this example it's the zero interface{} value, |
| 410 | so its pts is empty. |
| 411 | |
| 412 | reflect.Type: |
| 413 | |
| 414 | Just as in the real "reflect" library, we represent a reflect.Type |
| 415 | as an interface whose sole implementation is the concrete type, |
| 416 | *reflect.rtype. (This choice is forced on us by go/types: clients |
| 417 | cannot fabricate types with arbitrary method sets.) |
| 418 | |
| 419 | rtype instances are canonical: there is at most one per dynamic |
| 420 | type. (rtypes are in fact large structs but since identity is all |
| 421 | that matters, we represent them by a single node.) |
| 422 | |
| 423 | The payload of each *rtype-tagged object is an *rtype pointer that |
| 424 | points to exactly one such canonical rtype object. We exploit this |
| 425 | by setting the node.typ of the payload to the dynamic type, not |
| 426 | '*rtype'. This saves us an indirection in each resolution rule. As |
| 427 | an optimisation, *rtype-tagged objects are canonicalized too. |
| 428 | |
| 429 | Aggregate types: |
| 430 | |
| 431 | Aggregate types are treated as if all directly contained |
| 432 | aggregates are recursively flattened out. |
| 433 | |
| 434 | Structs: |
| 435 | |
| 436 | *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. |
| 437 | |
| 438 | *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create |
| 439 | |
| 440 | simple edges for each struct discovered in pts(x). |
| 441 | |
| 442 | The nodes of a struct consist of a special 'identity' node (whose |
| 443 | type is that of the struct itself), followed by the nodes for all |
| 444 | the struct's fields, recursively flattened out. A pointer to the |
| 445 | struct is a pointer to its identity node. That node allows us to |
| 446 | distinguish a pointer to a struct from a pointer to its first field. |
| 447 | |
| 448 | Field offsets are logical field offsets (plus one for the identity |
| 449 | node), so the sizes of the fields can be ignored by the analysis. |
| 450 | |
| 451 | (The identity node is non-traditional but enables the distinction |
| 452 | described above, which is valuable for code comprehension tools. |
| 453 | Typical pointer analyses for C, whose purpose is compiler |
| 454 | optimization, must soundly model unsafe.Pointer (void*) conversions, |
| 455 | and this requires fidelity to the actual memory layout using physical |
| 456 | field offsets.) |
| 457 | |
| 458 | *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. |
| 459 | |
| 460 | *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create |
| 461 | |
| 462 | simple edges for each struct discovered in pts(x). |
| 463 | |
| 464 | Arrays: |
| 465 | |
| 466 | We model an array by an identity node (whose type is that of the |
| 467 | array itself) followed by a node representing all the elements of |
| 468 | the array; the analysis does not distinguish elements with different |
| 469 | indices. Effectively, an array is treated like struct{elem T}, a |
| 470 | load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the |
| 471 | index i is ignored. |
| 472 | |
| 473 | A pointer to an array is pointer to its identity node. (A slice is |
| 474 | also a pointer to an array's identity node.) The identity node |
| 475 | allows us to distinguish a pointer to an array from a pointer to one |
| 476 | of its elements, but it is rather costly because it introduces more |
| 477 | offset constraints into the system. Furthermore, sound treatment of |
| 478 | unsafe.Pointer would require us to dispense with this node. |
| 479 | |
| 480 | Arrays may be allocated by Alloc, by make([]T), by calls to append, |
| 481 | and via reflection. |
| 482 | |
| 483 | Tuples (T, ...): |
| 484 | |
| 485 | Tuples are treated like structs with naturally numbered fields. |
| 486 | *ssa.Extract is analogous to *ssa.Field. |
| 487 | |
| 488 | However, tuples have no identity field since by construction, they |
| 489 | cannot be address-taken. |
| 490 | |
| 491 | # FUNCTION CALLS |
| 492 | |
| 493 | There are three kinds of function call: |
| 494 | 1. static "call"-mode calls of functions. |
| 495 | 2. dynamic "call"-mode calls of functions. |
| 496 | 3. dynamic "invoke"-mode calls of interface methods. |
| 497 | |
| 498 | Cases 1 and 2 apply equally to methods and standalone functions. |
| 499 | |
| 500 | Static calls: |
| 501 | |
| 502 | A static call consists three steps: |
| 503 | - finding the function object of the callee; |
| 504 | - creating copy edges from the actual parameter value nodes to the |
| 505 | P-block in the function object (this includes the receiver if |
| 506 | the callee is a method); |
| 507 | - creating copy edges from the R-block in the function object to |
| 508 | the value nodes for the result of the call. |
| 509 | |
| 510 | A static function call is little more than two struct value copies |
| 511 | between the P/R blocks of caller and callee: |
| 512 | |
| 513 | callee.P = caller.P |
| 514 | caller.R = callee.R |
| 515 | |
| 516 | Context sensitivity: Static calls (alone) may be treated context sensitively, |
| 517 | i.e. each callsite may cause a distinct re-analysis of the |
| 518 | callee, improving precision. Our current context-sensitivity |
| 519 | policy treats all intrinsics and getter/setter methods in this |
| 520 | manner since such functions are small and seem like an obvious |
| 521 | source of spurious confluences, though this has not yet been |
| 522 | evaluated. |
| 523 | |
| 524 | Dynamic function calls: |
| 525 | |
| 526 | Dynamic calls work in a similar manner except that the creation of |
| 527 | copy edges occurs dynamically, in a similar fashion to a pair of |
| 528 | struct copies in which the callee is indirect: |
| 529 | |
| 530 | callee->P = caller.P |
| 531 | caller.R = callee->R |
| 532 | |
| 533 | (Recall that the function object's P- and R-blocks are contiguous.) |
| 534 | |
| 535 | Interface method invocation: |
| 536 | |
| 537 | For invoke-mode calls, we create a params/results block for the |
| 538 | callsite and attach a dynamic closure rule to the interface. For |
| 539 | each new tagged object that flows to the interface, we look up |
| 540 | the concrete method, find its function object, and connect its P/R |
| 541 | blocks to the callsite's P/R blocks, adding copy edges to the graph |
| 542 | during solving. |
| 543 | |
| 544 | Recording call targets: |
| 545 | |
| 546 | The analysis notifies its clients of each callsite it encounters, |
| 547 | passing a CallSite interface. Among other things, the CallSite |
| 548 | contains a synthetic constraint variable ("targets") whose |
| 549 | points-to solution includes the set of all function objects to |
| 550 | which the call may dispatch. |
| 551 | |
| 552 | It is via this mechanism that the callgraph is made available. |
| 553 | Clients may also elect to be notified of callgraph edges directly; |
| 554 | internally this just iterates all "targets" variables' pts(·)s. |
| 555 | |
| 556 | # PRESOLVER |
| 557 | |
| 558 | We implement Hash-Value Numbering (HVN), a pre-solver constraint |
| 559 | optimization described in Hardekopf & Lin, SAS'07. This is documented |
| 560 | in more detail in hvn.go. We intend to add its cousins HR and HU in |
| 561 | future. |
| 562 | |
| 563 | # SOLVER |
| 564 | |
| 565 | The solver is currently a naive Andersen-style implementation; it does |
| 566 | not perform online cycle detection, though we plan to add solver |
| 567 | optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf |
| 568 | & Lin, PLDI'07). |
| 569 | |
| 570 | It uses difference propagation (Pearce et al, SQC'04) to avoid |
| 571 | redundant re-triggering of closure rules for values already seen. |
| 572 | |
| 573 | Points-to sets are represented using sparse bit vectors (similar to |
| 574 | those used in LLVM and gcc), which are more space- and time-efficient |
| 575 | than sets based on Go's built-in map type or dense bit vectors. |
| 576 | |
| 577 | Nodes are permuted prior to solving so that object nodes (which may |
| 578 | appear in points-to sets) are lower numbered than non-object (var) |
| 579 | nodes. This improves the density of the set over which the PTSs |
| 580 | range, and thus the efficiency of the representation. |
| 581 | |
| 582 | Partly thanks to avoiding map iteration, the execution of the solver is |
| 583 | 100% deterministic, a great help during debugging. |
| 584 | |
| 585 | # FURTHER READING |
| 586 | |
| 587 | Andersen, L. O. 1994. Program analysis and specialization for the C |
| 588 | programming language. Ph.D. dissertation. DIKU, University of |
| 589 | Copenhagen. |
| 590 | |
| 591 | David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient |
| 592 | field-sensitive pointer analysis for C. In Proceedings of the 5th ACM |
| 593 | SIGPLAN-SIGSOFT workshop on Program analysis for software tools and |
| 594 | engineering (PASTE '04). ACM, New York, NY, USA, 37-42. |
| 595 | http://doi.acm.org/10.1145/996821.996835 |
| 596 | |
| 597 | David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online |
| 598 | Cycle Detection and Difference Propagation: Applications to Pointer |
| 599 | Analysis. Software Quality Control 12, 4 (December 2004), 311-337. |
| 600 | http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2 |
| 601 | |
| 602 | David Grove and Craig Chambers. 2001. A framework for call graph |
| 603 | construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 |
| 604 | (November 2001), 685-746. |
| 605 | http://doi.acm.org/10.1145/506315.506316 |
| 606 | |
| 607 | Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast |
| 608 | and accurate pointer analysis for millions of lines of code. In |
| 609 | Proceedings of the 2007 ACM SIGPLAN conference on Programming language |
| 610 | design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299. |
| 611 | http://doi.acm.org/10.1145/1250734.1250767 |
| 612 | |
| 613 | Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location |
| 614 | equivalence to optimize pointer analysis. In Proceedings of the 14th |
| 615 | international conference on Static Analysis (SAS'07), Hanne Riis |
| 616 | Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg, |
| 617 | 265-280. |
| 618 | |
| 619 | Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution |
| 620 | for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000 |
| 621 | conference on Programming language design and implementation (PLDI '00). |
| 622 | ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310 |
| 623 | http://doi.acm.org/10.1145/349299.349310 |
| 624 | */ |
| 625 | package pointer // import "golang.org/x/tools/go/pointer" |
| 626 |
Members