| 1 | ======= |
| 2 | ThinLTO |
| 3 | ======= |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
| 11 | *ThinLTO* compilation is a new type of LTO that is both scalable and |
| 12 | incremental. *LTO* (Link Time Optimization) achieves better |
| 13 | runtime performance through whole-program analysis and cross-module |
| 14 | optimization. However, monolithic LTO implements this by merging all |
| 15 | input into a single module, which is not scalable |
| 16 | in time or memory, and also prevents fast incremental compiles. |
| 17 | |
| 18 | In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the |
| 19 | compile phase. The ThinLTO bitcode is augmented with a compact summary |
| 20 | of the module. During the link step, only the summaries are read and |
| 21 | merged into a combined summary index, which includes an index of function |
| 22 | locations for later cross-module function importing. Fast and efficient |
| 23 | whole-program analysis is then performed on the combined summary index. |
| 24 | |
| 25 | However, all transformations, including function importing, occur |
| 26 | later when the modules are optimized in fully parallel backends. |
| 27 | By default, linkers_ that support ThinLTO are set up to launch |
| 28 | the ThinLTO backends in threads. So the usage model is not affected |
| 29 | as the distinction between the fast serial thin link step and the backends |
| 30 | is transparent to the user. |
| 31 | |
| 32 | For more information on the ThinLTO design and current performance, |
| 33 | see the LLVM blog post `ThinLTO: Scalable and Incremental LTO |
| 34 | <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_. |
| 35 | While tuning is still in progress, results in the blog post show that |
| 36 | ThinLTO already performs well compared to LTO, in many cases matching |
| 37 | the performance improvement. |
| 38 | |
| 39 | Current Status |
| 40 | ============== |
| 41 | |
| 42 | Clang/LLVM |
| 43 | ---------- |
| 44 | .. _compiler: |
| 45 | |
| 46 | The 3.9 release of clang includes ThinLTO support. However, ThinLTO |
| 47 | is under active development, and new features, improvements and bugfixes |
| 48 | are being added for the next release. For the latest ThinLTO support, |
| 49 | `build a recent version of clang and LLVM |
| 50 | <https://llvm.org/docs/CMake.html>`_. |
| 51 | |
| 52 | Linkers |
| 53 | ------- |
| 54 | .. _linkers: |
| 55 | .. _linker: |
| 56 | |
| 57 | ThinLTO is currently supported for the following linkers: |
| 58 | |
| 59 | - **gold (via the gold-plugin)**: |
| 60 | Similar to monolithic LTO, this requires using |
| 61 | a `gold linker configured with plugins enabled |
| 62 | <https://llvm.org/docs/GoldPlugin.html>`_. |
| 63 | - **ld64**: |
| 64 | Starting with `Xcode 8 <https://developer.apple.com/xcode/>`_. |
| 65 | - **lld**: |
| 66 | Starting with r284050 for ELF, r298942 for COFF. |
| 67 | |
| 68 | Usage |
| 69 | ===== |
| 70 | |
| 71 | Basic |
| 72 | ----- |
| 73 | |
| 74 | To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g. |
| 75 | |
| 76 | .. code-block:: console |
| 77 | |
| 78 | % clang -flto=thin -O2 file1.c file2.c -c |
| 79 | % clang -flto=thin -O2 file1.o file2.o -o a.out |
| 80 | |
| 81 | When using lld-link, the -flto option need only be added to the compile step: |
| 82 | |
| 83 | .. code-block:: console |
| 84 | |
| 85 | % clang-cl -flto=thin -O2 -c file1.c file2.c |
| 86 | % lld-link /out:a.exe file1.obj file2.obj |
| 87 | |
| 88 | As mentioned earlier, by default the linkers will launch the ThinLTO backend |
| 89 | threads in parallel, passing the resulting native object files back to the |
| 90 | linker for the final native link. As such, the usage model the same as |
| 91 | non-LTO. |
| 92 | |
| 93 | With gold, if you see an error during the link of the form: |
| 94 | |
| 95 | .. code-block:: console |
| 96 | |
| 97 | /usr/bin/ld: error: /path/to/clang/bin/../lib/LLVMgold.so: could not load plugin library: /path/to/clang/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory |
| 98 | |
| 99 | Then either gold was not configured with plugins enabled, or clang |
| 100 | was not built with ``-DLLVM_BINUTILS_INCDIR`` set properly. See |
| 101 | the instructions for the |
| 102 | `LLVM gold plugin <https://llvm.org/docs/GoldPlugin.html#how-to-build-it>`_. |
| 103 | |
| 104 | Controlling Backend Parallelism |
| 105 | ------------------------------- |
| 106 | .. _parallelism: |
| 107 | |
| 108 | By default, the ThinLTO link step will launch as many |
| 109 | threads in parallel as there are cores. If the number of |
| 110 | cores can't be computed for the architecture, then it will launch |
| 111 | ``std::thread::hardware_concurrency`` number of threads in parallel. |
| 112 | For machines with hyper-threading, this is the total number of |
| 113 | virtual cores. For some applications and machine configurations this |
| 114 | may be too aggressive, in which case the amount of parallelism can |
| 115 | be reduced to ``N`` via: |
| 116 | |
| 117 | - gold: |
| 118 | ``-Wl,-plugin-opt,jobs=N`` |
| 119 | - ld64: |
| 120 | ``-Wl,-mllvm,-threads=N`` |
| 121 | - lld: |
| 122 | ``-Wl,--thinlto-jobs=N`` |
| 123 | - lld-link: |
| 124 | ``/opt:lldltojobs=N`` |
| 125 | |
| 126 | Incremental |
| 127 | ----------- |
| 128 | .. _incremental: |
| 129 | |
| 130 | ThinLTO supports fast incremental builds through the use of a cache, |
| 131 | which currently must be enabled through a linker option. |
| 132 | |
| 133 | - gold (as of LLVM 4.0): |
| 134 | ``-Wl,-plugin-opt,cache-dir=/path/to/cache`` |
| 135 | - ld64 (support in clang 3.9 and Xcode 8): |
| 136 | ``-Wl,-cache_path_lto,/path/to/cache`` |
| 137 | - ELF lld (as of LLVM 5.0): |
| 138 | ``-Wl,--thinlto-cache-dir=/path/to/cache`` |
| 139 | - COFF lld-link (as of LLVM 6.0): |
| 140 | ``/lldltocache:/path/to/cache`` |
| 141 | |
| 142 | Cache Pruning |
| 143 | ------------- |
| 144 | |
| 145 | To help keep the size of the cache under control, ThinLTO supports cache |
| 146 | pruning. Cache pruning is supported with gold, ld64 and ELF and COFF lld, but |
| 147 | currently only gold, ELF and COFF lld allow you to control the policy with a |
| 148 | policy string. The cache policy must be specified with a linker option. |
| 149 | |
| 150 | - gold (as of LLVM 6.0): |
| 151 | ``-Wl,-plugin-opt,cache-policy=POLICY`` |
| 152 | - ELF lld (as of LLVM 5.0): |
| 153 | ``-Wl,--thinlto-cache-policy,POLICY`` |
| 154 | - COFF lld-link (as of LLVM 6.0): |
| 155 | ``/lldltocachepolicy:POLICY`` |
| 156 | |
| 157 | A policy string is a series of key-value pairs separated by ``:`` characters. |
| 158 | Possible key-value pairs are: |
| 159 | |
| 160 | - ``cache_size=X%``: The maximum size for the cache directory is ``X`` percent |
| 161 | of the available space on the disk. Set to 100 to indicate no limit, |
| 162 | 50 to indicate that the cache size will not be left over half the available |
| 163 | disk space. A value over 100 is invalid. A value of 0 disables the percentage |
| 164 | size-based pruning. The default is 75%. |
| 165 | |
| 166 | - ``cache_size_bytes=X``, ``cache_size_bytes=Xk``, ``cache_size_bytes=Xm``, |
| 167 | ``cache_size_bytes=Xg``: |
| 168 | Sets the maximum size for the cache directory to ``X`` bytes (or KB, MB, |
| 169 | GB respectively). A value over the amount of available space on the disk |
| 170 | will be reduced to the amount of available space. A value of 0 disables |
| 171 | the byte size-based pruning. The default is no byte size-based pruning. |
| 172 | |
| 173 | Note that ThinLTO will apply both size-based pruning policies simultaneously, |
| 174 | and changing one does not affect the other. For example, a policy of |
| 175 | ``cache_size_bytes=1g`` on its own will cause both the 1GB and default 75% |
| 176 | policies to be applied unless the default ``cache_size`` is overridden. |
| 177 | |
| 178 | - ``cache_size_files=X``: |
| 179 | Set the maximum number of files in the cache directory. Set to 0 to indicate |
| 180 | no limit. The default is 1000000 files. |
| 181 | |
| 182 | - ``prune_after=Xs``, ``prune_after=Xm``, ``prune_after=Xh``: Sets the |
| 183 | expiration time for cache files to ``X`` seconds (or minutes, hours |
| 184 | respectively). When a file hasn't been accessed for ``prune_after`` seconds, |
| 185 | it is removed from the cache. A value of 0 disables the expiration-based |
| 186 | pruning. The default is 1 week. |
| 187 | |
| 188 | - ``prune_interval=Xs``, ``prune_interval=Xm``, ``prune_interval=Xh``: |
| 189 | Sets the pruning interval to ``X`` seconds (or minutes, hours |
| 190 | respectively). This is intended to be used to avoid scanning the directory |
| 191 | too often. It does not impact the decision of which files to prune. A |
| 192 | value of 0 forces the scan to occur. The default is every 20 minutes. |
| 193 | |
| 194 | Clang Bootstrap |
| 195 | --------------- |
| 196 | |
| 197 | To bootstrap clang/LLVM with ThinLTO, follow these steps: |
| 198 | |
| 199 | 1. The host compiler_ must be a version of clang that supports ThinLTO. |
| 200 | #. The host linker_ must support ThinLTO (and in the case of gold, must be |
| 201 | `configured with plugins enabled <https://llvm.org/docs/GoldPlugin.html>`_. |
| 202 | #. Use the following additional `CMake variables |
| 203 | <https://llvm.org/docs/CMake.html#options-and-variables>`_ |
| 204 | when configuring the bootstrap compiler build: |
| 205 | |
| 206 | * ``-DLLVM_ENABLE_LTO=Thin`` |
| 207 | * ``-DCMAKE_C_COMPILER=/path/to/host/clang`` |
| 208 | * ``-DCMAKE_CXX_COMPILER=/path/to/host/clang++`` |
| 209 | * ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib`` |
| 210 | * ``-DCMAKE_AR=/path/to/host/llvm-ar`` |
| 211 | |
| 212 | Or, on Windows: |
| 213 | |
| 214 | * ``-DLLVM_ENABLE_LTO=Thin`` |
| 215 | * ``-DCMAKE_C_COMPILER=/path/to/host/clang-cl.exe`` |
| 216 | * ``-DCMAKE_CXX_COMPILER=/path/to/host/clang-cl.exe`` |
| 217 | * ``-DCMAKE_LINKER=/path/to/host/lld-link.exe`` |
| 218 | * ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib.exe`` |
| 219 | * ``-DCMAKE_AR=/path/to/host/llvm-ar.exe`` |
| 220 | |
| 221 | #. To use additional linker arguments for controlling the backend |
| 222 | parallelism_ or enabling incremental_ builds of the bootstrap compiler, |
| 223 | after configuring the build, modify the resulting CMakeCache.txt file in the |
| 224 | build directory. Specify any additional linker options after |
| 225 | ``CMAKE_EXE_LINKER_FLAGS:STRING=``. Note the configure may fail if |
| 226 | linker plugin options are instead specified directly in the previous step. |
| 227 | |
| 228 | More Information |
| 229 | ================ |
| 230 | |
| 231 | * From LLVM project blog: |
| 232 | `ThinLTO: Scalable and Incremental LTO |
| 233 | <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_ |
| 234 | |