| <html lang="en"> |
| <head> |
| <title>i386 and x86-64 Options - Using the GNU Compiler Collection (GCC)</title> |
| <meta http-equiv="Content-Type" content="text/html"> |
| <meta name="description" content="Using the GNU Compiler Collection (GCC)"> |
| <meta name="generator" content="makeinfo 4.13"> |
| <link title="Top" rel="start" href="index.html#Top"> |
| <link rel="up" href="Submodel-Options.html#Submodel-Options" title="Submodel Options"> |
| <link rel="prev" href="HPPA-Options.html#HPPA-Options" title="HPPA Options"> |
| <link rel="next" href="i386-and-x86_002d64-Windows-Options.html#i386-and-x86_002d64-Windows-Options" title="i386 and x86-64 Windows Options"> |
| <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> |
| <!-- |
| Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, |
| 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, |
| 2008 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.2 or |
| any later version published by the Free Software Foundation; with the |
| Invariant Sections being ``Funding Free Software'', the Front-Cover |
| Texts being (a) (see below), and with the Back-Cover Texts being (b) |
| (see below). A copy of the license is included in the section entitled |
| ``GNU Free Documentation License''. |
| |
| (a) The FSF's Front-Cover Text is: |
| |
| A GNU Manual |
| |
| (b) The FSF's Back-Cover Text is: |
| |
| You have freedom to copy and modify this GNU Manual, like GNU |
| software. Copies published by the Free Software Foundation raise |
| funds for GNU development.--> |
| <meta http-equiv="Content-Style-Type" content="text/css"> |
| <style type="text/css"><!-- |
| pre.display { font-family:inherit } |
| pre.format { font-family:inherit } |
| pre.smalldisplay { font-family:inherit; font-size:smaller } |
| pre.smallformat { font-family:inherit; font-size:smaller } |
| pre.smallexample { font-size:smaller } |
| pre.smalllisp { font-size:smaller } |
| span.sc { font-variant:small-caps } |
| span.roman { font-family:serif; font-weight:normal; } |
| span.sansserif { font-family:sans-serif; font-weight:normal; } |
| --></style> |
| <link rel="stylesheet" type="text/css" href="../cs.css"> |
| </head> |
| <body> |
| <div class="node"> |
| <a name="i386-and-x86-64-Options"></a> |
| <a name="i386-and-x86_002d64-Options"></a> |
| <p> |
| Next: <a rel="next" accesskey="n" href="i386-and-x86_002d64-Windows-Options.html#i386-and-x86_002d64-Windows-Options">i386 and x86-64 Windows Options</a>, |
| Previous: <a rel="previous" accesskey="p" href="HPPA-Options.html#HPPA-Options">HPPA Options</a>, |
| Up: <a rel="up" accesskey="u" href="Submodel-Options.html#Submodel-Options">Submodel Options</a> |
| <hr> |
| </div> |
| |
| <h4 class="subsection">3.17.15 Intel 386 and AMD x86-64 Options</h4> |
| |
| <p><a name="index-i386-Options-1297"></a><a name="index-x86_002d64-Options-1298"></a><a name="index-Intel-386-Options-1299"></a><a name="index-AMD-x86_002d64-Options-1300"></a> |
| These ‘<samp><span class="samp">-m</span></samp>’ options are defined for the i386 and x86-64 family of |
| computers: |
| |
| <dl> |
| <dt><code>-mtune=</code><var>cpu-type</var><dd><a name="index-mtune-1301"></a>Tune to <var>cpu-type</var> everything applicable about the generated code, except |
| for the ABI and the set of available instructions. The choices for |
| <var>cpu-type</var> are: |
| <dl> |
| <dt><em>generic</em><dd>Produce code optimized for the most common IA32/AMD64/EM64T processors. |
| If you know the CPU on which your code will run, then you should use |
| the corresponding <samp><span class="option">-mtune</span></samp> option instead of |
| <samp><span class="option">-mtune=generic</span></samp>. But, if you do not know exactly what CPU users |
| of your application will have, then you should use this option. |
| |
| <p>As new processors are deployed in the marketplace, the behavior of this |
| option will change. Therefore, if you upgrade to a newer version of |
| GCC, the code generated option will change to reflect the processors |
| that were most common when that version of GCC was released. |
| |
| <p>There is no <samp><span class="option">-march=generic</span></samp> option because <samp><span class="option">-march</span></samp> |
| indicates the instruction set the compiler can use, and there is no |
| generic instruction set applicable to all processors. In contrast, |
| <samp><span class="option">-mtune</span></samp> indicates the processor (or, in this case, collection of |
| processors) for which the code is optimized. |
| <br><dt><em>native</em><dd>This selects the CPU to tune for at compilation time by determining |
| the processor type of the compiling machine. Using <samp><span class="option">-mtune=native</span></samp> |
| will produce code optimized for the local machine under the constraints |
| of the selected instruction set. Using <samp><span class="option">-march=native</span></samp> will |
| enable all instruction subsets supported by the local machine (hence |
| the result might not run on different machines). |
| <br><dt><em>i386</em><dd>Original Intel's i386 CPU. |
| <br><dt><em>i486</em><dd>Intel's i486 CPU. (No scheduling is implemented for this chip.) |
| <br><dt><em>i586, pentium</em><dd>Intel Pentium CPU with no MMX support. |
| <br><dt><em>pentium-mmx</em><dd>Intel PentiumMMX CPU based on Pentium core with MMX instruction set support. |
| <br><dt><em>pentiumpro</em><dd>Intel PentiumPro CPU. |
| <br><dt><em>i686</em><dd>Same as <code>generic</code>, but when used as <code>march</code> option, PentiumPro |
| instruction set will be used, so the code will run on all i686 family chips. |
| <br><dt><em>pentium2</em><dd>Intel Pentium2 CPU based on PentiumPro core with MMX instruction set support. |
| <br><dt><em>pentium3, pentium3m</em><dd>Intel Pentium3 CPU based on PentiumPro core with MMX and SSE instruction set |
| support. |
| <br><dt><em>pentium-m</em><dd>Low power version of Intel Pentium3 CPU with MMX, SSE and SSE2 instruction set |
| support. Used by Centrino notebooks. |
| <br><dt><em>pentium4, pentium4m</em><dd>Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set support. |
| <br><dt><em>prescott</em><dd>Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction |
| set support. |
| <br><dt><em>nocona</em><dd>Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, |
| SSE2 and SSE3 instruction set support. |
| <br><dt><em>core2</em><dd>Intel Core2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 |
| instruction set support. |
| <br><dt><em>atom</em><dd>Intel Atom CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 |
| instruction set support. |
| <br><dt><em>k6</em><dd>AMD K6 CPU with MMX instruction set support. |
| <br><dt><em>k6-2, k6-3</em><dd>Improved versions of AMD K6 CPU with MMX and 3DNow! instruction set support. |
| <br><dt><em>athlon, athlon-tbird</em><dd>AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow! and SSE prefetch instructions |
| support. |
| <br><dt><em>athlon-4, athlon-xp, athlon-mp</em><dd>Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow! and full SSE |
| instruction set support. |
| <br><dt><em>k8, opteron, athlon64, athlon-fx</em><dd>AMD K8 core based CPUs with x86-64 instruction set support. (This supersets |
| MMX, SSE, SSE2, 3DNow!, enhanced 3DNow! and 64-bit instruction set extensions.) |
| <br><dt><em>k8-sse3, opteron-sse3, athlon64-sse3</em><dd>Improved versions of k8, opteron and athlon64 with SSE3 instruction set support. |
| <br><dt><em>amdfam10, barcelona</em><dd>AMD Family 10h core based CPUs with x86-64 instruction set support. (This |
| supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit |
| instruction set extensions.) |
| <br><dt><em>winchip-c6</em><dd>IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction |
| set support. |
| <br><dt><em>winchip2</em><dd>IDT Winchip2 CPU, dealt in same way as i486 with additional MMX and 3DNow! |
| instruction set support. |
| <br><dt><em>c3</em><dd>Via C3 CPU with MMX and 3DNow! instruction set support. (No scheduling is |
| implemented for this chip.) |
| <br><dt><em>c3-2</em><dd>Via C3-2 CPU with MMX and SSE instruction set support. (No scheduling is |
| implemented for this chip.) |
| <br><dt><em>geode</em><dd>Embedded AMD CPU with MMX and 3DNow! instruction set support. |
| </dl> |
| |
| <p>While picking a specific <var>cpu-type</var> will schedule things appropriately |
| for that particular chip, the compiler will not generate any code that |
| does not run on the i386 without the <samp><span class="option">-march=</span><var>cpu-type</var></samp> option |
| being used. |
| |
| <br><dt><code>-march=</code><var>cpu-type</var><dd><a name="index-march-1302"></a>Generate instructions for the machine type <var>cpu-type</var>. The choices |
| for <var>cpu-type</var> are the same as for <samp><span class="option">-mtune</span></samp>. Moreover, |
| specifying <samp><span class="option">-march=</span><var>cpu-type</var></samp> implies <samp><span class="option">-mtune=</span><var>cpu-type</var></samp>. |
| |
| <br><dt><code>-mcpu=</code><var>cpu-type</var><dd><a name="index-mcpu-1303"></a>A deprecated synonym for <samp><span class="option">-mtune</span></samp>. |
| |
| <br><dt><code>-mfpmath=</code><var>unit</var><dd><a name="index-mfpmath-1304"></a>Generate floating point arithmetics for selected unit <var>unit</var>. The choices |
| for <var>unit</var> are: |
| |
| <dl> |
| <dt>‘<samp><span class="samp">387</span></samp>’<dd>Use the standard 387 floating point coprocessor present majority of chips and |
| emulated otherwise. Code compiled with this option will run almost everywhere. |
| The temporary results are computed in 80bit precision instead of precision |
| specified by the type resulting in slightly different results compared to most |
| of other chips. See <samp><span class="option">-ffloat-store</span></samp> for more detailed description. |
| |
| <p>This is the default choice for i386 compiler. |
| |
| <br><dt>‘<samp><span class="samp">sse</span></samp>’<dd>Use scalar floating point instructions present in the SSE instruction set. |
| This instruction set is supported by Pentium3 and newer chips, in the AMD line |
| by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE |
| instruction set supports only single precision arithmetics, thus the double and |
| extended precision arithmetics is still done using 387. Later version, present |
| only in Pentium4 and the future AMD x86-64 chips supports double precision |
| arithmetics too. |
| |
| <p>For the i386 compiler, you need to use <samp><span class="option">-march=</span><var>cpu-type</var></samp>, <samp><span class="option">-msse</span></samp> |
| or <samp><span class="option">-msse2</span></samp> switches to enable SSE extensions and make this option |
| effective. For the x86-64 compiler, these extensions are enabled by default. |
| |
| <p>The resulting code should be considerably faster in the majority of cases and avoid |
| the numerical instability problems of 387 code, but may break some existing |
| code that expects temporaries to be 80bit. |
| |
| <p>This is the default choice for the x86-64 compiler. |
| |
| <br><dt>‘<samp><span class="samp">sse,387</span></samp>’<dt>‘<samp><span class="samp">sse+387</span></samp>’<dt>‘<samp><span class="samp">both</span></samp>’<dd>Attempt to utilize both instruction sets at once. This effectively double the |
| amount of available registers and on chips with separate execution units for |
| 387 and SSE the execution resources too. Use this option with care, as it is |
| still experimental, because the GCC register allocator does not model separate |
| functional units well resulting in instable performance. |
| </dl> |
| |
| <br><dt><code>-masm=</code><var>dialect</var><dd><a name="index-masm_003d_0040var_007bdialect_007d-1305"></a>Output asm instructions using selected <var>dialect</var>. Supported |
| choices are ‘<samp><span class="samp">intel</span></samp>’ or ‘<samp><span class="samp">att</span></samp>’ (the default one). Darwin does |
| not support ‘<samp><span class="samp">intel</span></samp>’. |
| |
| <br><dt><code>-mieee-fp</code><dt><code>-mno-ieee-fp</code><dd><a name="index-mieee_002dfp-1306"></a><a name="index-mno_002dieee_002dfp-1307"></a>Control whether or not the compiler uses IEEE floating point |
| comparisons. These handle correctly the case where the result of a |
| comparison is unordered. |
| |
| <br><dt><code>-msoft-float</code><dd><a name="index-msoft_002dfloat-1308"></a>Generate output containing library calls for floating point. |
| <strong>Warning:</strong> the requisite libraries are not part of GCC. |
| Normally the facilities of the machine's usual C compiler are used, but |
| this can't be done directly in cross-compilation. You must make your |
| own arrangements to provide suitable library functions for |
| cross-compilation. |
| |
| <p>On machines where a function returns floating point results in the 80387 |
| register stack, some floating point opcodes may be emitted even if |
| <samp><span class="option">-msoft-float</span></samp> is used. |
| |
| <br><dt><code>-mno-fp-ret-in-387</code><dd><a name="index-mno_002dfp_002dret_002din_002d387-1309"></a>Do not use the FPU registers for return values of functions. |
| |
| <p>The usual calling convention has functions return values of types |
| <code>float</code> and <code>double</code> in an FPU register, even if there |
| is no FPU. The idea is that the operating system should emulate |
| an FPU. |
| |
| <p>The option <samp><span class="option">-mno-fp-ret-in-387</span></samp> causes such values to be returned |
| in ordinary CPU registers instead. |
| |
| <br><dt><code>-mno-fancy-math-387</code><dd><a name="index-mno_002dfancy_002dmath_002d387-1310"></a>Some 387 emulators do not support the <code>sin</code>, <code>cos</code> and |
| <code>sqrt</code> instructions for the 387. Specify this option to avoid |
| generating those instructions. This option is the default on FreeBSD, |
| OpenBSD and NetBSD. This option is overridden when <samp><span class="option">-march</span></samp> |
| indicates that the target cpu will always have an FPU and so the |
| instruction will not need emulation. As of revision 2.6.1, these |
| instructions are not generated unless you also use the |
| <samp><span class="option">-funsafe-math-optimizations</span></samp> switch. |
| |
| <br><dt><code>-malign-double</code><dt><code>-mno-align-double</code><dd><a name="index-malign_002ddouble-1311"></a><a name="index-mno_002dalign_002ddouble-1312"></a>Control whether GCC aligns <code>double</code>, <code>long double</code>, and |
| <code>long long</code> variables on a two word boundary or a one word |
| boundary. Aligning <code>double</code> variables on a two word boundary will |
| produce code that runs somewhat faster on a ‘<samp><span class="samp">Pentium</span></samp>’ at the |
| expense of more memory. |
| |
| <p>On x86-64, <samp><span class="option">-malign-double</span></samp> is enabled by default. |
| |
| <p><strong>Warning:</strong> if you use the <samp><span class="option">-malign-double</span></samp> switch, |
| structures containing the above types will be aligned differently than |
| the published application binary interface specifications for the 386 |
| and will not be binary compatible with structures in code compiled |
| without that switch. |
| |
| <br><dt><code>-m96bit-long-double</code><dt><code>-m128bit-long-double</code><dd><a name="index-m96bit_002dlong_002ddouble-1313"></a><a name="index-m128bit_002dlong_002ddouble-1314"></a>These switches control the size of <code>long double</code> type. The i386 |
| application binary interface specifies the size to be 96 bits, |
| so <samp><span class="option">-m96bit-long-double</span></samp> is the default in 32 bit mode. |
| |
| <p>Modern architectures (Pentium and newer) would prefer <code>long double</code> |
| to be aligned to an 8 or 16 byte boundary. In arrays or structures |
| conforming to the ABI, this would not be possible. So specifying a |
| <samp><span class="option">-m128bit-long-double</span></samp> will align <code>long double</code> |
| to a 16 byte boundary by padding the <code>long double</code> with an additional |
| 32 bit zero. |
| |
| <p>In the x86-64 compiler, <samp><span class="option">-m128bit-long-double</span></samp> is the default choice as |
| its ABI specifies that <code>long double</code> is to be aligned on 16 byte boundary. |
| |
| <p>Notice that neither of these options enable any extra precision over the x87 |
| standard of 80 bits for a <code>long double</code>. |
| |
| <p><strong>Warning:</strong> if you override the default value for your target ABI, the |
| structures and arrays containing <code>long double</code> variables will change |
| their size as well as function calling convention for function taking |
| <code>long double</code> will be modified. Hence they will not be binary |
| compatible with arrays or structures in code compiled without that switch. |
| |
| <br><dt><code>-mlarge-data-threshold=</code><var>number</var><dd><a name="index-mlarge_002ddata_002dthreshold_003d_0040var_007bnumber_007d-1315"></a>When <samp><span class="option">-mcmodel=medium</span></samp> is specified, the data greater than |
| <var>threshold</var> are placed in large data section. This value must be the |
| same across all object linked into the binary and defaults to 65535. |
| |
| <br><dt><code>-mrtd</code><dd><a name="index-mrtd-1316"></a>Use a different function-calling convention, in which functions that |
| take a fixed number of arguments return with the <code>ret</code> <var>num</var> |
| instruction, which pops their arguments while returning. This saves one |
| instruction in the caller since there is no need to pop the arguments |
| there. |
| |
| <p>You can specify that an individual function is called with this calling |
| sequence with the function attribute ‘<samp><span class="samp">stdcall</span></samp>’. You can also |
| override the <samp><span class="option">-mrtd</span></samp> option by using the function attribute |
| ‘<samp><span class="samp">cdecl</span></samp>’. See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>. |
| |
| <p><strong>Warning:</strong> this calling convention is incompatible with the one |
| normally used on Unix, so you cannot use it if you need to call |
| libraries compiled with the Unix compiler. |
| |
| <p>Also, you must provide function prototypes for all functions that |
| take variable numbers of arguments (including <code>printf</code>); |
| otherwise incorrect code will be generated for calls to those |
| functions. |
| |
| <p>In addition, seriously incorrect code will result if you call a |
| function with too many arguments. (Normally, extra arguments are |
| harmlessly ignored.) |
| |
| <br><dt><code>-mregparm=</code><var>num</var><dd><a name="index-mregparm-1317"></a>Control how many registers are used to pass integer arguments. By |
| default, no registers are used to pass arguments, and at most 3 |
| registers can be used. You can control this behavior for a specific |
| function by using the function attribute ‘<samp><span class="samp">regparm</span></samp>’. |
| See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>. |
| |
| <p><strong>Warning:</strong> if you use this switch, and |
| <var>num</var> is nonzero, then you must build all modules with the same |
| value, including any libraries. This includes the system libraries and |
| startup modules. |
| |
| <br><dt><code>-msseregparm</code><dd><a name="index-msseregparm-1318"></a>Use SSE register passing conventions for float and double arguments |
| and return values. You can control this behavior for a specific |
| function by using the function attribute ‘<samp><span class="samp">sseregparm</span></samp>’. |
| See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>. |
| |
| <p><strong>Warning:</strong> if you use this switch then you must build all |
| modules with the same value, including any libraries. This includes |
| the system libraries and startup modules. |
| |
| <br><dt><code>-mpc32</code><dt><code>-mpc64</code><dt><code>-mpc80</code><dd><a name="index-mpc32-1319"></a><a name="index-mpc64-1320"></a><a name="index-mpc80-1321"></a> |
| Set 80387 floating-point precision to 32, 64 or 80 bits. When <samp><span class="option">-mpc32</span></samp> |
| is specified, the significands of results of floating-point operations are |
| rounded to 24 bits (single precision); <samp><span class="option">-mpc64</span></samp> rounds the |
| significands of results of floating-point operations to 53 bits (double |
| precision) and <samp><span class="option">-mpc80</span></samp> rounds the significands of results of |
| floating-point operations to 64 bits (extended double precision), which is |
| the default. When this option is used, floating-point operations in higher |
| precisions are not available to the programmer without setting the FPU |
| control word explicitly. |
| |
| <p>Setting the rounding of floating-point operations to less than the default |
| 80 bits can speed some programs by 2% or more. Note that some mathematical |
| libraries assume that extended precision (80 bit) floating-point operations |
| are enabled by default; routines in such libraries could suffer significant |
| loss of accuracy, typically through so-called "catastrophic cancellation", |
| when this option is used to set the precision to less than extended precision. |
| |
| <br><dt><code>-mstackrealign</code><dd><a name="index-mstackrealign-1322"></a>Realign the stack at entry. On the Intel x86, the <samp><span class="option">-mstackrealign</span></samp> |
| option will generate an alternate prologue and epilogue that realigns the |
| runtime stack if necessary. This supports mixing legacy codes that keep |
| a 4-byte aligned stack with modern codes that keep a 16-byte stack for |
| SSE compatibility. See also the attribute <code>force_align_arg_pointer</code>, |
| applicable to individual functions. |
| |
| <br><dt><code>-mpreferred-stack-boundary=</code><var>num</var><dd><a name="index-mpreferred_002dstack_002dboundary-1323"></a>Attempt to keep the stack boundary aligned to a 2 raised to <var>num</var> |
| byte boundary. If <samp><span class="option">-mpreferred-stack-boundary</span></samp> is not specified, |
| the default is 4 (16 bytes or 128 bits). |
| |
| <br><dt><code>-mincoming-stack-boundary=</code><var>num</var><dd><a name="index-mincoming_002dstack_002dboundary-1324"></a>Assume the incoming stack is aligned to a 2 raised to <var>num</var> byte |
| boundary. If <samp><span class="option">-mincoming-stack-boundary</span></samp> is not specified, |
| the one specified by <samp><span class="option">-mpreferred-stack-boundary</span></samp> will be used. |
| |
| <p>On Pentium and PentiumPro, <code>double</code> and <code>long double</code> values |
| should be aligned to an 8 byte boundary (see <samp><span class="option">-malign-double</span></samp>) or |
| suffer significant run time performance penalties. On Pentium III, the |
| Streaming SIMD Extension (SSE) data type <code>__m128</code> may not work |
| properly if it is not 16 byte aligned. |
| |
| <p>To ensure proper alignment of this values on the stack, the stack boundary |
| must be as aligned as that required by any value stored on the stack. |
| Further, every function must be generated such that it keeps the stack |
| aligned. Thus calling a function compiled with a higher preferred |
| stack boundary from a function compiled with a lower preferred stack |
| boundary will most likely misalign the stack. It is recommended that |
| libraries that use callbacks always use the default setting. |
| |
| <p>This extra alignment does consume extra stack space, and generally |
| increases code size. Code that is sensitive to stack space usage, such |
| as embedded systems and operating system kernels, may want to reduce the |
| preferred alignment to <samp><span class="option">-mpreferred-stack-boundary=2</span></samp>. |
| |
| <br><dt><code>-mmmx</code><dt><code>-mno-mmx</code><dt><code>-msse</code><dt><code>-mno-sse</code><dt><code>-msse2</code><dt><code>-mno-sse2</code><dt><code>-msse3</code><dt><code>-mno-sse3</code><dt><code>-mssse3</code><dt><code>-mno-ssse3</code><dt><code>-msse4.1</code><dt><code>-mno-sse4.1</code><dt><code>-msse4.2</code><dt><code>-mno-sse4.2</code><dt><code>-msse4</code><dt><code>-mno-sse4</code><dt><code>-mavx</code><dt><code>-mno-avx</code><dt><code>-maes</code><dt><code>-mno-aes</code><dt><code>-mpclmul</code><dt><code>-mno-pclmul</code><dt><code>-msse4a</code><dt><code>-mno-sse4a</code><dt><code>-mfma4</code><dt><code>-mno-fma4</code><dt><code>-mxop</code><dt><code>-mno-xop</code><dt><code>-mlwp</code><dt><code>-mno-lwp</code><dt><code>-m3dnow</code><dt><code>-mno-3dnow</code><dt><code>-mpopcnt</code><dt><code>-mno-popcnt</code><dt><code>-mabm</code><dt><code>-mno-abm</code><dd><a name="index-mmmx-1325"></a><a name="index-mno_002dmmx-1326"></a><a name="index-msse-1327"></a><a name="index-mno_002dsse-1328"></a><a name="index-m3dnow-1329"></a><a name="index-mno_002d3dnow-1330"></a>These switches enable or disable the use of instructions in the MMX, |
| SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP, |
| LWP, ABM or 3DNow! extended instruction sets. |
| These extensions are also available as built-in functions: see |
| <a href="X86-Built_002din-Functions.html#X86-Built_002din-Functions">X86 Built-in Functions</a>, for details of the functions enabled and |
| disabled by these switches. |
| |
| <p>To have SSE/SSE2 instructions generated automatically from floating-point |
| code (as opposed to 387 instructions), see <samp><span class="option">-mfpmath=sse</span></samp>. |
| |
| <p>GCC depresses SSEx instructions when <samp><span class="option">-mavx</span></samp> is used. Instead, it |
| generates new AVX instructions or AVX equivalence for all SSEx instructions |
| when needed. |
| |
| <p>These options will enable GCC to use these extended instructions in |
| generated code, even without <samp><span class="option">-mfpmath=sse</span></samp>. Applications which |
| perform runtime CPU detection must compile separate files for each |
| supported architecture, using the appropriate flags. In particular, |
| the file containing the CPU detection code should be compiled without |
| these options. |
| |
| <br><dt><code>-mfused-madd</code><dt><code>-mno-fused-madd</code><dd><a name="index-mfused_002dmadd-1331"></a><a name="index-mno_002dfused_002dmadd-1332"></a>Do (don't) generate code that uses the fused multiply/add or multiply/subtract |
| instructions. The default is to use these instructions. |
| |
| <br><dt><code>-mcld</code><dd><a name="index-mcld-1333"></a>This option instructs GCC to emit a <code>cld</code> instruction in the prologue |
| of functions that use string instructions. String instructions depend on |
| the DF flag to select between autoincrement or autodecrement mode. While the |
| ABI specifies the DF flag to be cleared on function entry, some operating |
| systems violate this specification by not clearing the DF flag in their |
| exception dispatchers. The exception handler can be invoked with the DF flag |
| set which leads to wrong direction mode, when string instructions are used. |
| This option can be enabled by default on 32-bit x86 targets by configuring |
| GCC with the <samp><span class="option">--enable-cld</span></samp> configure option. Generation of <code>cld</code> |
| instructions can be suppressed with the <samp><span class="option">-mno-cld</span></samp> compiler option |
| in this case. |
| |
| <br><dt><code>-mcx16</code><dd><a name="index-mcx16-1334"></a>This option will enable GCC to use CMPXCHG16B instruction in generated code. |
| CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword) |
| data types. This is useful for high resolution counters that could be updated |
| by multiple processors (or cores). This instruction is generated as part of |
| atomic built-in functions: see <a href="Atomic-Builtins.html#Atomic-Builtins">Atomic Builtins</a> for details. |
| |
| <br><dt><code>-msahf</code><dd><a name="index-msahf-1335"></a>This option will enable GCC to use SAHF instruction in generated 64-bit code. |
| Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions supported |
| by AMD64 until introduction of Pentium 4 G1 step in December 2005. LAHF and |
| SAHF are load and store instructions, respectively, for certain status flags. |
| In 64-bit mode, SAHF instruction is used to optimize <code>fmod</code>, <code>drem</code> |
| or <code>remainder</code> built-in functions: see <a href="Other-Builtins.html#Other-Builtins">Other Builtins</a> for details. |
| |
| <br><dt><code>-mmovbe</code><dd><a name="index-mmovbe-1336"></a>This option will enable GCC to use movbe instruction to implement |
| <code>__builtin_bswap32</code> and <code>__builtin_bswap64</code>. |
| |
| <br><dt><code>-mcrc32</code><dd><a name="index-mcrc32-1337"></a>This option will enable built-in functions, <code>__builtin_ia32_crc32qi</code>, |
| <code>__builtin_ia32_crc32hi</code>. <code>__builtin_ia32_crc32si</code> and |
| <code>__builtin_ia32_crc32di</code> to generate the crc32 machine instruction. |
| |
| <br><dt><code>-mrecip</code><dd><a name="index-mrecip-1338"></a>This option will enable GCC to use RCPSS and RSQRTSS instructions (and their |
| vectorized variants RCPPS and RSQRTPS) with an additional Newton-Raphson step |
| to increase precision instead of DIVSS and SQRTSS (and their vectorized |
| variants) for single precision floating point arguments. These instructions |
| are generated only when <samp><span class="option">-funsafe-math-optimizations</span></samp> is enabled |
| together with <samp><span class="option">-finite-math-only</span></samp> and <samp><span class="option">-fno-trapping-math</span></samp>. |
| Note that while the throughput of the sequence is higher than the throughput |
| of the non-reciprocal instruction, the precision of the sequence can be |
| decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). |
| |
| <p>Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) |
| already with <samp><span class="option">-ffast-math</span></samp> (or the above option combination), and |
| doesn't need <samp><span class="option">-mrecip</span></samp>. |
| |
| <br><dt><code>-mveclibabi=</code><var>type</var><dd><a name="index-mveclibabi-1339"></a>Specifies the ABI type to use for vectorizing intrinsics using an |
| external library. Supported types are <code>svml</code> for the Intel short |
| vector math library and <code>acml</code> for the AMD math core library style |
| of interfacing. GCC will currently emit calls to <code>vmldExp2</code>, |
| <code>vmldLn2</code>, <code>vmldLog102</code>, <code>vmldLog102</code>, <code>vmldPow2</code>, |
| <code>vmldTanh2</code>, <code>vmldTan2</code>, <code>vmldAtan2</code>, <code>vmldAtanh2</code>, |
| <code>vmldCbrt2</code>, <code>vmldSinh2</code>, <code>vmldSin2</code>, <code>vmldAsinh2</code>, |
| <code>vmldAsin2</code>, <code>vmldCosh2</code>, <code>vmldCos2</code>, <code>vmldAcosh2</code>, |
| <code>vmldAcos2</code>, <code>vmlsExp4</code>, <code>vmlsLn4</code>, <code>vmlsLog104</code>, |
| <code>vmlsLog104</code>, <code>vmlsPow4</code>, <code>vmlsTanh4</code>, <code>vmlsTan4</code>, |
| <code>vmlsAtan4</code>, <code>vmlsAtanh4</code>, <code>vmlsCbrt4</code>, <code>vmlsSinh4</code>, |
| <code>vmlsSin4</code>, <code>vmlsAsinh4</code>, <code>vmlsAsin4</code>, <code>vmlsCosh4</code>, |
| <code>vmlsCos4</code>, <code>vmlsAcosh4</code> and <code>vmlsAcos4</code> for corresponding |
| function type when <samp><span class="option">-mveclibabi=svml</span></samp> is used and <code>__vrd2_sin</code>, |
| <code>__vrd2_cos</code>, <code>__vrd2_exp</code>, <code>__vrd2_log</code>, <code>__vrd2_log2</code>, |
| <code>__vrd2_log10</code>, <code>__vrs4_sinf</code>, <code>__vrs4_cosf</code>, |
| <code>__vrs4_expf</code>, <code>__vrs4_logf</code>, <code>__vrs4_log2f</code>, |
| <code>__vrs4_log10f</code> and <code>__vrs4_powf</code> for corresponding function type |
| when <samp><span class="option">-mveclibabi=acml</span></samp> is used. Both <samp><span class="option">-ftree-vectorize</span></samp> and |
| <samp><span class="option">-funsafe-math-optimizations</span></samp> have to be enabled. A SVML or ACML ABI |
| compatible library will have to be specified at link time. |
| |
| <br><dt><code>-mabi=</code><var>name</var><dd><a name="index-mabi-1340"></a>Generate code for the specified calling convention. Permissible values |
| are: ‘<samp><span class="samp">sysv</span></samp>’ for the ABI used on GNU/Linux and other systems and |
| ‘<samp><span class="samp">ms</span></samp>’ for the Microsoft ABI. The default is to use the Microsoft |
| ABI when targeting Windows. On all other systems, the default is the |
| SYSV ABI. You can control this behavior for a specific function by |
| using the function attribute ‘<samp><span class="samp">ms_abi</span></samp>’/‘<samp><span class="samp">sysv_abi</span></samp>’. |
| See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>. |
| |
| <br><dt><code>-mpush-args</code><dt><code>-mno-push-args</code><dd><a name="index-mpush_002dargs-1341"></a><a name="index-mno_002dpush_002dargs-1342"></a>Use PUSH operations to store outgoing parameters. This method is shorter |
| and usually equally fast as method using SUB/MOV operations and is enabled |
| by default. In some cases disabling it may improve performance because of |
| improved scheduling and reduced dependencies. |
| |
| <br><dt><code>-maccumulate-outgoing-args</code><dd><a name="index-maccumulate_002doutgoing_002dargs-1343"></a>If enabled, the maximum amount of space required for outgoing arguments will be |
| computed in the function prologue. This is faster on most modern CPUs |
| because of reduced dependencies, improved scheduling and reduced stack usage |
| when preferred stack boundary is not equal to 2. The drawback is a notable |
| increase in code size. This switch implies <samp><span class="option">-mno-push-args</span></samp>. |
| |
| <br><dt><code>-mthreads</code><dd><a name="index-mthreads-1344"></a>Support thread-safe exception handling on ‘<samp><span class="samp">Mingw32</span></samp>’. Code that relies |
| on thread-safe exception handling must compile and link all code with the |
| <samp><span class="option">-mthreads</span></samp> option. When compiling, <samp><span class="option">-mthreads</span></samp> defines |
| <samp><span class="option">-D_MT</span></samp>; when linking, it links in a special thread helper library |
| <samp><span class="option">-lmingwthrd</span></samp> which cleans up per thread exception handling data. |
| |
| <br><dt><code>-mno-align-stringops</code><dd><a name="index-mno_002dalign_002dstringops-1345"></a>Do not align destination of inlined string operations. This switch reduces |
| code size and improves performance in case the destination is already aligned, |
| but GCC doesn't know about it. |
| |
| <br><dt><code>-minline-all-stringops</code><dd><a name="index-minline_002dall_002dstringops-1346"></a>By default GCC inlines string operations only when destination is known to be |
| aligned at least to 4 byte boundary. This enables more inlining, increase code |
| size, but may improve performance of code that depends on fast memcpy, strlen |
| and memset for short lengths. |
| |
| <br><dt><code>-minline-stringops-dynamically</code><dd><a name="index-minline_002dstringops_002ddynamically-1347"></a>For string operation of unknown size, inline runtime checks so for small |
| blocks inline code is used, while for large blocks library call is used. |
| |
| <br><dt><code>-mstringop-strategy=</code><var>alg</var><dd><a name="index-mstringop_002dstrategy_003d_0040var_007balg_007d-1348"></a>Overwrite internal decision heuristic about particular algorithm to inline |
| string operation with. The allowed values are <code>rep_byte</code>, |
| <code>rep_4byte</code>, <code>rep_8byte</code> for expanding using i386 <code>rep</code> prefix |
| of specified size, <code>byte_loop</code>, <code>loop</code>, <code>unrolled_loop</code> for |
| expanding inline loop, <code>libcall</code> for always expanding library call. |
| |
| <br><dt><code>-momit-leaf-frame-pointer</code><dd><a name="index-momit_002dleaf_002dframe_002dpointer-1349"></a>Don't keep the frame pointer in a register for leaf functions. This |
| avoids the instructions to save, set up and restore frame pointers and |
| makes an extra register available in leaf functions. The option |
| <samp><span class="option">-fomit-frame-pointer</span></samp> removes the frame pointer for all functions |
| which might make debugging harder. |
| |
| <br><dt><code>-mtls-direct-seg-refs</code><dt><code>-mno-tls-direct-seg-refs</code><dd><a name="index-mtls_002ddirect_002dseg_002drefs-1350"></a>Controls whether TLS variables may be accessed with offsets from the |
| TLS segment register (<code>%gs</code> for 32-bit, <code>%fs</code> for 64-bit), |
| or whether the thread base pointer must be added. Whether or not this |
| is legal depends on the operating system, and whether it maps the |
| segment to cover the entire TLS area. |
| |
| <p>For systems that use GNU libc, the default is on. |
| |
| <br><dt><code>-msse2avx</code><dt><code>-mno-sse2avx</code><dd><a name="index-msse2avx-1351"></a>Specify that the assembler should encode SSE instructions with VEX |
| prefix. The option <samp><span class="option">-mavx</span></samp> turns this on by default. |
| </dl> |
| |
| <p>These ‘<samp><span class="samp">-m</span></samp>’ switches are supported in addition to the above |
| on AMD x86-64 processors in 64-bit environments. |
| |
| <dl> |
| <dt><code>-m32</code><dt><code>-m64</code><dd><a name="index-m32-1352"></a><a name="index-m64-1353"></a>Generate code for a 32-bit or 64-bit environment. |
| The 32-bit environment sets int, long and pointer to 32 bits and |
| generates code that runs on any i386 system. |
| The 64-bit environment sets int to 32 bits and long and pointer |
| to 64 bits and generates code for AMD's x86-64 architecture. For |
| darwin only the -m64 option turns off the <samp><span class="option">-fno-pic</span></samp> and |
| <samp><span class="option">-mdynamic-no-pic</span></samp> options. |
| |
| <br><dt><code>-mno-red-zone</code><dd><a name="index-mno_002dred_002dzone-1354"></a>Do not use a so called red zone for x86-64 code. The red zone is mandated |
| by the x86-64 ABI, it is a 128-byte area beyond the location of the |
| stack pointer that will not be modified by signal or interrupt handlers |
| and therefore can be used for temporary data without adjusting the stack |
| pointer. The flag <samp><span class="option">-mno-red-zone</span></samp> disables this red zone. |
| |
| <br><dt><code>-mcmodel=small</code><dd><a name="index-mcmodel_003dsmall-1355"></a>Generate code for the small code model: the program and its symbols must |
| be linked in the lower 2 GB of the address space. Pointers are 64 bits. |
| Programs can be statically or dynamically linked. This is the default |
| code model. |
| |
| <br><dt><code>-mcmodel=kernel</code><dd><a name="index-mcmodel_003dkernel-1356"></a>Generate code for the kernel code model. The kernel runs in the |
| negative 2 GB of the address space. |
| This model has to be used for Linux kernel code. |
| |
| <br><dt><code>-mcmodel=medium</code><dd><a name="index-mcmodel_003dmedium-1357"></a>Generate code for the medium model: The program is linked in the lower 2 |
| GB of the address space. Small symbols are also placed there. Symbols |
| with sizes larger than <samp><span class="option">-mlarge-data-threshold</span></samp> are put into |
| large data or bss sections and can be located above 2GB. Programs can |
| be statically or dynamically linked. |
| |
| <br><dt><code>-mcmodel=large</code><dd><a name="index-mcmodel_003dlarge-1358"></a>Generate code for the large model: This model makes no assumptions |
| about addresses and sizes of sections. |
| </dl> |
| |
| </body></html> |
| |