You are not a slave!
Closed educational course for children of the elite: "The true arrangement of the world."
http://noslave.org
From Wikipedia, the free encyclopedia
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). | ||||||||||||||||||||||||||||||||||||
Type | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Author |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Developer | ||||||||||||||||||||||||||||||||||||
Developers |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Written in |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Interface |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
operating system | ||||||||||||||||||||||||||||||||||||
Interface languages |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
First edition |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Hardware platform | ||||||||||||||||||||||||||||||||||||
latest version | ||||||||||||||||||||||||||||||||||||
release candidate |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
beta version |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
alpha version |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Test version |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Readable file formats |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
Generated file formats |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
State |
Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). |
|||||||||||||||||||||||||||||||||||
License |
Key features:
The compiler supports the OpenMP 3.0 standard for writing parallel programs. It also contains a modification of OpenMP called Cluster OpenMP, with which you can run applications written according to OpenMP on clusters using MPI. The Intel C++ Compiler uses a frontend (the part of the compiler that parses the program being compiled) from the Edison Design Group. The same frontend is used by the SGI MIPSpro, Comeau C++, Portland Group compilers. This compiler is widely used for compiling SPEC CPU benchmarks. There are 4 series of products from Intel containing the compiler:
The disadvantages of the Linux version of the compiler include partial incompatibility with the GNU extensions of the C language (supported by the GCC compiler), which can cause problems when compiling some programs.
Experimental variantsThe following experimental versions of the compiler have been published:
Main flags
Write a review on the article "Intel C++ compiler"Notessee alsoLinks
An excerpt characterizing the Intel C++ compilerAnd yet, she returned in order to see the White Magus for the last time ... Her husband and truest friend, whom she could never forget. In her heart, she forgave him. But, to his great regret, she could not bring him the forgiveness of Magdalene .... So, as you see, Isidora, the great Christian fable about "forgiveness" is just a childish lie for naive believers to allow them to do any Evil, knowing that whatever they do, they will eventually be forgiven. But you can forgive only that which is truly worthy of forgiveness. A person must understand that he has to answer for any evil done... And not before some mysterious God, but before himself, forcing himself to suffer cruelly. Magdalena did not forgive Vladyka, although she deeply respected and sincerely loved him. Just as she failed to forgive all of us for the terrible death of Radomir. After all, it was SHE who understood best of all - we could help him, we could save him from a cruel death ... But we did not want to. Considering the guilt of the White Magus too cruel, she left him to live with this guilt, not for a moment forgetting it... She did not want to grant him an easy forgiveness. We never saw her again. As never saw their babies. Through one of the knights of her Temple - our sorcerer - Magdalena conveyed the answer to the Lord to his request to return to us: “The sun does not rise twice in one day ... The joy of your world (Radomir) will never return to you, just as I will not return to you and I... I found my FAITH and my TRUTH, they are LIVE, yours is DEAD... Mourn your sons - they loved you. I will never forgive you for their deaths as long as I live. And may your guilt remain with you. Perhaps someday she will bring you Light and Forgiveness ... But not from me. The head of Magus John was not brought to Meteora for the same reason - none of the Knights of the Temple wanted to return to us ... We lost them, as we lost many others more than once, who did not want to understand and accept our victims ... Who is it just like you - they left, condemning us.I felt dizzy!.. As a thirsty one, satisfying my eternal hunger for knowledge, I greedily absorbed the flow of amazing information generously given by the North... And I wanted much more!.. I wanted to know everything to the end. It was a breath of fresh water in the desert scorched by pain and misfortune! And I couldn't drink enough... I have a thousand questions! But there is no time left ... What should I do, Sever? .. - Ask, Isidora!.. Ask, I will try to answer you... - Tell me, Sever, why does it seem to me that in this story two stories of life, intertwined with similar events, are connected, and they are presented as the life of one person? Or am I not right? – You are absolutely right, Isidora. As I told you earlier, the “powerful ones of this world”, who created a false history of mankind, “put” on the true life of Christ the alien life of the Jewish prophet Joshua, who lived one and a half thousand years ago (since the story of the North). And not only himself, but also his family, his relatives and friends, his friends and followers. After all, it was the wife of the prophet Joshua, the Jewish Mary, who had a sister Martha and a brother Lazarus, his mother's sister Maria Yakobe, and others who were never near Radomir and Magdalena. Just as there were no other "apostles" next to them - Paul, Matthew, Peter, Luke and the rest ... It was the family of the prophet Joshua who moved one and a half thousand years ago to Provence (which at that time was called Gaul (Transalpine Gaul), to the Greek city of Massalia (now Marseille), since Massalia at that time was the “gateway” between Europe and Asia, and it was the easiest way for all the “persecuted” to avoid persecution and misfortune.
|
Introduction In late 2003, Intel introduced version 8.0 of its compiler collection. New compilers are designed to improve the performance of applications running on servers, desktops and mobile systems(laptops, mobile phones and PDAs) based on Intel processors. We are pleased to note that this product was created with the active participation of employees of the Nizhny Novgorod Intel Software Development Center and Intel specialists from Sarov.
The new series includes Intel compilers for C++ and Fortran for Windows and Linux, as well as Intel compilers for C++ for Windows CE .NET. The compilers target systems based on the following Intel processors: Intel Itanium 2, Intel Xeon, Intel Pentium 4, Intel Personal Internet Client Architecture processors for mobile phones and PDAs, and the Intel Pentium M processor for mobile PCs (a component of Intel Centrino technology for mobile phones). PC).
The Intel Visual Fortran Compiler for Windows provides next-generation compilation technologies for high-performance computing. It combines the functionality of the Compaq Visual Fortran (CVF) language with the performance improvements made possible by Intel's compilation and code generation optimization technologies and simplifies the task of porting source code, developed with CVF, into the Intel Visual Fortran environment. This compiler is the first to implement CVF functions for both 32-bit Intel systems and systems based on the Intel Itanium processor family running in Windows environment. In addition, this compiler allows you to implement CVF language functions on Linux systems based on 32-bit Intel processors and the Intel Itanium processor family. In 2004, it is planned to release an extended version of this compiler - the Intel Visual Fortran Compiler Professional Edition compiler for Windows, which will include the IMSL Fortran 5.0 Library developed by Visual Numerics, Inc.
"The new compilers also support Intel's upcoming processors, code-named Prescott, which provide new commands to improve graphics and video performance, as well as other performance enhancements. They also support new technology Mobile MMX(tm), which similarly improves the performance of graphics, sound and video applications for mobile phones and PDAs, - said Alexei Odinokov, co-director of the Intel Software Development Center in Nizhny Novgorod. - These compilers provide application developers with a single set of tools for building new applications for wireless networks based on Intel architecture. The new Intel compilers also support Intel's Hyper-Threading Technology and the OpenMP 2.0 industry specification, which defines the use of directives high level to control the flow of instructions in applications".
Among the new tools included in the compilers are the Intel Code Coverage and Intel Test Prioritization tools. Together, these tools help accelerate application development and improve application quality by improving the testing process. software.
The Code Coverage tool, when testing an application, provides complete details about the application's logic usage and the location of the areas used in the application's source code. If changes are made to the application or if this test does not allow checking the part of the application that is of interest to the developer, the Test Prioritization tool allows you to check the operation of the selected area program code.
The new Intel compilers come in a variety of configurations ranging from $399 to $1,499. They can be purchased today from Intel Corporation or from resellers around the world, a list of which is located on the site. http://www.intel.com/software/products/reseller.htm#Russia.
Support for Prescott processors
Support for the Intel Pentium 4 (Prescott) processor in the eighth version of the compiler is as follows:1. Support for SSE3 commands (or PNI, Prescott New Instructions). There are three ways to highlight here:
A. Assembler inserts (Inline assembly). For example, the compiler will recognize the following use of the SSE3 command _asm(addsubpd xmm0, xmm1). Thus, users interested in low-level optimization can directly access the assembler commands.
b. In the C/C++ compiler, new instructions are available from a higher level than using assembler inserts. Namely, through built-in functions (intrinsic functions):
Built-in Functions
built-in function | Generated command |
---|---|
_mm_addsub_ps | Addsubps |
_mm_hadd_ps | Haddps |
_mm_hsub_ps | Msubps |
_mm_moveldup_ps | Movsldup |
_mm_movehdup_ps | Movshdup |
_mm_addsub_pd | Addsubpd |
_mm_hadd_pd | Haddpd |
_mm_hsub_pd | hsubpd |
_mm_loaddup_pd | movddup xmm, m64 |
_mm_movedup_pd | movddup reg, reg |
_mm_lddqu_si128 | Lddqu |
The table shows the built-in functions and corresponding assembler instructions from the SSE3 set. The same support exists for commands from the MMX\SSE\SSE2 sets. This allows the programmer to perform low-level code optimization without resorting to assembly language programming: the compiler itself takes care of mapping (mapping "e) the built-in functions to the corresponding processor instructions and the optimal use of registers. The programmer can concentrate on creating an algorithm that effectively uses new instruction sets.
V. Automatic generation of new commands by the compiler. The previous two methods involve the use of new commands by the programmer. But the compiler is also able (using appropriate options - see section 3 below) to automatically generate new instructions from the SSE3 set for C/C++ and Fortran code. For example, the optimized unaligned loading command (lddqu), which allows you to get a performance gain of up to 40% (for example, in video and audio coding tasks). Other commands from the SSE3 set allow you to get a significant acceleration in 3D graphics tasks or computational tasks using complex numbers. For example, the graph in section 3.1 below shows that for the 168.wupwise application from the SPEC CPU2000 FP suite, the speedup obtained from automatic generation of SSE3 instructions was ~25%. The performance of this application greatly depends on the speed of complex number arithmetic.
2. Using the microarchitectural advantages of the Prescott processor. When generating code, the compiler takes into account microarchitectural changes in the new processor. For example, some operations (such as integer shifts, integer multiplications, or number conversions between different floating point formats in SSE2) are faster on the new processor compared to previous versions (say, an integer shift now takes one processor cycle versus four for the previous version). Intel Pentium 4 processor). More intensive use of such commands allows you to get a significant acceleration of applications.
Another example of microarchitectural changes is the improved store forwarding mechanism (fast loading of data previously stored in memory); real saving does not even take place in the cache memory, but in some intermediate save buffer, which then allows for very fast access to the data. Such a feature of the architecture makes it possible, for example, to carry out more aggressive automatic vectorization of the program code.
The compiler also takes into account the increased amount of cache memory in the first and second levels.
3. Improved support for Hyper-Threading technology. This item may well be related to the previous one - microarchitectural changes and their use in the compiler. For example, a runtime library that supports the OpenMP industry specification has been optimized to run on the new processor.
Performance
Using compilers is an easy and efficient way to take advantage of Intel processor architectures. Below, two ways of using compilers are conditionally (very) highlighted: a) recompilation of programs with possible change compiler settings, b) recompilation with a change in both compiler settings and source text, as well as using compiler diagnostics for ongoing optimizations and the possible use of other software tools(for example, profilers).1.1 Optimizing programs by recompiling and changing compiler settings
Often, the first step in migrating to a new optimizing compiler is to use it with the default settings. The next logical step is to use options for more aggressive optimization. Figures 1, 2, 3 and 4 show the effect of switching to the Intel compiler version 8.0 compared to using other industry-leading products (-O2 - default compiler settings, base - settings on maximum performance). The comparison is made on 32-bit and 64-bit Intel architectures. Applications from SPEC CPU2000 are used as a test set.
Picture 1
Figure 2
Figure 3
Figure 4
Some of the options are listed below (hereinafter, the options are for the Windows OS family; for the Linux OS family, there are options with the same effect, but the name may differ; for example, -Od or QxK for Windows have a similar effect with -O0 or -xK for Linux respectively, more detailed information can be found in the compiler manual) supported by the Intel compiler.
Optimization levels control: Options -Od (no optimizations; used for debugging programs), -O1 (maximum speed while minimizing code size), -O2 (optimization for code execution speed; used by default), -O3 (enables the most aggressive optimizations for code execution speed ; in some cases it can lead to the opposite effect, i.e. to a slowdown; it should be noted that on IA-64 the use of -O3 leads to acceleration in most cases, while the positive effect on IA-32 is less pronounced). Examples of optimizations enabled by -O3 are loop interchange, loop fusion, loop distribution (reverse loop fusion optimization), software prefetch of data. The reason why slowness is possible when using -O3 may be that the compiler used a heuristic approach to choose aggressive optimization for specific case, without having sufficient information about the program (for example, generated prefetch instructions for the data used in the loop, believing that the loop is executed a large number of times, when in fact it has only a few iterations). Interprocedural profiling optimization, as well as a variety of programmer "hints" (see Section 3.2) can help in this situation.
Interprocedural optimization: -Qip (within a single file) and -Qipo (within several or all project files). Includes such optimizations as, for example, inline substitution of frequently used code (reducing the cost of calling a function/procedure). Represents information to other stages of optimization - for example, information about the upper bound of the loop (say, if it is a compile-time constant defined in one file, but used in many) or information about data alignment in memory (many MMX\SSE\SSE2\SSE3 commands work faster if the operands are aligned in memory on an 8 or 16 byte boundary). The analysis of memory allocation procedures (implemented/called in one of the project files) is passed to those functions/procedures where this memory is used (this can help the compiler to abandon the conservative assumption that the data is not properly aligned in memory; and the assumption should be conservative when no additional information). Disambiguation, data aliasing analysis can serve as another example: in the absence of additional information and the impossibility of proving the absence of intersections, the compiler proceeds from the conservative assumption that there are intersections. Such a decision can negatively affect the quality of such optimizations as, for example, automatic vectorization on IA-32 or software pipelining (software pipelining or SWP) on IA-64. Interprocedural optimization can help in analyzing the presence of memory intersections.
Profiling Optimization: Includes three stages. 1) generating instrumented code using the -Qprof_gen option. 2) the resulting code is run on representative data, while running, information is collected about various characteristics of code execution (for example, transition probabilities or a typical value for the number of loop iterations). 3) Recompilation with the -Qprof_use option, which ensures that the compiler uses the information collected in the previous step. Thus, the compiler has the ability to use not only static estimates of important program characteristics, but also data obtained during a real run of the program. This can help with the subsequent choice of certain optimizations (for example, a more efficient arrangement in memory of various branches of the program, based on information about which branches were executed at what frequency; or applying an optimization to a loop based on information about the typical number of iterations in it) . Profiling optimization is especially useful when it is possible to select a small but representative data set (for step #2) that well illustrates the most typical future use cases of the program. In some subject areas, the choice of such a representative set is quite possible. For example, profiling optimization is used by DBMS developers.
The optimizations listed above are of the generic type, i.e. the generated code will work on all different processors of the family (say, in the case of a 32-bit architecture, on all of the following processors: Intel Pentium-III, Pentium 4, including the Prescott core, Intel Pentium M). There are also optimizations for a specific processor.
Processor specific optimizations: -QxK (Pentium-III; use of SSE commands, microarchitecture specifics), -QxW and -QxN (Pentium 4; use of SSE and SSE2 commands, microarchitecture specifics), -QxB (Pentium M; use of SSE and SSE2 commands, microarchitecture specifics) ), QxP (Prescott; use of SSE, SSE2, and SSE3 commands, microarchitecture features). In this case, the code generated using these options may not work on other representatives of the processor family (for example, -QxW code may result in the execution of an invalid command if it is executed on a system based on an Intel Pentium-III processor). Or work not with maximum efficiency (for example, -QxB code on a Pentium 4 processor due to differences in microarchitecture). With these options, it is also possible to use runtime libraries optimized for a specific processor using its instruction set. To control that the code is actually executed on the target processor, a dispatch mechanism (cpu-dispatch) is implemented: checking the processor during program execution. In various situations, this mechanism can either be activated or not. Dispatch is always used if the -Qax(KWNP) option variation is used. In this case, two versions of the code are generated: optimized for a specific processor and "general" (generic), the choice occurs during the execution of the program. Thus, by increasing the size of the code, it is possible to achieve program execution on all processors of the line and optimal execution on the target processor. Another option is to use code optimization for the previous representative of the line and use this code on this and subsequent processors. For example, -QxN code can run on a Pentium 4 with both Northwood and Prescott cores. There is no increase in code size. With this approach, you can get good, but still not optimal performance on a system with a Prescott processor (because SSE3 is not used and microarchitecture differences are not taken into account) with optimal performance on Northwood. Similar options also exist for IA-64 architecture processors. On this moment there are two of them: -G1 (Itanium) and -G2 (Itanium 2; default option).
The graph below (Figure 5) shows the speedup (based on one - no speedup) from using some of the optimizations listed above (namely -O3 -Qipo -Qprof_use -Qx(N,P)) on a Prescott processor compared with default settings (-O2). Using -QxP helps in some cases to get a speedup compared to -QxN. The greatest speedup is achieved in the 168.wupwise application already mentioned in the previous section (due to intensive optimization of complex arithmetic using SSE3 instructions).
Figure 5
Figure 6 below shows the ratio (in times) of the speed of the code with optimal settings compared to completely unoptimized code (-Od) on Pentium 4 and Itanium 2 processors. It can be seen that Itanium 2 depends much more on the quality of optimization. This is especially pronounced for floating point (FP) calculations, where the ratio is about 36 times. Floating point calculations are strong point IA-64 architectures, but care must be taken to use the most efficient compiler settings. The resulting gain in productivity pays for the labor spent on finding them.
Figure 6. Acceleration when using the best optimization options SPEC CPU200
Intel compilers support the OpenMP industry specification for building multi-threaded applications. Explicit (option -Qopenmp) and automatic (-Qparallel) parallelization are supported. In the case of explicit mode, the programmer is responsible for the correct and efficient use of the OpenMP standard. In the case of automatic parallelization, the compiler has an additional burden associated with the analysis of the program code. For this reason, at present, automatic parallelization works effectively only on fairly simple codes.
The graph in Figure 7 shows the acceleration from using explicit parallelization on an engineering (pre-production) sample system based on an Intel Pentium 4 processor (Prescott) with Hyper-Threading technology support: 2.8GHz, 2GB RAM, 8K L1-Cache, 512K L2-Cache . SPEC OMPM2001 is used as a test suite. This set focuses on small and medium SMP systems, memory consumption is up to two gigabytes. The applications were compiled using Intel 8.0 C/C++ and Fortran with two sets of options: -Qopenmp -Qipo -O3 -QxN and -Qopenmp -Qipo -O3 -QxP, with each of which the applications started with Hyper-Threading enabled and disabled. The acceleration values on the graph are normalized to the performance of the single-threaded version with Hyper-Threading disabled.
Figure 7: Applications from the SPEC OMPM2001 suite on a Prescott processor
It can be seen that in 9 out of 11 cases, the use of explicit parallelization using OpenMP gives a performance boost when Hyper-Threading technology is enabled. One application (312.swim) is experiencing slowdowns. It is a known fact that this application is characterized by a high degree of dependence on bandwidth memory. As with the SPEC CPU2000, wupwise benefits greatly from Prescott optimizations (-QxP).
1.2 Optimizing programs with changes to the source code and using compiler diagnostics
In the previous sections, we considered the influence of the compiler (and its settings) on the speed of code execution. At the same time, Intel compilers provide more opportunities for code optimization than just changing settings. In particular, compilers allow the programmer to make "hints" in the program code, which allow the generation of more efficient code in terms of performance. Below are some examples for the C/C++ language (there are similar tools for the Fortran language, differing only in syntax).
#pragma ivdep (where ivdep means ignore vector dependencies) is used before program loops to tell the compiler that there are no data dependencies inside. This hint works when the compiler (based on the analysis) conservatively assumes that such dependencies can exist (if the compiler can prove that the dependency exists as a result of the analysis, then the "hint" has no effect), while the code author knows that such dependencies cannot arise. With this hint, the compiler can generate more efficient code: automatic vectorization for IA-32 (using vector instructions from the MMX\SSE\SSE2\SSE3 sets for C/C++ and Fortran program loops; you can learn more about this technique, for example, next article in the Intel Technology Journal), software pipelining (SWP) for IA-64.
#pragma vector always is used to force the compiler to change the decision about the inefficiency of loop vectorization (both automatic for IA-32 and SWP for IA-64), based on an analysis of the quantitative and qualitative characteristics of the work at each iteration.
#pragma novector does the opposite of #pragma vector always.
#pragma vector aligned is used to tell the compiler that the data used in the loop is aligned on a 16 byte boundary. This allows you to generate more efficient and/or compact (due to the lack of runtime checks) code.
#pragma vector unaligned does the opposite of #pragma aligned. It is difficult to talk about performance gains in this case, but you can count on a more compact code.
#pragma distribute point is used inside the program loop so that the compiler can split the distribution loop at this point into several smaller ones. For example, such a "hint" can be used when the compiler fails to automatically vectorize the source loop (for example, due to a data dependency that cannot be ignored even with #pragma ivdep), while each (or part) of the newly formed cycles can be efficiently vectorized.
#pragma loop count (N) is used to tell the compiler that the most likely value for the number of iterations of the loop will be N. This information helps to decide on the most effective optimization for this loop (for example, whether to unroll, whether to do SWP or automatic vectorization, whether to use software data prefetch commands, ...)
The "hint" _assume_aligned(p, base) is used to tell the compiler that the memory region associated with pointer p is aligned on a base = 2^n byte boundary.
This is far from full list various "hints" to the compiler, which can significantly affect the efficiency of the generated code. The question may arise as to how to determine that the compiler needs a hint.
First, you can use compiler diagnostics in the form of reports that it provides to the programmer. For example, using the -Qvec_reportN option (where N varies from 0 to 3 and represents the level of detail) you can get an automatic vectorization report. The programmer will have access to information about which loops have been vectorized and which have not. Otherwise, the compiler reports the reasons why the vectorization failed. Let's assume that the cause was a conservatively assumed dependence on the data. In this case, if the programmer is sure that the dependency cannot occur, then #pragma ivdep can be used. Compiler provides similar (comparing with Qvec_reportN for IA-32) capabilities on IA-64 to control the presence and effectiveness of SWP. In general, Intel compilers provide ample opportunities for diagnosing optimizations.
Second, other software products (such as the Intel VTune profiler) can be used to find performance bottlenecks in the code. The results of the analysis can help the programmer make the necessary changes.
You can also use the assembler code listing generated by the compiler for analysis.
Figure 8
Figure 8 above shows the step-by-step process of optimizing an application using a compiler (and other software products) Intel in Fortran language for IA-64 architecture. As an example, a non-adiabatic regional forecast scheme for 48 hours of the Roshydrometcenter is considered (you can read about it, for example, in this article. The article talks about the calculation time of about 25 minutes, but significant changes have occurred since it was written. Code performance is taken as a starting point on a Cray-YMP system Unmodified code with default compiler options (-O2) showed a performance gain of 20% on a 4-way system based on an Intel Itanium 2 900 MHz processor Applying more aggressive optimization (-O3) resulted in a ~2.5x speedup without changing the code mainly due to SWP and data prefetch Analysis using compiler diagnostics and Intel VTune profiler revealed some bottlenecks For example, the compiler did not programmatically pipeline several performance-critical loops, reporting in the report that it suggests data dependency .Small changes to the code (directive ivdep) helped to achieve the effect active conveying. Using the VTune profiler, it was found (and the compiler report confirmed this) that the compiler did not change the order of nested loops (loop interchange) for more efficient use of the cache memory. The reason was again conservative assumptions about the dependence on the data. Changes have been made in the source code of the program. As a result, we managed to achieve a 4-fold acceleration in relation to the initial version. Using explicit parallelization with OpenMP directives, and then moving to a system with more than high frequency allowed to reduce the calculation time to less than 8 minutes, which gave more than 16 times the speedup compared to the initial version.
Intel Visual Fortran
Intel Visual Fortran 8.0 uses the front-end (part of the compiler responsible for converting the program from text in the programming language to the internal representation of the compiler, which is largely independent of either the programming language or the target machine), CVF compiler technologies and components of the Intel compiler, responsible for a set of optimizations and code generation.Figure 9
Figure 10
Figures 9 and 10 show comparison graphs Intel performance Visual Fortran 8.0 with the previous version of Intel Fortran 7.1 and with other industry-famous compilers from this language running under the OS Windows families and Linux. For comparison, tests were used, the source texts of which, meeting the F77 and F90 standards, are available at http://www.polyhedron.com/. On the same site, more detailed information on comparing compiler performance is available (Win32 Compiler Comparisons -> Fortran (77, 90) Execution Time Benchmarks and Linux Compiler Comparisons -> Fortran (77, 90) Execution Time Benchmarks): more different compilers are shown, and the geometric mean is given in conjunction with the individual results for each test.
In the previous issue of the magazine, we discussed products of the Intel VTune Performance Analyzer family - performance analysis tools that are well-deservedly popular with application developers and allow you to detect in the code team applications, which consumes too much processor resources, which gives developers the opportunity to identify and eliminate potential bottlenecks associated with such sections of code, thereby speeding up the application development process. Note, however, that the performance of applications largely depends on how efficient the compilers used in their development are, and what hardware features they use when generating machine code.
The latest Intel C++ and Intel Fortran compilers for Windows and Linux provide up to 40% performance gains in application performance for systems based on Intel Itanium 2, Intel Xeon, and Intel Pentium 4 processors over existing compilers from other vendors by using these features of these processors, such as Hyper-Threading technology.
The differences associated with code optimization by this family of compilers include the use of a stack to perform floating point operations, interprocedural optimization (Interprocedural Optimization, IPO), optimization in accordance with the application profile (Profile Guided Optimization, PGO), preloading data into the cache (Data prefetching), which avoids the delay associated with memory access, support for characteristic features of Intel processors (for example, extensions for streaming data processing Intel Streaming SIMD Extensions 2, specific to Intel Pentium 4), automatic parallelization of code execution, creation of applications, running on multiple different types processors when optimizing for one of them, means of "prediction" of the subsequent code (branch prediction), extended support for working with execution threads.
Note that Intel compilers are used in such well-known companies as Alias/Wavefront, Oracle, Fujitsu Siemens, ABAQUS, Silicon Graphics, IBM. Based on independent testing by a number of companies, the performance of Intel compilers significantly outperforms compilers from other manufacturers (see, for example, http://intel.com/software/products/compilers/techtopics/compiler_gnu_perf.pdf).
Below we will look at some of the features latest versions Intel compilers for desktop and server operating systems.
Compilers for the Microsoft Windows platform
Intel C++ Compiler 7.1 for Windows
Intel C++ Compiler 7.1 is a compiler released earlier this year that allows you to achieve high degree code optimizations for the Intel Itanium, Intel Itanium 2, Intel Pentium 4, and Intel Xeon processors, as well as for the Intel Pentium M processor using Intel technology Centrino and designed for use in mobile devices.
The specified compiler is fully compatible with Microsoft Visual C++ 6.0 development tools and Microsoft Visual Studio .NET: It can be built into appropriate development environments.
This compiler supports ANSI and ISO C/C++ standards.
Intel Fortran Compiler 7.1 for Windows
The Intel Fortran Compiler 7.1 for Windows, also released earlier this year, allows you to create optimized code for Intel Itanium, Intel Itanium 2, Intel Pentium 4 and Intel Xeon, Intel Pentium M processors.
This compiler is fully compatible with Microsoft Visual C++ 6.0 and Microsoft Visual Studio .NET development tools, that is, it can be integrated into the corresponding development environments. In addition, this compiler allows you to develop 64-bit applications for operating systems running on Itanium / Itanium 2 processors, with help from Microsoft Visual Studio on a 32-bit Pentium processor using the 64-bit Intel Fortran Compiler. When debugging code, this compiler allows you to use the debugger for the Microsoft .NET platform.
If you have Compaq Visual Fortran 6.6 installed, you can use the Intel Fortran Compiler 7.1 instead of the original compiler because these compilers are compatible at the source code level.
The Intel Fortran Compiler 7.1 for Windows is fully compliant with the ISO Fortran 95 standard and supports building and debugging bilingual applications C and Fortran.
Compilers for the Linux platform
Intel C++ Compiler 7.1 for Linux
Another compiler that was released at the beginning of the year, Intel C++ Compiler 7.1 for Linux, allows you to achieve a high degree of code optimization for Intel Itanium, Intel Itanium 2, Intel Pentium 4, Intel Pentium M processors. This compiler is fully compatible with the GNU C compiler at the level source code and object modules, allowing applications built with GNU C to be migrated to it at no extra cost. operating systems SCO, early versions of Sun Solaris, etc.), which means full compatibility with the gcc 3.2 compiler at the binary level. Finally, with the Intel C++ Compiler 7.1 for Linux, you can even recompile the Linux kernel with a few minor changes to its source code.
Intel Fortran Compiler 7.1 for Linux
The Intel Fortran Compiler 7.1 compiler for Linux allows you to create optimized code for Intel Itanium, Intel Itanium 2, Intel Pentium 4, Intel Pentium M processors. This compiler is fully compatible with the Compaq Visual Fortran 6.6 compiler at the source code level, allowing you to recompile applications with it created with Compaq Visual Fortran, thus improving their performance.
In addition, the specified compiler is compatible with utilities used by developers, such as the emacs editor, the gdb debugger, and the make application build utility.
Like the Windows version of this compiler, Intel Fortran Compiler 7.1 for Linux is fully compatible with the ISO Fortran 95 standard and supports the creation and debugging of applications containing code in two languages C and Fortran.
It should be emphasized that a significant contribution to the creation of the listed Intel compilers was made by specialists Russian center Intel for software development in Nizhny Novgorod. More information about Intel compilers can be found on the Intel Web site at: www.intel.com/software/products/ .
The second part of this article will be devoted to Intel compilers that create applications for mobile devices.