Some articles

A series of interesting articles...


Fortran vs. C++ - good reading about aliasing

(original available at oon-list.archiv)

>> Forum: comp.lang.fortran
>> Thread: Fortran vs. C++ - good reading
>> Message 1 of 124
Subject: Fortran vs. C++ - good reading
Date: 03/23/2000
Author: John Molitor <molitor@stat.missouri.edu>

Hello,
 
For those who are interested, this post from Object Oriented Numerics List
discusses Fortran vs. C++.  The Pooma and Blitz libraries (previously discussed on the Fortran news group) are discussed and compared with Fortran. Note that these posts are written by authors of C++ libraries, so they may be slanted in favor of C++.  Nevertheless, it is conceded that Fortran has some definite advantages.
 
John
-------------------------------------------------------------
oon-digest          Thursday, March 23 2000          Volume 01 : Number 057
 
 
 
* In this issue:
OON: Comparison between Fortran & C++
RE: OON: Comparison between Fortran & C++
Re: OON: Comparison between Fortran & C++
Re: OON: Comparison between Fortran & C++
RE: OON: Comparison between Fortran & C++
Re: OON: Comparison between Fortran & C++
RE: OON: Comparison between Fortran & C++
Re: OON: Comparison between Fortran & C++
Re: OON: Comparison between Fortran & C++
RE: OON: Comparison between Fortran & C++
Re: OON: Aliasing (was: Comparison between Fortran & C++)
 
----------------------------------------------------------------------
 
Date: Wed, 22 Mar 2000 09:40:13 -0700 (MST)
From: Steve Karmesin <karmesin@lanl.gov>
Subject: OON: Comparison between Fortran & C++
 
This is the holy war of holy wars.  Let me try and start it off rationally.
 
I've spent the last few years working on the POOMA framework, using every trick in the book -- and many that aren't in any books -- to build general purpose scientific class libraries for high performance and parallelism.  We're using expression templates, lazy evaluation, cache optimizations, loop reorderings, the whole nine yards. My team and I have worked with other folks, like Todd Veldhuizen of Blitz++ and Jeremy Siek of MTL, and while I obviously can't speak for them, the opinions below reflect conversations with them and I'm pretty confident of them.
 
The basic deal is that the languages are different.  They were conceived with different goals in mind and have different capabilities as a result.  Modern C++ can do a lot of the things that a decade ago only made sense in FORTRAN, and modern FORTRAN can do some things that decade ago didn't make any sense at all in FORTRAN.
 
There are many syntax differences.  FORTRAN has many ugly aspects, and C++ (particularly with templates) has its warts.  To my eye C++ has fewer ugly syntax things and is more convenient to do complex things in.  To some degree that is a matter of taste and the task you have to do.
 
There are more substantive differences.  Given the forum I'll hit the things FORTRAN does well first since I think those are less well understood.
 
If the only data structure you need is flat multidimensional arrays, FORTRAN does a very good job out of the box.  With FORTRAN 90 you get array syntax (i.e. you can say A=B+C*D for whole arrays), which is very convenient and very difficult to do well in C++ (trust me on this).  It is easy to match FORTRAN 77 arrays in C++, but FORTRAN 90 arrays are difficult to match.  Not completely impossible, but it really is a Don't Try This At Home Kids(tm) kind of thing.  At this point though you can pick up POOMA, Blitz or the MTL to do this stuff for you in C++, depending on the mix of features you're looking for.
 
There is a semantic difference in passing arrays to subroutines in FORTRAN compared to C++: In FORTRAN you are guaranteeing that they are not aliased.  This can a really big deal for the optimizer when doing numeric intensive work, and is the primary remaining factor that makes many FORTRAN programs faster than the equivalent C++ programs.  There are proposals to improve this situation in C/C++ (primarily the restrict keyword at this point), but they are spottily implemented. The alternative to something like restrict is a godlike global optimizer that can figure out the aliasing, but that is way in the future for other than toy problems at this point. The bottom line, sadly enough, is that given comparably smart optimizers, under circumstances that are pretty common in scientific computing, the C/C++ equivalent to a FORTRAN program, will often not run as fast. Depending on the architecture it may not be a big difference, but it can be up to a factor of two on conventional workstations, and much larger than that on vector machines.
 
That is something FORTRAN does well, and we need to be rational about it and not be bigots.
 
If we grant that modern FORTRAN does arrays better, we then have to go to the things that C++ does better.
 
C++ is vastly easier to use for making exensible libraries of all kinds.  That includes inheritance heirarchies (which you really can't do in F90) and generic libraries (templates, which you can't do at all in F90).  This is a really big deal, particularly for large projects. You can't do anything like the STL in FORTRAN.
 
With F77, if it does what you need out of the box, you're golden.  If you need 5% more, you're dead.  There is a cliff at the edge of arrays and it is a long way down.
 
With F90, if you need more, then in some cases you can put together the data structures to do it.  Those data structures are nowhere near as flexible or powerful as in C++, but you have more tools available than with F77, and the arrays are convenient out of the box.
 
C++ is a Swiss Army Knife(tm) with a vast number of tools in it.  By picking the appropriate tools you can do everything from device drivers to databases to nuclear weapon simulations.  It is also true that if you pick inappropriate tools you will cut your hands off before you know what happened.  And there are so many tools and so many completely different programming styles possible in C++ that it is difficult to understand the tools well enough to keep from cutting yourself.
 
Personally, I'll take that risk instead of the risk of there not being any feasible way to do what I want, which is more likely for many many problems with a language like FORTRAN.
 
- -Steve Karmesin
 
p.s. Please folks, no jihads.
 
Cesare Padovani writes:
> Could someone tell me where I can find a comparison between Fortran & C++.
>
> Best regards. Cesare Padovani.
 
------------------------------
 
Date: Thu, 23 Mar 2000 09:00:34 -0800 (PST)
From: Geoffrey Furnish <furnish@actel.com>
Subject: RE: OON: Comparison between Fortran & C++
 
One tiny point of clarification:
 
Steve Karmesin writes:
> There is a semantic difference in passing arrays to subroutines in
> FORTRAN compared to C++: In FORTRAN you are guaranteeing that they are
> not aliased.
 
Please someone correct me if what I say next is wrong, but I don't think this is really the right way to say it.  My understanding is that the language /assumes/ array parameters to subroutines are unaliased, and allows the compiler to optimize accordingly.  It is actually a fairly common practice in FORTRAN to do purposeful aliasing in which you effectively redimension an array across a subroutine call site.  For example, you may have a 3 dimensional array in the caller, and pass one element of it to the subroutine, which receives it as a 2-d array, thus taking a slice.  Sometimes people actually do this stunt with overlapping regions, and think they'll be safe because they know how the subroutine walks the data.  But I have a good friend who lost about 2 weeks once, tracking down a bug that turned out to be a case where Cray FORTRAN made very effective use of the presumption of nonaliasing, and was able to reorder the subroutine's code so that it would be faster, but visited the elements in a manner that was
different than he expected, and so the computed output was tragically wrong, as a result of "lieing" about the non aliasing.  In other words, FORTRAN doesn't guarantee that arrays are unaliased, it assumes it.  You'd better make sure you don't violate that assumption, and yet htere are common programming styles which do so routinely.
 
- --
Geoffrey Furnish            Actel Corporation        furnish@actel.com Senior Staff Engineer      955 East Arques Ave      voice: 408-522-7528
 
Placement & Routing    Sunnyvale, CA  94086-4533  fax:  408-522-8041
 
"... because only those who write the code truly control the project."
 
------------------------------
 
Date: Thu, 23 Mar 2000 11:26:36 -0600 (CST)
From: Gary Kedziora <kedziora@chem.nwu.edu>
Subject: Re: OON: Comparison between Fortran & C++
 
Steve Karmesin writes:
> There is a semantic difference in passing arrays to subroutines in
> FORTRAN compared to C++: In FORTRAN you are guaranteeing that they are
 
> not aliased.  This can a really big deal for the optimizer when doing
> numeric intensive work, and is the primary remaining factor that makes
 
> many FORTRAN programs faster than the equivalent C++ programs.  There
> are proposals to improve this situation in C/C++
...
 
In principal C++ should be just as fast as FORTRAN (or just as good at optimizing), right?  Which C++ vendors are intersted in this?
 
I would like to understand Steve Karmesin's point quoted above better. I've seen similar statements about aliasing in the OON literature but have not seen an in-depth discussion or references to an in-depth discussion.  Can someone point out some references?
 
Why does C++ use this aliasing?  What is hard about removing it?
 
 
____
Gary Kedziora // Department of Chemistry // Northwestern University email: kedziora@chem.nwu.edu//ph: (847)467-4857//fax: (847)491-7713
 
------------------------------
 
Date: Thu, 23 Mar 2000 11:53:38 -0600
From: Jim Phillips <jim@ks.uiuc.edu>
Subject: Re: OON: Comparison between Fortran & C++
 
Geoffrey Furnish wrote:
>
> One tiny point of clarification:
>
> Steve Karmesin writes:
>  > There is a semantic difference in passing arrays to subroutines in
>  > FORTRAN compared to C++: In FORTRAN you are guaranteeing that they are
>  > not aliased.
>
> Please someone correct me if what I say next is wrong, but I don't
 
> words, FORTRAN doesn't guarantee that arrays are unaliased, it assumes
 
> it.  You'd better make sure you don't violate that assumption, and yet
 
> htere are common programming styles which do so routinely.
 
I think Steve's comment should be interpreted as "In FORTRAN you (the programmer)
are guaranteeing (to the compiler) that they (arrays) are not aliased (and
hence the compiler will make this assumption when optimizing).
 
- -Jim
 
------------------------------
 
Date: Thu, 23 Mar 2000 11:24:00 -0700
From: ferrell@cpca.com
Subject: RE: OON: Comparison between Fortran & C++
 
Geoffrey Furnish writes:
> One tiny point of clarification:
>
> Steve Karmesin writes:
>  > There is a semantic difference in passing arrays to subroutines in
 
>  > FORTRAN compared to C++: In FORTRAN you are guaranteeing that they are
>  > not aliased.
>
> Please someone correct me if what I say next is wrong, but I don't
> think this is really the right way to say it.  My understanding is
> that the language /assumes/ array parameters to subroutines are
> unaliased, and allows the compiler to optimize accordingly.
 
The FORTRAN standard is clear that aliasing is not allowed. 
The standard also does not allow implicit reshaping of arrays across subroutine boundaries.  The way to alias data is through use of pointers.  The way to reshape arrays in through intrinsics such as RESHAPE.  Use of pointers inhibits some optimizations, as it should. RESHAPE may or may not be efficient (e.g. logical reshaping, while leaving the data unchanged in memory).
 
Like all other languages, users are not required to write standard conforming code.  In general, the standard does not describe the behavior of non-conforming programs.  I'm not sure what you are driving at by saying that "FORTRAN doesn't guarantee that arrays are unaliased, it assumes it".  The compiler doesn't guarantee it, but the standard does.  The relevance to optimizing compilers is that FORTRAN compilers may make assumptions about aliasing, just as they make other assumptions about FORTRAN code.  What is lacking in C++ is a standard way to tell the compiler that it's worries about aliasing are unfounded.  Of course, use of such a pragma would not (need not) cause the compiler to guarantee that arrays weren't aliased.  As always, users are free to make mistakes and write non-conforming code.  Even though FORTRAN users guarantee there is no aliasing, sometimes they (unintentionally?) lie.
 
- -robert
 
------------------------------
 
Date: Thu, 23 Mar 2000 13:39:33 -0500 (EST)
From: jsiek@lsc.nd.edu
Subject: Re: OON: Comparison between Fortran & C++
 
To summarize, C++ is safe but slower (due to aliasing) and Fortran is dangerous and fast. They are both sub-optimal. Neither language allows the user to specify the aliasing behaviour, instead they make assumptions. Something like C's restrict keyword is a step in the right direction, however the restrict pointer is not a terribly good match for modern C++ and generic libraries, since they don't deal with pointers, but instead with the more abstract iterator. Last fall I began working on a several different notions of how to abstract the idea of a restrict pointer...  then things here at SGI got a little crazy :(
 
 
Cheers,
 
Jeremy
 
P.S. Note that in C++ you can regain most of the lost performance by doing
some of the optimizations yourself (like loop unrolling), but that of course is a pain.
 
------------------------------
 
Date: Thu, 23 Mar 2000 13:52:57 -0500 (EST)
From: jsiek@lsc.nd.edu
Subject: RE: OON: Comparison between Fortran & C++
 
A standard sanctioned assumption is still an assumption ;) Some
assumptions are worse than others... assumptions that can not be checked at compile time are in the "worse" category.
 
ferrell@cpca.com writes:
> The FORTRAN standard is clear that aliasing is not allowed. 
The > standard also does not allow implicit reshaping of arrays across
> subroutine boundaries.  The way to alias data is through use of
> pointers.  The way to reshape arrays in through intrinsics such as
> RESHAPE.  Use of pointers inhibits some optimizations, as it should.
> RESHAPE may or may not be efficient (e.g. logical reshaping, while
> leaving the data unchanged in memory).
>
> Like all other languages, users are not required to write standard
> conforming code.  In general, the standard does not describe the
> behavior of non-conforming programs.  I'm not sure what you are
> driving at by saying that "FORTRAN doesn't guarantee that arrays are
> unaliased, it assumes it".  The compiler doesn't guarantee it, but
the
> standard does.  The relevance to optimizing compilers is that FORTRAN
 
> compilers may make assumptions about aliasing, just as they make other
> assumptions about FORTRAN code.  What is lacking in C++ is a standard
 
> way to tell the compiler that it's worries about aliasing are
> unfounded.  Of course, use of such a pragma would not (need not) cause
> the compiler to guarantee that arrays weren't aliased.  As always,
> users are free to make mistakes and write non-conforming code.  Even
> though FORTRAN users guarantee there is no aliasing, sometimes they
> (unintentionally?) lie.
 
------------------------------
 
Date: Thu, 23 Mar 2000 11:54:44 -0700
From: ferrell@cpca.com
Subject: Re: OON: Comparison between Fortran & C++
 
Gary Kedziora writes:
>
> Steve Karmesin writes:
> > There is a semantic difference in passing arrays to subroutines in
> > FORTRAN compared to C++: In FORTRAN you are guaranteeing that they are
> > not aliased.  This can a really big deal for the optimizer when doing
> > numeric intensive work, and is the primary remaining factor that makes
> > many FORTRAN programs faster than the equivalent C++ programs. There
> > are proposals to improve this situation in C/C++
> ...
>
> In principal C++ should be just as fast as FORTRAN (or just as
> good at optimizing), right?  Which C++ vendors are intersted in this?
 
>
> I would like to understand Steve Karmesin's point quoted above better
> I've seen similar statements about aliasing in the OON literature
> but have not seen an in-depth discussion or references to an in-depth
 
> discussion.  Can someone point out some references?
>
> Why does C++ use this aliasing?  What is hard about removing it?
 
Aliasing shows up in ANSI C, and is not intrinsically associated with OO programming.  It shows up because it is damn useful.
 
Here's an example:
 
        float a[100], *b;
        b = &a[1];
        do_the_nasty_math_thing(a,b,99)
 
  void do_the_nasty_math_thing(float *a, float *b, int n)
    {  int i;
        for (i=0;i<n;i++) {
            b[i] = b[i] + a[i];
        }
        return
    }
 
Given how b was assigned it would be quite incorrect of the compiler to assume that the computation of iteration i is independent of the computation of iteration i+1.  But an optimizing compiler quite likely would like to do that.
 
The FORTRAN standard dictates that arguments which refer to the same thing (e.g. &b[0] == &a[1]) may not be assigned in the subroutine. Thus a FORTRAN compiler could optimize the for() loop, and it would be a user error to call it with overlapping arguments, as I did above. There is no similar restriction on C users, so certain optimizations may not be performed by C (or C++) compilers.
 
- -robert
 
------------------------------
 
Date: Thu, 23 Mar 2000 19:24:19 GMT
From: Jacek Generowicz <jmg@ecs.soton.ac.uk>
Subject: Re: OON: Comparison between Fortran & C++
 
> Why does C++ use this aliasing?
 
Because it admits pointers. As soon as you have done that, you cannot guarantee that any given chunk of data isn't being referred to by some pointer or other, which has the potential to mess with those data.
 
> What is hard about removing it?
 
If you remove pointers from C, it would no longer be C (it would be much like FORTRAN with different syntax) and fans of the language might object . . .
 
Jacek
 
------------------------------
 
Date: Thu, 23 Mar 2000 12:54:44 -0700
From: ferrell@cpca.com
Subject: RE: OON: Comparison between Fortran & C++
 
jsiek@lsc.nd.edu writes:
>
> A standard sanctioned assumption is still an assumption ;) Some
> assumptions are worse than others... assumptions that can not be
> checked at compile time are in the "worse" category.
 
FORTRAN does not make any assumptions about programs.  The standard defines the behavior of conforming programs.  It does not define the behavior for non-conforming programs (except for limited requirement to identify syntax errors and a few other conformance requirements). That is an important and deliberate feature of the language.  For instance, some FORTRAN compilers have a flag which indicates that the code to be compiled uses array aliasing.  Such a compiler is still FORTRAN compliant- even though it will produce correct results for non-compliant programs, it also produces correct results for standard-conforming programs.
 
Thus, it is worth realizing that FORTRAN is different from languages which do make assumptions about user code.  The standard is deliberately permissive, for better and for worse.  That is very much in accordance with standard mathematical terminology.  0/x is defined for x/=0.  There is no assumption that x/=0.  Merely the expression is not defined for that case.
 
- -robert
 
------------------------------
 
Date: Thu, 23 Mar 2000 15:01:55 -0500 (EST)
From: Todd Veldhuizen <tveldhui@extreme.indiana.edu>
Subject: Re: OON: Aliasing (was: Comparison between Fortran & C++)
 
> I would like to understand Steve Karmesin's point quoted above better.
 
> I've seen similar statements about aliasing in the OON literature
> but have not seen an in-depth discussion or references to an in-depth
> discussion.  Can someone point out some references?
>
> Why does C++ use this aliasing?  What is hard about removing it?
 
Oh boy!  An essay question!  :-)
 
I am still figuring aliasing out, so consider all of this material under a "I may yet be ignorant"-type disclaimer.
 
Outline:
 
1.  What is aliasing
2.  Examples of optimizations which are hard/impossible because of     aliasing
    (a) Reordering loads/stores
    (b) Loop-invariant code motion/PRE
    (c) Constant/copy propagation through the store
3.  Remedies
    (a) Alias analysis
    (b) Command-line options
    (c) restrict
    (d) valarray
    (e) Hardware solutions
 
 
1. What is aliasing
 
Variables are aliases if they refer to overlapping storage locations.  A simple example is:
 
int x = 0;
int* p = &x;
*p = 10;
int y = x + 1;
 
Both x and *p refer to the same storage location.
A naive compiler which did constant propagation without worrying about aliasing would see
 
int x = 0;
int y = x + 1;
 
and wrongly conclude that y == 1.
 
Aliasing is not unique to C++: it is also a problem in languages such as Java, Scheme, Fortran 90.  Fortran 77 is actually the oddball.
 
Any compiler which wants to optimize C++ has to worry about aliasing. There are two types of aliasing information compilers gather:
 
  may-alias:  x and *p might refer to the same storage location(s)
  must-alias: x and *p definitely DO refer to the same storage location(s)
 
Performance problems are caused by may-alias information.  If the compiler cannot prove that pointers are not aliased, then it has to assume the worst.  Must-alias information is needed if the compiler wants to do e.g. constant/copy propagation through aliased pointers.
 
Looking at the above example, with may-alias information the compiler would say, x and *p might alias.  I don't know for certain that they DO alias, so the assignment to *p invalidates anything I might know about x.  The code after optimization would be:
 
int x = 0;
int* p = &x;
*p = 10;
int y = x + 1;
 
i.e. no changes, because the compiler doesn't know whether x and p actually alias.
 
With must-alias information, the compiler would say, x and *p definitely
 
DO refer to the same store location.  Then it can do constant propagation despite the aliasing:
 
int x = 0;
int* p = &x;
*p = 10;
int y = 11;
 
 
2.  Examples of optimizations which are hard/impossible because of aliasing
 
Here are some examples of optimizations which are difficult/impossible due to possible aliasing:
 
 
(a) Reordering loads/stores
 
Compilers get good performance on numerical kernels by "software pipelining".
The basic idea is exploit parallelism by partially unrolling the inner loop.  An example:
 
      void daxpy(int n, double a, double* x, double* y)
      {
          for (int i=0; i < n; ++i)
            y[i] = y[i] + a * x[i];
      }
 
After unrolling this 4 times we get:
 
      void daxpy(int n, double a, double* x, double* y)
      {
          for (int i=0; i < n; i += 4)
          {
            y[i]  = y[i]  + a * x[i];
            y[i+1] = y[i+1] + a * x[i+1];
            y[i+2] = y[i+2] + a * x[i+2];
            y[i+3] = y[i+3] + a * x[i+3];
          }
 
          // cleanup code for (n % 4) != 0 here
      }
 
If there is no aliasing between x and y, the compiler can rearrange the loads and stores to get a good schedule.  Here's a hand-faked version to give the idea:
 
      // fake pipelined DAXPY loop
      double* xt = x;
      double* yt = y;
      for (; i < n; i += 4)
      {
          double t0 = yt[0] + a * xt[0];
          double t1 = yt[1] + a * xt[1];
          yt[0] = t0;
          double t2 = yt[2] + a * xt[2];
          yt[1] = t1;
          double t3 = yt[3] + a * xt[3];
          i += 4;
          xt += 4;
          yt[2] = t2;
          yt += 4;
          yt[3] = t3;
      }
 
This version overlaps the load/stores with floating-point computations, doing the same amount of work in fewer clock cycles.
 
BUT: if x and y might alias, then the compiler cannot rearrange the loads and stores, because writing y[0] could change the value of x[1] or x[2] or x[3].
 
In C++ the compiler has to assume that x and y might alias.  Hence the loads and stores cannot be rearranged.
 
The resulting performance loss is primarily a problem for in-cache problems: you can see a loss of 20-50%.  aliasing tends to be less of a problem for out-of-cache problems: memory latency tends to overwhelm the benefits of software pipelining.
 
 
(b) Partial redundancy elimination
 
Partial redundancy elimination combines common subexpression
elimination,
loop-invariant code motion and other stuff.
 
Here's a typical application: suppose we want to calculate the
rank-1 update of a matrix:  A <- A + x x'    (where x' = x transpose) We could code this in C++ as:
 
      void rank1Update(Matrix& A, const Vector& x)
      {
          for (int i=0; i < A.rows(); ++i)
            for (int j=0; j < A.cols(); ++j)
              A(i,j) += x(i) * x(j);
      }
 
For good performance, the compiler needs to keep x(i) in a register instead of reloading it constantly in the inner loop. However, the compiler can't be sure that the data in the matrix A doesn't overlap the data in the vector x -- there is a possibility that writing to A(i,j) could change the value of x(i) partway through. This might be impossible given the design of the Matrix and Vector classes, but the compiler doesn't know that. The remote possibility of aliasing forces the compiler to generate inefficient code.
 
 
(c) Constant and copy propagation through the store (heap/stack)
 
In C++ we often have lots of objects kicking around.  This leads to the well-known "abstraction penalty": if you work with objects (Iterator, Complex) instead of scalars (double) and pointers (double*), performance suffers.  Getting rid of this "abstraction penalty" requires doing constant and copy propagation through the store, to e.g. convert Iterators back into pointers, and eliminate the temporary objects.
 
Analyzing the store is greatly complicated by aliasing.  Everytime you want to propagate something out of the store, you have to ask: could this region of the store have been modified through an aliased pointer?  Far too often the answer is yes, and you can't do any optimization.  For example:
 
void f(int* x)
{
  x[0] = 5;
  foo();
  y = 3.0 * x[0];
}
 
Can I assume that y = 15.0?
Well, that depends on what happens in foo, and whether it could have access to the store locations pointed to by x.
 
3.  Remedies
 
(a) Alias analysis
 
In alias analysis, compilers automatically determine may-alias
and/or must-alias information.
 
Some forms of aliasing are ruled out by the ANSI C standard.
Type-based aliasing rules stipulate (for example) that float* and double* pointers will not alias.
 
Alias analysis is useful but not perfect:
- - precise alias analysis is NP-hard.  This is true even for the crudest form of alias analysis, namely flow-insensitive local   alias analysis (i.e. not interprocedural).  (See [3] below).  So   alias analysis is ALWAYS going to be approximate, not precise.
- - ruling out aliasing of function arguments requires interprocedural, closed program analysis.  i.e. it is NOT compatible with the separate compilation model followed by most C++ compilers.  (for example, to solve the daxpy(..) aliasing problem describe above, one would have to analyze a complete program and prove that there are no calls to daxpy(..) for which x and y alias.  This requires being able to examine the whole program at once.)  Some compilers e.g. SGI will do interprocedural alias analysis (see -IPA options).
- - pointers frequently "escape" to where the analysis cannot follow.   Example: Invoking virtual functions, function pointers, or external   functions requires making a worst-case assumption about aliasing.   Also, in C++ you have to make aliasing decisions like: does x.y.w   alias w.e.f[4].g?  Putting things into containers (e.g. list<T>) and   pulling them out also kills any useful alias information you might   have had about a particular T.
 
 
(b) Command-line options
 
Some compilers have options like "assume no aliasing of function parameters" which duplicate the behaviour of Fortran no-aliasing restrictions.
 
These can really blow up (as Geoffrey Furnish described) if your code happens to rely on aliasing being handled correctly.
 
 
(c) restrict
 
NCEG (Numerical C Extensions Group) designed a keyword restrict which is recognized by some C++ compilers (KAI, Cray, SGi, Intel, gcc). See: http://www.lysator.liu.se/c/restrict.html
This has been accepted into the C9x standard, so the chances of it appearing in the next C++ standard are good, assuming people can agree on what it means in C++.
 
restrict is a modifier which tells the compiler there are no aliases for a store location.  For example:
 
  double* restrict a;
 
This means that there are no aliases for any store locations accessed through a.  Hence the compiler is free to rearrange load/store orderings and do other optimizations.
 
Unfortunately restrict does not extend well to C++ idioms such as iterators.  For example, I cannot say:
 
Iterator restrict iter = x.begin();
 
to mean that the data referred to by iter is not aliased by any other iterators.  Instead I would have to choose between:
  (a) Making the pointer inside Iterator "restrict", meaning   that I can never use two Iterators which refer to the same container simultaneously;
  (b) Providing two versions of Iterator, one of which is alias-free and one of which is not, for example:
  Iterator and AliasFreeIterator.
 
Neither of these options are very appealing.
There is also some weirdness around restrict not being an "official" part of the type.  So Vector<double*> and Vector<double* restrict> would be the same template instance, as far as I understand.  I guess this would get sorted out by the standards committee.
 
(d) valarray
 
Restrict never made it into the ISO/ANSI C++ standard.  There is however this statement about valarray:
 
  2 The valarray array classes are defined to be free of
    certain forms  of aliasing, thus allowing operations
    on these classes to be optimized.
 
I don't know of any compiler which makes use of this information.
 
(e) Hardware solutions
 
The new IA-64 processor architecture has "data speculative loads" which may solve aliasing performance problems in hardware.  The idea is to allow loads to be optimistically moved before stores, despite possible aliasing.  The processor determines on the fly if a subsequent store invalidates the load, and cancels it.
 
Here's an example from the specs.  Code before optimization:
 
st8  [r4] = r12        // Store contents of r12 at location [r4] ld8  r6 = [r8];;        // Load from location [r8] to r6
add  r5 = r6, r7;;
st8  [r18] = r5        // Store r5 at [r18]
 
IA-64's "data speculation" lets you move the load before the store. The load is split into two instructions: ld8.a, an "advanced load", and "ld8.c.clr" the "check load + clear":
 
ld8.a  r6 = [r8];;      // advanced load
st8    [r4] = r12
ld8.c.clr r6 = [r8]    // check load
add    r5 = r6, r7;;
st8    [r18] = r5
 
The load starts when the ld8.a instruction is reached.  When the check instruction ld8.c.clr is reached, if the store clobbered the advanced load, then the load is done a second time.  Otherwise it is a nop. The IA-64 has a special hardware table where it checks stores against advanced loads.
 
So it might solve the problem we have now of software pipelining loops in the presence of aliasing.  If there is aliasing, your code will "magically" run slower because of the check loads failing and re-issuing the loads.  If there isn't aliasing, the check loads will be no-ops.  I don't know whether the check loads introduce latency or muck up the scheduling.
 
 
More info:
[1] Advanced Compiler Design & Implementation, Steven S. Muchnick, Chapter 10   (Alias Analysis)
[2] NCEG restrict proposal: http://www.lysator.liu.se/c/restrict.html
[3] Precise Flow-Insensitive May-Alias Analysis is NP-Hard, Susan Horwitz, http://www.acm.org/pubs/citations/journals/toplas/1997-19-1/p1-horwitz/
[4] restrict in C9x   http://www.accu.org/events/public/c_news.htm
[5] IA-64 Application Developer's Architecture Guide
    Section 4.4.5: Data Speculation   http://developer.intel.com/design/ia-64/downloads/adag.htm
 
- --
Todd Veldhuizen              tveldhui@acm.org
Indiana Univ. Comp. Sci.      http://extreme.indiana.edu/~tveldhui/
 
------------------------------
 
End of oon-digest V1 #57
************************
 
--------------------- Object Oriented Numerics List
--------------------------
* To subscribe/unsubscribe: use the handy web form at
http://oonumerics.org/oon/
* If this doesn't work, please send a note to
owner-oon-list@oonumerics.org


Advice about floating point by Herman D. Knoble

(original post)

From: Herman D. Knoble <hdk@psu.edu>
Newsgroups: comp.lang.fortran
Subject: Re: ? How to have a FORTRAN function emulate flops for desired digits
Date: Thu, 07 Sep 2000 14:30:13 -0400
Organization: Penn State University Center for Academic Computing
Lines: 138
Message-ID: <e8jfrs4q76ofoa3qdcpppt9s7i09pfelg6@4ax.com>
References: <39B50161.C0D18525@yahoo.com>
Reply-To: hdk@psu.edu
NNTP-Posting-Host: hdknt.cac.psu.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Authinfo-User: hdk@psu.edu
X-Newsreader: Forte Agent 1.8/32.548

One technique to illustrate to students how significant digits can affect
computations is to use examples which break down very quickly
and drastically. (Worse yet are ones which break down more subtly!)
 
Back in '93 John R. Ehrman (then at IBM Santa Teresa Labs) and
I did a back to back presentation at Share 75 (See Proceedings of 
Share 75) on this topic. John presented an excellent "Floating-point
Numbers and Arithmetic Tutorial showing the various models and
implementations and how arithmetic really breaks down. My presentation,
there was "Reality of REAL Arithmetic's Really Wrong Results". The
following example is from that presentation and has been for
us, a fairly good teaching example that you don't need to 
limit how many digits participate in the model to make a point.

!
!=== Example 1: Numeric breakdown for IEEE Arithmetic
! (or worse yet for IBM 360 Floating-point arithmetic).
!    For example try: A=1000000., X=0.0000001
!    and see P computed as: 4534650. in Single Precision
!                       or: 4768372. in Double Precision.
!    Note that P is algebraically equal to 1.0 for all A and X<>0.
!
!  Penn State University Center for Academic Computing
!  H. D. Knoble, August 1979.
       REAL (KIND=1) P,A,X
       DO I=1,99999
         WRITE(6,*) 'Please enter A and X (or press Enter to Quit):'
         WRITE(6,*) '(Try values 10000., 0.0001, then 0.0001, 10000.)'
         READ(5,*,END=99) A,X
         IF (A.EQ.0 .OR. X.EQ.0) GOTO 99
         P=((A+X)**2 - A**2 - 2.*A*X)/X**2
         WRITE(6,100) P,A,X
100      FORMAT(' P=',G14.7,' A=',G14.7,' X=',G14.7)
       ENDDO
99     WRITE(6,*) '** Example 1 Ended **'
       PAUSE
       STOP
       END                                         

Interesting enough, the bigger the "spread" of A and X in general the bigger
the "wrong" answer. However essentially a decent result is computed even
with a very large spread between A and X if the rolls of A and X are
reversed (X large, A, small). The problem here of course is subtracting nearly
equal quantities which are formed by converting decimal numbers that have
infinite binary expansions; when subtracting nearly equal quantities, the
most significant digits cancel during the subtraction, thus shifting the "noise"
toward the most significant digits. Then dividing by X**2 is like multiplying
the "noise" by a very large number (scaling it up).  "Really wrong results"
here illustrate with very little arithmetic so-called "significance loss".   

A second part to a student exercise studying such an example (like the one
above or one using the so-called "calculator formula" for Standard Deviation
which subtracts two sums that will be nearly equal for a small Standard Deviation)
is to make a set of "recommendations" to help avoid numerical computational
difficulties. My undergrad students, working as a team,  back in 1993 came up with:

1) Use Proven (refereed or commercial) library programs  that are as robust 
(give the right result or an error message if not) as possible. For example,
IMSL, NAG mathematical statistical libraries. Avoid blindly taking numerical
code from textbooks and magazines unless they are refereed or presented
by a proven expert in numerical computations.

2) Study (know) the possible domain of values that each program input 
quantity will take on and test each algorithm thoroughly, especially on
the limits of that domain.  (This of course is an art in itself).

3) Use at least Double Precision Arithmetic. (In more critical cases use
multiple or quadruple precision arithmetic; see http://www.nersc.gov/~dhbailey/
and  http://www.ozemail.com.au/~milleraj/ respectively).

4) For a given mantissa precision (computer) do not input or output numbers
in ASCII (or EBCDIC) format with more decimal digits than can be handled by the
conversion mapping algorithms for that precision (computer) unless you
document that that's what you are doing.

5) Use relative comparisons in place of absolute comparisons of real
quantities. (Also note that you can also use "fuzzy" comparisons. See
http://ftp.cac.psu.edu/pub/ger/fortran/hdk/eps.f90 and extfloor.for )

6) Avoid computations which are algebraically undefined (or singular) or which
develop resul;ts whose magnitudes approach or exceed machine limits.E.g., 
Blindly setting results of underflows to zero can yield "Really Wrong Results".

7) Avoid mixing Fortran INTEGER and REAL quantities without knowing how they
will be compiled. E.g., X=X+I**(-J) does not increment REAL typed X except when
INTEGER typed I=1 or INTEGER typed J=0 and I<0.

8) Do not program a numerical model beyond your competence to either avoid
significance loss or monitor it. Be especially careful when less than a pre-specified
number of mantissa digits participate in a sum or when all but a few mantissa 
digits cancel during a subtraction. You are responsible for disasters caused
by other people using your code!

I'm sure your students can come up with more and better here, especially
along the ethics implied by the last sentence in (8) above.

                        Good luck with it.
                        Skip Knoble, Penn State


On Tue, 05 Sep 2000 22:21:22 +0800, Cheng Cosine <acosine@yahoo.com> wrote:

-|
-|Hi ALL:
-|
-|  In FORTRAN, there are two types of real floating number, single
-|
-|and double precision. Single precision has 6 significant digits
-|
-|while double precision has 16. ( when represented in 10 based )
-|
-|   From what I learned from some numerical analysis books, the
-|
-|floating point system is machine dependent. Say there are other
-|
-|system used in Cray and IBM mainframe that are other than the
-|
-|IEEE SP/DP system.
-|
-|   Now how can I write a FORTRAN90 code that works in lower significant
-|
-|number only. Say only works for 3, 4, or 10 digits?
-|
-|  Thanks,
-|
-| by Cheng Cosine
-|   Sep/05/2k UT
-|


   Herman D. (Skip) Knoble, Research Associate
   Mailto:hdk@psu.edu
   Web: http://www.personal.psu.edu/hdk
   Center for Academic Computing
   Penn State University
     214C Computer Building
     University Park, PA 16802-2101
   Phone:+1 814 865-0818   Fax:+1 814 863-7049

The Munro post: how to trap FP exceptions

In July 1999, David H. Munro posted this terrific document. Note that the code reproduced below has been superseded and an up-to-date version can be found in the code of yorick. However, as I updated the copy below, it may still provide some useful information.

From: munro@icf.llnl.gov (David H. Munro)
Subject: SIGFPE delivery
Date: 28 Jul 1999 00:00:00 GMT
Message-ID: <873dy88du3.fsf@yorick.llnl.gov>
Content-Type: text/plain; charset=US-ASCII
X-Complaints-To: abuse@lll-winken.llnl.gov
X-Trace: lll-winken.llnl.gov 933269651 18025 128.115.36.163 (29 Jul 1999 17:34:11 GMT)
Organization: Lawrence Livermore National Laboratory
Mime-Version: 1.0 (generated by tm-edit 7.106)
Reply-To: munro@icf.llnl.gov
NNTP-Posting-Date: 29 Jul 1999 17:34:11 GMT
Newsgroups: comp.os.linux.development.system


Hello,

Like many other numerical code developers, I need to get SIGFPE
delivered to my programs as promptly as the hardware can do so.  This
is a crucial aid in debugging all numerical software, but in the case
of my yorick interpreter there are serious code correctness or
performance problems without SIGFPE delivery.

Unfortunately, there is a bug built into the ANSI C signal() and POSIX
sigaction() mechanisms on Linux and nearly every other operating
system: Setting a signal handler for SIGFPE has no effect because when
the system starts every program, it sets (in crt0) the exception mask
bits in the FPU control register so that SIGFPE will never be
delivered to the program.

For years I have wondered why the signal() and sigaction() functions
do not reset the FPE mask bits to deliver SIGFPE, since I cannot
imagine that any program that does not need SIGFPE delivered would
call signal() or sigaction() to set up a SIGFPE signal handler!
Perhaps the excuse for this misbehavior is that most FPUs require
detailed bits to be set to tell precisely which FPEs will generate
signals: The possibilities are zero divide, invalid operation,
overflow, underflow, inexact result, and (for some FPUs) denormalized
operand.  However, the vast majority of numerical applications care
about the first three and want the others ignored, so delivery of zero
divide, invalid operation, and overflow, and non-delivery of all the
rest is a very obvious default behavior for programs registering a
SIGFPE handler.

Here's my immediate problem:

A recent GNU libc release withdrew the __setfpucw function (or, more
correctly, made it a static function only callable by their own crt0
code!).  This reduces anyone who wants SIGFPE delivered to duplicating
the assembly code used by __setfpucw -- making Linux the only OS
requiring assembly code to perform this important function.  (The only
exception is the alpha Linux platform, which has adopted Digital's
ieee_set_fp_control() function.  See fpuset.c below.)

I was told by Cygnus support that the new fenv.h interface replaces
the __setfpucw functionality.  I have read the C9X draft standard
describing fenv.h very carefully, and it provides no means for setting
the FPE mask bits -- you can use the fenv.h interface only to set,
query, and test the bits which determine whether an exception
condition has occurred.  Numerical software requires that the FPU
actually generate an interrupt; it doesn't do any good to know
something happened long afterward.  I also spoke with a Livermore
representative on the C9X committee, who told me that delivery or
non-delivery of any signal is not regarded as a C language issue (by
way of an excuse for the fenv.h interface not providing this crucial
functionality).  Unless GNU libc extends the fenv.h interface, it is
useless.

Note: Most notably thanks to David Munro, glibc 2.2 and later extends 
      C99 with feenableexcept, fedisableexcept and fegetexcept declared
      in <fenv.h>.
      Previously, _FPU_SETCW declared in <fpu_control.h> could be used 
      to perform the same task.
  -- Arnaud Desitter (August 2001)
I suspect that the reason __setfpucw was withdrawn has to do with the
proliferation of Linux architectures.  Unfortunately, the absence of
any standard to replace it makes my programming job substantially
harder: I only have access to Intel Linux machines, so I have a hard
time testing code for my favorite OS anywhere else.  If any of you can
help me by building, testing, and correcting the enclosed test program
on non-Intel Linux platforms, I would be extremely grateful.

Any immediate help with the test program, or hope for a more rational
interface in the future will be deeply appreciated.

Dave Munro

------------------------------------------------------------------------

Here are two files: fpuset.c defines a function u_fpu_setup() which
turns on SIGFPE delivery on every platform where I have a guess about
how to do it.  fputest.c is a test driver for u_fpu_setup().  It works
by the unusual tactic of #including the fpuset.c code, so you only
need to compile fputest.c and run the resulting program to perform the
test.  The following lines should work:

gcc -g -O -I. fputest.c -o fputest
./fputest

The program should print

SIGFPE handling works properly

Any other message indicates some sort of serious problem for numerical
software.  For example (the default behavior under Linux/Intel):

$ gcc -g -O -I. -Ulinux -DFPU_IGNORE fputest.c -o fputest
$ ./fputest
SIGFPE handling failed on pass 1, 1./0.= inf
$ 

------8><---------fputest.c------8><-------fputest.c------8><-----------

/*
 * fputest.c -- test for SIGFPE delivery
 */
#ifdef linux
# if defined(i386)
#define FPU_GCC_I86
# elif defined(alpha)
#define FPU_ALPHA_LINUX
# elif defined(sparc)
#define FPU_GCC_SPARC
# elif defined(powerpc)
#define FPU_GCC_POWERPC
# elif defined(m68k)
#define FPU_GCC_M68K
# elif defined(arm)
#define FPU_GCC_ARM
# endif
#endif

#ifndef NO_FPUSET
#include "fpuset.c"
#endif

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

#include <setjmp.h>
static jmp_buf u_jmp_target;

extern void u_sigfpe(int sig);
extern double reciprocal(double x);
extern double quotient(double x, double y);
extern double powpow(double x, int n);
extern int decrement(int *ip);
static int always_true= 0;

int main(int argc, char *argv[])
{
  int i= 5;
  double zero= (double)(argc>1000);
  double huge= (argc>1000)? 2.0 : 1.e20;
  always_true= !(argc>1000);

  /* signal *ought* to be enough to get SIGFPE delivered
   * -- but it never is -- see README.fpu */
  signal(SIGFPE, &u_sigfpe);
#ifndef NO_FPUSET
  u_fpu_setup();
#endif

  setjmp(u_jmp_target);
  /* need to make sure that loop index i actually decrements
   * despite interrupt */
  while (decrement(&i)) {
    printf("SIGFPE handling failed on pass %d, 1./0.= %g\n", 5-i,
           reciprocal(zero));
    if (i) break;
  }
  if (!i) {
    i= 5;
    setjmp(u_jmp_target);
    while (decrement(&i)) {
      printf("SIGFPE handling failed on pass %d, 0./0.= %g\n", 5-i,
             quotient(zero, zero));
      if (i) break;
    }
    if (!i) {
      i= 5;
      setjmp(u_jmp_target);
      while (decrement(&i)) {
        printf("SIGFPE handling failed on pass %d, 10.^20480= %g\n", 5-i,
               powpow(huge, 10));
        if (i) break;
      }
      if (!i) {
        if (setjmp(u_jmp_target)) {
          puts("SIGFPE improperly generated on underflow");
          i= 11;
        } else {
          double x= powpow(1./huge, 10);
          if (x != 0.0)
            printf("SIGFPE handling works, but 10.^-20480= %g\n", x);
          else
            puts("SIGFPE handling works properly");
        }
      }
    }
  }
  exit(i? 2 : 0);
}

void u_sigfpe(int sig)
{
  if (sig==SIGFPE) {
    signal(SIGFPE, &u_sigfpe);
#ifndef NO_FPUSET
    u_fpu_setup();
#endif
    longjmp(u_jmp_target, 1);
  } else {
    puts("u_sigfpe called, but with bad parameter");
  }
  exit(1);
}

int decrement(int *ip)
{
  int i= *ip;
  if (always_true) i--;
  else i-= 2;
  *ip= i;
  return i>0;
}

double reciprocal(double x)
{
  return 1./x;
}

double quotient(double x, double y)
{
  return x/y;
}

extern double powpow(double x, int n)
{
  double y= x;
  while (n--) y*= y;
  return y;
}

------8><---------fpuset.c-------8><-------fpuset.c-------8><-----------

/*
 * fpuset.c -- set up FPU to trap floating point exceptions
 * - this is very non-portable, not covered by ANSI C, POSIX, or even C9X
 * - if you port to a new platform (eg- Ultrix) please contact the author
 */

extern void u_fpu_setup(void);

#if defined(FPU_DIGITAL) || defined(FPU_ALPHA_LINUX)

/* man pages: exception_intro, ieee */
# ifdef FPU_DIGITAL
#  include <machine/fpu.h>
# else
   extern void ieee_set_fp_control(long);
#  define IEEE_TRAP_ENABLE_INV 0x000002
#  define IEEE_TRAP_ENABLE_DZE 0x000004
#  define IEEE_TRAP_ENABLE_OVF 0x000008
# endif
void u_fpu_setup(void)
{
  ieee_set_fp_control(IEEE_TRAP_ENABLE_INV | IEEE_TRAP_ENABLE_DZE |
                      IEEE_TRAP_ENABLE_OVF);
}

#elif defined(FPU_GCC_I86)

/* see also: fpu_control.h or i386/fpu_control.h, __setfpucw function */
void u_fpu_setup(void)
{
  unsigned int fpucw= 0x1372;
  __asm__ ("fldcw %0" : : "m" (fpucw));
}
  /*
    Note: Starting glibc 2.2, one should use:
    #define _GNU_SOURCE 1
    #include <fenv.h>
    feenableexcept(FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW);
      -- Arnaud Desitter (August 2001)
  */
#elif defined(FPU_GCC_POWERPC)

void u_fpu_setup(void)
{
  unsigned int tmp[2] __attribute__ ((__aligned__(8)));
  tmp[0] = 0xFFF80000; /* More-or-less arbitrary; this is a QNaN. */
  tmp[1] = 0xd0;
  __asm__ ("lfd 0,%0; mtfsf 255,0" : : "m" (*tmp) : "fr0");
}

#elif defined(FPU_GCC_SPARC)

void u_fpu_setup(void)
{
  unsigned int fpucw= 0xd400000;  /* the 4 is nonstandard arithmetic bit */
  __asm__ ("ld %0,%%fsr" : : "m" (fpucw));
}

#elif defined(FPU_GCC_M68K)

/* works on NeXT as well as m68k Linux */
void u_fpu_setup(void)
{
  unsigned int fpucw= 0x7400;
  __asm__ volatile ("fmove%.l %0, %!" : : "dm" (fpucw));
  /* includes bit to trap on signalling NaN (may affect libm behavior) */
}

#elif defined(FPU_GCC_ARM)

void u_fpu_setup(void)
{
  unsigned int fpucw= 0x70200;
  __asm__ ("wfs %0" : : "r" (fpucw));
  /* includes bit to trap on signalling NaN (may affect libm behavior) */
}

#elif defined(FPU_AIX)

/* man pages: fp_trap, fp_enable */
#include <fptrap.h>
void u_fpu_setup(void)
{
  fp_trap(FP_TRAP_FASTMODE);
  fp_enable(TRP_INVALID | TRP_DIV_BY_ZERO | TRP_OVERFLOW);
}

#elif defined(FPU_HPUX)

/* man pages: fpsetmask
 * library: -lm */
/* HPUX turns off FP_X_* without this (_INCLUDE_HPUX_SOURCE) */
#ifndef _HPUX_SOURCE
#define _HPUX_SOURCE 1
#endif
#include <math.h>
void u_fpu_setup(void)
{
  fpsetmask(FP_X_INV | FP_X_DZ | FP_X_OFL);  /* 0x1c */
  fpsetfastmode(1);    /* fast underflows */
}
  /*
    Note: Starting HP-UX 11.0, one should use:
    #ifndef _HPUX_SOURCE
    # define _HPUX_SOURCE 1
    #endif
    #include <fenv.h>
    fesettrapenable(FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW);
      -- Arnaud Desitter (August 2001)
  */
#elif defined(FPU_IRIX)

/* man pages: handle_sigfpes, note lethal TRAP_FPE environment variable
 * library: -lfpe
 * note: earlier versions used get_fpc_csr/set_fpc_csr?, sys/fpu.h */
#include <sigfpe.h>
static int already_set= 0;
void u_fpu_setup(void)
{
  if (!already_set) {
    extern void u_sigfpe(int sig);  /* from handler.c (or fputest.c) */
    handle_sigfpes(_ON, _EN_OVERFL|_EN_DIVZERO|_EN_INVALID,
                   (void (*)())0, _REPLACE_HANDLER_ON_ERROR, &u_sigfpe);
    already_set= 1;
  }
}
  /*
    Another way to set the fp mask (without linking with -lfpe):
    #include <sys/fpu.h>
    union fpc_csr exp;
    exp.fc_word = 0;
    exp.fc_struct.en_invalid = 1;
    exp.fc_struct.en_divide0 = 1;
    exp.fc_struct.en_overflow = 1;
    set_fpc_csr(exp.fc_word);
      -- Arnaud Desitter (August 2001)
  */
#elif defined(FPU_SOLARIS)

/* man pages: fpsetmask
 *    Sun's -fnonstd compiler switch switches between __fnonstd.o
 *      and __fstd.o under Solaris, as far as I can tell.  Use FPU_IGNORE
 *        if you do this.  */
#include <ieeefp.h>
void u_fpu_setup(void)
{
  fpsetmask(FP_X_INV | FP_X_DZ | FP_X_OFL);
  /* this doesn't set the "nonstandard arithmetic" bit, which prevents
   * software emulation of IEEE gradual underflow
   * -- apparently no way to do this in libc (see FPU_GCC_SPARC) */
}
  /*
    Note: Sun offers an extension to C99: fex_set_handling.
      -- Arnaud Desitter (August 2001)
  */
#elif defined(FPU_SUN4)

/* man pages: ieee_handler
 *               nonstandard_arithmetic is undocumented, but rumored
 *               to be important to get rapid underflows
 * library: -lsunmath (under /usr/lang hierarchy)
 *          may also be in -lm (standard libm)?
 *   note: libsunmath.a is provided by Sun only if you purchase their
 *         compilers; if you are trying to compile with gcc on a SPARC
 *         architecture, try FPU_GCC_SPARC
 *    Sun's -fnonstd compiler switch buggers crt1.o under SunOS 4,
 *      as far as I can tell.  Use FPU_IGNORE if you do this
 *      (not possible with gcc?).  */
static int already_set= 0;
void u_fpu_setup(void)
{
  if (!already_set) {
    extern void u_sigfpe(int sig);  /* from handler.c (or fputest.c) */
    nonstandard_arithmetic();
    ieee_handler("set","common", &u_sigfpe);
    already_set= 1;
  }
}

#elif defined(FPU_UNICOS)

/* delivers SIGFPE by default, this just arranges to trap on
 * libm errors as well */
static int already_set= 0;
void u_fpu_setup(void)
{
  if (!already_set) {
    int flag= -1;
    libmset(&flag);
    already_set= 1;
  }
}

#elif defined(FPU_MSX86)
#include "float.h"
#include "signal.h"

  _control87(EM_DENORMAL | EM_UNDERFLOW | EM_INEXACT, MCW_EM);
  /* With MS VC++, compiling and linking with -Zi will permit */
  /* clicking to invoke the MS C++ debugger, which will show */
  /* the point of error -- provided SIGFPE is SIG_DFL. */
  signal(SIGFPE, SIG_DFL);
  /*
  From f2c source.
    -- Arnaud Desitter (August 2001)
  */
#elif defined(FPU_IGNORE)

void u_fpu_setup(void)
{
}

#else

#error one of the FPU_* symbols must be defined

#endif

------8><------------------------8><----------------------8><-----------

$Id: Articles.html,v 1.32 2005/08/26 10:01:55 adesitter Exp $

Maintained by Arnaud Desitter.
© Arnaud Desitter 2000, 2002, 2003, 2005