Do a late CSE pass
authorSimon Peyton Jones <simonpj@microsoft.com>
Fri, 1 Jun 2018 11:53:41 +0000 (12:53 +0100)
committerSimon Peyton Jones <simonpj@microsoft.com>
Mon, 4 Jun 2018 09:35:34 +0000 (10:35 +0100)
When investigating something else I found that a condition
was being re-evaluated in wheel-seive1.  Why, when CSE should
find it?  Because the opportunity only showed up after
LiberateCase

This patch adds a late CSE pass. Rather than give it an extra
flag I do it when (cse && (spec_constr || liberate_case)), so
roughly speaking it happense with -O2.

In any case, CSE is very cheap.

Nofib results are minor but in the right direction:

        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           anna          -0.1%     -0.0%     0.163     0.163      0.0%
          eliza          -0.1%     -0.4%     0.001     0.001      0.0%
           fft2          -0.1%      0.0%     0.087     0.087      0.0%
           mate          -0.0%     -1.3%     -0.8%     -0.8%      0.0%
      paraffins          -0.0%     -0.1%     +0.9%     +0.9%      0.0%
            pic          -0.0%     -0.1%     0.009     0.009      0.0%
   wheel-sieve1          -0.2%     -0.0%     -0.1%     -0.1%      0.0%
--------------------------------------------------------------------------------
            Min          -0.6%     -1.3%     -2.4%     -2.4%      0.0%
            Max          +0.0%     +0.0%     +3.8%     +3.8%    +23.8%
 Geometric Mean          -0.0%     -0.0%     +0.2%     +0.2%     +0.2%

compiler/simplCore/SimplCore.hs

index 8884636..d461b99 100644 (file)
@@ -321,6 +321,12 @@ getCoreToDo dflags
           (CoreDoPasses [ CoreDoSpecialising
                         , simpl_phase 0 ["post-late-spec"] max_iter]),
 
+        -- LiberateCase can yield new CSE opportunities because it peels
+        -- off one layer of a recursive function (concretely, I saw this
+        -- in wheel-sieve1), and I'm guessing that SpecConstr can too
+        -- And CSE is a very cheap pass. So it seems worth doing here.
+        runWhen ((liberate_case || spec_constr) && cse) CoreCSE,
+
         -- Final clean-up simplification:
         simpl_phase 0 ["final"] max_iter,