Author: David Max MA DPhil (2020-05-07)
The Sieve of Eratosthenes
R is extremely powerful but sometimes the speed of compiled languages like Fortran is needed.
Routines written in Fortran and contained in a dynamic-link library or shared object library can be accessed from R scripts.
Described here is a very simple example of this approach.
Eratosthenes sieve: how the algorithm works
The Sieve of Eratosthenes is a famous method for finding prime numbers.
To find the primes in a series of integers 1..k, the method works by striking out every second number, then every third, every fifth, and so on, up to k.
The numbers left at the end are the primes.
Comparison between R and Fortran routines
The R-only script provided (see below) runs R code to search for primes up to 108.
A Fortran routine called from R
Another script does the same work but calls a Fortran routine FINDP5 (contained in the library ACRO.DLL).
Alternative versions of FINDP5 (FINDP3, FINDP4 etc) were also tried that differed in the way the inner loop (DO 30 ...) is exited.
Although the results are not shown here, these alternative routines were markedly slower than FINDP5.
A key requirement in the inner DO loop is to avoid the upper bound NVBOOL of two big vectors being exceeded. (R would almost certainly then crash). Also the index itself must not overflow its storage limit (4 bytes).
FINDP3 and FINDP4 used GOTO statements to avoid these pitfalls, perhaps interfering with gcc's loop optimisation code.
Overall, the results show how significant the effects of just one or two lines of code can be on performance.
Results
Results agree with Table 24.9 in Abramowitz & Stegun, which shows primes up to 105.
On 32-bit Windows it was tricky to get the scripts to complete because R struggled to allocate the large vectors required in this implementation.
Even with 4GB of RAM, whether or not the main loop ran to completion seemed to depend on what else (other than R) was also running.
(The explanation for this behaviour seems to be that Windows places some presumably system-related items right in the middle of the available physical address space, approximately halving the maximum size of arrays that can be put there by application programs).
The Fortran version is slightly slower than the R version over the first few cycles of the for() loop. Presumably there is some overhead associated with passing data to the library routine.
With larger kmax values, the Fortran version is much quicker.
The scripts can easily be modified to save the primes to a disk file if required, but note that creating the file is slow (filesize is around 50MB).
kmax | number of primes | times (elapsed) | |
---|---|---|---|
R | FINDP5 (F77) | ||
100 | 25 | 0.03 | 0.66 |
1000 | 168 | 0.00 | 0.64 |
104 | 1229 | 0.02 | 0.64 |
105 | 9592 | 0.14 | 0.64 |
106 | 78498 | 1.61 | 0.74 |
107 | 664579 | 17.22 | 1.79 |
108 | 5761455 | 181.89 | 13.36 |
Tests on a Linux workstation
The scripts were also run on a more modern Linux machine with i7 processor.
Results from the interpreted R code were first compared with those using R's byte-code compiler.
The Fortran routine FINDP5 was compiled into a shared object library.
Results are shown below.
kmax | number of primes | times (elapsed) | ||
---|---|---|---|---|
R | R, compiled | FINDP5 (F77) | ||
100 | 25 | 0.000 | 0.000 | 0.20 |
1000 | 168 | 0.002 | 0.000 | 0.18 |
104 | 1229 | 0.024 | 0.002 | 0.18 |
105 | 9592 | 0.28 | 0.02 | 0.18 |
106 | 78498 | 2.86 | 0.22 | 0.19 |
107 | 664579 | 29.71 | 2.16 | 0.46 |
108 | 5761455 | 312.54 | 21.98 | 3.31 |
Surprisingly, the interpreted R script ran appreciably slower than on the 3GHz Pentium (time to complete the primes search up to 108 was 312.54 seconds).
The byte-code compiler dramatically reduced execution to 21.98 seconds (7% of the raw script time).
Finally, on the Linux machine, the Fortran routine took just 3.31 seconds to complete the final cycle through the loop, a lot quicker than the equivalent time on the old Pentium (as expected).
The R scripts
1. R, interpreted-only
2. R, using byte code compiler
3. Calling the Fortran routine FINDP5
The Fortran routine FINDP5
References
[2] Daintith, J. & Nelson, R.D. 1989. THE PENGUIN DICTIONARY OF MATHEMATICS. Penguin Group, London.
[3] Abramowitz, M. & Stegun, I.A. 1965. Handbook of Mathematical Functions. Dover, New York.
[4] Matloff, N. 2011. THE ART OF R PROGRAMMING. No Starch Press, San Francisco.