20180609, 13:21  #1 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3^{2}×7×47 Posts 
Mlucasspecific reference thread
This thread is intended to hold only reference material specifically for Mlucas
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Offtopic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.) Download and setup information for mlucas is located at http://www.mersenneforum.org/mayer/README.html For Windows 10 or above, install WSL, a Linux distribution for WSL, add buildessential to the Linux distro, then in WSL Linux download the Mlucas source, extract the files, compile Mlucas, run self tests, etc. For Windows 8x or below, not supported /no build method at Mlucas V20.x. Earlier versions may be compiled using msys2 and then run as native Windows (single threaded) applications on Windows 7 for example. For Linux, see the readme. For MacOS or Android, see the readme. Additional help may be found in the Mlucas subforum. Table of contents
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20211117 at 12:41 Reason: added Optimizing core count for fastest iteration time 
20180609, 13:49  #2  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13442_{8} Posts 
Save file format as described by ewmayer
As posted at http://www.mersenneforum.org/showpos...4&postcount=36 by ewmayer, except as updated in bold, first describing the V17.x format:, then planned V18 format additions:
Quote:
I note that the exponent p itself is in the file name, not in the contents. Also note mlucas V19 PRP implementation is type 1 residues only. Save file names are p<exponent>, q<exponent>, p<exponent>.10M, etc. For example, for M332220523, p33220523, q33220523, p33220523.10M, and eventually p33220523.20M and so on. File sizes derived from the preceding are:
(No data yet on V19.1, V20, etc.) Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210120 at 17:10 Reason: update for V19 PRP type etc 

20190812, 17:07  #3 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3^{2}·7·47 Posts 
Mlucas v17.1 h help output
Code:
Mlucas 17.1 http://hogranch.com/mayer/README.html INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64bit Version, compiled with Gnu C [or other compatible], Version 8.2.0. INFO: CPU supports SSE2 instruction set, but using scalar floatingpoint build. INFO: Using inlinemacro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64bitsignificand form of floatingdouble rounding constant for scalarmode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: testing FFT radix tables... For the full list of command line options, run the program with the h flag. Mlucas command line options: Symbol and abbreviation key: <CR> : carriage return  : separator for oneofthefollowing multiplechoice menus [] : encloses optional arguments {} : denotes usersupplied numerical arguments of the type noted. ({int} means nonnegative integer, {+int} = positive int, {float} = float.) argument : Vertical stacking indicates argument short 'nickname' options, arg : e.g. in this example 'arg' can be used in place of 'argument'. Supported arguments: <CR> Default mode: looks for a worktodo.ini file in the local directory; if none found, prompts for manual keyboard entry Help submenus by topic. No additional arguments may follow the displayed ones: s Postbuild selftesting for various FFTlength rnages. fftlen FFTlength setting. radset FFT radixset specification. m[ersenne] Mersennenumber primality testing. f[ermat] Fermatnumber primality testing. iters Iterationnumber setting. nthreadcpu Setting threadcount and CPU core affinity. *** NOTE: *** The following selftest options will cause an mlucas.cfg file containing the optimal FFT radix set for the runlength(s) tested to be created (if one did not exist previously) or appended (if one did) with new timing data. Such a filewrite is triggered by each complete set of FFT radices available at a given FFT length being tested, i.e. by a selftest without a userspecified radset argument. (A userspecific Mersenne exponent may be supplied via the m flag; if none is specified, the program will use the largest permissible exponent for the given FFT length, based on its internal lengthsetting algorithm). The user must specify the number of iterations for the selftest via the iters flag; while it is not required, it is strongly recommended to stick to one of the standard timingtest values of iters = [100,1000,10000], with the larger values being preferred for multithreaded timing tests, in order to assure a decently large slice of CPU time. Similarly, it is recommended to not use the m flag for such tests, unless roundoff error levels on a given compute platform are such that the default exponent at one or more FFT lengths of interest prevents a reasonable sampling of available radix sets at same. If the user lets the program set the exponent and uses one of the aforementioned standard selftest iteration counts, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest result matches the internally stored precomputed one for the given default exponent at the iteration count in question, with eligible radix sets consisting of those for which the roundoff error remains below an acceptable threshold. If the user instead specifies the exponent (only allowed for a singleFFTlength timing test)**************** and/or a nondefault iteration number, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest results match each other? ********* check logic here ******* This is important for tuning code parameters to your particular platform. FOR BEST RESULTS, RUN ANY SELFTESTS UNDER ZERO OR CONSTANTLOAD CONDITIONS s {...} Selftest, user must also supply exponent [via m or f] and/or FFT length to use. s tiny Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s t This will take around 1 minute on a fast CPU.. s small Runs 100iteration selftests on set of 24 Mersenne exponents, ranging from 173431 to 1245877 s s This will take around 10 minutes on a fast CPU.. **** THIS IS THE ONLY SELFTEST ORDINARY USERS ARE RECOMMENDED TO DO: ****** * * * s medium Runs set of 24 Mersenne exponents, ranging from 1327099 to 9530803 * s m This will take around an hour on a fast CPU. * * * **************************************************************************** s large Runs set of 24 Mersenne exponents, ranging from 10151971 to 72851621 s l This will take around an hour on a fast CPU. s huge Runs set of 16 Mersenne exponents, ranging from 77597293 to 282508657 s h This will take a couple of hours on a fast CPU. s all Runs 100iteration selftests of all test Mersenne exponents and all FFT radix sets. s a This will take several hours on a fast CPU. fftlen {+int} If {+int} is one of the available FFT lengths (in Kilodoubles), runs all all available FFT radices available at that length, unless the radset flag is invoked (see below for details). If fftlen is invoked without the iters flag, it is assumed the user wishes to do a production run with a nondefault FFT length, In this case the program requires a valid worktodo.inifile entry with exponent not more than 5% larger than the default maximum for that FFT length. If fftlen is invoked with a usersupplied value of iters but without a usersupplied exponent, the program will do the specified number of iterations using the default selftest Mersenne or Fermat exponent for that FFT length. If fftlen is invoked with a usersupplied value of iters and either the m or f flag and a usersupplied exponent, the program will do the specified number of iterations of either the LucasLehmer test with starting value 4 (m) or the Pe'pin test with starting value 3 (f) on the userspecified modulus. In either of the latter 2 cases, the program will produce a cfgfile entry based on the timing results, assuming at least one radix set ran the specified #iters to completion without suffering a fatal error of some kind. Use this to find the optimal radix set for a single FFT length on your hardware. NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLEFFT LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE! radset {int} Specific index of a set of complex FFT radices to use, based on the big select table in the function get_fft_radices(). Requires a supported value of fftlen to also be specified, as well as a value of iters for the timing test. m [{+int}] Performs a LucasLehmer primality test of the Mersenne number M(int) = 2^int  1, where int must be an odd prime. If iters is also invoked, this indicates a timing test. and requires suitable added arguments (fftlen and, optionally, radset) to be supplied. If the fftlen option (and optionally radset) is also invoked but iters is not, the program first checks the first line of the worktodo.ini file to see if the assignment specified there is a LucasLehmer test with the same exponent as specified via the m argument. If so, the fftlen argument is treated as a user override of the default FFT length for the exponent. If radset is also invoked, this is similarly treated as a user specified radix set for the userset FFT length; otherwise the program will use the cfg file to select the radix set to be used for the userforced FFT length. If the worktodo.ini file entry does not match the m value, a set of timing selftests is run on the userspecified Mersenne number using all sets of FFT radices available at the specified FFT length. If the fftlen option is not invoked, the selftests use all sets of FFT radices available at that exponent's default FFT length. Use this to find the optimal radix set for a single given Mersenne number exponent on your hardware, similarly to the fftlen option. Performs as many iterations as specified via the iters flag [required]. f {int} Performs a base3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1. If desired this can be invoked together with the fftlen option. as for the Mersennenumber selftests (see notes about the m flag; note that not all FFT lengths supported for m are available for f). Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations as specified via the iters flag [required]. iters {int} Do {int} selftest iterations of the type determined by the modulusrelated options (s/m = LucasLehmer test iterations with initial seed 4, f = Pe'pintest squarings with initial seed 3. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20191118 at 14:34 
20200520, 19:07  #4 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1011100100010_{2} Posts 
Mlucas install script for Linux
Haven't tried it myself, but there's a post about one at https://mersenneforum.org/showpost.p...9&postcount=34
A second version of that is at https://mersenneforum.org/showpost.php?p=569920&postcount=83; third version at https://mersenneforum.org/showpost.p...0&postcount=89 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210201 at 19:46 
20201119, 22:32  #5 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
13442_{8} Posts 
Mlucas builds for Linux (or for running on WSL on Windows)
How I built Mlucas (v19) in WSL / Ubuntu 18.04 for multiple processor types
(rename the executable between builds to identify the flavor) Note these are mostly untested. basic x8664, & presumably the best bet for Knight's Corner Xeon Phi: Code:
gcc c O3 DUSE_THREADS ../src/*.c >& build.log grep error build.log gcc o Mlucas *.o lm lpthread lrt Code:
gcc c O3 DUSE_SSE2 DUSE_THREADS ../src/*.c >& build.log grep error build.log gcc o Mlucas *.o lm lpthread lrt Code:
gcc c O3 DUSE_AVX2 mavx2 DUSE_THREADS ../src/*.c >& build.log grep error build.log gcc o Mlucas *.o lm lpthread lrt Code:
gcc c O3 DUSE_AVX512 march=knl DUSE_THREADS ../src/*.c >& build.log grep error build.log gcc o Mlucas *.o lm lpthread lrt Code:
gcc c O3 DUSE_AVX512 march=skylakeavx512 DUSE_THREADS ../src/*.c >& build.log grep error build.log gcc o Mlucas *.o lm lpthread lrt https://www.mersenneforum.org/mayer/README.html Attachments are Mlucas v19 builds intended for Linux and were built on Ubuntu v18.04 running on WSL / Win10 on an i78750H. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210831 at 23:52 Reason: minor edits 
20201127, 15:01  #6 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5922_{10} Posts 
Mlucas builds for Windows
Building Mlucas v19 for Windows in msys2 is similar to building for Linux or WSL, except:
remove DUSE_THREADS and lpthread for Windows singlethreaded end use. How I built or attempted in msys2 for Windows singlethreaded environments: SSE2 such as Xeon x5650, e5645, E526xx Code:
gcc c O3 DUSE_SSE2 ../src/*.c >& build.log grep error build.log gcc o Mlucassse2 *.o lm lrt Code:
gcc c O3 ../src/*.c >& build.log grep error build.log gcc o Mlucasx86 *.o lm lrt Code:
gcc c O3 DUSE_AVX2 mavx2 ../src/*.c >& build.log grep error build.log gcc o Mlucasfma3 *.o lm lrt Code:
gcc c O3 DUSE_AVX512 march=skylakeavx512 ../src/*.c >& build.log grep error build.log gcc o Mlucasavx512 *.o lm lrt Attachments are singlethreaded Mlucas v19 builds intended for Windows 7 or higher, and were built in msys2 running on Windows 7 Pro 64bit on a dualXeonE5645 HP Z600. (Note, because of changes in the software requirements, Mlucas v20.x no longer will build with this method. So there currently is no documented path to producing actual Windows executables for Mlucas v20.x or presumably mfactor v20.x.) Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210901 at 00:17 Reason: update for versions & limits 
20210120, 17:26  #7 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3^{2}×7×47 Posts 
V17.0 apparently normal run
From the beginning of a .stat file (V17.0 I think):
Code:
INFO: no restart file found...starting run from scratch. M332220523: using FFT length 18432K = 18874368 8byte floats. this gives an average 17.601676676008438 bits per digit Using complex FFT radices 36 16 16 32 32 [Jul 22 13:31:39] M332220523 Iter# = 10000 [ 0.00% complete] clocks = 02:16:04.515 [816.4516 sec/iter] Res64: 1A313D709BFA6663. AvgMaxErr = 0.171224865. MaxErr = 0.250000000. [Jul 22 15:47:01] M332220523 Iter# = 20000 [ 0.01% complete] clocks = 02:15:19.019 [811.9019 sec/iter] Res64: 73DC7A5C8B839081. AvgMaxErr = 0.171704563. MaxErr = 0.234375000. [Jul 22 18:02:29] M332220523 Iter# = 30000 [ 0.01% complete] clocks = 02:15:25.632 [812.5633 sec/iter] Res64: B928CD22434EEC7C. AvgMaxErr = 0.171970012. MaxErr = 0.234375000. [Jul 22 20:17:54] M332220523 Iter# = 40000 [ 0.01% complete] clocks = 02:15:22.185 [812.2186 sec/iter] Res64: 307ECB47139AEB31. AvgMaxErr = 0.172004342. MaxErr = 0.250000000. [Jul 22 22:33:29] M332220523 Iter# = 50000 [ 0.02% complete] clocks = 02:15:32.470 [813.2471 sec/iter] Res64: 3F64ED9E01C13B1D. AvgMaxErr = 0.171687719. MaxErr = 0.218750000. [Jul 23 00:49:15] M332220523 Iter# = 60000 [ 0.02% complete] clocks = 02:15:43.121 [814.3121 sec/iter] Res64: B238D7DE50AFACED. AvgMaxErr = 0.171868494. MaxErr = 0.281250000. [Jul 23 03:04:36] M332220523 Iter# = 70000 [ 0.02% complete] clocks = 02:15:18.738 [811.8738 sec/iter] Res64: 892C20B5F5C4776C. AvgMaxErr = 0.171980529. MaxErr = 0.234375000. [Jul 23 05:20:00] M332220523 Iter# = 80000 [ 0.02% complete] clocks = 02:15:20.844 [812.0844 sec/iter] Res64: 6374CC678224D058. AvgMaxErr = 0.171895016. MaxErr = 0.250000000. [Jul 23 07:35:13] M332220523 Iter# = 90000 [ 0.03% complete] clocks = 02:15:10.622 [811.0622 sec/iter] Res64: 393DCD2788664405. AvgMaxErr = 0.172066525. MaxErr = 0.250000000. [Jul 23 09:51:21] M332220523 Iter# = 100000 [ 0.03% complete] clocks = 02:16:05.131 [816.5132 sec/iter] Res64: 91B688264B5B3F39. AvgMaxErr = 0.171926060. MaxErr = 0.250000000.
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210901 at 00:18 
20210120, 17:47  #8 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3^{2}·7·47 Posts 
What it may look like when something is not working correctly
See https://mersenneforum.org/showpost.p...8&postcount=76
Any of the following are reason to view the interim or final results with suspicion. The original poster was correct to doubt the accuracy of the run. Some of these will also apply to other software.
For comparison, brief alternate runs in gpuowl on the same exponent follow. Note the res64 matches at 200K and 310K, and no GEC errors logged, on these independent runs, indicating high reliability. Code:
20210120 12:19:01 gpuowl v6.11380g79ea0cc 20210120 12:19:01 config: user kriesel cpu asr2/radeonvii3 d 3 use NO_ASM maxAlloc 13000 log 10000 20210120 12:19:01 device 3, unique id '' 20210120 12:19:01 asr2/radeonvii3 110899639 FFT: 6M 1K:12:256 (17.63 bpw) 20210120 12:19:01 asr2/radeonvii3 Expected maximum carry32: 39160000 20210120 12:19:02 asr2/radeonvii3 OpenCL args "DEXP=110899639u DWIDTH=1024u DSMALL_HEIGHT=256u DMIDDLE=12u DPM1=0 DAMDGPU=1 DWEIGHT_STEP_MINUS_1=0x9.70d2e6d4d6eb8p5 DIWEIGHT_STEP_MINUS_1=0xe.947b562a8bfep6 DNO_ASM=1 clunsafemathoptimizations clstd=CL2.0 clfinitemathonly " 20210120 12:19:07 asr2/radeonvii3 OpenCL compilation in 4.48 s 20210120 12:19:08 asr2/radeonvii3 110899639 OK 0 loaded: blockSize 400, 0000000000000003 20210120 12:19:08 asr2/radeonvii3 validating proof residues for power 8 20210120 12:19:08 asr2/radeonvii3 Proof using power 8 20210120 12:19:09 asr2/radeonvii3 110899639 OK 800 0.00%; 881 us/it; ETA 1d 03:09; 6191b4b775c8edca (check 0.59s) 20210120 12:19:18 asr2/radeonvii3 110899639 OK 10000 0.01%; 882 us/it; ETA 1d 03:11; 59d707dfd3e8a6e5 (check 0.59s) 20210120 12:19:27 asr2/radeonvii3 110899639 OK 20000 0.02%; 884 us/it; ETA 1d 03:13; 59d112f7284edbb4 (check 0.59s) 20210120 12:19:37 asr2/radeonvii3 110899639 OK 30000 0.03%; 883 us/it; ETA 1d 03:12; b9114934905a8443 (check 0.59s) 20210120 12:19:46 asr2/radeonvii3 110899639 OK 40000 0.04%; 882 us/it; ETA 1d 03:09; f5e1840cc2c9ae6f (check 0.59s) 20210120 12:19:56 asr2/radeonvii3 110899639 OK 50000 0.05%; 881 us/it; ETA 1d 03:08; 3a6a9896d868f34e (check 0.60s) 20210120 12:20:05 asr2/radeonvii3 110899639 OK 60000 0.05%; 882 us/it; ETA 1d 03:09; 581477e4ea2f2fd5 (check 0.62s) 20210120 12:20:15 asr2/radeonvii3 110899639 OK 70000 0.06%; 897 us/it; ETA 1d 03:37; 76171fa52b081f88 (check 0.60s) 20210120 12:20:24 asr2/radeonvii3 110899639 OK 80000 0.07%; 885 us/it; ETA 1d 03:15; b87b7c28d301446e (check 0.61s) 20210120 12:20:34 asr2/radeonvii3 110899639 OK 90000 0.08%; 886 us/it; ETA 1d 03:16; 084955167e9c1678 (check 0.62s) 20210120 12:20:43 asr2/radeonvii3 110899639 OK 100000 0.09%; 886 us/it; ETA 1d 03:17; 6866029ebdf6f42f (check 0.60s) 20210120 12:20:53 asr2/radeonvii3 110899639 OK 110000 0.10%; 884 us/it; ETA 1d 03:13; 00fb4982ad9a9ac6 (check 0.65s) 20210120 12:21:02 asr2/radeonvii3 110899639 OK 120000 0.11%; 887 us/it; ETA 1d 03:19; f2480bc5b17f8151 (check 0.60s) 20210120 12:21:12 asr2/radeonvii3 110899639 OK 130000 0.12%; 884 us/it; ETA 1d 03:13; f1eb30b6262e11ba (check 0.59s) 20210120 12:21:21 asr2/radeonvii3 110899639 OK 140000 0.13%; 880 us/it; ETA 1d 03:05; e334d2375c872f0f (check 0.59s) 20210120 12:21:30 asr2/radeonvii3 110899639 OK 150000 0.14%; 881 us/it; ETA 1d 03:06; 951a0e26b6da9927 (check 0.59s) 20210120 12:21:40 asr2/radeonvii3 110899639 OK 160000 0.14%; 881 us/it; ETA 1d 03:05; ff557ebe567e1f0d (check 0.59s) 20210120 12:21:49 asr2/radeonvii3 110899639 OK 170000 0.15%; 882 us/it; ETA 1d 03:07; 3d30664ec2bf6118 (check 0.59s) 20210120 12:21:59 asr2/radeonvii3 110899639 OK 180000 0.16%; 881 us/it; ETA 1d 03:06; 472b05a96d9ecf1a (check 0.59s) 20210120 12:22:08 asr2/radeonvii3 110899639 OK 190000 0.17%; 880 us/it; ETA 1d 03:04; 12cd354415712251 (check 0.59s) 20210120 12:22:17 asr2/radeonvii3 110899639 OK 200000 0.18%; 882 us/it; ETA 1d 03:07; b56e64d2ec39cd4d (check 0.59s) 20210120 12:22:27 asr2/radeonvii3 110899639 OK 210000 0.19%; 881 us/it; ETA 1d 03:05; f84002c6841db007 (check 0.59s) 20210120 12:22:36 asr2/radeonvii3 110899639 OK 220000 0.20%; 881 us/it; ETA 1d 03:05; 57cdfa904d0b3cda (check 0.59s) 20210120 12:22:46 asr2/radeonvii3 110899639 OK 230000 0.21%; 881 us/it; ETA 1d 03:06; 0307b1331c567a43 (check 0.59s) 20210120 12:22:55 asr2/radeonvii3 110899639 OK 240000 0.22%; 881 us/it; ETA 1d 03:05; c9b34a5047ba285b (check 0.59s) 20210120 12:23:05 asr2/radeonvii3 110899639 OK 250000 0.23%; 882 us/it; ETA 1d 03:06; 3f17202d73f429ee (check 0.60s) 20210120 12:23:14 asr2/radeonvii3 110899639 OK 260000 0.23%; 881 us/it; ETA 1d 03:05; 93d688231b646b99 (check 0.60s) 20210120 12:23:23 asr2/radeonvii3 110899639 OK 270000 0.24%; 881 us/it; ETA 1d 03:04; 0369c93e11d7a67c (check 0.59s) 20210120 12:23:33 asr2/radeonvii3 110899639 OK 280000 0.25%; 881 us/it; ETA 1d 03:04; e2688fd986ab2216 (check 0.59s) 20210120 12:23:42 asr2/radeonvii3 110899639 OK 290000 0.26%; 880 us/it; ETA 1d 03:03; 8040fd8a9dfcf9eb (check 0.59s) 20210120 12:23:52 asr2/radeonvii3 110899639 OK 300000 0.27%; 881 us/it; ETA 1d 03:04; 198cbf452a3e452b (check 0.59s) 20210120 12:24:01 asr2/radeonvii3 110899639 OK 310000 0.28%; 882 us/it; ETA 1d 03:07; 12f9b4443a1ca408 (check 0.59s) 20210120 12:24:03 asr2/radeonvii3 Stopping, please wait.. Code:
20210120 12:10:14 asr2/radeonvii3 110899639 FFT: 6M 1K:12:256 (17.63 bpw) 20210120 12:10:14 asr2/radeonvii3 Expected maximum carry32: 39160000 20210120 12:10:16 asr2/radeonvii3 OpenCL args "DEXP=110899639u DWIDTH=1024u DSMALL_HEIGHT=256u DMIDDLE=12u DPM1=0 DAMDGPU=1 DWEIGHT_STEP_MINUS_1=0x9.70d2e6d4d6eb8p5 DIWEIGHT_STEP_MINUS_1=0xe.947b562a8bfep6 DNO_ASM=1 clunsafemathoptimizations clstd=CL2.0 clfinitemathonly " 20210120 12:10:20 asr2/radeonvii3 OpenCL compilation in 4.42 s 20210120 12:10:21 asr2/radeonvii3 110899639 OK 0 loaded: blockSize 400, 0000000000000003 20210120 12:10:21 asr2/radeonvii3 validating proof residues for power 8 20210120 12:10:21 asr2/radeonvii3 Proof using power 8 20210120 12:10:23 asr2/radeonvii3 110899639 OK 800 0.00%; 884 us/it; ETA 1d 03:13; 6191b4b775c8edca (check 0.59s) 20210120 12:13:19 asr2/radeonvii3 110899639 OK 200000 0.18%; 884 us/it; ETA 1d 03:10; b56e64d2ec39cd4d (check 0.59s) 20210120 12:14:55 asr2/radeonvii3 Stopping, please wait.. 20210120 12:14:55 asr2/radeonvii3 110899639 OK 308400 0.28%; 882 us/it; ETA 1d 03:05; 9e394f7f61bb85d7 (check 0.60s) 20210120 12:14:56 asr2/radeonvii3 Exiting because "stop requested" 20210120 12:14:56 asr2/radeonvii3 Bye 20210120 12:15:23 config: user kriesel cpu asr2/radeonvii3 d 3 use NO_ASM maxAlloc 13000 log 10000 20210120 12:15:23 device 3, unique id '' 20210120 12:15:23 asr2/radeonvii3 110899639 FFT: 6M 1K:12:256 (17.63 bpw) 20210120 12:15:23 asr2/radeonvii3 Expected maximum carry32: 39160000 20210120 12:15:24 asr2/radeonvii3 OpenCL args "DEXP=110899639u DWIDTH=1024u DSMALL_HEIGHT=256u DMIDDLE=12u DPM1=0 DAMDGPU=1 DWEIGHT_STEP_MINUS_1=0x9.70d2e6d4d6eb8p5 DIWEIGHT_STEP_MINUS_1=0xe.947b562a8bfep6 DNO_ASM=1 clunsafemathoptimizations clstd=CL2.0 clfinitemathonly " 20210120 12:15:29 asr2/radeonvii3 OpenCL compilation in 4.55 s 20210120 12:15:30 asr2/radeonvii3 110899639 OK 308400 loaded: blockSize 400, 9e394f7f61bb85d7 20210120 12:15:30 asr2/radeonvii3 validating proof residues for power 8 20210120 12:15:30 asr2/radeonvii3 Proof using power 8 20210120 12:15:31 asr2/radeonvii3 110899639 OK 309200 0.28%; 895 us/it; ETA 1d 03:29; f60084731d7963cc (check 0.59s) 20210120 12:15:32 asr2/radeonvii3 110899639 OK 310000 0.28%; 885 us/it; ETA 1d 03:11; 12f9b4443a1ca408 (check 0.59s) 20210120 12:15:42 asr2/radeonvii3 110899639 OK 320000 0.29%; 886 us/it; ETA 1d 03:12; f2d36a6ab5abc361 (check 0.59s) 20210120 12:15:51 asr2/radeonvii3 110899639 OK 330000 0.30%; 884 us/it; ETA 1d 03:09; 44b44f2c3550f717 (check 0.59s) 20210120 12:16:01 asr2/radeonvii3 110899639 OK 340000 0.31%; 886 us/it; ETA 1d 03:13; b30686ce36dcf10c (check 0.61s) 20210120 12:16:10 asr2/radeonvii3 110899639 OK 350000 0.32%; 883 us/it; ETA 1d 03:08; b04af45d28e73cc9 (check 0.61s) 20210120 12:16:20 asr2/radeonvii3 110899639 OK 360000 0.32%; 882 us/it; ETA 1d 03:04; fe9ea80343df78e1 (check 0.59s) 20210120 12:16:29 asr2/radeonvii3 110899639 OK 370000 0.33%; 882 us/it; ETA 1d 03:04; 6afc77809e993a9f (check 0.59s) 20210120 12:16:39 asr2/radeonvii3 110899639 OK 380000 0.34%; 881 us/it; ETA 1d 03:03; 280d5489847a1cec (check 0.59s) 20210120 12:16:48 asr2/radeonvii3 110899639 OK 390000 0.35%; 882 us/it; ETA 1d 03:05; a46e45e9343c52a9 (check 0.59s) 20210120 12:16:58 asr2/radeonvii3 110899639 OK 400000 0.36%; 882 us/it; ETA 1d 03:05; 9e6c6fb76ef72a19 (check 0.59s) 20210120 12:17:07 asr2/radeonvii3 110899639 OK 410000 0.37%; 881 us/it; ETA 1d 03:03; f861f42a6182792e (check 0.60s) 20210120 12:17:16 asr2/radeonvii3 110899639 OK 420000 0.38%; 881 us/it; ETA 1d 03:02; a63fbc909d404859 (check 0.60s) 20210120 12:17:26 asr2/radeonvii3 110899639 OK 430000 0.39%; 882 us/it; ETA 1d 03:04; 496ebdb31daffa63 (check 0.62s) 20210120 12:17:36 asr2/radeonvii3 110899639 OK 440000 0.40%; 914 us/it; ETA 1d 04:02; f1d3e4d3f9bff432 (check 0.59s) 20210120 12:17:45 asr2/radeonvii3 110899639 OK 450000 0.41%; 881 us/it; ETA 1d 03:02; fbf71e75373c1a72 (check 0.59s) 20210120 12:17:54 asr2/radeonvii3 110899639 OK 460000 0.41%; 881 us/it; ETA 1d 03:02; 42efdc145cda3529 (check 0.61s) 20210120 12:18:04 asr2/radeonvii3 110899639 OK 470000 0.42%; 890 us/it; ETA 1d 03:19; 93134f128a91d38e (check 0.66s) 20210120 12:18:13 asr2/radeonvii3 110899639 OK 480000 0.43%; 884 us/it; ETA 1d 03:07; fd1c5887489a268f (check 0.61s) 20210120 12:18:23 asr2/radeonvii3 110899639 OK 490000 0.44%; 883 us/it; ETA 1d 03:05; 1f58ebc4c56caa98 (check 0.59s) 20210120 12:18:32 asr2/radeonvii3 110899639 OK 500000 0.45%; 886 us/it; ETA 1d 03:10; a5e9c983cefcd245 (check 0.59s) Last fiddled with by kriesel on 20210124 at 19:45 
20210702, 21:35  #9 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3^{2}×7×47 Posts 
Mlucas v19.0 h help output
As generated by the program:
Code:
Mlucas 19.0 http://www.mersenneforum.org/mayer/README.html INFO: using 64bitsignificand form of floatingdouble rounding constant for scalarmode DNINT emulation. INFO: testing FFT radix tables... For the full list of command line options, run the program with the h flag. For a list of commandline options grouped by type, run the program with the topic flag. Mlucas command line options: Symbol and abbreviation key: <CR> : carriage return  : separator for oneofthefollowing multiplechoice menus [] : encloses optional arguments {} : denotes usersupplied numerical arguments of the type noted. ({int} means nonnegative integer, {+int} = positive int, {float} = float.) argument : Vertical stacking indicates argument short 'nickname' options, arg : e.g. in this example 'arg' can be used in place of 'argument'. Supported arguments: <CR> Default mode: looks for a worktodo.ini file in the local directory; if none found, prompts for manual keyboard entry Help submenus by topic. No additional arguments may follow the displayed ones: s Postbuild selftesting for various FFTlength rnages. fftlen FFTlength setting. radset FFT radixset specification. m[ersenne] Mersennenumber primality testing. f[ermat] Fermatnumber primality testing. shift ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration0 residue). prp Probableprimality testing mode. iters Iterationnumber setting. nthreadcpu Setting threadcount and CPU core affinity. *** NOTE: *** The following selftest options will cause an mlucas.cfg file containing the optimal FFT radix set for the runlength(s) tested to be created (if one did not exist previously) or appended (if one did) with new timing data. Such a filewrite is triggered by each complete set of FFT radices available at a given FFT length being tested, i.e. by a selftest without a userspecified radset argument. (A userspecific Mersenne exponent may be supplied via the m flag; if none is specified, the program will use the largest permissible exponent for the given FFT length, based on its internal lengthsetting algorithm). The user must specify the number of iterations for the selftest via the iters flag; while it is not required, it is strongly recommended to stick to one of the standard timingtest values of iters = [100,1000,10000], with the larger values being preferred for multithreaded timing tests, in order to assure a decently large slice of CPU time. Similarly, it is recommended to not use the m flag for such tests, unless roundoff error levels on a given compute platform are such that the default exponent at one or more FFT lengths of interest prevents a reasonable sampling of available radix sets at same. If the user lets the program set the exponent and uses one of the aforementioned standard selftest iteration counts, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest result matches the internally stored precomputed one for the given default exponent at the iteration count in question, with eligible radix sets consisting of those for which the roundoff error remains below an acceptable threshold. If the user instead specifies the exponent (only allowed for a singleFFTlength timing test)**************** and/or a nondefault iteration number, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest results match each other? ********* check logic here This is important for tuning code parameters to your particular platform. FOR BEST RESULTS, RUN ANY SELFTESTS UNDER ZERO OR CONSTANTLOAD CONDITIONS s {...} Selftest, user must also supply exponent [via m or f] and/or FFT length to use. s tiny Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s t This will take around 1 minute on a fast CPU.. s small Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s s This will take around 10 minutes on a fast CPU.. **** THIS IS THE ONLY SELFTEST ORDINARY USERS ARE RECOMMENDED TO DO: ****** * * * s medium Runs set of 16 Mersenne exponents, ranging from 2614999 to 9530803 * s m This will take around an hour on a fast CPU. * * * **************************************************************************** s large Runs set of 24 Mersenne exponents, ranging from 10151971 to 72123137 s l This will take around an hour on a fast CPU. s huge Runs set of 16 Mersenne exponents, ranging from 76821337 to 282508657 s h This will take a couple of hours on a fast CPU. s all Runs 100iteration selftests of all test Mersenne exponents and all FFT radix sets. s a This will take several hours on a fast CPU. fftlen {+int} If {+int} is one of the available FFT lengths (in Kilodoubles), runs all all available FFT radices available at that length, unless the radset flag is invoked (see below for details). If fftlen is invoked without the iters flag, it is assumed the user wishes to do a production run with a nondefault FFT length, In this case the program requires a valid worktodo.inifile entry with exponent not more than 5% larger than the default maximum for that FFT length. If fftlen is invoked with a usersupplied value of iters but without a usersupplied exponent, the program will do the specified number of iterations using the default selftest Mersenne or Fermat exponent for that FFT length. If fftlen is invoked with a usersupplied value of iters and either the m or f flag and a usersupplied exponent, the program will do the specified number of iterations of either the LucasLehmer test with starting value 4 (m) or the Pe'pin test with starting value 3 (f) on the userspecified modulus. In either of the latter 2 cases, the program will produce a cfgfile entry based on the timing results, assuming at least one radix set ran the specified #iters to completion without suffering a fatal error of some kind. Use this to find the optimal radix set for a single FFT length on your hardware. NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLEFFT LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE! radset {int} Specific index of a set of complex FFT radices to use, based on the big select table in the function get_fft_radices(). Requires a supported value of fftlen to also be specified, as well as a value of iters for the timing test. m [{+int}] Performs a LucasLehmer primality test of the Mersenne number M(int) = 2^int  1, where int must be an odd prime. If iters is also invoked, this indicates a timing test. and requires suitable added arguments (fftlen and, optionally, radset) to be supplied. If the fftlen option (and optionally radset) is also invoked but iters is not, the program first checks the first line of the worktodo.ini file to see if the assignment specified there is a LucasLehmer test with the same exponent as specified via the m argument. If so, the fftlen argument is treated as a user override of the default FFT length for the exponent. If radset is also invoked, this is similarly treated as a user specified radix set for the userset FFT length; otherwise the program will use the cfg file to select the radix set to be used for the userforced FFT length. If the worktodo.ini file entry does not match the m value, a set of timing selftests is run on the userspecified Mersenne number using all sets of FFT radices available at the specified FFT length. If the fftlen option is not invoked, the selftests use all sets of FFT radices available at that exponent's default FFT length. Use this to find the optimal radix set for a single given Mersenne number exponent on your hardware, similarly to the fftlen option. Performs as many iterations as specified via the iters flag [required]. f {int} Performs a base3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1. If desired this can be invoked together with the fftlen option. as for the Mersennenumber selftests (see notes about the m flag; note that not all FFT lengths supported for m are available for f). Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations as specified via the iters flag [required]. shift ***SIMD builds only*** Bits by which to circularleftshift the initial seed. This shift count is doubled (modulo the number of bits of the modulus being tested) each iteration. Savefile residues are rightwardshifted by the current shift count before being written to the file; thus savefiles contain the unshifted residue, and separately the current shift count, which the program uses to leftwardshift the savefile residue when the program is restarted from interrupt. The shift count is a 64bit unsigned int (e.g. to accommodate Fermat numbers > F32). prp {int} Instead of running the rigorous primality test defined for the modulus type in question (LucasLehmer test for Mersenne numbers, Pe'pin test for Fermat numbers do a probablyprimality test to the specified integer base b = {int}. For a Mersenne number M(p), starting with initial seed x = b (which must not = 2 or a power of 2), this means do a FermatPRP test, consisting of (p2) iterations of form x = b*x^2 (mod M(p)) plus a final modsquaring x = x^2 (mod M(p)), with M(p) being a probableprime to base b if the result == 1. For a Fermat number F(m), starting with initial seed x = b (which must not = 2 or a power of 2), this means do an EulerPRP test (referred to as a Pe'pin test for these moduli), i.e. do 2^m1 iterations of form x = b*x^2 (mod M(p)), with M(p) being not merely a probable prime but in fact deterministically a prime if the result == 1. The reason we still use the prp flag in the Fermat case is for legacycode compatibility: All prev18 Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the prp flag with a suitable basevalue to override this default choice of base. iters {int} Do {int} selftest iterations of the type determined by the modulusrelated options (s/m = LucasLehmer test iterations with initial seed 4, f = Pe'pintest squarings with initial seed 3. Last fiddled with by kriesel on 20210702 at 21:41 
20210812, 14:32  #10 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3^{2}·7·47 Posts 
Mlucas v19.1 h help output
Info portion will vary depending on the system it is run upon.
Code:
~/mlucas_v19.1/mlucas_v19.1$ ./Mlucasavx2 h Mlucas 19.1 http://www.mersenneforum.org/mayer/README.html INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: Build uses AVX2 instruction set. INFO: Using inlinemacro form of MUL_LOHI64. INFO: Using FMADDbased 100bit modmul routines for factoring. INFO: MLUCAS_PATH is set to "" INFO: using 64bitsignificand form of floatingdouble rounding constant for scalarmode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. INFO: testing FFT radix tables... For the full list of command line options, run the program with the h flag. For a list of commandline options grouped by type, run the program with the topic flag. Mlucas command line options: Symbol and abbreviation key: <CR> : carriage return  : separator for oneofthefollowing multiplechoice menus [] : encloses optional arguments {} : denotes usersupplied numerical arguments of the type noted. ({int} means nonnegative integer, {+int} = positive int, {float} = float.) argument : Vertical stacking indicates argument short 'nickname' options, arg : e.g. in this example 'arg' can be used in place of 'argument'. Supported arguments: <CR> Default mode: looks for a worktodo.ini file in the local directory; if none found, prompts for manual keyboard entry Help submenus by topic. No additional arguments may follow the displayed ones: s Postbuild selftesting for various FFTlength rnages. fftlen FFTlength setting. radset FFT radixset specification. m[ersenne] Mersennenumber primality testing. f[ermat] Fermatnumber primality testing. shift ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration0 residue). prp Probableprimality testing mode. iters Iterationnumber setting. nthreadcpu Setting threadcount and CPU core affinity. *** NOTE: *** The following selftest options will cause an mlucas.cfg file containing the optimal FFT radix set for the runlength(s) tested to be created (if one did not exist previously) or appended (if one did) with new timing data. Such a filewrite is triggered by each complete set of FFT radices available at a given FFT length being tested, i.e. by a selftest without a userspecified radset argument. (A userspecific Mersenne exponent may be supplied via the m flag; if none is specified, the program will use the largest permissible exponent for the given FFT length, based on its internal lengthsetting algorithm). The user must specify the number of iterations for the selftest via the iters flag; while it is not required, it is strongly recommended to stick to one of the standard timingtest values of iters = [100,1000,10000], with the larger values being preferred for multithreaded timing tests, in order to assure a decently large slice of CPU time. Similarly, it is recommended to not use the m flag for such tests, unless roundoff error levels on a given compute platform are such that the default exponent at one or more FFT lengths of interest prevents a reasonable sampling of available radix sets at same. If the user lets the program set the exponent and uses one of the aforementioned standard selftest iteration counts, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest result matches the internally stored precomputed one for the given default exponent at the iteration count in question, with eligible radix sets consisting of those for which the roundoff error remains below an acceptable threshold. If the user instead specifies the exponent (only allowed for a singleFFTlength timing test)**************** and/or a nondefault iteration number, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest results match each other? ********* check logic here ******* This is important for tuning code parameters to your particular platform. FOR BEST RESULTS, RUN ANY SELFTESTS UNDER ZERO OR CONSTANTLOAD CONDITIONS s {...} Selftest, user must also supply exponent [via m or f] and/or FFT length to use. s tiny Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s t This will take around 1 minute on a fast CPU.. s small Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s s This will take around 10 minutes on a fast CPU.. **** THIS IS THE ONLY SELFTEST ORDINARY USERS ARE RECOMMENDED TO DO: ****** * * * s medium Runs set of 16 Mersenne exponents, ranging from 2614999 to 9530803 * s m This will take around an hour on a fast CPU. * * * **************************************************************************** s large Runs set of 24 Mersenne exponents, ranging from 10151971 to 72123137 s l This will take around an hour on a fast CPU. s huge Runs set of 16 Mersenne exponents, ranging from 76821337 to 282508657 s h This will take a couple of hours on a fast CPU. s all Runs 100iteration selftests of all test Mersenne exponents and all FFT radix sets. s a This will take several hours on a fast CPU. fftlen {+int} If {+int} is one of the available FFT lengths (in Kilodoubles), runs all all available FFT radices available at that length, unless the radset flag is invoked (see below for details). If fftlen is invoked without the iters flag, it is assumed the user wishes to do a production run with a nondefault FFT length, In this case the program requires a valid worktodo.inifile entry with exponent not more than 5% larger than the default maximum for that FFT length. If fftlen is invoked with a usersupplied value of iters but without a usersupplied exponent, the program will do the specified number of iterations using the default selftest Mersenne or Fermat exponent for that FFT length. If fftlen is invoked with a usersupplied value of iters and either the m or f flag and a usersupplied exponent, the program will do the specified number of iterations of either the LucasLehmer test with starting value 4 (m) or the Pe'pin test with starting value 3 (f) on the userspecified modulus. In either of the latter 2 cases, the program will produce a cfgfile entry based on the timing results, assuming at least one radix set ran the specified #iters to completion without suffering a fatal error of some kind. Use this to find the optimal radix set for a single FFT length on your hardware. NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLEFFT LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE! radset {int} Specific index of a set of complex FFT radices to use, based on the big select table in the function get_fft_radices(). Requires a supported value of fftlen to also be specified, as well as a value of iters for the timing test. m [{+int}] Performs a LucasLehmer primality test of the Mersenne number M(int) = 2^int  1, where int must be an odd prime. If iters is also invoked, this indicates a timing test. and requires suitable added arguments (fftlen and, optionally, radset) to be supplied. If the fftlen option (and optionally radset) is also invoked but iters is not, the program first checks the first line of the worktodo.ini file to see if the assignment specified there is a LucasLehmer test with the same exponent as specified via the m argument. If so, the fftlen argument is treated as a user override of the default FFT length for the exponent. If radset is also invoked, this is similarly treated as a user specified radix set for the userset FFT length; otherwise the program will use the cfg file to select the radix set to be used for the userforced FFT length. If the worktodo.ini file entry does not match the m value, a set of timing selftests is run on the userspecified Mersenne number using all sets of FFT radices available at the specified FFT length. If the fftlen option is not invoked, the selftests use all sets of FFT radices available at that exponent's default FFT length. Use this to find the optimal radix set for a single given Mersenne number exponent on your hardware, similarly to the fftlen option. Performs as many iterations as specified via the iters flag [required]. f {int} Performs a base3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1. If desired this can be invoked together with the fftlen option. as for the Mersennenumber selftests (see notes about the m flag; note that not all FFT lengths supported for m are available for f). Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations as specified via the iters flag [required]. shift ***SIMD builds only*** Bits by which to circularleftshift the initial seed. This shift count is doubled (modulo the number of bits of the modulus being tested) each iteration. Savefile residues are rightwardshifted by the current shift count before being written to the file; thus savefiles contain the unshifted residue, and separately the current shift count, which the program uses to leftwardshift the savefile residue when the program is restarted from interrupt. The shift count is a 64bit unsigned int (e.g. to accommodate Fermat numbers > F32). prp {int} Instead of running the rigorous primality test defined for the modulus type in question (LucasLehmer test for Mersenne numbers, Pe'pin test for Fermat numbers do a probablyprimality test to the specified integer base b = {int}. For a Mersenne number M(p), starting with initial seed x = b (which must not = 2 or a power of 2), this means do a FermatPRP test, consisting of (p2) iterations of form x = b*x^2 (mod M(p)) plus a final modsquaring x = x^2 (mod M(p)), with M(p) being a probableprime to base b if the result == 1. For a Fermat number F(m), starting with initial seed x = b (which must not = 2 or a power of 2), this means do an EulerPRP test (referred to as a Pe'pin test for these moduli), i.e. do 2^m1 iterations of form x = b*x^2 (mod M(p)), with M(p) being not merely a probable prime but in fact deterministically a prime if the result == 1. The reason we still use the prp flag in the Fermat case is for legacycode compatibility: All prev18 Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the prp flag with a suitable basevalue to override this default choice of base. iters {int} Do {int} selftest iterations of the type determined by the modulusrelated options (s/m = LucasLehmer test iterations with initial seed 4, f = Pe'pintest squarings with initial seed 3. nthread {int} For multithreadenabled builds, run with this many threads. If the user does not specify a thread count, the default is to run singlethreaded with that thread's affinity set to logical core 0. AFFINITY: The code will attempt to set the affinity of the resulting threads 0:n1 to the sameindexed processor cores  whether this means distinct physical cores is entirely up to the CPU vendor  E.g. Intel uses such a numbering scheme but AMD does not. For this reason as of v17 this option is deprecated in favor of the cpu flag, whose usage is detailed below, with the online README page providing guidance for the corenumbering schemes of popular CPU vendors. If n exceeds the available number of logical processor cores (call it #cpu), the program will halt with an error message. For greater control over affinity setting, use the cpu option, which supports two distinct corespecification syntaxes (which may be mixed together), as follows: cpu {lo[:hi[:incr]]} (All args {int} here) Set thread/CPU affinity. NOTE: This flag and nthread are mutually exclusive: If cpu is used, the threadcount is inferred from the numericargumenttriplet which follows. If only the 'lo' argument of the triplet is supplied, this means 'run singlethreaded with affinity to CPU {lo}.' If the increment (third) argument of the triplet is omitted, it is taken as incr = 1. The CPU set encoded by the integertriplet argument to cpu corresponds to the values of the integer loop index i in the Cloop for(i = lo; i <= hi; i += incr), excluding the loopexit value of i. Thus 'cpu 0:3' and 'cpu 0:3:1' are both exactly equivalent to 'nthread 4', whereas 'cpu 0:6:2' and 'cpu 0:7:2' both specify affinity setting to cores 0,2,4,6, assuming said cores exist. Lastly, note that no whitespace is permitted within the colonseparated numeric field. cpu {triplet0[,triplet1,...]} This is simply an extended version of the above affinity setting syntax in which each of the commaseparated 'triplet' subfields is in the above form and, analogously to the onetripletonly version, no whitespace is permitted within the colonandcommaseparated numeric field. Thus 'cpu 0:3,8:11' and 'cpu 0:3:1,8:11:1' both specify an 8threaded run with affinity set to the core quartets 03 and 811, whereas 'cpu 0:3:2,8:11:2' means run 4threaded on cores 0,2,8,10. As described for the nthread option, it is an error for any core index to exceed the available number of logical processor cores. Last fiddled with by kriesel on 20210813 at 14:58 Reason: added "top of reference tree" link footer 
20210813, 16:24  #11 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3^{2}·7·47 Posts 
Mlucas V20.0 h help output
./Mlucas h produces lesser output and including an error message. As a workaround, use ./Mlucas h printall
Info portion will vary depending on the system it is run upon. There does not appear to be any P1specific help output available at this time. Code:
~/mlucas_v20/obj$ ./Mlucas h printall Mlucas 20.0 http://www.mersenneforum.org/mayer/README.html INFO: testing qfloat routines... System total RAM = 16243, free RAM = 287 INFO: 287 MB of free system RAM detected; will use up to 90% = 258 MB of that, unless user specifies a lower fraction via maxalloc. CPU Family = x86_64, OS = Linux, 64bit Version, compiled with Gnu C [or other compatible], Version 7.4.0. INFO: Build uses AVX2 instruction set. INFO: Using inlinemacro form of MUL_LOHI64. INFO: Using FMADDbased 100bit modmul routines for factoring. INFO: MLUCAS_PATH is set to "" INFO: using 64bitsignificand form of floatingdouble rounding constant for scalarmode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 12 available processor cores. INFO: testing FFT radix tables... For the full list of command line options, run the program with the h flag. For a list of commandline options grouped by type, run the program with the topic flag. Mlucas command line options: Symbol and abbreviation key: <CR> : carriage return  : separator for oneofthefollowing multiplechoice menus [] : encloses optional arguments {} : denotes usersupplied numerical arguments of the type noted. ({int} means nonnegative integer, {+int} = positive int, {float} = float.) argument : Vertical stacking indicates argument short 'nickname' options, arg : e.g. in this example 'arg' can be used in place of 'argument'. Supported arguments: <CR> Default mode: looks for a worktodo.ini file in the local directory; if none found, prompts for manual keyboard entry Help submenus by topic. No additional arguments may follow the displayed ones: s Postbuild selftesting for various FFTlength rnages. fft[len] FFTlength setting. radset FFT radixset specification. m[ersenne] Mersennenumber primality testing. f[ermat] Fermatnumber primality testing. shift ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration0 residue). prp Probableprimality testing mode. iters Iterationnumber setting. nthreadcpu Setting threadcount and CPU core affinity. maxalloc Setting maximumpercentage of available system RAM to use per instance. *** NOTE: *** The following selftest options will cause an mlucas.cfg file containing the optimal FFT radix set for the runlength(s) tested to be created (if one did not exist previously) or appended (if one did) with new timing data. Such a filewrite is triggered by each complete set of FFT radices available at a given FFT length being tested, i.e. by a selftest without a userspecified radset argument. (A userspecific Mersenne exponent may be supplied via the m flag; if none is specified, the program will use the largest permissible exponent for the given FFT length, based on its internal lengthsetting algorithm). The user must specify the number of iterations for the selftest via the iters flag; while it is not required, it is strongly recommended to stick to one of the standard timingtest values of iters = [100,1000,10000], with the larger values being preferred for multithreaded timing tests, in order to assure a decently large slice of CPU time. Similarly, it is recommended to not use the m flag for such tests, unless roundoff error levels on a given compute platform are such that the default exponent at one or more FFT lengths of interest prevents a reasonable sampling of available radix sets at same. If the user lets the program set the exponent and uses one of the aforementioned standard selftest iteration counts, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest result matches the internally stored precomputed one for the given default exponent at the iteration count in question, with eligible radix sets consisting of those for which the roundoff error remains below an acceptable threshold. If the user instead specifies the exponent (only allowed for a singleFFTlength timing test)**************** and/or a nondefault iteration number, the resulting besttiming FFT radix set will only be written to the resulting mlucas.cfg file if the timingtest results match each other? ********* check logic here ******* This is important for tuning code parameters to your particular platform. FOR BEST RESULTS, RUN ANY SELFTESTS UNDER ZERO OR CONSTANTLOAD CONDITIONS s {...} Selftest, user must also supply exponent [via m or f] and/or FFT length to use. s tiny Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s t This will take around 1 minute on a fast CPU.. s small Runs 100iteration selftests on set of 32 Mersenne exponents, ranging from 173431 to 2455003 s s This will take around 10 minutes on a fast CPU.. **** THIS IS THE ONLY SELFTEST ORDINARY USERS ARE RECOMMENDED TO DO: ****** * * * s medium Runs set of 16 Mersenne exponents, ranging from 2614999 to 9530803 * s m This will take around an hour on a fast CPU. * * * **************************************************************************** s large Runs set of 24 Mersenne exponents, ranging from 10151971 to 72123137 s l This will take around an hour on a fast CPU. s huge Runs set of 16 Mersenne exponents, ranging from 76821337 to 282508657 s h This will take a couple of hours on a fast CPU. s all Runs 100iteration selftests of all test Mersenne exponents and all FFT radix sets. s a This will take several hours on a fast CPU. fft[len] {+int} If {+int} is one of the available FFT lengths (in Kilodoubles), runs all all available FFT radices available at that length, unless the radset flag is invoked (see below for details). If fft is invoked without the iters flag, it is assumed the user wishes to do a production run with a nondefault FFT length, In this case the program requires a valid worktodo.inifile entry with exponent not more than 5% larger than the default maximum for that FFT length. If fft is invoked with a usersupplied value of iters but without a usersupplied exponent, the program will do the specified number of iterations using the default selftest Mersenne or Fermat exponent for that FFT length. If fft is invoked with a usersupplied value of iters and either the m or f flag and a usersupplied exponent, the program will do the specified number of iterations of either the LucasLehmer test with starting value 4 (m) or the Pe'pin test with starting value 3 (f) on the userspecified modulus. In either of the latter 2 cases, the program will produce a cfgfile entry based on the timing results, assuming at least one radix set ran the specified #iters to completion without suffering a fatal error of some kind. Use this to find the optimal radix set for a single FFT length on your hardware. NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLEFFT LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE! radset {int} Specific index of a set of complex FFT radices to use, based on the big select table in the function get_fft_radices(). Requires a supported value of fft to also be specified, as well as a value of iters for the timing test. m [{+int}] Performs a LucasLehmer primality test of the Mersenne number M(int) = 2^int  1, where int must be an odd prime. If iters is also invoked, this indicates a timing test. and requires suitable added arguments (fft and, optionally, radset) to be supplied. If the fft option (and optionally radset) is also invoked but iters is not, the program first checks the first line of the worktodo.ini file to see if the assignment specified there is a LucasLehmer test with the same exponent as specified via the m argument. If so, the fft argument is treated as a user override of the default FFT length for the exponent. If radset is also invoked, this is similarly treated as a user specified radix set for the userset FFT length; otherwise the program will use the cfg file to select the radix set to be used for the userforced FFT length. If the worktodo.ini file entry does not match the m value, a set of timing selftests is run on the userspecified Mersenne number using all sets of FFT radices available at the specified FFT length. If the fft option is not invoked, the selftests use all sets of FFT radices available at that exponent's default FFT length. Use this to find the optimal radix set for a single given Mersenne number exponent on your hardware, similarly to the fft option. Performs as many iterations as specified via the iters flag [required]. f {int} Performs a base3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1. If desired this can be invoked together with the fft option. as for the Mersennenumber selftests (see notes about the m flag; note that not all FFT lengths supported for m are available for f). Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations as specified via the iters flag [required]. shift ***SIMD builds only*** Bits by which to circularleftshift the initial seed. This shift count is doubled (modulo the number of bits of the modulus being tested) each iteration. Savefile residues are rightwardshifted by the current shift count before being written to the file; thus savefiles contain the unshifted residue, and separately the current shift count, which the program uses to leftwardshift the savefile residue when the program is restarted from interrupt. The shift count is a 64bit unsigned int (e.g. to accommodate Fermat numbers > F32). prp {int} Instead of running the rigorous primality test defined for the modulus type in question (LucasLehmer test for Mersenne numbers, Pe'pin test for Fermat numbers do a probablyprimality test to the specified integer base b = {int}. For a Mersenne number M(p), starting with initial seed x = b (which must not = 2 or a power of 2), this means do a FermatPRP test, consisting of (p2) iterations of form x = b*x^2 (mod M(p)) plus a final modsquaring x = x^2 (mod M(p)), with M(p) being a probableprime to base b if the result == 1. For a Fermat number F(m), starting with initial seed x = b (which must not = 2 or a power of 2), this means do an EulerPRP test (referred to as a Pe'pin test for these moduli), i.e. do 2^m1 iterations of form x = b*x^2 (mod F(m)), with F(m) being not merely a probable prime but in fact deterministically a prime if the result == 1. The reason we still use the prp flag in the Fermat case is for legacycode compatibility: All prev18 Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the prp flag with a suitable basevalue to override this default choice of base. iters {int} Do {int} selftest iterations of the type determined by the modulusrelated options (s/m = LucasLehmer test iterations with initial seed 4, f = Pe'pintest squarings with initial seed 3. maxalloc {int} Maximumpercentage of available system RAM to use per instance. Must be in [10,90], default = 90. nthread {int} For multithreadenabled builds, run with this many threads. If the user does not specify a thread count, the default is to run singlethreaded with that thread's affinity set to logical core 0. AFFINITY: The code will attempt to set the affinity of the resulting threads 0:n1 to the sameindexed processor cores  whether this means distinct physical cores is entirely up to the CPU vendor  E.g. Intel uses such a numbering scheme but AMD does not. For this reason as of v17 this option is deprecated in favor of the cpu flag, whose usage is detailed below, with the online README page providing guidance for the corenumbering schemes of popular CPU vendors. If n exceeds the available number of logical processor cores (call it #cpu), the program will halt with an error message. For greater control over affinity setting, use the cpu option, which supports two distinct corespecification syntaxes (which may be mixed together), as follows: cpu {lo[:hi[:incr]]} (All args {int} here) Set thread/CPU affinity. NOTE: This flag and nthread are mutually exclusive: If cpu is used, the threadcount is inferred from the numericargumenttriplet which follows. If only the 'lo' argument of the triplet is supplied, this means 'run singlethreaded with affinity to CPU {lo}.' If the increment (third) argument of the triplet is omitted, it is taken as incr = 1. The CPU set encoded by the integertriplet argument to cpu corresponds to the values of the integer loop index i in the Cloop for(i = lo; i <= hi; i += incr), excluding the loopexit value of i. Thus 'cpu 0:3' and 'cpu 0:3:1' are both exactly equivalent to 'nthread 4', whereas 'cpu 0:6:2' and 'cpu 0:7:2' both specify affinity setting to cores 0,2,4,6, assuming said cores exist. Lastly, note that no whitespace is permitted within the colonseparated numeric field. cpu {triplet0[,triplet1,...]} This is simply an extended version of the above affinity setting syntax in which each of the commaseparated 'triplet' subfields is in the above form and, analogously to the onetripletonly version, no whitespace is permitted within the colonandcommaseparated numeric field. Thus 'cpu 0:3,8:11' and 'cpu 0:3:1,8:11:1' both specify an 8threaded run with affinity set to the core quartets 03 and 811, whereas 'cpu 0:3:2,8:11:2' means run 4threaded on cores 0,2,8,10. As described for the nthread option, it is an error for any core index to exceed the available number of logical processor cores. what appears in the selftest log file is 39,003,229 to 142,037,359, in mlucas.cfg fft lengths 2048(K) to 7680(K). Apparently Ernst has adjusted the meaning of m etc. over time to keep up with a moving wavefront, without maintaining sync in the program's help text output. Source code Mlucas.c V20.0 appears consistent with selftest: Code:
class fftlo(K) ffthi(K) plow phigh tiny 8 120 173431 2455003 small 128 1920 2614999 36617407 medium 2048 7680 39003229 142037359 (includes DC and first test wavefronts now) large 8192 61440 152816047 1094833457 (exceeds mersenne.org p < 10^{9} limit) huge 65536 245760 1154422469 4197433843 (up to ~0.98 * 2^{32}) /* Larger require 64bit exponent support */ Last fiddled with by kriesel on 20210813 at 20:07 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
gpuOwLspecific reference material  kriesel  kriesel  30  20210910 16:09 
Mfaktospecific reference material  kriesel  kriesel  5  20200702 01:30 
CUDALucasspecific reference material  kriesel  kriesel  9  20200528 23:32 
Mfaktcspecific reference material  kriesel  kriesel  8  20200417 03:50 
CUDAPm1specific reference material  kriesel  kriesel  12  20190812 15:51 