new.txt		15-Feb-2022

                           What's New in XPL0

Since the last release (ver 3.0) the most significant change is adding
support for multiprocessing.

Multiprocessing provides a whole new way to solve programming problems.
It enables a number of processes -- essentially independent programs --
to run simultaneously. It can make existing programs run up to four times
faster by distributing the load onto the four cores available in every
Raspberry Pi beyond the RPi 1.

Five new intrinsics have been added to make multiprocessing easy.

105: process:= Fork(number). This intrinsic starts the specified number
of child processes and returns the currently active process number. The
parent process is number 0, and the child processes are 1, 2, 3 ... up to
number.

After Fork is called, there will be number+1 instances of the program
running simultaneously. The returned process number is used to select
different tasks for each of those simultaneous programs, called
processes.

The first four processes (parent plus three child processes) can be
loaded into four cores, and they actually run simultaneously, thus
providing a four-times speed increase. Additional processes only appear
to run simultaneously because they rapidly alternate using time slicing.

106: Join(process). This intrinsic waits for the specified child process
to finish. Since Fork starts a number of child processes that run
simultaneously, this waits for them all to finish before the parent
process continues. Every call to Fork should have a corresponding call to
Join (otherwise child processes continue to run even when the main parent
process ends and the main program appears to have finished).

107: address:= SharedMem(bytes). This intrinsic allocates memory that can
be shared among all processes. Ordinarily, variables (even global
variables) are completely separate for each process. This provides a
memory area where variables can be shared between processes, thus
enabling processes to communicate with each other.

108: Lock(boolean). This intrinsic is used to prevent processes that
modify variables in shared memory from interfering with each other,
possibly corrupting those variables. This waits until the boolean (in
shared memory) is false (0). It then sets the boolean true (-1), which
forces other processes to wait when they encounter this Lock intrinsic.
(Lock is a binary semaphore or mutex.)

109. Unlock(boolean): This intrinsic releases a Lock, which allows other
processes to gain access to shared variables protected by Lock.

Multiprocessing can be confusing. It might be best understood by an
example showing how these five intrinsics can be used. This program
distributes the task of counting prime numbers between 1 and 10 million
among four cores, thus making it run four times faster.

    func IsPrime(N);        \Return 'true' if N is a prime number
    int  N, I;
    [if N <= 2 then return N = 2;
    if (N&1) = 0 then return false;
    for I:= 3 to sqrt(N) do
        [if rem(N/I) = 0 then return false;
        I:= I+1;
        ];
    return true;
    ];

    int Start(4), Stop(4), Counter, Key, Process, N;
    [\Divide 10 million into ranges that run in roughly equal time
    Start(0):= 1;           Stop(0):= 4_000_000;
    Start(1):= 4_000_001;   Stop(1):= 6_500_000;
    Start(2):= 6_500_001;   Stop(2):= 8_500_000;
    Start(3):= 8_500_001;   Stop(3):=10_000_000;

    Counter:= SharedMem(4); \Counter must be in shared memory (takes 4 bytes)
    Counter(0):= 0;         \reset counter to zero
    Key:= SharedMem(4);     \required to prevent Counter from being corrupted
    Key(0):= false;         \initialize it unlocked
    
    Process:= Fork(3);      \start 3 additional (child) processes: 1, 2, 3
    for N:= Start(Process) to Stop(Process) do \divide ranges among processes
        if IsPrime(N) then  \increment Counter
            [Lock(Key);     \must only allow one process at a time
            Counter(0):= Counter(0)+1;
            Unlock(Key);
            ];
    Join(Process);          \wait for child processes to finish and end them
    IntOut(0, Counter(0));  \should show 664579
    CrLf(0);
    ]


MINOR CHANGES

A newline (CR+LF) is no longer automatically added at the end of a
program. The Linux command-line prompt can be set to start at the
beginning of a new line by adding the following line to ~/.bash_profile:
PS1='\n'$PS1

The optimizing compiler (xx) now generates code that does integer
division many times faster than before. It detects if it's running on an
RPi that has a divide instruction (by running xplpimodel), and if so it
uses it. A divide-by-zero error (if it's untrapped) now returns 0 instead
of 2147483647.

Some warning messages provided by the -w switch in the optimizing
compiler have been eliminated. Automatically included 'code' declarations
are no longer shown as not being used. Also, inconsistent numbers of
subscripts on arrays are no longer shown. These warnings were very rarely
useful. Variables used in inline assembly code ('asm') are no longer
shown as not being used.

$8_0000_0000 is now detected as a number out of range.

Lowercase sqrt(-10) now gives compile error 74: math error in constant
expression.

Inline assembly code now handles variables with offsets up to 16 MB.
Previously, the unoptimized compiler (x) was limited to offsets of 64K.
The optimizing compiler (xx) did not have this restriction. Blank lines
in inline assembly code ('asm') no longer cause errors.

A couple bugs in the optimizing compiler (xx) that were unlikely to be
encountered have been fixed. A constant array assigned to a subscripted
array variable was not always compiled correctly. For instance, A(0):=
[100.0]; Also, a register array variable used as the control variable in
a 'for' loop was not always compiled correctly.