Task: Input N Output: The smallest prime triplet whose primes are no less than N Efficient algorithm: isPrime() is called once for each prime number after N ---- m1, m2, m3 = 2, 2, 2 cur = n while True: if isPrime(cur): m1,m2,m3 = m2, m3, cur if (m3 - m1 == 6) find the prime triplet, exit cur = cur + 1 ---- The complexity is in the isPrime function. Generic isPrime function like the one below performs a lot of redundant mod operations - a lot of i and i+2 are not prime numbers. This is dominating the run time. So to have good performance, we need to have an isPrime function that performs the least amount of mod operations. Consider complexity, what is the complexity of the generic isPrime function? The prime number theory says that the number of prime numbers up to N is approximately N/log(N). ---- def is_prime(n): """Check if a number is a prime.""" if n <= 1: return False if n <= 3: return True if n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True ---- Idea for improvement: 1) Build a list of primes up to sqrt(N+100000) -- assume that the prime triplet will be no bigger than N + 100000 2) Use a isPrime(x) function that only check whether x is divisible by prime numbers only - by definition, this is the least number of mod operations that any isPrime() function needs to perform. Final algorithm (in assignment3_provided.py): ------------- def isPrime(n): i=0 b = int(math.sqrt(n)) + 1 pLen = len(primes) while i < pLen and (primes[i] < b): if n % primes[i] == 0: return False i = i + 1 return True largestPrimeNeeded = int (math.sqrt(n + 100000)) makePrimeList(largestPrimeNeeded) m1, m2, m3 = 2, 2, 2 cur = n while True: if isPrime(cur): m1,m2,m3 = m2, m3, cur if (m3 - m1 == 6) find the prime triplet, exit cur = cur + 1 ------------- Parallelizing strategy: 1. Find out where the time is spent, both the makePrimeList() and the loop to search for the prime triplet takes a lot of time -- both need to be parallelized with multiprocessing 2. We discussed how to parallelize makePrimeList() in the class 3. Parallelize the search? The sequence algorithm find one new primes number, and check prime triplet parallel algorithm can use multiple workers to find all primes numbers in a range with each worker finding all primes in a range of a fixed size such as 100: worker 0 find primes from in [cur, cur+100), worker 1 find primes in [cur+100, cur+200) and so on. In one round all primes in [cur, cur + nprocs * 100). The main thread then check if prime triplet can be found with this set of primes (sort them if it makes easier). If a prime triplet is not found in a round, cur = cur + nprocs * 100 and repeat. ---- start nprocs workers m1, m2, m3 = 2, 2, 2 cur = n while True: 1. Give each worker process a range of X (X=100 or other value) numbers to find prime numbers among them, nprocs workers will find all primes in [cur, cur + nprocs * X) 2. Get all primes found, sort them. 3. consider all the primes found to see if the smallest prime triplet is found 4. If yes, output and done 5. If not, cur = cur + nprocs * X, back to the while loop. ----