ThreadScope Tour/Consolidate

From HaskellWiki
Jump to navigation Jump to search

Objectives

Gain a clearer understanding of performance behaviour by separating sequential and parallel parts of the computation

Steps

  1. Build sudoku3 and examine the ThreadScope profile.

    rm sudoku3
    ghc -O2 sudoku3.hs -threaded -rtsopts -eventlog
    ./sudoku3 sudoku17.1000.txt +RTS -N2 -l
    threadscope sudoku3.eventlog
  2. Zoom to the beginning of the program runtime where there seems to be very little activity.

    ThreadScope-sudoku3-io.png

  3. Notice the activity spikes. We alternate between using both cores and waiting for garbage collection. Parallel and Concurrent Programming in Haskell suggests that "In fact, what we are seeing here is the program reading the input file (lazily) and dividing it into lines, driven by the demands of parMapwhich traverses the whole list of lines"

  4. See what happens when we add an evaluate (length grids) to force the file reading to happen in one go first. Build sudoku4, run it, and examine the ThreadScope profile.

    rm sudoku4
    ghc -O2 sudoku4.hs -threaded -rtsopts -eventlog
    ./sudoku4 sudoku17.1000.txt +RTS -N2 -l
    threadscope sudoku4.eventlog
  5. Zoom to the beginning of the program runtime as before. It may help to keep both sudoku3 and sudoku4 eventlogs on the same screen

    ThreadScope-sudoku3-io.png

    ThreadScope-sudoku4-io.png
  6. Notice that while the sudoku4 profile looks similar to the sudoku3, it differs in one crucial way: no parallelism in the beginning during file reading.

  7. Now that we have clean separation between sequential and parallel parts, zoom back out and determine what portion of the runtime is sequential.

    ThreadScope-sudoku4-io-out.png

    Here we can say that the first 15ms of the program is sequential. This part of the program cannot be made any faster by use of parallelism.

  8. Read Parallel and Concurrent Programming in Haskell to learn how to apply Amdahl's law for a performance topline.