Reading (and enjoying) Primož Gabrijelčič's book Delphi High Performance, I've come across this paragraph in Chapter 7: Exploring Parallel Practices:
There's not much to say about Join, except that in current Delphi it doesn't work correctly. A bug in the 10.1 Berlin and 10.2 Tokyo implementations causes Join to not start enough threads. For example, if you pass in two tasks, it will only create one thread and execute tasks one after another. If you pass in three tasks, it will create two threads and execute two tasks in one and one in another.The code accompanying the book is available on Github: PacktPublishing/Delphi-High-Performance
TParallel,Join does not create enough threads (Embarcadero's Quality Portal RSP-19557)
The author offers a simple workaround by starting a dummy task (which does nothing) first; you can see an example here.
Some time ago (using Delphi XE7), I wrote a library with my own
TThreadPoolclass using and encapsulating the Windows IOCP. The result turned out well, it was rock-solid. Reading about bugs like this in the latest-and-greatest Delphi version makes me glad I chose to write my own implementation from scratch and saved a lot of time hunting for bugs like this in Delphi's runtime library (in addition to my own).