- Properly catch the exception thrown by getDevices() and if it's a no
devices found error just return an empty vector.
- Replace C macro for getPlatforms() with a proper function
Use a dichotomic algo to discover optimal m_globalWorkSize:
- m_wayWorkSizeAdjust is the direction steps are done (-1 or +1)
- m_stepWorkSizeAdjust is the steps of adjustment (added or substracted
to m_globalWorkSize)
- when a change of direction is needed, step is divided by 2
It used to be 64 (local size) * 4096 (global multiplier). Miners
reported a lot better results with those old defaults and as such we are
bringing them back.
- Now the user can also set the local work size (workgroup size)
- In addition the global work size is specified in the command line only
as a multiplier of the local work size.
- Adding an argument to specify OpenCL global work size.
- Adding an argument to specify milliseconds per global work
size (msPerBatch). If this is 0 then no adjustment of the global work
size happens.
- Giving names to the variables that properly reflect the API
- Making sure that the limitations that are stated in
clEnqueueNDRangeKernel() documentation are adhered to
- Removed the `--force-single-chunk` option
- Always attempt to create a single chunk DAG buffer in the GPU. If that
fails then and only then switch to multiple chunks.
This change is motivated by the fact that many GPUs appear to be able to
actually allocate a lot more than what CL_DEVICE_MAX_MEM_ALLOC_SIZE
returns which proves that the results of querying the CL API on this
basically can't be trusted.
- The script to turn the source into a bytearray header is no longer a
function but is instead the body of a script so that it's callable as an
external cmake command
- Spaces -> Tabs in the touched cmake files
The OpenCL kernel gets parsed and copied into a byte array accessible
by a specific header during the cmake configuration step.
We are now adding a special command "make clbin2h" which would generate
this header byte array on demand