- Properly catch the exception thrown by getDevices() and if it's a no
devices found error just return an empty vector.
- Replace C macro for getPlatforms() with a proper function
Use a dichotomic algo to discover optimal m_globalWorkSize:
- m_wayWorkSizeAdjust is the direction steps are done (-1 or +1)
- m_stepWorkSizeAdjust is the steps of adjustment (added or substracted
to m_globalWorkSize)
- when a change of direction is needed, step is divided by 2
It used to be 64 (local size) * 4096 (global multiplier). Miners
reported a lot better results with those old defaults and as such we are
bringing them back.
- Now the user can also set the local work size (workgroup size)
- In addition the global work size is specified in the command line only
as a multiplier of the local work size.
- Adding an argument to specify OpenCL global work size.
- Adding an argument to specify milliseconds per global work
size (msPerBatch). If this is 0 then no adjustment of the global work
size happens.
- Giving names to the variables that properly reflect the API
- Making sure that the limitations that are stated in
clEnqueueNDRangeKernel() documentation are adhered to
- Removed the `--force-single-chunk` option
- Always attempt to create a single chunk DAG buffer in the GPU. If that
fails then and only then switch to multiple chunks.
This change is motivated by the fact that many GPUs appear to be able to
actually allocate a lot more than what CL_DEVICE_MAX_MEM_ALLOC_SIZE
returns which proves that the results of querying the CL API on this
basically can't be trusted.
- The script to turn the source into a bytearray header is no longer a
function but is instead the body of a script so that it's callable as an
external cmake command
- Spaces -> Tabs in the touched cmake files
The OpenCL kernel gets parsed and copied into a byte array accessible
by a specific header during the cmake configuration step.
We are now adding a special command "make clbin2h" which would generate
this header byte array on demand
It seems that OpenCL macosx implementation needs a static on the
function implementations if there is no corresponding declaration as can
be seen by this report: https://github.com/ethereum/cpp-ethereum/issues/2172
A new argument is added. --force-single-chunk allows the user to
overwrite auto chunk detection and force DAG uploading in a single
chunk. This should only be used if the user is 100% certain that their
card can actually enqueue a DAG for writting bigger than the
MAX_MEM_ALLOC_SIZE.
OpenCL says this is undefined behaviour so use at your own risk. Still,
some cards seem to be able to upload the DAG in a single chunk even if
OpenCL thinks they can't, thus the decision to add this option.