Use a dichotomic algo to discover optimal m_globalWorkSize:
- m_wayWorkSizeAdjust is the direction steps are done (-1 or +1)
- m_stepWorkSizeAdjust is the steps of adjustment (added or substracted
to m_globalWorkSize)
- when a change of direction is needed, step is divided by 2
- Now the user can also set the local work size (workgroup size)
- In addition the global work size is specified in the command line only
as a multiplier of the local work size.
- Adding an argument to specify OpenCL global work size.
- Adding an argument to specify milliseconds per global work
size (msPerBatch). If this is 0 then no adjustment of the global work
size happens.
- Giving names to the variables that properly reflect the API
- Making sure that the limitations that are stated in
clEnqueueNDRangeKernel() documentation are adhered to
- Removed the `--force-single-chunk` option
- Always attempt to create a single chunk DAG buffer in the GPU. If that
fails then and only then switch to multiple chunks.
This change is motivated by the fact that many GPUs appear to be able to
actually allocate a lot more than what CL_DEVICE_MAX_MEM_ALLOC_SIZE
returns which proves that the results of querying the CL API on this
basically can't be trusted.
A new argument is added. --force-single-chunk allows the user to
overwrite auto chunk detection and force DAG uploading in a single
chunk. This should only be used if the user is 100% certain that their
card can actually enqueue a DAG for writting bigger than the
MAX_MEM_ALLOC_SIZE.
OpenCL says this is undefined behaviour so use at your own risk. Still,
some cards seem to be able to upload the DAG in a single chunk even if
OpenCL thinks they can't, thus the decision to add this option.
- Added new option --cl-extragpumem with which you can let the OpenCL
miner know how much GPU memory you believe your system would need for
miscellaneous stuff like Windowing system rendering e.t.c. The default
is 350 MB.
- Added new option --curent-block with which you can let the miner know
the current block during the configuration phase and as such help him
provide a much more accurate calculation of how much GPU memory is
required.
- Added help(documentation) for some arguments that did not have one
- No need for many different functions to set each single option for the
miner. First we set all options and then we execute them. This way
--list-devices will give different results with --alow-opencl-cpu and
without it as it should.
- If the user has no GPU with sufficient memory we no longer default to
CPU. It's better to throw an error and let the user remove the
argument. It's easy to miss the defaulting to CPU log message.
By default now, CPU is not considered an OpenCL device.
Also added a new argument --allow-opencl-cpu, that would allow OpenCL to
include CPU devices if the user's openCL implementation caters for it.
- Removing --use-chunks argument. The decision to use chunks or not is
now made by the implementation, depending on
CL_DEVICE_MAX_MEM_ALLOC_SIZE
- Refactored the code a bit in ethash_cl_miner, abstracting out some of
the device iteration into its own functions.
By providing the --use-chunks argument dagChunks is set to 4. Default is
1 big chunk. Future improvement could be to provide arbitrary number of chunks.