Thursday, March 14, 2013

Nexus 4 is running with OpenCL programs using aparapi

After the initial OpenCL programs
  clinfo
  helloworld
  helloworld2

on my Nexus 4 phone, I am trying to use some existing Java-OpenCL frameworks (like JOCL, JavCL, OpenCL4Java  and aparapi) for the Android based OpenCL developement.

So aftersome initial studies, fixed on AMD's aparapi and started porting the X86/X86_64 OpenCL-Java framework to any Android-based phone with OpenCL implementation (on GPU).

As part of helloworld2, I have mapped all OpenCL API methods using generic way irrespective of the underlying OpenCL SDK vendor.

Now I ventured to port aparapi from X86/X86_64 to ARM (on android ). To simplify the porting exercise, I took svn release version 68 as starting point for aparapi project.

Need to hack following of the code to accommodate Qualcomm OpenCL SDK on the Android on ARM:
  • Need to use reflection to get sun.misc.Unsafe.
    • Not all Unsafe methods are available on Android version
      • So no float, double, boolean and their corresponding array versions
    • Apache Unsafe internal field names are different than Sun's Unsafe
  • Removed some hard codings on JNI code, so that it works for QUALCOMM SDK also
  • QualComm SDK crashes the Dalvik-VM, if the OpenCL-Kernel is in inner class (and field names contains '$')
  • Current AMD OpenCL code generation depends on class-files availability on classpath. In Android it is dex-file, so need to package the original classpath also into apk file and access it via raw resource

So after all these major changes, the OpenCL program totally written in java worked on my Nexus 4. Once I get time, I will cleanup code and submit my code to my site

Here is the log from my Nexus 4 for this whole exercise (with generated OpenCL in logs):

I/System.out( 2379): getClassBytes Needed for :org/aopencl/aparapi/MyKernel.class , res-id:7f040000
I/System.out( 2379): getClassBytes:java.util.zip.ZipEntry, getting bytes for:org/aopencl/aparapi/MyKernel.class
D/dalvikvm( 2379): GC_CONCURRENT freed 394K, 5% free 9008K/9444K, paused 2ms+3ms, total 16ms
I/System.out( 2379): Got :org/aopencl/aparapi/MyKernel.class, size:-1
I/System.out( 2379): extractBytes:710
D/aopencl ( 2379): Trying shared libraries at following locations:
D/aopencl ( 2379): /system/lib/libOpenCL.so
D/aopencl ( 2379): /system/vendor/lib/egl/libGLES_mali.so
D/aopencl ( 2379): /system/lib/libllvm-a3xx.so
D/aopencl ( 2379): Using the Shared library:/system/lib/libOpenCL.so
E/aparapi ( 2379): platform 0 QUALCOMM
E/aparapi ( 2379): CL_DEVICE_MAX_COMPUTE_UNITS 4
E/aparapi ( 2379): CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 3
E/aparapi ( 2379): CL_DEVICE_MAX_WORK_GROUP_SIZE 256
E/aparapi ( 2379): CL_DEVICE_GLOBAL_MEM_SIZE 1910623136
E/aparapi ( 2379): CL_DEVICE_LOCAL_MEM_SIZE 1910623168
E/aparapi ( 2379): device[0x722d6d50]: Type:
E/aparapi ( 2379): GPU
E/aparapi ( 2379):
E/aparapi ( 2379):
E/aparapi ( 2379): JNIContext 7289b008
E/aparapi ( 2379): JNIContext valid:1
W/aparapi ( 2379): getExtensions JNIContext :7289b008
W/aparapi ( 2379): getExtensions2 JNIContext :cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_qcom_default_relaxed_math cl_qcom_perf_monitor
D/KernelRunner( 2379): Capabilities initialized to :[cl_qcom_perf_monitor, cl_khr_byte_addressable_store, cl_qcom_default_relaxed_math, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_khr_fp16]
D/KernelRunner( 2379): passed requiresDoublePragma
D/KernelRunner( 2379): passed requiresByteAddressableStorePragma
D/KernelRunner( 2379): passed requiresAtomic32Pragma
D/KernelRunner( 2379): Gen-OpenCL:typedef struct This_s{
D/KernelRunner( 2379): __global int *squares;
D/KernelRunner( 2379): __global int *values;
D/KernelRunner( 2379): int passid;
D/KernelRunner( 2379): }This;
D/KernelRunner( 2379): int get_pass_id(This *this){
D/KernelRunner( 2379): return this->passid;
D/KernelRunner( 2379): }
D/KernelRunner( 2379): __kernel void run(
D/KernelRunner( 2379): __global int *squares,
D/KernelRunner( 2379): __global int *values,
D/KernelRunner( 2379): int passid
D/KernelRunner( 2379): ){
D/KernelRunner( 2379): This thisStruct;
D/KernelRunner( 2379): This* this=&thisStruct;
D/KernelRunner( 2379): this->squares = squares;
D/KernelRunner( 2379): this->values = values;
D/KernelRunner( 2379): this->passid = passid;
D/KernelRunner( 2379): {
D/KernelRunner( 2379): int gid = get_global_id(0);
D/KernelRunner( 2379): this->squares[gid] = this->values[gid] + (this->values[gid] / 2);
D/KernelRunner( 2379): return;
D/KernelRunner( 2379): }
D/KernelRunner( 2379): }
E/aparapi ( 2379): clBuildProgram start clBuildProgram-ptr:722c533d, cl-prgm-ptr:735e4b38, dev-idc:1, devs:735d6ac8
E/aparapi ( 2379): clBuildProgram end:0
D/KernelRunner( 2379): passed buildProgramJNI
D/KernelRunner( 2379): arg 0, squares, type=1688, primitiveSize=4
D/KernelRunner( 2379): arg 1, values, type=1288, primitiveSize=4
E/aparapi ( 2379): in setArgs arg 0 squares type 00001688
E/aparapi ( 2379): in setArgs arg 1 values type 00001288
D/KernelRunner( 2379): saw newArrayRef for squares = [I@4248aa40, newArrayLen = 32
D/KernelRunner( 2379): saw newArrayRef for values = [I@4248a9a8, newArrayLen = 32
E/aparapi ( 2379): for globalSize=32, stepping localSize from 32, returning localSize=32
D/KernelRunner( 2379): Need to resync arrays on org.aopencl.aparapi.MyKernel
E/aparapi ( 2379): got type for squares: 00001688
E/aparapi ( 2379): testing for Resync javaArray squares: old=0x0, new=0x2e900005
E/aparapi ( 2379): Resync javaArray for squares: 0x2e900005 0x0
E/aparapi ( 2379): NewWeakGlobalRef for squares, set to 0x1d600003
E/aparapi ( 2379): updateKernel, args[0].sizeInBytes=128
E/aparapi ( 2379): got type for values: 00001288
E/aparapi ( 2379): testing for Resync javaArray values: old=0x0, new=0x1e300009
E/aparapi ( 2379): Resync javaArray for values: 0x1e300009 0x0
E/aparapi ( 2379): NewWeakGlobalRef for values, set to 0x1d600007
E/aparapi ( 2379): updateKernel, args[1].sizeInBytes=128
E/aparapi ( 2379): back from updateKernel
E/aparapi ( 2379): got type for arg 0, squares, type=00001688
E/aparapi ( 2379): runKernel: arrayOrBuf ref 0x1d600003, oldAddr=0x0, newAddr=0x4248aa50, ref.mem=0x0, isArray=1
E/aparapi ( 2379): at memory addr 0x4248aa50, contents:
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379):
D/aparapi ( 2379): squares 0 clCreateBuffer(context, CL_MEM_USE_HOST_PTR|CL_MEM_READ_WRITE, size=00000080 bytes, address=4248aa50, &status)
E/aparapi ( 2379): writing buffer 0 squares
E/aparapi ( 2379): got type for arg 1, values, type=00001288
E/aparapi ( 2379): runKernel: arrayOrBuf ref 0x1d600007, oldAddr=0x0, newAddr=0x4248a9b8, ref.mem=0x0, isArray=1
E/aparapi ( 2379): at memory addr 0x4248a9b8, contents:
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 01
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379): 00
E/aparapi ( 2379):
D/aparapi ( 2379): values 1 clCreateBuffer(context, CL_MEM_USE_HOST_PTR|CL_MEM_READ_ONLY, size=00000080 bytes, address=4248a9b8, &status)
E/aparapi ( 2379): writing buffer 1 values
E/aparapi ( 2379): reading buffer 0 squares
D/KernelRunner( 2379): executeOpenCL completed. _globalSize=32
I/System.out( 2379): Execution mode=GPU

5 comments:

  1. awesome work! I have been working just using JNI and this is a major pain!

    ReplyDelete
  2. Nice work. I added a link from the Aparapi home page.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Can someone confirm that they can run this program multiple times on a Nexus4 (please report the ROM version) without a reboot of the phone / errors in logcat. It seems that the OpenCL program is executed correctly, but only the context cleanup is a problem. If I put a loop in the main of 2000, I get consistent errors on a Nexus4. The same program runs perfect on a Nexus10 (linked to a different lib of course)

    I get error messages in my log file like : "(W/Adreno200-GSL( 160): : ioctl code 0xc0140910 (IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS) failed: errno 35)." just after cleanup of the context.

    ReplyDelete
  5. Edwin,
    Thanks for trying this program.

    I am sorry , I could not see your comment early -- I am busy in my official work. I will look into this loop testing on my Nexus 4 and let you know once I am done.

    ReplyDelete