TensorRT及CUDA自学笔记004 内核函数以及两个简单的内核函数demo
内核函数是能够在GPU上被线程并发执行的函数
CUDA程序中的函数修饰符
修饰符 | 执行位置 | 调用 | 注意 |
---|---|---|---|
global | 在设备(device)上执行 | 主机和compute capabilitiy 3(计算能力三级)的设备可以调用 | 必须有一个void type的返回值 |
device | 在设备(device)上执行 | 只有设备可以调用,只能在设备上执行 | |
host | 在主机(host)上执行 | 只有主机能调用,只能在主机上执行 | __host__可以省略,也就是说__host__是默认的修饰符 |
内核函数的特性
- 只能访问GPU memory
- 必须返回void type
- 不能用变长参数,不能使用静态变量,不能使用函数指针
- 有异步性,当内核函数执行时,CPU上的程序可以和内核函数并行执行
demo1 能在GPU上运行并打印信息的内核函数
代码
#include<stdio.h>__global__ void HellofromGPU(){printf("Hello from GPU!\n");
}int main(){printf("Hello from CPU!\n");HellofromGPU<<<1,6>>>();//<<<grid,block>>>cudaDeviceReset();//释放GPU资源return 0;
}
注意:文件后缀名应为.cu
nvcc ./main.cu -o main.exe
然后运行使用nvcc编译的可执行文件
运行结果
luke@ubuntu:~/workspace/TensorRT_course/02_kenel_demo$ ./main.exe
Hello from CPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
Hello from GPU!
demo2 能在GPU上运行并打印thread 标识和thread ID的内核函数
代码
#include<stdio.h>
#include<iostream>__global__ void hellowfromGPU(){printf("Hello from block(%d,%d,%d) thread(%d,%d,%d)\t thread ID is %d \n",blockIdx.x,blockIdx.y,blockIdx.z,threadIdx.x,threadIdx.y,threadIdx.z,threadIdx.x + threadIdx.y * blockDim.x + threadIdx.z * blockDim.x * blockDim.y);
}int main(){std::cout <<"Hello from CPU!"<<std::endl;//block被grid包含dim3 grid1(2,2,1);//指定name为grid1的grid内部含有2x2x1=4个block块dim3 block1(2,2,2);//指定name为block1的block内部含有2x2x2=8个thread// printf("Launching kernel width gridDim:%d %d %d blockDim:%d %d %d ",gridDim.x,gridDim.y,gridDim.z,blockDim.x,blockDim.y,blockDim.z);// 实际测试发现不能在CPU端直接访问gridDim和blockDimstd::cout <<"Launching kernel..."<<std::endl;hellowfromGPU<<<grid1,block1>>>();//会调用4x8=32个thread并行运行//std::cout <<"Work done!"<<std::endl;cudaDeviceReset();std::cout <<"Work done!"<<std::endl;return 0;
}
注意:文件后缀名应为.cu
nvcc ./main.cu -o main.exe
然后运行使用nvcc编译的可执行文件
运行结果
luke@ubuntu:~/workspace/TensorRT_course/03_cuda_idx$ ./main.exe
Hello from CPU!
Launching kernel...
Hello from block(1,1,0) thread(0,0,0) thread ID is 0
Hello from block(1,1,0) thread(1,0,0) thread ID is 1
Hello from block(1,1,0) thread(0,1,0) thread ID is 2
Hello from block(1,1,0) thread(1,1,0) thread ID is 3
Hello from block(1,1,0) thread(0,0,1) thread ID is 4
Hello from block(1,1,0) thread(1,0,1) thread ID is 5
Hello from block(1,1,0) thread(0,1,1) thread ID is 6
Hello from block(1,1,0) thread(1,1,1) thread ID is 7
Hello from block(1,0,0) thread(0,0,0) thread ID is 0
Hello from block(1,0,0) thread(1,0,0) thread ID is 1
Hello from block(1,0,0) thread(0,1,0) thread ID is 2
Hello from block(1,0,0) thread(1,1,0) thread ID is 3
Hello from block(1,0,0) thread(0,0,1) thread ID is 4
Hello from block(1,0,0) thread(1,0,1) thread ID is 5
Hello from block(1,0,0) thread(0,1,1) thread ID is 6
Hello from block(1,0,0) thread(1,1,1) thread ID is 7
Hello from block(0,1,0) thread(0,0,0) thread ID is 0
Hello from block(0,1,0) thread(1,0,0) thread ID is 1
Hello from block(0,1,0) thread(0,1,0) thread ID is 2
Hello from block(0,1,0) thread(1,1,0) thread ID is 3
Hello from block(0,1,0) thread(0,0,1) thread ID is 4
Hello from block(0,1,0) thread(1,0,1) thread ID is 5
Hello from block(0,1,0) thread(0,1,1) thread ID is 6
Hello from block(0,1,0) thread(1,1,1) thread ID is 7
Hello from block(0,0,0) thread(0,0,0) thread ID is 0
Hello from block(0,0,0) thread(1,0,0) thread ID is 1
Hello from block(0,0,0) thread(0,1,0) thread ID is 2
Hello from block(0,0,0) thread(1,1,0) thread ID is 3
Hello from block(0,0,0) thread(0,0,1) thread ID is 4
Hello from block(0,0,0) thread(1,0,1) thread ID is 5
Hello from block(0,0,0) thread(0,1,1) thread ID is 6
Hello from block(0,0,0) thread(1,1,1) thread ID is 7
Work done!