OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

sgemm_gpu_kernel_performance

active ARFF CC BY 4.0 Visibility: public Uploaded 22-12-2022 by Shirley
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Data Description This data set measures the running time of a matrix-matrix product A\*B = C, where all matrices have size 2048 x 2048, using a parameterizable SGEMM GPU kernel with 241600 possible parameter combinations. For each tested combination, 4 runs were performed and their results are reported as the 4 last columns. All times are measured in milliseconds. There are 14 parameter, the first 10 are ordinal and can only take up to 4 different powers of two values, and the 4 last variables are binary. Out of 1327104 total parameter combinations, only 241600 are feasible (due to various kernel constraints). This data set contains the results for all these feasible combinations. The experiment was run on a desktop workstation running Ubuntu 16.04 Linux with an Intel Core i5 (3.5GHz), 16GB RAM, and a NVidia Geforce GTX 680 4GB GF580 GTX-1.5GB GPU. We use the 'gemm_fast' kernel from the automatic OpenCL kernel tuning library 'CLTune'. Attribute Description 1-2. *MWG*, *NWG* - per-matrix 2D tiling at workgroup level: {16, 32, 64, 128} (integer) 3. *KWG* - inner dimension of 2D tiling at workgroup level: {16, 32} (integer) 4-5. *MDIMC*, *NDIMC* - local workgroup size: {8, 16, 32} (integer) 6-7. *MDIMA*, *NDIMB* - local memory shape: {8, 16, 32} (integer) 8. *KWI* - kernel loop unrolling factor: {2, 8} (integer) 9-10. *VWM*, *VWN* - per-matrix vector widths for loading and storing: {1, 2, 4, 8} (integer) 11-12. *STRM*, *STRN* - enable stride for accessing off-chip memory within a single thread: {0, 1} (categorical) 13-14. *SA*, *SB* - per-matrix manual caching of the 2D workgroup tile: {0, 1} (categorical) 15-18. *Run1*, *Run2*, *Run3*, *Run4* - performance times in milliseconds for 4 independent runs using the same parameters, ranging between 13.25 and 3397.08.

15 features

Run1 (target)	numeric	58161 unique values 0 missing
VWN	numeric	4 unique values 0 missing
Run4 (ignore)	numeric	58154 unique values 0 missing
Run3 (ignore)	numeric	58264 unique values 0 missing
Run2 (ignore)	numeric	58269 unique values 0 missing
SB	numeric	2 unique values 0 missing
SA	numeric	2 unique values 0 missing
STRN	numeric	2 unique values 0 missing
STRM	numeric	2 unique values 0 missing
MWG	numeric	4 unique values 0 missing
VWM	numeric	4 unique values 0 missing
KWI	numeric	2 unique values 0 missing
NDIMB	numeric	3 unique values 0 missing
MDIMA	numeric	3 unique values 0 missing
NDIMC	numeric	3 unique values 0 missing
MDIMC	numeric	3 unique values 0 missing
KWG	numeric	2 unique values 0 missing
NWG	numeric	4 unique values 0 missing