From 8994af1364a4fff4427f3f217b87ae6c79a80f44 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Hugo=20M=C3=A5rdbrink?= Date: Sat, 30 Mar 2024 19:39:54 +0100 Subject: [PATCH] Add first iteration of main process documentation --- Design_and_analysis.md | 129 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 129 insertions(+) create mode 100644 Design_and_analysis.md diff --git a/Design_and_analysis.md b/Design_and_analysis.md new file mode 100644 index 0000000..c1c5637 --- /dev/null +++ b/Design_and_analysis.md @@ -0,0 +1,129 @@ +# 2D DCT-II algorithm hardware-software codesign for image compression in RISC-V satellite processor + +Hugo MÄrdbrink, 2024 + +## Introduction + +This project aims to design and implement a 2D DCT-II algorithm in hardware and software for image compression in a RISC-V satellite processor. +There is currently a rise of RISC-V processors, [notably in the space industry](https://gaisler.com/index.php/products/processors/noel-v#DOC). +The 2D DCT-II algorithm transforms blocks of pixels into blocks of frequency coefficients. This compresses the image by removing spatial redundancy. +Thus, the algorithm is used in spacecraft to decrease the amount of image data that needs to be transmitted back to Earth. + +## Goals + +Since the environment in space is limited, the design needs to focus on an energy efficient design using a small hardware area. +This alters the focus of the codesign to prefer energy efficiency over throughput or execution time. +However, the aspect of fast execution times is still highly relevant and a good balance between the two needs to be explored. + +## Method + +### Development and evaluation + +The software will be compiled and built in C using the GCC RISC-V compiler. For vectorisation, vector intrinsics for RVV will be used from a C header file. +For parallelisation, the (OpenMP library)[https://www.openmp.org/] will be used. +To test and evaluate the software implementation, it will run in the gem5 simulator. The hardware configuration is also done in configuration files for gem5. +The mock data for the images will be generated in C with nonsensical values. This does not matter since different values will not affect the run time. +When measuring the performance the sequential time of generating mock data and freeing the memory will be deducted for a true performance reflection. + +### Building + +To run the build command for the software, the following base command is used: + +```bash +riscv64-unknown-elf-gcc -march=rv64imafcv -mabi=lp64d main.c -o dct2d_riscv.out +``` + +The following flags will be used based on what functionality is needed: + +- `-lm` for math library +- `-libomp` for OpenMP library +- `-O[level]` for different optimisation levels +- `-march=rv64imafcv` for the RISC-V ISA +- `-mabi=lp64d` for the RISC-V ABI + +### Simulating +To simulate the software on different hardware configurations gem5 is used. +Gem5 allows for different hardware configurations to be tested using a python script. +The python script for this project is tailored for this project specifically, thus, 5 parameters are custom to this project for ease of use: + +- `--l1i` for the L1 instruction cache size +- `--l1d` for the L1 data cache size +- `--l2` for the L2 cache size +- `--vlen` for the vector length +- `--elen` for the element length + +To run the simulation and output the result, the following command is used: + +```bash +../gem5/build/RISCV/gem5.opt -d stats/ ./riscv_hw.py --l1i 16kB --l1d 64kB --l2 256kB --vlen 256 --elen 64 +``` + +## Implementation + +### Constants and definitions +Throughout the code, several constants and definitions are defined for ease to try different configurations. These are defined in the following way: +- `DCT_SIZE` is the size of the DCT block +- `TOTAL_DCT_BLOCKS` is the total amount of DCT blocks and, thus, the problem size. +- `NUM_THREADS` is the amount of threads to use for parallelisation. +- `element_t` is the data type of the elements in the DCT block. +- `real_t` is the data type of the real variables of the algorithm. + +### Mock data generation + +To start testing our algorithm we need a way to generate data for reliable performance results. +This will be done by allocating DCT-blocks heap memory and filling them with data. +It's important to actually generate all the data and not reuse the same matrices to get realistic cache hits and misses. +The memory allocation is done in the following way: + +### Initial hardware configuration + + +```c +element_t ***mock_matrices = (element_t ***) malloc(TOTAL_DCT_BLOCKS * sizeof(element_t**)); +for (int i = 0; i < TOTAL_DCT_BLOCKS; i++) { + mock_matrices[i] = (element_t **) malloc(DCT_SIZE * sizeof(element_t*)); + for (int j = 0; j < DCT_SIZE; j++) { + mock_matrices[i][j] = (element_t *) malloc(DCT_SIZE * sizeof(element_t)); + } +} +``` + +And the data is generated using the following code: + +```c +for (long i = 0; i < TOTAL_DCT_BLOCKS; i++) { + for (int j = 0; j < DCT_SIZE; j++) { + for (int k = 0; k < DCT_SIZE; k++) { + mock_matrices[i][j][k] = j + k; + } + } + } +``` + +### Naive +The first iteration of this code (see below) will use a naive implementation of DCT-II, along with no optimisations. + +```c +void dct_2d(element_t** matrix_in, element_t** matrix_out) { + real_t cu, cv, sum; + int u, v, i, j; + + for (u = 0; u < DCT_SIZE; u++) { + for (v = 0; v < DCT_SIZE; v++) { + cu = u == 0 ? 1 / sqrt(DCT_SIZE) : sqrt(2) / sqrt(DCT_SIZE); + cv = v == 0 ? 1 / sqrt(DCT_SIZE) : sqrt(2) / sqrt(DCT_SIZE); + + sum = 0; + for (i = 0; i < DCT_SIZE; i++) { + for (j = 0; j < DCT_SIZE; j++) { + sum += matrix_in[i][j] * cos((2 * i + 1) * u * PI / (2 * DCT_SIZE)) * cos((2 * j + 1) * v * PI / (2 * DCT_SIZE)); + } + } + matrix_out[u][v] = cu * cv * sum; + } + } +} +``` + + +