1. Embedded Programs
(1) What “Embedded Programs” Mean
Embedded programs are “bare metal” programs, i.e. without an underlying operating system (OS). You have full control over the hardware of embedded systems. You also have full responsibility, no convenient functions are available.
(2) Retrace Build Steps
- Cross Compiler
arm-non-eabi-gcc
compiles header files, libraries and main.c and generate *.o
files. -I
options are for specifying directories to search for header files to be included. Here the device header XMC4500.h are the header files for the GPIO driver.
*.o
files are compiled but unlinked versions of source files, not human-readable.
- Cross Linker
arm-non-eabi-gcc
links all *.o
files into main.elf
file. -T
option gives the linker description file *.ld
.
*.elf
files (executable and linkable format) are compiled and linked programs, ready to execute on the architecture they are built for.
That means, ELF
format is used in two ways: The linker reads it as an input that can be linked with other objects. The loader interprets it as an executable program.
Important ELF sections
- objcopy
arm-non-eabi-objcopy
creates *.hex
files, which is pure machine code together with information about instruction addresses, technically human-readable.
- objdump
arm-non-eabi-objdump
creates *.lst
files, which is a human-readable copy of parts of the *.elf
file. What to put here is assigned by options of objdump
, but usually it is:
– Section headers (where .data, .bss, etc. are located and how large they are)
– Disassembly of the .text section interleaved with the C instructions it was
compiled from. (.text部分的反汇编以及对应的的C指令)
(3) Difference from Computer Programs
- Cross-compiler instead of compiler
- Device header and device linker file needed
- Often additional libraries and drivers necessary
- Programming onto uC as another final step
2. XMC4500 Board
(1) Functional Blocks
CPU, memory, clock and reset, timers/counters, communications, analogs, GPIOs
(2) Peripherals
- CCU4
provides several counters for PWM generation, counting external events
- ADC
Measures analog signals
(3) Accessing Peripherals
Memory-mapped: I/O devices are accessed through memory addresses, just like normal memory, the processor can access them by reading or writing to those memory addresses. It is generally more flexible and easier to use, as it allows the processor to access I/O devices using the same instructions and addressing modes as it uses for normal memory. BUT the memory bus has to connect to each and every peripheral, whereas a longer bus reduces the maximal clock frequency, especially when going off-chip.
Port-mapped: I/O devices are accessed through dedicated I/O ports, each I/O device is assigned a unique port address, and the processor can access the device by reading or writing to that port address via special instructions IN
, OUT
instead of LD
, ST
. It is generally faster and more efficient than memory mapped I/O, as it requires fewer bus transactions and can be implemented using a simpler addressing scheme. BUT specialized instructions and addressing modes are needed for different devices, which leads to additional complexity.
3. Cortex M4
(1) Functional Blocks
The Cortex-M4 has a three-stage pipeline (Fetch, Decode, Execute) with the following functional blocks.
(2) Registers
1. Registers for Function Arguments
2. Caller/Callee-saved Registers
Registers are special memory locations in the processor that are used to store data temporarily while a program is running.
Caller-saved registers (R0 – R3, R12) are registers that are expected to be preserved by a called function. This means that the calling function (the “caller”) is responsible for saving the values of these registers before making the function call, and restoring them after the function returns.
Callee-saved registers (R4 – R11), on the other hand, are registers that are expected to be preserved by the called function, and are guaranteed to retain their values after the function returns. This means that the called function (the “callee”) is responsible for saving the values of these registers before modifying them, and restoring them before returning control to the calling function. To do this, the callee must either leave them unchanged or push them on the stack in the beginning and pop them back before return.
3. Special Registers
Register R13 – R15 are not classified as caller- or callee-saved.
R13 (Stack Pointer): The SP
determines the border between allocated and unallocated memory on the stack. If a function requires stack space, it allocates it by decreasing the SP
.
R14 (Link Register): The LR
can be seen as a kind of hidden argument register that tells a callee the return address.
R15 (Program Counter)
4. PSR (Program status registers)
ApplicationPSR: N, Z, C, V (flags)
ExecutionPSR: IT (if-then instruction status bits), T (thumb state, always 1 for Cortex-M)
InterruptPSR: EN (exception number)
4. Assembler
(1) Instructions
- Data Processing
ADD r3, r4, r5;
Add the contents of r4
and r5
and store the result in r3
.
ADC r0, r1, r2;
Add the contents of r1
and r2
and store the result in r0
, taking into account the carry flag.
SUB r3, r4, r5;
Subtract the contents of r5
from r4
and store the result in r3
.
NEG r12, r13;
Negate the contents of r13
and store the result in r12
.
- Data Move
MOV r0, #0;
Move the value 0
into register r0
. The #
symbol indicates that the value is an immediate value, rather than a register or memory location.
LDR r1, [r2];
Load the value at the memory location pointed to by r2
into r1
. The []
symbol indicates that the operand is a memory location, rather than a register or immediate value.
STR r6, [r7, #4];
Store the contents of r6
at the memory location pointed to by r7 + 4
.
- Control Flow
B foo;
Jump to the label foo
. This instruction adds a delta to the current PC.
BX r0;
Jump to the address stored in r0
and change the execution mode. This branch writes a new address value in the PC
BL foo;
(function call)
Call the function foo
and store the return address in the link register. Before branch execution PC is copied into the link register (LR).
pop {r4,r5,r7,pc};
(function return)
(2) RISC vs CISC
RISC: 除了load/store,没有其他访问内存的指令。指令固定长度,指令很多,但CPU很简单,时钟频率很高。By reducing the number of addressing modes, RISC computer achieves less complexity and higher clock frequencies
Distinct load
and store
instructions, lacking memory addressing modes for data processing instructions (e.g. ADD), and fixed length instructions all indicate RISC.
CISC: load/store被集成到各种指令中。指令长度可变,指令很少但CPU更复杂。
(3) Thumb Mode
- Thumb or ARM Mode
Use the LSB of PC
for detection. An even address is seen as an ARM code, and an odd address as Thumb.
- Operands
result
counts as an operand of the opcode, thus Cortex-M opcodes have three operands result, operand1, operand2
- Suffix “S”
Suffix “S” tells the CPU to update the conditional execution flags depending on the result of this operation, i.e. ADDS is ADD with S suffix, only ADDS updates APSR flags.
- 32-bit Literal
As instructions are only 32-bit long and a few bits are needed to encode the opcode, the 32-bit literal cannot be placed as an immediate in the instruction. Use MOV
for the lower 16 bits and MOVT
for the upper 16 bits.
Alternatively, we can use the so-called literal pool with LDR r0,=0x12345678
. The literal is placed into the text section right after the current function and is loaded from there using the PC
with an offset automatically calculated by the assembler.
- Function call
Function calls/ Jumps are done using B
or BX
. Note that the LR
needs to be updated to the address of the next instruction after the function call. BL
or BLX
do that automatically.
5. GNU Debugger
(1) Comparisons of Debug Methods
(2) GDB Overview
GDB can be directly attached to any PC program.
(3) Cheatsheet
- 到达函数foo()时停止执行 – break foo
- 执行单行源代码/汇编代码 – step (
s
) / stepi (si
)
- 当变量Bytes被改变时停止执行 – watch Bytes
- 暂时继续执行 – continue (
c
)
- 删除2号断点 – delete 2
- 改变布局,同时显示源代码和汇编程序 – layout split
- 改变光标焦点以扫描命令历史而不是滚动源码 –
focus cmd
- 打印变量计数器/寄存器r3的值 – print counter / print $r3
- 设置变量计数器为7 – set counter = 7
- 打印地址为0x08000000的32位的十六进制值 – x /1wx 0x80000000
- 每次执行停止时显示Bytes值 – display Bytes
- 显示当前函数的局部变量 – info locals
6. Memory Organization Vulnerabilities
(1) Sections in a regular OS-based system
- BSS: uninitialized global data
uint_t bla;
- data: initialized global data
uint32_t bla2 = 0xFEFE;
- heap: dynamically allocated data
long *foo = calloc(a, sizeof(long));
- stack: local variables
uint8_t bla3 = 5;
(2) Section Locations
Virtual Memory Address & Load Memory Address
The virtual memory address (VMA) is used by the processor to access a particular location in virtual memory, i.e. to access the data in virtual memory, regardless of whether it is currently stored in RAM or on disk. In a virtual memory system, each program file is assigned a virtual address space, which is a range of memory addresses that are used for the program to access the data. Virtual memory addresses are used in a similar way to physical memory addresses, with the main difference being that the operating system is responsible for mapping virtual addresses to physical addresses when the program or data is accessed.
The load memory address (LMA) is a location in the physical main memory where a particular piece of code or data is loaded. This address is typically specified in the program or data file that the operating system or other software uses to load the program or data into the appropriate location in memory.
1. SRAM
SRAM is volatile, which means that it requires a constant power supply to retain its stored data. The data stored in SRAM will be lost if the power supply is interrupted.
For the data section, the VMA is in SRAM, because the program needs to be able to modify the data.
- maximum size of stack
The main stack currently occupies 0x10000000
through 0x10000800
, so 2 KiB, which is the maximum size during runtime. It can be made larger in the linker description file, then the maximum is the size of PSRAM, 64 KiB.
- maximum size of heap
However, the actual size limit of the heap is defined in the linker description file. Of course, the limit defined there must be small enough such that the heap and all the other sections, e.g. data
and bss
, all together fit into DSRAM1. Note that this size limit cannot be used entirely for heap storage, because each chunk consumes an additional 4B
for its header.
2. FLASH
Flash memory is non-volatile and based on electrically-erasable programmable read-only memory (EEPROM) technology. Non-volatile memory retains its stored data even when the power supply is interrupted
For the data section, the LMA is in FLASH, because the initialization values need to be in some non-volatile memory. Startup code in the boot routine copies initialization values from FLASH to SRAM and clears BSS
(3) Address space
Address space for stack, data and BSS can be read out from *.lst file. Location of the heap cannot be read from it but has to be tried using a debugger and some calloc
calls.
Stack cannot crash into heap, but may run out of memory. Because On many platforms the heap and stack are allocated in different pages and never will meet.
(4) Stack Frame
如果参数在堆栈中传递,例如,如果函数有四个以上的参数,它们是caller’s stack frame的一部分,而不是callee’s stack frame。因此,在上图中,函数参数被简要标记为 “previous frame “的一部分。
除此之外,编译器可能会把一个参数的副本放入为local variable保留的区域。例如,如果第一个参数,即在R0中传递的参数,在函数的末尾是需要的,但另一个带有其他参数的函数必须在之前被调用。那么寄存器R0-R3必须被释放,因为其他函数可能会破坏它们,所以编译器必须将我们的第一个参数保存在(可能是)堆栈中。
如果一个较长的string被放进local variable区域,local variable区域上的return address可能会被覆盖,这可能使程序流程改变,因为在当前函数返回时,return address的值将被放入程序计数器中。
(5) Buffer Overflow Attack
参见参考手册第2.3.3节第2-22页关于默认的访问权限。代码、SRAM和外部RAM区域默认都是可执行的。堆栈在PSRAM中,位于Cortex-M4的代码存储器区域,范围是0x00000000到0x1FFFFFFF。所以堆栈默认是可执行的。
- 用info frames确定缓冲区和返回地址的位置以及exploit要多长才能覆盖return address:
如图,buffer在&buf = 0x100007c0
处。返回地址被GDB称为lr
,它的位置是0x100007e4
。两者之间有36 Bytes,所以我们的exploit需要40 Bytes长来覆盖返回地址。
- 设计一个exploit:
由于给定的代码只有20B长,我们需要增加16B的padding,然后是新的返回地址,指向exploit代码。exploit代码位于buf的开头,即0x100007c0
处。在Thumb模式下,新的返回地址是0x100007c0+1=0x100007c1
。
一个可能的exploit是(每个字节由两个HEX数表示)
FD 46 48 F2 01 12 C4 F6 02 02 80 21 D1 73 C9 09 D1 70 FE E7
FF FF FF FF FF
FF FF FF FF FF FF FF FF
FF FF FFC1 07 00 10
我们也可以预留填充物,将返回地址改为0x100007d1。那么,这个exploit将是:ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff fd 46 48 f2 01 12 c4 f6 02 02 80 21 d1 73 c9 09 d1 70 fe e7 d1 07 00 10
– exploit instruction
– padding
– new return address
- Little Endian:
在发送这个文件到电路板之前,需要将其转换为二进制表示
从0x100007c0
到0x100007c4
处存储的字节是:FD 46 48 F2
,转换成32位二进制值是0xF24846FD
。
以0x100007c0
到0x100007c4
处为例,在Little Endian时,MSByte要用*((uint_8 *)&a+3)
获得,而在Big Endian时,用*((uint_8 *)&a)
获得。
- Drawbacks of strcpy():
如果我们看一下上面的漏洞,它们都在新的返回地址中包含一个00字符。这将导致strcpy()在这一点上终止,并且不写最后一个字节,让新的返回地址指向0x080007c1
,这不是我们的exploit所在的位置。所以在这个特定的例子中,用strcpy()在堆栈上执行buffer overflow attack是不可能的。
- Find Buffer Overflow Vulnerability
The string givenPW
is allocated with length 21, but then in line 8, up to 0x21=33
characters are allowed to be written.
Pro tip: Use a macro or a const variable with a name to hold the size and always use this variable instead of the plain number a.k.a. magic number. That not only avoids such vulnerabilities but also makes your code much more comprehensible.
7. Exceptions and Interrupts
(1) Use Cases
- Reaction to events outside of the CPU, e.g. ADC conversion finished
Reaction to outside events is also possible via polling.
- Multi-Tasking with termination of hung-up tasks
Multi-Tasking may work without interrupts if all tasks periodically call a context-switching function, but this is highly impractical. In real scenarios, and also if it comes to terminating hung-up tasks, multi-tasking cannot be realized without interrupts. Usually, the SysTickTimer is used to switch context and hand over the CPU to the next task in the queue.
- Power Saving
Power saving requires IRQs (interrupt requests) to wake up the CPU after sleep. So this cannot work without interrupts (reset is often considered just a special case of interrupt or exception).
(2) Interrupts vs. Polling
1. According to the problem, a context switch happens once per second. The CPU load for the context switch is . It is obvious that 30 µs + 5 µs < 50 µs.
No flags have to be polled in case of interrupts, but we have to save the current CPU context on the stack and restore it after the ISR (interrupt service routine) finishes. Such a context switch only happens if the IRQ is pending, i.e. after the outside event occurred.
2. The flag needs to be polled with at least The CPU load for the polling is .
We must have finished the subroutine within 50 µs after the event. Considering that processing the subroutine takes 30 µs, the subroutine has to start not more than 20 µs after the event. As the poll itself takes 2 µs, the start of a poll must thus be not more than 18 µs after the last poll started.
Since the flag is not guaranteed to be set within 2 µs after the event happens, we have to assume that a poll is not guaranteed to be successful when the event happens.
3. The CPU load for the subroutine is .
Therefore, the polling-based implementation has an overall CPU load of
And the polling-based implementation has an overall CPU load of
1. Due to the interrupt latency (5 µs) and context switching (4 µs), interrupts cannot be used.
2. Since we poll the GPIO line, what decides is whether the system can poll fast enough (frequency). Polling can achieve the requirements.
Although in this case, the polling loop requires 100% CPU load, it does so only for a very short period of time after the SW initiates a mode change. Thus the average CPU load is not much affected by the polling loop.
(3) ISR (interrupt service routine)
- Transparency
In general, an ISR should be transparent to other codes, which means that it should not interfere with the normal operation of the system. I.e. except for the variables intended to be changed by the ISR, everything – including all registers and special registers such as APSR – must be restored to their original value.
- Context saving
The registers that are saved automatically upon an IRQ for the XMC4500 include PC, PSR, R0, R1, R2, R3, R12, LR
.
- Number of arguments
None, because there is no caller that could set the arguments to some meaningful value. But this is not true for exceptions in general.
- Access of IRQ
For performance optimization, the compiler might keep a local copy of a variable in a register for repeated access. If ISR updates the original variable in SRAM, the software will continue to use the old value. If it is a wait loop that postpones code execution until, e.g. a certain number of bytes are received by the UART, the system will hang forever.
The keyword volatile
can be given to a variable to avoid this issue. The use of the “volatile
” keyword tells the compiler that it should not optimize access to the variable, as the value of the variable may change unexpectedly.
8. Memory Protection Unit
MPU defines regions in memory and specifies attributes for them:
r– r–: read-only in privileged mode
rw- rw-: read&write, never execute
rw- r–: read always, but write only in privileged mode
r-x r-x: read&execute, never write
(1) Access to Memory Sections
The text (code) section is a region of memory that is used to store the executable instructions of a program. For these instructions to be executed, the text section must be marked as executable.
(2) MPU Configuration
- XMC4500
Up to 8 regions, each is of size between 32B
and 4GB
distinguished by priority, i.e. only one region per priority level.
Background region for privileged level with the lowest priority.
- Define
MPUconfig_t
enum MPUeasyPermissions { MPUeasy_None_None = 0, MPUeasy_RW_None = 1, MPUeasy_RW_R = 2, MPUeasy_RW_RW = 3, MPUeasy_R_None = 5, MPUeasy_R_R = 6};
# define MPUeasyXN (0x1<<4)
# define MPUeasyENABLEREGION (0x1<<7)
typedef struct {
void * baseAddress;
int permissions;
uint8_t size;
uint8_t priority;
} MPUconfig_t;
uint8_t size
is as power of 2, so e.g.10=1KiB
,20=1MiB
.
“#define” directive is used to define a macro. A macro is a fragment of code that is replaced with a different fragment of code when the program is compiled. Macros can be used to simplify complex code, to improve readability, or to provide a convenient way to reuse code.
- Define regions
The proper MPU configuration for the sections mentioned in the previous question looks like that:
MPUconfig_t FLASH = {.baseAddress =( void *) 0x08000000, .size =27, .priority =0, .permissions = MPUeasyENABLEREGION | MPUeasy_R_R };
// 10000000 | 1010 = 10001010
MPUconfig_t PSRAM = {.baseAddress =( void *) 0x10000000, .size =16, .priority =1, .permissions = MPUeasyENABLEREGION | MPUeasy_RW_RW | MPUeasyXN };// 10000000 | 11 | 10000 = 10010011
MPUconfig_t DSRAM1 = {.baseAddress =( void *) 0x20000000, .size =16, .priority =2, .permissions = MPUeasyENABLEREGION | MPUeasy_RW_RW | MPUeasyXN };// 10000000 | 11 |
10000 = 10010011
MPUconfig_t Pheriperals = {.baseAddress =( void *) 0x40000000, .size =29, .priority =3, .permissions = MPUeasyENABLEREGION | MPUeasy_RW_RW | MPUeasyXN };
// 10000000 | 11 |
10000 = 10010011
The size for the FLASH region is 27 and not 20 as one would expect for
1 MiB
, because the cached access to the FLASH runs via addresses0x0C000000
up to0x0C0FFFFF
and we want to capture both cached and uncached access to the FLASH.
The “|” operator is a bitwise OR operator. the “.” operator is used to access the members of a structure. It is used to both create and initialize a variable of a structure type.
Although the regions for PSRAM, DSRAM1, and peripherals share the same access permissions, we have to define separate regions for them for two reasons:
First, a single region ranging from
0x10000000
to0x5FFFFFFF
would have size230.3219B
which is not an integer power of 2.A common region for PSRAM and DSRAM1 with size
229B
would have a feasible size, but is not possible for the second reason, namely that region base addresses have to be aligned to the size of the region. A region of size229B
would need to start at an address that has its29
lowermost bits equal to zero (>=0x20000000
), which is not the case for 0x10000000.
- Calling
configMPU()
After defining the appropriate regions as MPUconfig_t
, we have to program them into the MPU by calling configMPU()
on each one. Then we can enable the MPU and drop our privileges. If the program continues to run, we have set up the MPU correctly.
The functions to check the current privilege level and drop privileges are provided by another set of small helper functions in privilege.c
.
Note that the Private Peripheral Bus (PPB) is always accessible in privileged mode even if there is no region defined for it and with a disabled background region.
- Example
Now a credential store is added to the system in the uppermost 1KiB
of DSRAM1, which should be only readable by the task. We change the configuration like:
MPUconfig_t Secret = {.baseAddress =( void *) SECRETSTORE, .size =10, .priority =4, .permissions = MPUeasyENABLEREGION | MPUeasy_RW_R | MPUeasyXN };
// 10000000 | 11 | 10000 = 10010011
MPUconfig_t DSRAM1 = {.baseAddress =( void *) 0x20000000, .size =16, .priority =2, .permissions = MPUeasyENABLEREGION | MPUeasy_RW_RW | MPUeasyXN };// 10000000 | 10 |
10000 = 10010010
You do not have to exclude the uppermost
1 KiB
for the secret store from the DSRAM1 region, because the higher priority of the Secret region will override the permissions of this part of the DSRAM1 region. The priority for Secret can be any priority (0-7
) that is yet unused (4-7
) and larger than the priority of the DSRAM1 region (>2
).
9. Manual Canary
(1) Secure below function using canaries
- Using
struct
We use a struct
to prohibit the compiler from reordering the local variables during alignment:
This is a hypothetic example. In practice, you would not add the canary yourself in a real program but use the -fstack-protector
option of your compiler and then the compiler decides if it spends the extra effort to protect exactly the array boundaries of only the control flow information, i.e.the return address.
- Using Buffer
Another smart solution is to increase the size of the buffer and make the canary part of the buffer itself. Then no struct is required to prohibit reordering, but it gets somewhat complicated to access the canary.
(2) Properties of Canaries
If the value of the canary can be guessed or tried out by an attacker, she can overwrite the canary with its original value, such that it is not changed. This would render an attack unnoticeable and must thus be made infeasible. The value should therefore fulfill the following properties:
- Unpredictable and not readable for the attacker in any way
- Large enough to avoid trying out all possible values (brute-force)
- Ideally, change upon each program invocation (a change upon each function call would make the program terribly slow)
10. Other Software Attacks
(1) Heap Based Buffer Overflow
- Example
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
char *buf;
// Allocate memory on the heap
buf = malloc(10);
if (buf == NULL) {
perror("malloc failed");
return 1; }
// Read input from the user
printf("Enter a string: ");
fgets(buf, 20, stdin);
// Print the input back to the user
printf("You entered: %s\n", buf);
// Free the allocated memory free(buf);
return 0; }
The “fgets()
” function is called with a buffer size of 20, which is larger than the size of the allocated buffer (10). This means that the program writes more data to the buffer than it is intended to hold, which can cause a buffer overflow. In this example, the size of the buffer should be passed as the second argument to “fgets()
” rather than hardcoded as a constant.
This attack can overwrite the backward and forward pointer.
- Use-after-free Bug
Consequence: The memory locations might already be allocated for a different purpose. Reading from them may cause the function to perform unexpected and possibly exploitable actions. Writing to it clobbers data of the other function that the memory locations are now allocated to and may cause this code to malfunction.
- Double-free Bug
Countermeasure: The easiest way to do this is to always set the pointer to NULL when it is freed, just like malloc and calloc return a NULL pointer when the allocation failed. According to the C standard, freeing a NULL pointer does no harm.
(2) Format String Attacks
- Read password
In line 7, an attacker-controlled string is used as a format string to printf. An attacker may thus add conversions to this string to read correctPW
from memory.
We know that r0
to r3
contains the first four arguments of a function, and the return value is placed in r0
. Tracing back the code, the last time r1
is used, is to pass correctPW
to strcmp()
, so correctPW
is still in r1
when printf()
is called.
Since the “printf()
” function in C takes a format string as its first argument(r0
) and a variable number of additional arguments(r1, r2, r3, stack ...
) that are used to fill in placeholders in the format string. It is thus sufficient for an attacker to provide %s
as the givenPW
, because printf()
will interpret the conversion and print the string pointed to by r1
(second argument), which is the correctPW
.
- Read location of stack
In line 8 a user-controlled string is used as format string. The “sprintf()
” function takes a string buffer as its first argument and a format string as its second argument, and a variable number of additional arguments that are used to fill in placeholders in the format string.
A sufficient number of %x
or %p
will print out the value of r2, r3, and contents in stack
including the previous stack frame pointer (under the return address) into the debugString
, which will eventually be displayed on the screen.
(3) Integer Underflows
for(uint8_t i = 42; i >= 0; --i);
The programmer intended to loop 43 times by decrementing the variable uint8_t i
from 42 to 0. The loop should stop as soon as i
becomes negative. The actual behavior of the implementation is an endless loop due to the declaration of the variable i
as uint8_t
, i.e. as an unsigned 8 bit variable of range 0 to 255. If i=0
is decremented by one, the result will be 255. This issue is called integer underflow.
(4) SQL Injection
Code Injection requires that data is treated as code so that it can contain variables.
userpass=sqlInt.execute("SELECT␣password␣FROM␣users␣WHERE␣username ␣=␣’" + userName + " ’;");
The variable userName
is taken directly from the input of the login form. The SQL statement retrieves the password corresponding to the entered username and checks it against the array userpass
.
The statements after the WHERE
keyword filter what is retrieved from the database. So we need to disable the filter or make it always true:
userpass = " SELECT ␣ password ␣ FROM ␣ users ␣ WHERE ␣ username ␣=␣ ’’␣OR␣’1’=’1’;"
The ’1’=’1’
statement is always true.
(5) Cross Site Scripting(XSS)
XSS requires that server stores user data and displays it on its webpages to others e.g. comment fields in online shop, forum entries, etc. and server does not check data for statements interpreted by a browser.
Popular not-so-harmful example: Alert box using javascript:
Javascript can also do other things, like stealing a session cookie and sending it to the attacker, which then can impersonate the victim.
When is there a risk of code injection?
Whenever code and data is only weakly separated, e.g. in von Neumann architectures or scripting languages.
11. Security and Cryptography
(1) Security Objectives (CIA)
Objectives | Measure |
Confidentiality | Access Control / Encryption |
Integrity | Write Protection / Crypto Signature |
Availiability | Redundancy |
Accountability | Logging |
Authenticity | Password / Crypto Signature |
Privacy | Data Minimization / Pseudonyms |
(2) Crypto Algorithm Overview
- Symmetric Cryptography
Block Ciphers | Description | Block Size / bit | Key Size / bit |
DES | proven weak | 64 | 56 |
IDEA | international data encryption algorithm | 64 | 128 |
AES | advanced encryption algorithm | 128 | 128 / 192 / 256 |
SPECK | linear cipher, light weight | 32 – 128 | 64 – 256 |
Stream Cipher | Key Size / bit |
RC4(weak) | 8 – 2048 |
Salsa20 | 256 |
- Asymmetric Cryptography
Cryptography | Key Size |
RSA | 1024(weak), 4096 |
DSA(digital signature) | 1024 |
ECDSA | 160 |
- Cryptographic Hash Functions
Hash Functions | Output Size / bit |
MD5(weak) | 128 |
SHA-1(weak) | 160 |
SHA-2 | 224 – 512 |
SHA-3 | 224 – 512 |
(3) Confusion & Diffusion
- Confusion of mapping
The relation between plaintext and ciphertext shall be highly complex, known pairs of plaintext and ciphertext shall not allow recovering the key.
- Diffusion of entropy
Every bit of input to the cryptographic algorithm shall affect all bits of the output.
(4) Attacks
- Block Cipher
Instead of trying brute-force which is infeasible, one can build up a look-up table and perform a search-and-replace attack.
- Ideal Stream Cipher
A brute-force attack is not possible. Because all keys are equally probable and a certain ciphertext could equally probably represent any plaintext.
- One-time Pad
The problem with OTP is that it requires a secure method for generating and distributing the keys, as well as securely storing them. The key must be kept secret and be as long as the plaintext. This is a difficult task to accomplish in practice, especially for large amounts of data or for long-term storage.
Another issue with OTP is that it is not very efficient, as the same amount of data must be encrypted as the plaintext, which can lead to large key sizes and slow encryption and decryption times.
- Asymmetric Cryptography
An IoT node communicates with multiple webservers via TLS, a hybrid cryptography protocol. If the IoT node wants to communicate, it first exchanges public keys with the respective server, then establishes a signed and encrypted channel with them.
If an attacker redirects the entire traffic to his servers, the IoT won’t know and still accepts the attacker’s public key. To avoid such attacks, the IoT node could have a list of public keys for the trusted servers.
To avoid such attacks, real-world asymmetric cryptography never uses raw public keys, but certificates, which are basically just signed public keys for a certain URL, email address, etc. Whether to trust the public key can then be decided depending on who signed it.
The webbrowser has a built-in list of so called certificate authorities (CAs), whose business model is to sign public keys for money. If you now browse a website that uses TLS, your browser will check if the certificate returned by the server contains a signature of one of these trusted CAs (often via a couple of intermediate certifications) and only if so accept the public key.In this problem’s scenario, the public key returned by the attacker’s server will not be accompanied by a signature of one of these CAs and thus the browser will display a warning that the public key could not be authenticated and it is likely that you are currently under attack.
12. Side Channel Attacks
(1) Types
There is physical quantity related to the operation of a cryptosystem but not intended to carry information.
- Time
- Power
- Electromagnetic emanations
- Acoustic emanations
- Temperature
- Light
(2) Timing of Password Check
strcmp
terminates upon the first character that differs. So by observing how long it takes to check the password, one can determine how many characters – at the beginning of the string – are correct.
The cryptographic library sodium
provides a more secure function for comparing data: sodium_memcmp
(3) Power Trace of Square-Multiply Algorithm
13. Embedded Communication
(1) Embedded Communication Standards
- CAN
- RS-232
- I2C(Inter-Integrated Circuit)
- SPI(Serial Peripheral Interface)
It is a full-duplex communication protocol, which means that data can be transmitted and received simultaneously on separate lines. It is used for communication between integrated circuits.
- Ethernet
一种广泛使用的局域网(LAN)协议,支持10 Mbps至100 Gbps的数据传输速率。It is a standard for connecting computers and other devices in a LAN and is also used in wide area network (WAN) connections. Ethernet is based on the use of a shared medium, typically a wired cable, to transmit data between devices.
- UART(Universal Asynchronous Receiver Transmitter)
It allows devices to transmit and receive data serially (i.e., one bit at a time) over a single communication line or channel. UARTs are commonly used in embedded systems, such as microcontrollers, to communicate with other devices, such as sensors, memory, and other peripherals.
(2) USB (Universal Serial Bus)
It is a standard for connecting devices to a computer or other host. It is a serial bus that provides a standard interface for connecting a wide variety of peripherals, such as keyboard, mouse, cameras, printers, and external hard drives.
- USB connection sequence
- Attach: The host detects the connection and sends a reset signal to the device.
- Read Device Descriptor: The host queries the device for its identity and configuration information. The device responds with its device descriptor, which contains information such as its vendor ID, product ID, and supported USB version.
- Assign Address: The host assigns a unique address to the device and the device uses that address for all subsequent communications with the host.
- Configuration: The host selects a configuration for the device and sends a configuration request. The device responds by setting its configuration and sending a configuration descriptor, which contains information such as the device’s power requirements and the number of interfaces.
- Read Interface Descriptor: The host selects an interface and sets up the endpoints. The device responds by setting up the interface and endpoints and sending an endpoint descriptor, which contains information such as the endpoint’s maximum packet size and transfer type.
- Load Driver: A USB driver is a software component that allows a computer’s operating system to communicate with a USB device. Drivers act as a translator between the operating system and the device, allowing the operating system to recognize and control the device.