How To Store Address In A Register X86
TL;DR
This blog post explains how Linux programs call functions in the Linux kernel.
It will outline several dissimilar methods of making systems calls, how to handcraft your own assembly to make system calls (examples included), kernel entry points into organization calls, kernel exit points from arrangement calls, glibc wrappers, bugs, and much, much more.
What is a system phone call?
When you lot run a program which calls open
, fork
, read
, write
(and many others) you are making a system call.
Organisation calls are how a programme enters the kernel to perform some job. Programs use organization calls to perform a diversity of operations such as: creating processes, doing network and file IO, and much more.
You can observe a list of system calls by checking thehuman being page for syscalls(ii).
There are several different means for user programs to make organisation calls and the low-level instructions for making a organization telephone call vary among CPU architectures.
As an application developer, you don't typically demand to remember about how exactly a system call is fabricated. You simply include the appropriate header file and make the call as if information technology were a normal function.
glibc
provides wrapper lawmaking which abstracts you abroad from the underlying code which arranges the arguments you've passed and enters the kernel.
Before we tin can dive into the details of how organisation calls are made, we'll demand to define some terms and examine some core ideas that will announced later.
Prerequisite information
Hardware and software
This blog post makes the following assumptions that:
- You are using a 32-bit or 64-chip Intel or AMD CPU. The discussion about the methods may exist useful for people using other systems, but the code samples below contain CPU-specific code.
- You are interested in the Linux kernel, version iii.thirteen.0. Other kernel versions will be similar, merely the exact line numbers, organization of code, and file paths will vary. Links to the 3.thirteen.0 kernel source tree on GitHub are provided.
- You are interested in
glibc
orglibc
derived libc implementations (e.g.,eglibc
).
x86-64 in this blog postal service will refer to 64bit Intel and AMD CPUs that are based on the x86 architecture.
User programs, the kernel, and CPU privilege levels
User programs (like your editor, terminal, ssh daemon, etc) need to collaborate with the Linux kernel so that the kernel can perform a set of operations on behalf of your user programs that they can't perform themselves.
For case, if a user program needs to do some sort of IO (open
, read
, write
, etc) or modify its accost space (mmap
, sbrk
, etc) it must trigger the kernel to run to complete those actions on its behalf.
What prevents user programs from performing these actions themselves?
It turns out that the x86-64 CPUs have a concept chosenprivilege levels. Privilege levels are a complex topic suitable for their ain blog post. For the purposes of this post, nosotros can (greatly) simplify the concept of privilege levels by saying:
- Privilege levels are a means of access control. The current privilege level determines which CPU instructions and IO may be performed.
- The kernel runs at the most privileged level, called "Ring 0". User programs run at a lesser level, typically "Ring 3".
In guild for a user program to perform some privileged functioning, it must cause a privilege level modify (from "Ring iii" to "Band 0") so that the kernel tin execute.
There are several means to cause a privilege level change and trigger the kernel to perform some activeness.
Let's first with a common style to cause the kernel to execute: interrupts.
Interrupts
You can call back of an interrupt as an event that is generated (or "raised") past hardware or software.
A hardware interrupt is raised by a hardware device to notify the kernel that a item event has occurred. A common example of this blazon of interrupt is an interrupt generated when a NIC receives a package.
A software interrupt is raised past executing a slice of code. On x86-64 systems, a software interrupt can be raised past executing the int
teaching.
Interrupts usually have numbers assigned to them. Some of these interrupt numbers have a special meaning.
You tin can imagine an array that lives in retentivity on the CPU. Each entry in this assortment maps to an interrupt number. Each entry contains the address of a office that the CPU will begin executing when that interrupt is received along with some options, like what privilege level the interrupt handler part should be executed in.
Here's a photograph from the Intel CPU manual showing the layout of an entry in this array:
If you look closely at the diagram, you can run across a two-scrap field labeled DPL (Descriptor Privilege Level). The value in this field determines the minimum privilege level the CPU will be in when the handler office is executed.
This is how the CPU knows which address it should execute when a detail blazon of event is received and what privilege level the handler for that consequence should execute in.
In practise, there are lots of different ways to deal with interrupts on x86-64 systems. If you are interested in learning more than read near the8259 Programmable Interrupt Controller,Advanced Interrupt Controllers, andIO Advanced Interrupt Controllers.
In that location are other complexities involved with dealing with both hardware and software interrupts, such every bit interrupt number collisions and remapping.
We don't demand to concern ourselves with these details for this discussion nearly system calls.
Model Specific Registers (MSRs)
Model Specific Registers (also known every bit MSRs) are control registers that take a specific purpose to control sure features of the CPU. The CPU documentation lists the addresses of each of the MSRs.
You tin apply the CPU instructions rdmsr
to wrmsr
to read and write MSRs, respectively.
There are also command line tools which allow you to read and write MSRs, but doing this is non recommended as changing these values (especially while a system is running) is dangerous unless y'all are actually careful.
If yous don't mind potentially destabilizing your system or irreversibly corrupting your data, you can read and write MSRs by installing msr-tools
and loading the msr
kernel module:
Some of the system phone call methods we'll see subsequently make employ of MSRs, as we'll run into soon.
Calling organization calls with assembly is a bad idea
Information technology'southward not a great idea to call system calls by writing your own assembly code.
One big reason for this is that some arrangement calls accept additional code that runs in glibc before or after the system call runs.
In the examples below, we'll be using the exit
system telephone call. It turns out that you can annals functions to run when go out
is chosen by a plan by using atexit
.
Those functions are chosen from glibc, not the kernel. And so, if you lot write your own assembly to call exit
equally we show below, your registered handler functions won't be executed since yous are bypassing glibc.
Nevertheless, manually making system calls with assembly is a proficient learning experience.
Legacy organisation calls
Using our prerequisite noesis we know two things:
- Nosotros know that we can trigger the kernel to execute by generating a software interrupt.
- Nosotros can generate a software interrupt with the
int
associates instruction.
Combining these ii concepts leads the states to the legacy organisation call interface on Linux.
The Linux kernel sets aside a specific software interrupt number that can be used by user infinite programs to enter the kernel and execute a system call.
The Linux kernel registers an interrupt handler named ia32_syscall
for the interrupt number: 128 (0x80). Let'southward take a look at the lawmaking that actually does this.
From the trap_init
function in the kernel iii.13.0 source in arch/x86/kernel/traps.c
:
Where IA32_SYSCALL_VECTOR
is a divers every bit 0x80
in arch/x86/include/asm/irq_vectors.h
.
But, if the kernel reserves a single software interrupt that userland programs can raise to trigger the kernel, how does the kernel know which of the many arrangement calls it should execute?
The userland program is expected to put the organisation call number in the eax
annals. The arguments for the syscall itself are to be placed in the remaining general purpose registers.
I place this is documented is in a comment in arch/x86/ia32/ia32entry.S
:
Now that we know how to make a organisation phone call and where the arguments should live, let's attempt to make one by writing some inline assembly.
Using legacy system calls with your own assembly
To brand a legacy organization call, yous tin can write a small bit of inline associates. While this is interesting from a learning perspective, I encourage readers to never make system calls by crafting their own assembly.
In this example, we'll endeavor calling the exit
organisation call, which takes a single argument: the exit status.
Outset, nosotros need to find the organization call number for exit
. The Linux kernel includes a file which lists each system call in a tabular array. This file is processed past various scripts at build time to generate header files which can be used by user programs.
Let'southward await at the table constitute in arch/x86/syscalls/syscall_32.tbl
:
The exit
syscall is number one
. Co-ordinate to the interface described in a higher place, we just demand to movement the syscall number into the eax
annals and the offset argument (the exit condition) into ebx
.
Here'southward a piece of C code with some inline associates that does this. Let'due south set the exit status to "42":
(This instance can be simplified, but I thought it would exist interesting to make information technology a bit more wordy than necessary so that anyone who hasn't seen GCC inline associates before tin can utilise this as an example or reference.)
Next, compile, execute, and check the exit status:
Success! We called the go out
system phone call using the legacy system call method by raising a software interrupt.
Kernel-side: int $0x80
entry bespeak
So now that we've seen how to trigger a system telephone call from a userland program, let's see how the kernel uses the organization call number to execute the arrangement call code.
Recall from the previous department that the kernel registered a syscall handler part called ia32_syscall
.
This office is implemented in assembly in arch/x86/ia32/ia32entry.S
and we can see several things happening in this function, the virtually of import of which is the call to the actual syscall itself:
IA32_ARG_FIXUP
is a macro which rearranges the legacy arguments so that they may be properly understood by the current system call layer.
The ia32_sys_call_table
identifier refers to a table which is divers in arch/x86/ia32/syscall_ia32.c
. Note the #include
line toward the end of the lawmaking:
Recall earlier nosotros saw the syscall table defined in arch/x86/syscalls/syscall_32.tbl
.
At that place are a few scripts which run at compile time which take this table and generate the syscalls_32.h
file from it. The generated header file is comprised of valid C lawmaking, which is simply inserted with the #include
shown above to fill up in ia32_sys_call_table
with function addresses indexed past organization call number.
And this is how y'all enter the kernel via a legacy organization phone call.
Returning from a legacy system telephone call with iret
We've seen how to enter the kernel with a software interrupt, but how does the kernel render back to the user plan and driblet the privilege level afterwards it has finished running?
If we turn to the (warning: large PDF)Intel Software Developer's Manualwe can find a helpful diagram that illustrates how the program stack volition be arranged when a privilege level change occurs.
Let's take a look:
When execution is transferred to the kernel function ia32_syscall
via the execution of a software interrupt from a user program, a privilege level modify occurs. The result is that the stack when ia32_syscall
is entered volition look like the diagram higher up.
This ways that the return address and the CPU flags which encode the privilege level (and other stuff), and more are all saved on the program stack before ia32_syscall
executes.
And so, in order to resume execution the kernel just needs to re-create these values from the program stack dorsum into the registers where they belong and execution will resume back in userland.
OK, so how do you practise that?
There's a few means to do that, merely one of the easiest means is to the use the iret
education.
The Intel instruction set manual explains that the iret
teaching pops the return address and saved annals values from the stack in the social club they were prepared:
As with a existent-address mode interrupt render, the IRET instruction pops the return instruction pointer, return lawmaking segment selector, and EFLAGS paradigm from the stack to the EIP, CS, and EFLAGS registers, respectively, and so resumes execution of the interrupted programme or process.
Finding this lawmaking in the Linux kernel is a bit difficult equally it is hidden below several macros and there is extensive care taken to deal with things like signals and ptrace system call exit tracking.
Eventually all the macros in the assembly stubs in the kernel reveal the iret
which returns from a arrangement call back to a user program.
From irq_return
in arch/x86/kernel/entry_64.Due south
:
Where INTERRUPT_RETURN
is defined in curvation/x86/include/asm/irqflags.h
as iretq
.
And now you lot know how legacy organization calls work.
Fast arrangement calls
The legacy method seems pretty reasonable, but in that location are newer means to trigger a system telephone call which don't involve a software interrupt and aremuch fasterthan using a software interrupt.
Each of the ii faster methods is comprised of two instructions. One to enter the kernel and one to leave. Both methods are described in the Intel CPU documentation as "Fast System Phone call".
Unfortunately, Intel and AMD implementations have some disagreement on which method is valid when a CPU is in 32bit or 64bit style.
In order to maximize compatibility across both Intel and AMD CPUs:
- On 32bit systems utilise:
sysenter
andsysexit
. - On 64bit systems use:
syscall
andsysret
.
32-bit fast system calls
sysenter
/sysexit
Using sysenter
to make a system call is more complicated than using the legacy interrupt method and involves more coordination between the user program (via glibc
) and the kernel.
Allow's take it one step at a time and sort out the details. First, let'due south see what the documentation in the Intel Education Gear up Reference (warning very largePDF) says about the sysenter
and how to apply information technology.
Let's have a look:
Prior to executing the SYSENTER teaching, software must specify the privilege level 0 code segment and code entry point, and the privilege level 0 stack segment and stack arrow past writing values to the post-obit MSRs:
• IA32_SYSENTER_CS (MSR address 174H) — The lower xvi bits of this MSR are the segment selector for the privilege level 0 lawmaking segment. This value is also used to determine the segment selector of the privilege level 0 stack segment (see the Operation section). This value cannot indicate a null selector.
• IA32_SYSENTER_EIP (MSR accost 176H) — The value of this MSR is loaded into RIP (thus, this value references the commencement instruction of the selected operating procedure or routine). In protected mode, only bits 31:0 are loaded.
• IA32_SYSENTER_ESP (MSR accost 175H) — The value of this MSR is loaded into RSP (thus, this value contains the stack pointer for the privilege level 0 stack). This value cannot stand for a non-canonical address. In protected fashion, only bits 31:0 are loaded.
In other words: in order for the kernel to receive incoming system calls with sysenter
, the kernel must set 3 Model Specific Registers (MSRs). The well-nigh interesting MSR in our example is IA32_SYSENTER_EIP
(which has the address 0x176). This MSR is where the kernel should specify the address of the function that will execute when a sysenter
teaching is executed past a user program.
We can detect the lawmaking in the Linux kernel which writes to the MSR in arch/x86/vdso/vdso32-setup.c
:
Where MSR_IA32_SYSENTER_EIP
is defined every bit a 0x00000176
arch/x86/include/uapi/asm/msr-alphabetize.h
.
Much like the legacy software interrupt syscalls, at that place is a defined convention for making system calls with sysenter
.
One place this is documented is in a comment in curvation/x86/ia32/ia32entry.Due south
:
Recall that the legacy system phone call method includes a machinery for returning back to the userland program which was interrupted: the iret
instruction.
Capturing the logic needed to brand sysenter
piece of work properly is complicated because unlike software interrupts, sysenter
does non store the return address.
How, exactly, the kernel does this and other accounting prior to executing a sysenter
pedagogy tin can change over time (and it has changed, equally you will encounter in the Bugs department below).
In lodge to protect against future changes, user programs are intended to apply a role chosen __kernel_vsyscall
which is implemented in the kernel, but mapped into each user procedure when the procedure is started.
This is a fleck odd; it's code that comes with the kernel, but runs in userland.
It turns out that __kernel_vsyscall
is role of something chosen a virtual Dynamic Shared Object (vDSO) which exists to let programs to execute kernel code in userland.
We'll examine what the vDSO is, what it does, and how it works in depth later.
For now, let'south examine the __kernel_vsyscall
internals.
__kernel_vsyscall
internals
The __kernel_vsyscall
function that encapulates the sysenter
calling convention tin be found in arch/x86/vdso/vdso32/sysenter.South
:
__kernel_vsyscall
is role of a Dynamic Shared Object (too known equally a shared library) how does a user program locate the address of that role at runtime?
The address of the __kernel_vsyscall
function is written into anELF auxilliary vectorwhere a user program or library (typically glibc
) can find it and employ it.
There are a few methods for searching ELF auxilliary vectors:
- By using
getauxval
with theAT_SYSINFO
argument. - By iterating to the end of the environment variables and parsing them from memory.
Option ane is the simplest choice, but does not exist on glibc
prior to ii.16. The example code shown below illustrates option 2.
Every bit we can see in the code above, __kernel_vsyscall
does some accounting earlier executing sysenter
.
So, all nosotros need to practise to manually enter the kernel with sysenter
is:
- Search the ELF auxilliary vectors for
AT_SYSINFO
where the accost of__kernel_vsyscall
is written. - Put the system call number and arguments into the registers as we would ordinarily for legacy system calls
- Call the
__kernel_vsyscall
function
You should admittedly never write your own sysenter
wrapper function as the convention the kernel uses to enter and leave arrangement calls with sysenter
can change and your lawmaking will interruption.
You should ever start a sysenter
system call past calling through __kernel_vsyscall
.
And so, lets practice that.
Using sysenter
system calls with your ain assembly
Keeping with our legacy system call case from before, we'll call exit
with an exit condition of 42
.
The get out
syscall is number 1
. According to the interface described above, we merely need to movement the syscall number into the eax
register and the outset statement (the go out status) into ebx
.
(This instance can be simplified, but I thought information technology would be interesting to make information technology a flake more than wordy than necessary so that anyone who hasn't seen GCC inline assembly before tin can use this equally an example or reference.)
Next, compile, execute, and bank check the get out status:
Success! We called the exit
system phone call using the legacy sysenter method without raising a software interrupt.
Kernel-side: sysenter
entry point
And so now that we've seen how to trigger a arrangement call from a userland program with sysenter
via __kernel_vsyscall
, let's see how the kernel uses the organisation call number to execute the system call lawmaking.
Recall from the previous section that the kernel registered a syscall handler function called ia32_sysenter_target
.
This function is implemented in assembly in arch/x86/ia32/ia32entry.Southward
. Let's take a expect at where the value in the eax register is used to execute the system telephone call:
This is identical code as we saw in the legacy organisation telephone call way: a table named ia32_sys_call_table
which is indexed into with the system call number.
After all the needed bookkeeping is done both the legacy system call model and the sysenter
system telephone call model use the same mechanism and system call tabular array for dispatching system calls.
Refer to the int $0x80
entry betoken sectionto learn where the ia32_sys_call_table
is divers and how information technology is constructed.
And this is how you enter the kernel via a sysenter
system phone call.
Returning from a sysenter
system call with sysexit
The kernel tin can use the sysexit
education to resume execution back to the user programme.
Using this instruction is not as straight forwards as using iret
. The caller is expected to put the address to return to into the rdx
register, and to put the arrow to the programme stack to utilise in the rcx
register.
This means that your software must compute the address where execution should be resumed, preserve that value, and restore it prior to calling sysexit
.
We can find the code which does this in: arch/x86/ia32/ia32entry.South
:
ENABLE_INTERRUPTS_SYSEXIT32
is a macro which is defined in arch/x86/include/asm/irqflags.h
which contains the sysexit
pedagogy.
And at present you know how 32-bit fast organisation calls work.
64-bit fast system calls
Adjacent upward on our journey are 64-scrap fast organization calls. These system calls use the instructions syscall
and sysret
to enter and render from a system call, respectively.
syscall
/sysret
The documentation in the Intel Educational activity Set Reference (very largePDF) explains how the syscall
education works:
SYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR (later on saving the address of the instruction post-obit SYSCALL into RCX).
In other words: for the kernel to receive incoming system calls, it must annals the address of the code that will execute when a system call occurs by writing its address to the IA32_LSTAR
MSR.
Nosotros can find that code in the kernel in arch/x86/kernel/cpu/common.c
:
Where MSR_LSTAR
is divers as 0xc0000082
in arch/x86/include/uapi/asm/msr-index.h
.
Much like the legacy software interrupt syscalls, there is a defined convention for making system calls with syscall
.
The userland program is expected to put the system call number to be in the rax
register. The arguments to the syscall are expected to be placed in a subset of the full general purpose registers.
This is documented in thex86-64 ABIin section A.2.i:
- User-level applications utilize as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
- A organization-telephone call is done via the syscall teaching. The kernel destroys registers %rcx and %r11.
- The number of the syscall has to be passed in register %rax.
- Organization-calls are limited to six arguments,no statement is passed direct on the stack.
- Returning from the syscall, register %rax contains the outcome of the system-telephone call. A value in the range between -4095 and -1 indicates an error, it is -errno.
- Just values of class INTEGER or class Memory are passed to the kernel.
This is besides documented in a comment in arch/x86/kernel/entry_64.S
.
Now that we know how to brand a system call and where the arguments should live, let's try to make one past writing some inline associates.
Using syscall
system calls with your ain assembly
Building on the previous example, let'southward build a modest C program with inline assembly which executes the get out system call passing the exit condition of 42.
Start, we need to find the organization call number for exit
. In this case we need to read the table institute in arch/x86/syscalls/syscall_64.tbl
:
The exit
syscall is number 60
. According to the interface described to a higher place, nosotros simply need to move sixty
into the rax
register and the first argument (the exit condition) into rdi
.
Here'southward a piece of C code with some inline associates that does this. Similar the previous instance, this example is more wordy than necessary in the interest of clarity:
Side by side, compile, execute, and check the exit status:
Success! We called the go out
system call using the syscall
system call method. We avoided raising a software interrupt and (if nosotros were timing a micro-benchmark) information technology executes much faster.
Kernel-side: syscall entry point
Now we've seen how to trigger a organisation phone call from a userland program, let'southward see how the kernel uses the arrangement call number to execute the system call code.
Recall from the previous department we saw the accost of a function named system_call
go written to the LSTAR
MSR.
Permit'south take a look at the code for this function and see how information technology uses rax
to actually hand off execution to the organisation phone call, from arch/x86/kernel/entry_64.Due south
:
Much like the legacy system call method, sys_call_table
is a tabular array defined in a C file that uses #include
to pull in C code generated by a script.
From arch/x86/kernel/syscall_64.c
, note the #include
at the bottom:
Earlier we saw the syscall table divers in arch/x86/syscalls/syscall_64.tbl
. Exactly like the legacy interrupt mode, a script runs at kernel compile time and generates the syscalls_64.h
file from the tabular array in syscall_64.tbl
.
The code to a higher place simply includes the generated C lawmaking producing an array of function pointers indexed by system call number.
And this is how you enter the kernel via a syscall
arrangement call.
Returning from a syscall
organisation call with sysret
The kernel can use the sysret
instruction to resume execution dorsum to where execution left off when the user program used syscall
.
sysret
is simpler than sysexit
because the address to where execution should be resume is copied into the rcx
annals when syscall
is used.
As long as you preserve that value somewhere and restore it to rcx
earlier calling sysret
, execution will resume where it left off before the telephone call to syscall
.
This is user-friendly because sysenter
requires that you compute this accost yourself in addition to clobbering an boosted register.
We can find the code which does this in curvation/x86/kernel/entry_64.S
:
USERGS_SYSRET64
is a macro which is divers in arch/x86/include/asm/irqflags.h
which contains the sysret
educational activity.
And now yous know how 64-fleck fast system calls work.
Calling a syscall semi-manually with syscall(2)
Great, nosotros've seen how to call organization calls manually by crafting associates for a few dissimilar arrangement phone call methods.
Usually, y'all don't need to write your own assembly. Wrapper functions are provided past glibc that handle all of the assembly code for y'all.
There are some organisation calls, however, for which no glibc wrapper exists. I example of a system call like this is futex
, the fast userspace locking system call.
Just, wait, why doesno system call wrapper exist for futex
?
futex
is intended only to be called by libraries, not application code, and thus in order to telephone call futex
you must do information technology past:
- Generating assembly stubs for every platform you lot want to support
- Using the
syscall
wrapper provided by glibc
If you detect yourself in the situation of needing to phone call a organization call for which no wrapper exists, you should definitely choose choice 2: utilize the function syscall
from glibc.
Allow'southward utilise syscall
from glibc to phone call exit
with go out status of 42
:
Side by side, compile, execute, and bank check the exit status:
Success! We chosen the exit
organisation telephone call using the syscall
wrapper from glibc.
glibc syscall
wrapper internals
Allow's have a look at the syscall
wrapper function we used in the previous instance to see how it works in glibc.
From sysdeps/unix/sysv/linux/x86_64/syscall.Southward
:
Before we showed an excerpt from the x86_64 ABI document that describes both userland and kernel calling conventions.
This assembly stub is cool because it shows both calling conventions. The arguments passed into this role follow the userland calling convention, but are then moved to a different gear up of registers to obey the kernel calling convention prior to inbound the kernel with syscall
.
This is how the glibc syscall wrapper works when you utilise it to call system calls that do non come up with a wrapper by default.
Virtual system calls
Nosotros've now covered all the methods of making a system telephone call by entering the kernel and shown how you can make those calls manually (or semi-manually) to transition the organisation from userland to the kernel.
What if programs could call certain organisation calls without entering the kernel at all?
That'southward precisely why the Linux virtual Dynamic Shared Object (vDSO) exists. The Linux vDSO is a set of code that is function of the kernel, simply is mapped into the address space of a user plan to be run in userland.
The thought is that some organisation calls can be used without entering the kernel. One such call is: gettimeofday
.
Programs calling the gettimeofday
organization call do not really enter the kernel. They instead make a simple part call to a piece of code that was provided past the kernel, only is run in userland.
No software interrupt is raised, no complicated sysenter
or syscall
bookkeeping is required. gettimeofday
is simply a normal office call.
You can meet the vDSO listed as the first entry when you lot use ldd
:
Let's see how the vDSO is setup in the kernel.
vDSO in the kernel
You tin discover the vDSO source in arch/x86/vdso/
. In that location are a few associates and C source files along with a linker script.
Thelinker scriptis a absurd matter to take a look at.
From arch/x86/vdso/vdso.lds.S
:
Linker scripts are pretty useful, but non specially very well known. This linker script arranges the symbols that are going to be exported in the vDSO.
Nosotros can encounter that vDSO exports 4 unlike functions, each with two names. You can discover the source for these functions in the C files in this directory.
For case, the source for gettimeofday
found in arch/x86/vdso/vclock_gettime.c
:
This is defining gettimeofday
to be aweak aliasfor __vdso_gettimeofday
.
The __vdso_gettimeofday
functionin the aforementioned filecontains the actual source which will be executed in user land when a user programme calls the gettimeofday
system call.
Locating the vDSO in memory
Due toaddress space layout randomizationthe vDSO volition be loaded at a random accost when a program is started.
How can user programs discover the vDSO if its loaded at a random address?
If you recall before when examining the sysenter
arrangement call method we saw that user programs should call __kernel_vsyscall
instead of writing their own sysenter
assembly lawmaking themselves.
This function is part of the vDSO, too.
The sample code provided located __kernel_vsyscall
by searching theELF auxilliary headersto find a header with type AT_SYSINFO
which contained the address of __kernel_vsyscall
.
Similarly, to locate the vDSO, a user program tin search for an ELF auxilliary header of type AT_SYSINFO_EHDR
. It will incorporate the address of the start of the ELF header for the vDSO that was generated by a linker script.
In both cases, the kernel writes the accost in to the ELF header when the program is loaded. That's how the correct addresses e'er end upwardly in AT_SYSINFO_EHDR
and AT_SYSINFO
.
Once that header is located, user programs can parse the ELF object (peradventure usinglibelf) and call the functions in the ELF object as needed.
This is nice because this means that the vDSO can take reward of some useful ELF features likesymbol versioning.
An instance of parsing and calling functions in the vDSO is provided in the kernel documentation in Documentation/vDSO/
.
vDSO in glibc
Most of the time, people admission the vDSO without knowing it because glibc
abstracts this away from them past using the interface described in the previous section.
When a program is loaded, thedynamic linker and loaderloads the DSOs that the plan depends on, including the vDSO.
glibc
stores some data virtually the location of the vDSO when it parses the ELF headers of the program that is being loaded. It as well includes short stub functions that will search the vDSO for a symbol proper name prior to making an actual system call.
For example, the gettimeofday
function in glibc
, from sysdeps/unix/sysv/linux/x86_64/gettimeofday.c
:
This lawmaking in glibc
searches the vDSO for the gettimeofday
function and returns the address. This is wrapped up nicely with anindirect function.
That's how programs calling gettimeofday
pass through glibc
and hitting the vDSO all without switching into kernel manner, incurring a privilege level change, or raising a software interrupt.
And, that concludes the showcase of every single arrangement call method available on Linux for 32-chip and 64-chip Intel and AMD CPUs.
glibc
arrangement telephone call wrappers
While we're talking about organization calls ;) it makes sense to briefly mention how glibc
deals with system calls.
For many arrangement calls, glibc
but needs a wrapper function where it moves arguments into the proper registers and and so executes the syscall
or int $0x80
instructions, or calls __kernel_vsyscall
.
It does this by using a series of tables defined in text files that are processed with scripts and output C code.
For example, the sysdeps/unix/syscalls.list
file describes some common system calls:
To learn more about each cavalcade, check the comments in the script which processes this file: sysdeps/unix/make-syscalls.sh
.
More complex organization calls, similar exit
which invokes handlers accept actual implementations in C or assembly lawmaking and will not be plant in a templated text file like this.
Future blog posts volition explore the implementation in glibc
and the linux kernel for interesting system calls.
It would be unfortunate not to have this opportunity to mention two fabled bugs related to system calls in Linux.
So, let's take a wait!
CVE-2010-3301
This security exploitallows local users to gain root access.
The cause is a small bug in the associates lawmaking which allows user programs to make legacy system calls on x86-64 systems.
The exploit code is pretty clever: it generates a region of retentiveness with mmap
at a item address and uses an integer overflow to cause this code:
(Think this lawmaking from the legacy interrupts section to a higher place?)
to hand execution off to an arbitrary accost which runs every bit kernel code and can escalate the running process to root.
Android sysenter
ABI breakage
Call up the part almost not hardcoding the sysenter
ABI in your application code?
Unfortunately, the android-x86 folks made this mistake. The kernel ABI changed and suddenly android-x86 stopped working.
The kernel folks concluded upwardly restoring the old sysenter
ABI to avoid breaking the Android devices in the wild with dried hardcoded sysenter
sequences.
Hither's the gear upthat was added to the Linux kernel. Yous can find a link to the offending commit in the android source in the commit bulletin.
Remember: never write your own sysenter
assembly lawmaking. If you have to implement it directly for some reason, use a slice of code similar the instance above and go through __kernel_vsyscall
at the very to the lowest degree.
Conclusion
The system telephone call infrastructure in the Linux kernel is incredibly complex. There are many different methods for making system calls each with their own advantages and disadvantages.
Calling organization calls past crafting your own assembly is mostly a bad idea as the ABI may break underneath y'all. Your kernel and libc implementation will (probably) choose the fastest method for making system calls on your system.
If you can't use the glibc
provided wrappers (or if ane doesn't exist), you should at the very least use the syscall
wrapper part, or endeavor to go through the vDSO provided __kernel_vsyscall
.
Stay tuned for future blog posts investigating private organisation calls and their implementations.
Source: https://blog.packagecloud.io/the-definitive-guide-to-linux-system-calls/
Posted by: johnsonrunt1953.blogspot.com
0 Response to "How To Store Address In A Register X86"
Post a Comment