![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
OK, so. We want to allocate a large block of memory that is contiguous as physical memory. That means allocating physical memory in the kernel (as with kmalloc), and then later providing it to userspace software. Presumably then mapping it into virtual memory for use in userspace with mmap from physical memory in dev/mem, although we may be doing something different for reasons which aren't relevant here.
We happen to have a kernel driver already for other experiments with our specific hardware, so we have somewhere convenient to put this kernel code as needed.
This is running on a hardware board dedicated to a single task, so we have a few advantages. We would prefer to allocate a large chunk on start-up, and will have complete control over which programs we expect to use it, we don't need to dynamically manage unknown different drivers trying to get this memory, and we never intend to free it, and the board will only be used for this so we don't need to make sure other programs run ok. And there's no restriction on addresses, DMA and other relevant peripherals can access the entire memory map, so unlike x86 we don't need to specifically reserve *low* memory.
There are several different related approaches, and I went through a few rabbit holes figuring out what worked.
Option 1: __memblock_alloc_base()
From research and helpful friends, I found some relevant instructions online. One was from "Linux Device Drivers, 3rd edition", the section entitled "Obtaining Large Buffers", about using alloc_bootmem_low to grab kernel pages during boot. I'm not sure, but I think, this was correct, but the kernel started using memblock instead of bootmem as a start-up allocator?
From the code in the contiguous memory allocator (search the kernel source for "cma"), I learned that possibly I should be using memblock functions as well. I didn't understand the different options, but I used the same one as in the contiguous memory allocator code, __memblock_alloc_base and it seemed to work. I tried large powers of 2 and could allocate half of physical memory in one go. I haven't fully tested this, but it seemed to work.
There are several related functions, and I don't know for sure what is correct, except that what the cma code did worked.
This code is currently in a kernel driver init function. The driver must be compiled statically into the kernel, you can't load it as a module later. You could put the code in architecture specific boot-up code instead.
Option 2: cma=
fanf found a link to some kernel patches which tried to make a systematic way of doing this, based on some early inconsistently-maintained patch, which later turned into code which was taken up by the kernel. Google for "contiguous memory allocator". There's an article about it from the time and some comments on the kernel commit.
It's a driver which can be configured to grab a large swath of contiguous memory at startup, and then hand that out to any other driver which needs it.
You specify the memory with "cma=64MB" or whatever size on the kernel command line. (Or possibly in the .config file via "make menuconf"?) You need to do this because it allocates on start-up, and it doesn't know if it should have this or not.
It then returns this memory to normal calls to "alloc_dma_coherent" which is designed to allocate memory which is physically contiguous, but doesn't normally allocate such big blocks. I hadn't tested this approach because I didn't need any specific part of memory so I'd been looking at kmalloc not "alloc_dma_coherent", but a colleague working on a related problem said it worked on their kernel.
It may also do clever things involving exposing the memory to normal allocating, but paging whatever else is there out to disk to free it up when needed, I'm not sure (?)
I was looking at the source code for this and borrowed the technique to allocate memory just for our driver. We may either go with that (since we don't need any further dynamic allocation, one chunk of memory is fine), or revert to using the cma later since it's already in the kernel.
I went down a blind alley because it looked like it wasn't enabled on my architecture. But I think that was because I screwed up "make menuconfig" not specifying the architecture, and actually it is. Look for instructions on cross-compiling it if you don't already have that incorporated in your build process.
Option 3: CONFIG_FORCE_MAX_ZONEORDER
This kernel parameter in .config apparently increases the amount of memory you can allocate with kmalloc (or dma_alloc_coherent?). We haven't explored this further because the other option seemed to work, and I had some difficulties with building .config, so I don't know quite how it works.
I found the name hard to remember at first. For the record, it means, ensure the largest size of zone which can be allocated is at least this order of magnitude (as a power of two). I believe it is actually 1 higher than the largest allowed value, double check the documentation if you're not sure.
Further options
There are several further approaches that are not really appropriate here, but may be useful under related circumstances.
* On many architectures, dma does scatter-gather specifically to read or write from non-contiguous memory so you shouldn't need this in the first place.
* Ensure the hardware can write to several non-contiguous addresses.
* Allocate the several blocks of the largest size kmalloc can allocate, and check that they do in fact turn out to be contiguous since kernel boot-up probably hasn't fragmented the majority of memory.
* Ditto, but just allocate one or several large blocks of virtual memory with malloc, and check that most of it turns out to be allocated from contiguous physical memory because that's what was available. This is a weird approach, but if you have to do it in userspace entirely, it's the only option you could take.
We happen to have a kernel driver already for other experiments with our specific hardware, so we have somewhere convenient to put this kernel code as needed.
This is running on a hardware board dedicated to a single task, so we have a few advantages. We would prefer to allocate a large chunk on start-up, and will have complete control over which programs we expect to use it, we don't need to dynamically manage unknown different drivers trying to get this memory, and we never intend to free it, and the board will only be used for this so we don't need to make sure other programs run ok. And there's no restriction on addresses, DMA and other relevant peripherals can access the entire memory map, so unlike x86 we don't need to specifically reserve *low* memory.
There are several different related approaches, and I went through a few rabbit holes figuring out what worked.
Option 1: __memblock_alloc_base()
From research and helpful friends, I found some relevant instructions online. One was from "Linux Device Drivers, 3rd edition", the section entitled "Obtaining Large Buffers", about using alloc_bootmem_low to grab kernel pages during boot. I'm not sure, but I think, this was correct, but the kernel started using memblock instead of bootmem as a start-up allocator?
From the code in the contiguous memory allocator (search the kernel source for "cma"), I learned that possibly I should be using memblock functions as well. I didn't understand the different options, but I used the same one as in the contiguous memory allocator code, __memblock_alloc_base and it seemed to work. I tried large powers of 2 and could allocate half of physical memory in one go. I haven't fully tested this, but it seemed to work.
There are several related functions, and I don't know for sure what is correct, except that what the cma code did worked.
This code is currently in a kernel driver init function. The driver must be compiled statically into the kernel, you can't load it as a module later. You could put the code in architecture specific boot-up code instead.
Option 2: cma=
fanf found a link to some kernel patches which tried to make a systematic way of doing this, based on some early inconsistently-maintained patch, which later turned into code which was taken up by the kernel. Google for "contiguous memory allocator". There's an article about it from the time and some comments on the kernel commit.
It's a driver which can be configured to grab a large swath of contiguous memory at startup, and then hand that out to any other driver which needs it.
You specify the memory with "cma=64MB" or whatever size on the kernel command line. (Or possibly in the .config file via "make menuconf"?) You need to do this because it allocates on start-up, and it doesn't know if it should have this or not.
It then returns this memory to normal calls to "alloc_dma_coherent" which is designed to allocate memory which is physically contiguous, but doesn't normally allocate such big blocks. I hadn't tested this approach because I didn't need any specific part of memory so I'd been looking at kmalloc not "alloc_dma_coherent", but a colleague working on a related problem said it worked on their kernel.
It may also do clever things involving exposing the memory to normal allocating, but paging whatever else is there out to disk to free it up when needed, I'm not sure (?)
I was looking at the source code for this and borrowed the technique to allocate memory just for our driver. We may either go with that (since we don't need any further dynamic allocation, one chunk of memory is fine), or revert to using the cma later since it's already in the kernel.
I went down a blind alley because it looked like it wasn't enabled on my architecture. But I think that was because I screwed up "make menuconfig" not specifying the architecture, and actually it is. Look for instructions on cross-compiling it if you don't already have that incorporated in your build process.
Option 3: CONFIG_FORCE_MAX_ZONEORDER
This kernel parameter in .config apparently increases the amount of memory you can allocate with kmalloc (or dma_alloc_coherent?). We haven't explored this further because the other option seemed to work, and I had some difficulties with building .config, so I don't know quite how it works.
I found the name hard to remember at first. For the record, it means, ensure the largest size of zone which can be allocated is at least this order of magnitude (as a power of two). I believe it is actually 1 higher than the largest allowed value, double check the documentation if you're not sure.
Further options
There are several further approaches that are not really appropriate here, but may be useful under related circumstances.
* On many architectures, dma does scatter-gather specifically to read or write from non-contiguous memory so you shouldn't need this in the first place.
* Ensure the hardware can write to several non-contiguous addresses.
* Allocate the several blocks of the largest size kmalloc can allocate, and check that they do in fact turn out to be contiguous since kernel boot-up probably hasn't fragmented the majority of memory.
* Ditto, but just allocate one or several large blocks of virtual memory with malloc, and check that most of it turns out to be allocated from contiguous physical memory because that's what was available. This is a weird approach, but if you have to do it in userspace entirely, it's the only option you could take.