Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

Sahibzada Irfanullah irfan.gomalian at gmail.com
Wed Sep 25 03:00:08 EDT 2019


I am sorry if I am bothering you.
I have read this article
<http://amsekharkernel.blogspot.com/2012/01/what-are-ways-of-communication-bw-user.html>,
I have to ask that will Netlink socket work for my task; Storing the
contents of "gpa" variable (which is present in the
handle_ept_violation()functino in vmx.c ) into a file.
Thank you.

On Wed, 25 Sep 2019 at 11:44, Sahibzada Irfanullah <irfan.gomalian at gmail.com>
wrote:

> 1 > Have you tried that today?  I doubt you need any kernel changes at all
> to get this information directly from the kernel to userspace.
> I also feel the same. Because I have to written these information to the
> file, as well as read from the file in the same kernel function, i.e,
> handle_ept_voilation(). I want the file which stores text (e.g., CSV), not
> bytes so that I can also open it using Open Office etc.
> Thanks. I will try ftrace, and debugfs. I am not sure ftrace, but may be
> debugfs may help some what.
>
> 2 > For starters, you can get those tools to give you things like stack
> tracebacks so you know who is asking for a page, and who is *releasing* a
> page, and so on
> At the start my goal is to generating log of physical addresses involved
> in page faults. Further, I will extend my program to store other
> informations to the file, like as you said, which process is
> requesting/releasing the page, and which instruction address refered to
> which memory reference which was not present in the memory, how many times
> an address was involved in a page fault etc.)
>
> 3 > So what "some type of analysis" are you trying to do? What question(s)
> are you trying to answer?
>    Uptill now I want to perform simple analysis mentioned in the above
> question 2. Morevoer, this analysis will provide details about the
> instruction address that is responsible for page fault along with memory
> reference that is no present, the appliction that generated this page
> fault, and for a single address how many times a page fault occured etc.
> By unique and non-unique, I  meant the list of addresses in the log
> without duplication. For example, we have log of addresses
> [1,2,2,3,3,4,3,3,4,4,4,1, 4]. In this list unique addresses are 1,2,3,4,
> and frquency of each address is 2,2, 4,5 respectively.   At this stage I
> want to keep things very simple by ignoring the details like size of the
> RAM, size of kernel, size of loaded modules etc. Briefly I can say I want
> to generate the log for guest physical addresses involved in page fault,
> the corresponding instruction address, coressponsing logical address, along
> with the corresponding application.
>
> At the first stage, I am trying to develope an application that provides
> some basic functionalities (i.e., instruction instrumentation) of Pin Tool
> <https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/index.html#EXAMPLES> for
> just guest physical address by tracing instruction addresses, memory
> referencees, and save it to the file. And the file can not only be
> accessible from within the kernel, but also can be opened using any word
> processing application ,e.g., csv or .txt file.
> Thank you very much for the help.
>
>
> On Wed, 25 Sep 2019 at 03:55, Valdis Klētnieks <valdis.kletnieks at vt.edu>
> wrote:
>
>> On Tue, 24 Sep 2019 20:26:36 +0900, Sahibzada Irfanullah said:
>>
>> > After having a reasonable amount  of log data,
>>
>> If you're trying to figure out how the kernel memory manager is working,
>> you're
>> probably better off using 'perf'  or one of the other tracing tools
>> already in
>> the kernel to track the kernel memory manager. For starters, you can get
>> those
>> tools to give you things like stack tracebacks so you know who is asking
>> for a
>> page, and who is *releasing* a page, and so on.
>>
>> Of course, which of these tools to use depends on what data you need to
>> answer
>> the question - but simply knowing what physical address was involved in a
>> page
>> fault is almost certainly not going to be sufficient.
>>
>> > I want to perform some type of analsys at run time, e.g., no. of unique
>> > addresses, total no. of addresses, frequency of occurences of each
>> addresses
>> > etc.
>>
>> So what "some type of analysis" are you trying to do? What question(s)
>> are you trying to answer?
>>
>> The number of unique physical addresses in your system is dictated by how
>> much
>> RAM you have installed. Similarly for total number of addresses, although
>> I'm
>> not sure why you list both - that would mean that there is some number of
>> non-unique addresses.  What would that even mean?
>>
>> The number of pages actually available for paging and caching depends on
>> other
>> things as well - the architecture of the system, how much RAM (if any) is
>> reserved for use by your video card, the size of the kernel, the size of
>> loaded
>> modules, space taken up by kmalloc allocations, page tables, whether any
>> processes have called mlock() on a large chunk of space, whether the
>> pages are
>> locked by the kernel because there's I/O going on, and then there's
>> things like
>> mmap(), and so on.
>>
>> The kernel provides /proc/meminfo and /proc/slabinfo - you're going to
>> want
>> to understand all that stuff before you can make sense of anything.
>>
>> Simply looking at the frequency of occurrences of each address is
>> probably not
>> going to tell you much of anything, because you need to know things like
>> the total working and resident set sizes for the process and other
>> context.
>>
>> For example - you do the analysis, and find that there are 8 gigabytes of
>> pages
>> that are constantly being re-used.  But that doesn't tell you if there
>> are two
>> processes that are thrashing against each other because each is doing
>> heavy
>> repeated referencing of 6 gigabytes of data, or if one process is wildly
>> referencing
>> many pages because some programmer has a multi-dimensional array and is
>> walking across the array with the indices in the wrong order
>>
>> i_max = 4095; j_max = 4095;
>> for (i = 0, i < i_max; i++) for j = 0, j < j_max; j++) {sum += foo[i][j]}
>>
>> If somebdy is doing foo[j][i] instead, things can get ugly.  And if you're
>> mixing with Fortran code, where the semantics of array references is
>> reverse
>> and you *want* to use 'foo[j][i]' for efficient memory access, it's a
>> bullet loaded
>> in the chamber and waiting for somebody to pull the trigger.
>>
>> Not that I've ever seen *that* particular error happen with a programmer
>> processing 2 terabytes of arrays on a machine that only had 1.5 terabytes
>> of
>> RAM.  But I did tease the person involved about it, because they *really*
>> should have known better. :)
>>
>> So again:  What question(s) are you trying to get answers to?
>>
>>
>
> --
> Regards,
>
> *Mr. Irfanullah*
>
>

-- 
Regards,

*Mr. Irfanullah*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20190925/2131bf89/attachment-0001.html>


More information about the Kernelnewbies mailing list