<div dir="ltr"><div><font color="#741b47">1 > Have you tried that today? I doubt you need any kernel changes at all to get this information directly from the kernel to userspace. </font><font color="#c27ba0"> </font><br></div><div>I also feel the same. Because I have to written these information to the file, as well as read from the file in the same kernel function, i.e, handle_ept_voilation(). I want the file which stores text (e.g., CSV), not bytes so that I can also open it using Open Office etc. </div><div>Thanks. I will try ftrace, and debugfs. I am not sure ftrace, but may be debugfs may help some what.</div><div><br></div><div><font color="#741b47">2 > For starters, you can get those tools to give you things like stack tracebacks so you know who is asking for a page, and who is *releasing* a page, and so on</font> </div><div>At the start my goal is to generating log of physical addresses involved in page faults. Further, I will extend my program to store other informations to the file, like as you said, which process is requesting/releasing the page, and which instruction address refered to which memory reference which was not present in the memory, how many times an address was involved in a page fault etc.)</div><div><br></div><div><font color="#741b47">3 > So what "some type of analysis" are you trying to do? What question(s) are you trying to answer?</font></div><div> Uptill now I want to perform simple analysis mentioned in the above question 2. Morevoer, this analysis will provide details about the instruction address that is responsible for page fault along with memory reference that is no present, the appliction that generated this page fault, and for a single address how many times a page fault occured etc. </div><div>By unique and non-unique, I meant the list of addresses in the log without duplication. For example, we have log of addresses [1,2,2,3,3,4,3,3,4,4,4,1, 4]. In this list unique addresses are 1,2,3,4, and frquency of each address is 2,2, 4,5 respectively. At this stage I want to keep things very simple by ignoring the details like size of the RAM, size of kernel, size of loaded modules etc. Briefly I can say I want to generate the log for guest physical addresses involved in page fault, the corresponding instruction address, coressponsing logical address, along with the corresponding application. <br></div><div><br></div><div>At the first stage, I am trying to develope an application that provides some basic functionalities (i.e., instruction instrumentation) of <a href="https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/index.html#EXAMPLES">Pin Tool</a> for just guest physical address by tracing instruction addresses, memory referencees, and save it to the file. And the file can not only be accessible from within the kernel, but also can be opened using any word processing application ,e.g., csv or .txt file.<br></div><div>Thank you very much for the help.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 25 Sep 2019 at 03:55, Valdis Klētnieks <<a href="mailto:valdis.kletnieks@vt.edu">valdis.kletnieks@vt.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, 24 Sep 2019 20:26:36 +0900, Sahibzada Irfanullah said:<br>
<br>
> After having a reasonable amount of log data,<br>
<br>
If you're trying to figure out how the kernel memory manager is working, you're<br>
probably better off using 'perf' or one of the other tracing tools already in<br>
the kernel to track the kernel memory manager. For starters, you can get those<br>
tools to give you things like stack tracebacks so you know who is asking for a<br>
page, and who is *releasing* a page, and so on.<br>
<br>
Of course, which of these tools to use depends on what data you need to answer<br>
the question - but simply knowing what physical address was involved in a page<br>
fault is almost certainly not going to be sufficient.<br>
<br>
> I want to perform some type of analsys at run time, e.g., no. of unique<br>
> addresses, total no. of addresses, frequency of occurences of each addresses<br>
> etc.<br>
<br>
So what "some type of analysis" are you trying to do? What question(s)<br>
are you trying to answer? <br>
<br>
The number of unique physical addresses in your system is dictated by how much<br>
RAM you have installed. Similarly for total number of addresses, although I'm<br>
not sure why you list both - that would mean that there is some number of<br>
non-unique addresses. What would that even mean?<br>
<br>
The number of pages actually available for paging and caching depends on other<br>
things as well - the architecture of the system, how much RAM (if any) is<br>
reserved for use by your video card, the size of the kernel, the size of loaded<br>
modules, space taken up by kmalloc allocations, page tables, whether any<br>
processes have called mlock() on a large chunk of space, whether the pages are<br>
locked by the kernel because there's I/O going on, and then there's things like<br>
mmap(), and so on.<br>
<br>
The kernel provides /proc/meminfo and /proc/slabinfo - you're going to want<br>
to understand all that stuff before you can make sense of anything.<br>
<br>
Simply looking at the frequency of occurrences of each address is probably not<br>
going to tell you much of anything, because you need to know things like<br>
the total working and resident set sizes for the process and other context.<br>
<br>
For example - you do the analysis, and find that there are 8 gigabytes of pages<br>
that are constantly being re-used. But that doesn't tell you if there are two<br>
processes that are thrashing against each other because each is doing heavy<br>
repeated referencing of 6 gigabytes of data, or if one process is wildly referencing<br>
many pages because some programmer has a multi-dimensional array and is<br>
walking across the array with the indices in the wrong order<br>
<br>
i_max = 4095; j_max = 4095;<br>
for (i = 0, i < i_max; i++) for j = 0, j < j_max; j++) {sum += foo[i][j]}<br>
<br>
If somebdy is doing foo[j][i] instead, things can get ugly. And if you're<br>
mixing with Fortran code, where the semantics of array references is reverse<br>
and you *want* to use 'foo[j][i]' for efficient memory access, it's a bullet loaded<br>
in the chamber and waiting for somebody to pull the trigger.<br>
<br>
Not that I've ever seen *that* particular error happen with a programmer<br>
processing 2 terabytes of arrays on a machine that only had 1.5 terabytes of<br>
RAM. But I did tease the person involved about it, because they *really*<br>
should have known better. :)<br>
<br>
So again: What question(s) are you trying to get answers to?<br>
<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="font-size:12.8px">Regards,</span><div style="font-size:12.8px"><br><span style="font-size:12.8px;color:rgb(80,0,80)"><b>Mr. Irfanullah</b></span></div><div style="font-size:12.8px"><div style="font-size:12.8px"><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div>