Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds a stub for realloc #1035

Merged
merged 11 commits into from
Feb 26, 2020

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Jan 22, 2020

We add a stub for realloc - surprisingly many allocations are done by passing a null pointer to realloc, and an analysis that is based on a tracing of allocation/freeing memory won't notice such call.

We also dropped unresolved_call linking from the promiscuous mode as it was interfering with our stack handling routines (it was preventing them from being triggered).

Also, we add a new behavior of the free call processing in memcheck-malloc: now, if the pointer being freed in unknown, i.e. there wasn't any memory allocation for it in the current primus machine, then we allocate one-byte length region and then release it. The justification is simple: by doing so, we are able to find more bugs like use-after-free or double-free, e.g. in some library, where any path we take won't necessarily contain the chain of calls like malloc->free->memory-access/free because such path exists only in the user code but not in the library itself.
ivg: however so far this feature produced only false positives, probably due to some unwanted aliasing between the pointer and pointee. Will keep in master for some time and see if it really a good idea.

We add a stub for realloc - surprisingly many allocations are done by
passing a null pointer to realloc, and an analysis that is based on a
tracing of allocation/freeing memory won't notice such call.

Also, this PR fixes a minor bug in the method name in: there is no
signal enter-jmp, so we can't depend on it.

Also, we add a new behavior of the `free` call processing in
memcheck-malloc:  now, if the pointer being freed in unknown, i.e. there
wasn't any memory allocation for it in the current primus machine, then
we allocate one-byte length region and then release it. The
justification is simple: by doing so, we are able to find more bugs like
use-after-free or double-free, e.g. in some library, where any path we
take won't necessarily contain the chain of calls like
`malloc->free->memory-access/free` because such path exists only in the
user code but not in the library itself.
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Program semantics shall be defined by the state of the program and program itself. No magic dictionaries, please.

plugins/primus_lisp/lisp/simple-memory-allocator.lisp Outdated Show resolved Hide resolved
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the inline comments, please add documentation in the simple-memory-allocator.lisp of the chunk format.


(defun word-size () (/ (word-width) 8))

(defun encode-memory-length(ptr len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lisp style issue, please add a space between the function name and the list of arguments.

(memory-allocate ptr len *malloc-initial-value*)
(memory-allocate ptr len)))

(defun word-size () (/ (word-width) 8))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this function to the public interface will introduce confusion, between word-size and word-width so, please either remove it or mangle correspondingly.

(defun decode-memory-length(ptr)
(read-word ptr_t ptr))

(defun decode-memory-length'(ptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that we need these two functions (again in the public interface).

  1. mangle the function name, e.g., malloc/get-chunk-length
  2. collapse them into one function.


(defun malloc (n)
"allocates a memory region of size N"
(declare (external "malloc"))
(if (= n 0) *malloc-zero-sentinel*
(if (malloc-will-reach-limit n) 0
(let ((n (+ n (* 2 *malloc-guard-edges*)))
(let ((width (word-size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) we always use the word width for the size in bits. And in general, unless specified otherwise, we prefer bits to bytes. In that case, you maximize the astonishment when you first define a function word-size that translates word width into size in bytes and the bind it to a variable that is named width. This code will break on the next update...

b) please choose variable names that bear some semantics, instead of using generic width, array, etc. In this particular case, I suggest header-size, e.g.,

(let ((header-size (word-width / 8)) ...)


(defun malloc (n)
"allocates a memory region of size N"
(declare (external "malloc"))
(if (= n 0) *malloc-zero-sentinel*
(if (malloc-will-reach-limit n) 0
(let ((n (+ n (* 2 *malloc-guard-edges*)))
(let ((width (word-size))
(full (+ n (* 2 *malloc-guard-edges*) width))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename full to chunk-size

(set brk (+ brk full))
(malloc/fill-edges ptr full)
(set ptr (+ ptr *malloc-guard-edges*))
(encode-memory-length ptr n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename encode/decode to get/put or load/store or read/write. You choice.

(encode-memory-length ptr n)
(+ ptr width))))))

(defun realloc (ptr new)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, rewrite this function to conform with POSIX. In particular,

  1. when ptr is NULL it shall behave as malloc
  2. when len is NULL it shall behave as free
  3. when len is unchanged nothing should be done
  4. when len is smaller the pointer shall not change, but see a not below.
  5. only when len is greater than the old-len we may return a new pointer (and we don't have other choices with our current allocator).
  6. if a new chunk is allocated by realloc the old chunk, pointed by ptr shall be deallocated.
    The following code (untested) implements this semantics. Note, how (or (and ..) (and ..) ...) is used in place where you will be using pattern matching in OCaml. This is much easier to read than a bunch of nested if/then/elses.
(defun realloc (ptr len)
  (or (and (not ptr) (malloc ptr len))
      (and (not len) (free ptr))
      (let ((old-len (malloc/get-chunk-size ptr)))
        (if (<= old-len len)
            (prog (malloc/put-chunk-size ptr len) ptr)
          (free ptr)
          (let (new-ptr (malloc ptr len))
            (if new-ptr (memcpy new-ptr ptr old-len)
              new-ptr))))))

Note, if realloc is used to reduce the size of a chunk (e.g., for shrinking a dynamic vector), we shall return the same pointer, but mark the region ptr+len,ptr+old-len as invalid. Therefore, we need to update the region-analysis (where malloc is hooked into it, and track calls to realloc which shrink the regions and try to mark them as freed). If it would be too hard, we can ignore it right now and think about it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This realloc is also buggy, by the way, the trick (or (and <cond> <body>) ...) doesn't work here, since <body> in our case could also evaluate to false so the control flow will fall down, i.e., both malloc and free could return false, in fact free always returns false, so we will fall after free to realloc. It won't break a lot, i.e., we will now hit the (<= old-len len) guard, and just write 0 to the length field. But still it is more a bug than a feature.

@@ -3,6 +3,9 @@
(defmethod call (name ptr)
(when (and ptr (= name 'free)
(not (= ptr *malloc-zero-sentinel*)))
(when (and ptr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr is already non-NULL here, see the guard above. Also for when (not cond) there is unless until cond. Ooops, it should be, of course, unless, not until and not even unit (as it is documented) :) Could you please add unless in addition to (the now deprecated) until to the init.lisp? And, please, update the docs correspondingly, and mark until as deprecated.

So the code will be now much more readable:

(defmethod call (name ptr)
  (when (and ptr (= name 'free)
             (not (= ptr *malloc-zero-sentinel*)))
    (unless (memcheck-is-allocated 'malloc ptr)
      (memcheck-acquire 'malloc ptr 1))
    (memcheck-release 'malloc ptr)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was really confused yesterday :) unless is not until. unless is a dual for when, where until is dual while. So here are the definitions that we should have,

(defmacro when (cnd body)
  "(when CND BODY) if CND evaluates to true, then BODY is evaluated and
   the value of the last expression in BODY becomes the value of the
   whole expression. Otherwise, if CND evaluates to false, nil is returned."
  (if cnd (prog body) ()))

(defmacro unless (cnd body)
  "(unless CND BODY) if CND evaluates to false, then BODY is evaluated and
   the value of the last expression in BODY becomes the value of the
   whole expression. Otherwise, if CND evaluates to true, nil is returned."
  (if (not cnd) () body))

(defmacro until (c b)
  "(until COND BODY) if COND evaluates to true, then the whole expression
   evaluates to nil and BODY is not evaluated. Otherwise, if COND evaluates
   to false, then BODY is evaluated until COND evaluates to true and the value
   of the last evaluation of BODY becomes the value of the whole expression."
  (while (not c) b))

@@ -24,6 +24,12 @@
(require incident)

;; Public Interface

(defun memcheck-is-allocated (heap ptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name it memcheck-is/was-allocated or even better, memcheck-is-tracked to prevent future confusion.

@gitoleg gitoleg requested a review from ivg January 31, 2020 21:33
(+ ptr header-size))))))


(defun realloc/update-chunk-size (ptr len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename it to realloc/shrink-chunk

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I want #1047 to be merged before this PR, so that the new lisp codes could be type-checked. There is also quite a few intersections, so it will become clear after #1047 is merged.

@gitoleg gitoleg force-pushed the few-primus-lisp-updates branch 2 times, most recently from 0ab0b9b to c7a82d5 Compare February 12, 2020 18:28
@gitoleg gitoleg force-pushed the few-primus-lisp-updates branch from c7a82d5 to d8080fe Compare February 12, 2020 18:30
@ivg ivg changed the title few primus-lisp updates adds a stub for realloc Feb 26, 2020
@ivg ivg merged commit cd024cf into BinaryAnalysisPlatform:master Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants