Etienne Kneuss

Insane

I'm currently working on a scala compiler plugin that performs interprocedural, compositional effect and pointer analysis on arbitrary scala programs!

Feel free to check out Insane on github!

Phantm

In short, Phantm is a type checker for PHP applications, but it does also much more!

PHP

I sporadically contributes to the core of PHP for quite a while now, see my patch repository.

This stupid website

Mon, 2 April 2012

I completely rewrote the system that renders this website. For fun I implemented the following features that strive for simplicity:

  • File based, no database, tracked by git
  • Online interface to edit
  • Renders markdown content
  • post-update hook so that I can edit my website locally and publish on push

The system is also publicly available on github!

PHANTM Continued

Wed, 18 August 2010

Seven months ago, I mentioned PHANTM, a tool that statically analyzes PHP code in order to detect and report type mismatch. I've actively continued working on it since then, mainly as a research project for EPFL but also as a fun way to occupy my free time.

Why check types? This is PHP after all!

As you may have read recently on the internals mailing list, types are basically being accused of treason against the spirit of PHP. Erm. Some of the original minds behind PHP argued that strictly checking for types wouldn't have its place in the PHP-boat, or even that it would sink it.

PHANTM, statically checking your types since 2010

So, what's the point of checking types? Contrary to what people might want to believe, PHP is used to do more than manipulating strings out of files, databases, and forms. In a real application, it is in fact rare to see a case where an implicit type conversion is actually wanted (concatenation or string interpolation aside). Even though PHP will handle them gracefully in most cases, it is usually an indication of either bad input handling, or simply laziness. It is arguable that a code without implicit type conversions will be better understood, leaving less room for unexpected behaviors and/or bugs.

That put aside, how can PHANTM help you to write better code? PHANTM will statically check that you manipulate values correctly in your PHP code. Statically means that it will do so without actually running any of your code, but simply by looking at the source code. This has the advantage of checking all code paths without you having to write a test for each, but it comes with the cost that it will only be able to deal with an abstract representation of each program point: yes, you will have false positives, reports about perfectly valid and safe operations.

Features

Here is a non-exhaustive list of what PHANTM supports:

  • PHP 5.3 grammar
  • Most extensions, functions, and classes shipped with PHP, as well as selected PECL extensions
  • Flow-sensitive type analysis
  • Inter-procedural analysis through selective inlining
  • Pure statements check
  • Runtime instrumentation
  • phpDoc annotations usage/check
  • Includes resolution
  • ST/AST/CFG generation in dot format
  • Function call graph generation
  • External annotations
  • API generation
  • ...

Check it out

The official website of PHANTM is http://lara.epfl.ch/dokuwiki/phantm, where you can download releases, read documentation, and find more details on the features PHANTM provides. You can also find it in github at http://github.com/colder/phantm.

Interested in seeing some of its features in action? Check out the online demo here: http://project.colder.ch/demo/

Comments? Contributions? Problems? I'd be happy to hear about them!

Dataflow Type Analysis for PHP

Wed, 13 January 2010

I've spent some time lately working on a project involving data-flow analysis for PHP.

This analyzer will basically model your code as control flow graphs, in which it will assign types and let them flow through control structures. When reaching stability, it will check that the operations done of the values are sound type-wise. It will also do some structural checks.

You can find a presentation I gave recently about it: presentation12.01.10.pdf This project is in an early stage, a prototype, but still gives some results!

You will find the github project page here: http://github.com/colder/phantm

ps: I just missed the 1 year "no-news anniversary", damn!

There have been activity recently about the different ways to uniquely identify objects in PHP. In fact, a function have been sitting in SPL unnoticed for quite some time and while people came across it, some got frustrated. I'm of course talking about spl_object_hash(). To summarize it: in PHP, you basically need two things to safely identify an object: a object index, the handle, and the class handlers which is how the object will react internally. This set of handlers is actually a pointer, and since disclosing valid pointers is not something that should be done, spl_object_hash is simply providing a MD5 hash of those two values concatenated. Now two problems comes from this MD5 hash: * It's quite slow * It may generate collisions

One of the usages of this hash that comes to mind is an object dictionary(or map): attach information to instances, for example:

$dict = array();

$dict[spl_object_hash($obj1)] = $info1;
$dict[spl_object_hash($obj2)] = $info2;
// and so on.

Sadly, since PHP arrays are themselves hashtables, that means the hash will get hashed one more time, this is a waste of time.

Another example could be to mark nodes in a graph traversal algorithm, using a set of visited nodes.

SPL thankfully provides a class (as of PHP5.3) that can implement quite easily both examples without the problems stated above: SplObjectStorage. Since an example is better than thousand words, here is a demonstration:

// Map
$dict = new SplObjectStorage;

$dict[$obj1] = $info1;
$dict[$obj2] = $info2;

var_dump($dict[$obj1]); // $info1

// Set
$set = new SplObjectStorage;
$set->attach($obj1);

var_dump($set->contains($obj1)); // True

SplObjectStorage directly uses the unique identifier without pre-hashing it, so you spare time and you will be safe against collisions!

Antony got the idea to implement a C-like array wrapper in SPL: SplFixedArray. The main advantage of that class is performance, it's indeed faster than PHP arrays. How so? No free lunch: The speedup comes from the fact that non-numeric indexes are not allowed and that the array is of fixed size (which means no hashing and continuous memory storage). That said, here is a quick example:

$a = new SplFixedArray(4); // takes the initial size as first argument

$a[0] = "foo";
$a[2] = "gee";
$a[3] = "plop";

$a->setSize(3);
$a->setSize(5); // increase the size

foreach($a as $k=>$v) {
    var_dump($v);
}
/* Output:
 * string(3) "foo"
 * NULL
 * string(3) "gee"
 * NULL
 * NULL
 */

The speedup seems to vary between different environments, but so far the multiple benchmarks showed that SplFixedArray was 10~30% faster than standard PHP arrays.