, 8 min read
MD4C PHP Extension
Original post is here eklausmeier.goip.de/blog/2024/02-24-md4c-php-extension.
This blog uses MD4C to convert Markdown to HTML. So far I used PHP:FFI to link PHP with the MD4C C library. PHP:FFI is "Foreign Function Interface" in PHP and allows to call C functions from PHP without writing a PHP extension. Using FFI is very easy.
Previous profiling measurements with XHProf and PHPSPY indicated that the handling of the return value from MD4C via FFI::String takes some time. So I changed FFI to a "real" PHP extension. I measured again. Result: No difference between FFI and PHP extension. So the profiling measurements were misleading.
Also the following claim in the PHP manual is downright false:
it makes no sense to use the FFI extension for speed; however, it may make sense to use it to reduce memory consumption.
Nevertheless, writing a PHP extension was a good exercise to keep my acquaintance with the PHP development ecosystem up to date. I had already written a COBOL to PHP and an IMS/DC to PHP extension:
Literature on writing PHP extension are here:
- Sara Golemon: Extending and Embedding PHP, Sams Publishing, 2006, xx+410 p.
- PHP Internals: Zend extensions
- https://github.com/dstogov/php-extension
The PHP extension code is in GitHub: php-md4c.
1. Walk through the C code. For this simple extension there is no need for a separate header file.
The extension starts with basic includes for PHP, for the phpinfo()
, and for MD4C:
// MD4C extension for PHP: Markdown to HTML conversion
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <php.h>
#include <ext/standard/info.h>
#include <md4c-html.h>
The following code is directly from the FFI part php_md4c_toHtml.c:
struct membuffer {
char* data;
size_t asize; // allocated size = max usable size
size_t size; // current size
};
The following routines are also almost the same as in the FFI case, except that memory allocation is using safe_pemalloc()
instead of native malloc()
.
In our case this doesn't make any difference.
static void membuf_init(struct membuffer* buf, MD_SIZE new_asize) {
buf->size = 0;
buf->asize = new_asize;
if ((buf->data = safe_pemalloc(buf->asize,sizeof(char),0,1)) == NULL)
php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_init: safe_pemalloc() failed with asize=%ld.\n",(long)buf->asize);
}
Next routine uses safe_perealloc()
instead of realloc()
.
static void membuf_grow(struct membuffer* buf, size_t new_asize) {
buf->data = safe_perealloc(buf->data, sizeof(char*), new_asize, 0, 1);
if (buf->data == NULL)
php_error_docref(NULL, E_ERROR, "php-md4c.c: membuf_grow: realloc() failed, new_asize=%ld.\n",(long)new_asize);
buf->asize = new_asize;
}
The rest is identical to FFI.
static void membuf_append(struct membuffer* buf, const char* data, MD_SIZE size) {
if (buf->asize < buf->size + size)
membuf_grow(buf, buf->size + buf->size / 2 + size);
memcpy(buf->data + buf->size, data, size);
buf->size += size;
}
static void process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) {
membuf_append((struct membuffer*) userdata, text, size);
}
static struct membuffer mbuf = { NULL, 0, 0 };
Now we come to something PHP specific.
We encapsulate the C function into PHP_FUNCTION
.
Furthermore, the arguments of the routine are parsed with ZEND_PARSE_PARAMETERS_START(1, 2)
.
This routine must have at least one argument.
It might have an optional second argument.
That is what is meant by (1,2)
.
The return string is allocated via estrndup()
.
In the FFI case we just return a pointer to a string.
/* {{{ string md4c_toHtml( string $markdown, [ int $flag ] )
*/
PHP_FUNCTION(md4c_toHtml) { // return HTML string
char *markdown;
size_t markdown_len;
int ret;
long flag = MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS;
ZEND_PARSE_PARAMETERS_START(1, 2)
Z_PARAM_STRING(markdown, markdown_len)
Z_PARAM_OPTIONAL Z_PARAM_LONG(flag)
ZEND_PARSE_PARAMETERS_END();
if (mbuf.asize == 0) membuf_init(&mbuf,16777216); // =16MB
mbuf.size = 0; // prepare for next call
ret = md_html(markdown, markdown_len, process_output,
&mbuf, (MD_SIZE)flag, 0);
membuf_append(&mbuf,"\0",1); // make it a null-terminated C string, so PHP can deduce length
if (ret < 0) {
RETVAL_STRINGL("<br>- - - Error in Markdown - - -<br>\n",sizeof("<br>- - - Error in Markdown - - -<br>\n"));
} else {
RETVAL_STRING(estrndup(mbuf.data,mbuf.size));
}
}
/* }}}*/
The following two PHP extension specific functions are just for initialization and shutdown. The following diagram from PHP internals shows the sequence of initialization and shutdown.
Init: Do nothing.
/* {{{ PHP_MINIT_FUNCTION
*/
PHP_MINIT_FUNCTION(md4c) { // module initialization
//REGISTER_INI_ENTRIES();
//php_printf("In PHP_MINIT_FUNCTION(md4c): module initialization\n");
return SUCCESS;
}
/* }}} */
Shutdown: Do nothing.
/* {{{ PHP_MSHUTDOWN_FUNCTION
*/
PHP_MSHUTDOWN_FUNCTION(md4c) { // module shutdown
if (mbuf.data) pefree(mbuf.data,1);
return SUCCESS;
}
/* }}} */
The following function prints out information when called via phpinfo()
.
/* {{{ PHP_MINFO_FUNCTION
*/
PHP_MINFO_FUNCTION(md4c) {
php_info_print_table_start();
php_info_print_table_row(2, "MD4C", "enabled");
php_info_print_table_row(2, "PHP-MD4C version", "1.0");
php_info_print_table_row(2, "MD4C version", "0.5.2");
php_info_print_table_end();
}
/* }}} */
The output looks like this:
Below describes the argument list.
/* {{{ arginfo
*/
ZEND_BEGIN_ARG_INFO(arginfo_md4c_test, 0)
ZEND_END_ARG_INFO()
ZEND_BEGIN_ARG_INFO(arginfo_md4c_toHtml, 1)
ZEND_ARG_INFO(0, str)
ZEND_ARG_INFO_WITH_DEFAULT_VALUE(0, flag, "MD_DIALECT_GITHUB | MD_FLAG_NOINDENTEDCODEBLOCKS")
ZEND_END_ARG_INFO()
/* }}} */
/* {{{ php_md4c_functions[]
*/
static const zend_function_entry php_md4c_functions[] = {
PHP_FE(md4c_toHtml, arginfo_md4c_toHtml)
PHP_FE_END
};
/* }}} */
The zend_module_entry
is somewhat classical.
All the above is configured here.
/* {{{ md4c_module_entry
*/
zend_module_entry md4c_module_entry = {
STANDARD_MODULE_HEADER,
"md4c", // Extension name
php_md4c_functions, // zend_function_entry
NULL, //PHP_MINIT(md4c), // PHP_MINIT - Module initialization
PHP_MSHUTDOWN(md4c), // PHP_MSHUTDOWN - Module shutdown
NULL, // PHP_RINIT - Request initialization
NULL, // PHP_RSHUTDOWN - Request shutdown
PHP_MINFO(md4c), // PHP_MINFO - Module info
"1.0", // Version
STANDARD_MODULE_PROPERTIES
};
/* }}} */
This seemingly innocent looking statement is important: Without it you will get PHP Startup: Unable to load dynamic library
.
#ifdef COMPILE_DL_TEST
# ifdef ZTS
ZEND_TSRMLS_CACHE_DEFINE()
# endif
#endif
ZEND_GET_MODULE(md4c)
2. M4 config file.
The PHP extension requires a config.m4
file.
dnl config.m4 for php-md4c extension
PHP_ARG_WITH(md4c, [whether to enable MD4C support],
[ --with-md4c[[=DIR]] Enable MD4C support.
DIR is the path to MD4C install prefix])
if test "$PHP_YAML" != "no"; then
AC_MSG_CHECKING([for md4c headers])
for i in "$PHP_MD4C" "$prefix" /usr /usr/local; do
if test -r "$i/include/md4c-html.h"; then
PHP_MD4C_DIR=$i
AC_MSG_RESULT([found in $i])
break
fi
done
if test -z "$PHP_MD4C_DIR"; then
AC_MSG_RESULT([not found])
AC_MSG_ERROR([Please install md4c])
fi
PHP_ADD_INCLUDE($PHP_MD4C_DIR/include)
dnl recommended flags for compilation with gcc
dnl CFLAGS="$CFLAGS -Wall -fno-strict-aliasing"
export OLD_CPPFLAGS="$CPPFLAGS"
export CPPFLAGS="$CPPFLAGS $INCLUDES -DHAVE_MD4C"
AC_CHECK_HEADERS([md4c.h md4c-html.h], [], AC_MSG_ERROR(['md4c.h' header not found]))
#AC_CHECK_HEADER([md4c-html.h], [], AC_MSG_ERROR(['md4c-html.h' header not found]))
PHP_SUBST(MD4C_SHARED_LIBADD)
PHP_ADD_LIBRARY_WITH_PATH(md4c, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
PHP_ADD_LIBRARY_WITH_PATH(md4c-html, $PHP_MD4C_DIR/$PHP_LIBDIR, MD4C_SHARED_LIBADD)
export CPPFLAGS="$OLD_CPPFLAGS"
PHP_SUBST(MD4C_SHARED_LIBADD)
AC_DEFINE(HAVE_MD4C, 1, [ ])
PHP_NEW_EXTENSION(md4c, md4c.c, $ext_shared)
fi
3. Compiling. Run
phpize
./configure
make
Symbols are as follows:
$ nm md4c.so
0000000000002160 r arginfo_md4c_test
0000000000003d00 d arginfo_md4c_toHtml
w __cxa_finalize@GLIBC_2.2.5
00000000000040a0 d __dso_handle
0000000000003dc0 d _DYNAMIC
U _emalloc
U _emalloc_64
U _estrndup
00000000000016c8 t _fini
U free@GLIBC_2.2.5
00000000000016c0 T get_module
0000000000003fe8 d _GLOBAL_OFFSET_TABLE_
w __gmon_start__
00000000000021c8 r __GNU_EH_FRAME_HDR
0000000000001000 t _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000004180 b mbuf
00000000000040c0 D md4c_module_entry
U md_html
U memcpy@GLIBC_2.14
U php_error_docref
U php_info_print_table_end
U php_info_print_table_row
U php_info_print_table_start
0000000000003d60 d php_md4c_functions
U php_printf
0000000000001640 t process_output
0000000000001234 t process_output.cold
U _safe_malloc
U _safe_realloc
U __stack_chk_fail@GLIBC_2.4
U strlen@GLIBC_2.2.5
0000000000004168 d __TMC_END__
U zend_parse_arg_long_slow
U zend_parse_arg_str_slow
U zend_wrong_parameter_error
U zend_wrong_parameters_count_error
U zend_wrong_parameters_none_error
. . .
0000000000001380 T zif_md4c_toHtml
00000000000011cf t zif_md4c_toHtml.cold
0000000000001175 T zm_info_md4c
0000000000001350 T zm_shutdown_md4c
00000000000016b0 T zm_startup_md4c
4. Installing on Arch Linux. Copy the md4c.so
library to /usr/lib/php/modules
as root:
cp modules/md4c.so /usr/lib/php/modules
Finally activate the extension in php.ini
:
extension=md4c
5. Notes on Windows. On Linux we use the installed MD4C library. As noted in Installing Simplified Saaze on Windows 10 #2 it is advisable to amalgamate all MD4C source files into a single file for easier compilation.