Category Archives: python

Python style decorators in Erlang

Alan Perlis said that “A language that doesnt change how you think about programming isnt worth knowing”.
My favourite languages are Erlang, Python and C++ (and Lua, but these days Python is taking more of that market share). I like them all for very different things, and they all have vastly different strengths,weaknesses and feature sets.
I also like thinking about how/if you can apply idioms that are common in one language to another language, and if theres any benefit in doing so. Today Im going to look at Python style decorators implemented in Erlang.

The example code is here

Decorators overview

Firstly: Python decorators are unrelated to the GoF Decorator pattern.
Decorators are syntactic sugar in python for applying a function to another.

@memoize
def some_expensive_func(n):
   return some_expensive_computation(n)

is the same as

def some_expensive_func(n):
   return some_expensive_computation(n)
some_expensive_func = memoize(some_expensive_func)

This works because Pythons functions are first class values.

Decorators are a common idiom in Python and one that I really like.
I personally think that when used appropriately they make functions easier to read, by moving small details out of the main logic.

You can read more about the origin of the decorator syntax here.
You can see some of the neat things people do with decorators here.
Decorators are also very common in one of my favourite web frameworks; Django.

Decorators in Erlang

Following the goals laid down in the PEP I decided to set the following goals:

* Allow decorators to run code before and after a call to the function they are decorating. Eg to commit db transactions, cache values, trace or time execution etc.
* Allow decorators to alter the argument list that gets passed to the function they are decorating.
* Have a simple, non intrusive syntax. (Within the confines of erlang syntax rules.)
* Support multiple decorators on a single function
* Support passing custom arguments to the decorator.

Erlang does not support reassignment of variables, so no monkey business (monkey patching) here folks.
It also doesnt support executing code at module scope. Though on_load can often achieve similar results to module scope.
This complicates the implementation a little.

Id been intending to play with parse transforms for a while, and thought they could help.

The Plan

transform something like

-decorate( {decorator_module, decorator_function} ).
foo(N) -> N.

to

foo(N) -> foo_arity0_1([N]).
foo_arity0_1(Args) ->
   F = decorator_module:decorator_function( fun foo_arity0_0/1, Args),
   F().
foo_arity0_0([N]) -> foo_original___(N).
foo_original___(N) -> N.

This would let each decorator do whatever it wanted. It could call a completely different function or charge the arguments.

Our simple first test case

As a fan of TDD I wrote a very basic first test that I could use to start development

-module(decorator_test).
-include_lib("eunit/include/eunit.hrl").
-export([replace_return_value_decorator/2]).
 
-compile([{parse_transform, decorators}]).
 
% example decorator that replaces the return value with the atom 'replaced'
replace_return_value_decorator(F,Args)->
	fun() -> 
		_R = apply(F, [Args] ),
		replaced
	end.
 
-decorate({ ?MODULE, replace_return_value_decorator }).
replace_ret_val_decorated() -> ok.
 
replace_ret_value_test()->
	?assertEqual(replaced, replace_ret_val_decorated() ).

Getting the abstract forms

I did the simplest thing possible and printed the forms that were passed to the parse transform.

parse_transform(Ast,_Options)->
	io:format("~p~n=======~n",[ Ast ]),
	Ast.

We get the following info out.
You can read a bit about the format here

[{attribute,1,file,{"test/decorator_test.erl",1}},
 {attribute,1,module,decorator_test},
 {attribute,0,export,[{test,0},{replace_ret_value_test,0}]},
 {attribute,1,file,
  {"c:/PROGRA~2/ERL57~1.5/lib/eunit-2.1.5/include/eunit.hrl",1}},
 {attribute,3,file,{"test/decorator_test.erl",3}},
 {attribute,3,export,[{replace_return_value_decorator,2}]},
 {attribute,5,compile,[]},
 {function,10,replace_return_value_decorator,2,
  [{clause,10,
    [{var,10,'F'},{var,10,'Args'}],
    [],
    [{'fun',11,
      {clauses,
       [{clause,11,[],[],
         [{match,12,
           {var,12,'_R'},
           {call,12,
            {atom,12,apply},
            [{var,12,'F'},{cons,12,{var,12,'Args'},{nil,12}}]}},
          {atom,13,replaced}]}]}}]}]},
 {attribute,16,decorate,{decorator_test,replace_return_value_decorator}},
 {function,17,replace_ret_val_decorated,0,[{clause,17,[],[],[{atom,17,ok}]}]},
 {function,19,replace_ret_value_test,0,
  [{clause,19,[],[],
    [{call,20,
      {'fun',20,
       {clauses,
        [{clause,20,
          [{var,20,'__X'}],
          [],
          [{'case',20,
            {call,20,{atom,20,replace_ret_val_decorated},[]},
            [{clause,20,[{var,20,'__X'}],[],[{atom,20,ok}]},
             {clause,20,
              [{var,20,'__V'}],
              [],
              [{call,20,
                {remote,20,
                 {record_field,20,{atom,20,''},{atom,20,erlang}},
                 {atom,20,error}},
                [{tuple,20,
                  [{atom,20,assertEqual_failed},
                   {cons,20,
                    {tuple,20,[{atom,20,module},{atom,20,decorator_test}]},
                    {cons,20,
                     {tuple,20,[{atom,20,line},{integer,20,20}]},
                     {cons,20,
                      {tuple,20,
                       [{atom,20,expression},
                        {string,20,"replace_ret_val_decorated ( )"}]},
                      {cons,20,
                       {tuple,20,[{atom,20,expected},{var,20,'__X'}]},
                       {cons,20,
                        {tuple,20,[{atom,20,value},{var,20,'__V'}]},
                        {nil,20}}}}}}]}]}]}]}]}]}},
      [{atom,20,replaced}]}]}]},
 {eof,22},
 {function,0,test,0,
  [{clause,0,[],[],
    [{call,0,
      {remote,0,{record_field,0,{atom,0,''},{atom,0,eunit}},{atom,0,test}},
      [{atom,0,decorator_test}]}]}]}]

Pretty Printing

I think the abstract forms are quite easy to follow, but it would be nicer to be able to see what we are producing as if it was real Erlang code.
Luckily erl_pp has this.
If you are writing your own parse transforms this will be very useful for debugging.

pretty_print(Ast) -> lists:flatten([erl_pp:form(N) || N<-Ast]).

for the untransformed code we see

-file("test/decorator_test.erl", 1).
-module(decorator_test).
-export([test/0,replace_ret_value_test/0]).
-file("c:/PROGRA~2/ERL57~1.5/lib/eunit-2.1.5/include/eunit.hrl", 1).
-file("test/decorator_test.erl", 3).
-export([replace_return_value_decorator/2]).
-compile([]).
replace_return_value_decorator(F, Args) ->
    fun() ->
           _R = apply(F, [Args]),
           replaced
    end.
-decorate({decorator_test,replace_return_value_decorator}).
replace_ret_val_decorated() ->
    ok.
replace_ret_value_test() ->
    fun(__X) ->
           case replace_ret_val_decorated() of
               __X ->
                   ok;
               __V ->
                   .erlang:error({assertEqual_failed,
                                  [{module,decorator_test},
                                   {line,20},
                                   {expression,
                                    "replace_ret_val_decorated ( )"},
                                   {expected,__X},
                                   {value,__V}]})
           end
    end(replaced).
 
test() ->
    .eunit:test(decorator_test).

The transform

We need to walk the forms collecting any decorators, and apply collected decorators to a function when we meet it.
For the application step I figured it would be easy to output a nested list for when 1 form (the original function) expands to many, and clean it up later.
We also need to remove the decorator attributes, as you really are only meant to have attributes before all functions.

parse_transform(Ast,_Options)->
	%io:format("~p~n=======~n",[Ast]),
	%io:format("~s~n=======~n",[pretty_print(Ast)]),
	{ExtendedAst2, RogueDecorators} = lists:mapfoldl(fun transform_node/2, [], Ast),
	Ast2 = lists:flatten(lists:filter(fun(Node)-> Node =/= nil end, ExtendedAst2)),
	%io:format("~p~n<<<<~n",[Ast2]),
	%io:format("~s~n>>>>~n",[pretty_print(Ast2)]),
	Ast2.	
 
% transforms module level nodes
% see http://www.erlang.org/doc/apps/erts/absform.html
% outputs nil (to swallow the node), a single node, or a list of nodes.
% nil nodes are removed in a subsequent pass and the lists flattened
transform_node(Node={attribute, _Line, decorate, _Decorator}, DecoratorList) ->
	% keep a list of decorators but dont emit them in the code.
	% this is important as you arent meant to have attributes after functions in a module
	{nil, [Node|DecoratorList]};
transform_node(Node={function, _Line, _FuncName, _Arity, _Clauses}, []) ->
	% pass through decoratorless functions
	{Node, []};
transform_node(Node={function, _Line, _FuncName, _Arity, _Clauses}, DecoratorList) ->
	% apply decorators to this function and reset decorator list
	{apply_decorators(Node,DecoratorList), []};
transform_node(Node, DecoratorList) ->
	% some other form. (the only other valid forms are other attributes)
	{Node, DecoratorList}.
 
apply_decorators(Node={function, Line, FuncName, Arity, _Clauses}, DecoratorList) when length(DecoratorList) > 0 ->
	[
		% output the original function renamed
		function_form_original(Node),
		% output a trampoline into our decorator chain
		function_form_trampoline(Line, FuncName, Arity, DecoratorList),
		% output the funname_arityn_0 function to unpack the arg list and forward to the original 
		function_form_unpacker(Line,FuncName,Arity)
		% output our decorator chain
		| function_forms_decorator_chain(Line, FuncName, Arity, DecoratorList)
	].

I wont go to much into the details of what each of these do, because frankly its pretty simple.
They just fill in various absform templates.. very basic code gen..
eg

function_form_decorator_chain(Line,FuncName,Arity, {DecMod, DecFun}, DecoratorIndex) ->
	NextFuncName = generated_func_name({decorator_wrapper, FuncName, Arity, DecoratorIndex-1}),
	{function, Line, 
		generated_func_name({decorator_wrapper, FuncName,Arity, DecoratorIndex}), % name
		1, % arity
		[{ clause, Line,
			emit_arguments(Line, ['ArgList'] ),
			emit_guards(Line, []),
			[
				% F = DecMod:DecFun( fun NextFun/1, ArgList),
				emit_decorated_fun(Line, 'F', {DecMod, DecFun},	NextFuncName, 'ArgList'),
				% call 'F'
				{call, Line,{var,Line,'F'},[]}
			]
		}]
	}.

Sure theres some emit_* funcs in there to make it easier to read, but anyone that even glances as the absform docs and the absform for some sample functions could piece this together in short time.

I often find myself thinking “I cant believe how nice Erlang is”. Python too.

custom warnings and errors

Oh yeah, I also wanted to warn if there was any decorators at the end of the file that werent associated with a function.
To do that I just add warning nodes to the end of the file for each element in RogueDecorators.

parse_transform(Ast,_Options)->
	{ExtendedAst2, RogueDecorators} = lists:mapfoldl(fun transform_node/2, [], Ast),
	Ast2 = lists:flatten(lists:filter(fun(Node)-> Node =/= nil end, ExtendedAst2))
		++ emit_errors_for_rogue_decorators(RogueDecorators),
	Ast2.	
 
emit_errors_for_rogue_decorators(DecoratorList)->
	[{error,{Line,erl_parse,["rogue decorator ", io_lib:format("~p",[D]) ]}} || {attribute, Line, decorate, D} <- DecoratorList].

We also need to do it when we hit the eof node. Otherwise the decorators would get applied to the function that eunit generates after the eof.
So I added the following clause to transform_node

transform_node(Node={eof,_Line}, DecoratorList) ->
	{[Node| emit_errors_for_rogue_decorators(DecoratorList) ], []};

adding

-decorate({f,f}).

to the end of the file makes it emit the following

Ender@PROSPERO /j/git/erlang_decorators
$ ./rebar compile eunit
==> erlang_decorators (compile)
==> erlang_decorators (eunit)
test/decorator_test.erl:22: rogue decorator {f,f}

the result

The code is here.
It could be developed more and cleaned up a little.

Heres the code after transformation (ie pretty_print(Ast2))

-file("test/decorator_test.erl", 1).
-module(decorator_test).
-export([test/0,replace_ret_value_test/0]).
-file("c:/PROGRA~2/ERL57~1.5/lib/eunit-2.1.5/include/eunit.hrl", 1).
-file("test/decorator_test.erl", 3).
-export([replace_return_value_decorator/2]).
-compile([]).
replace_return_value_decorator(F, Args) ->
    fun() ->
           _R = apply(F, [Args]),
           replaced
    end.
replace_ret_val_decorated_original___() ->
    ok.
replace_ret_val_decorated() ->
    replace_ret_val_decorated_arity0_1([]).
replace_ret_val_decorated_arity0_0([]) ->
    replace_ret_val_decorated_original___().
replace_ret_val_decorated_arity0_1(ArgList) ->
    F = decorator_test:replace_return_value_decorator(fun replace_ret_val_decora
ted_arity0_0/1,
                                                      ArgList),
    F().
replace_ret_value_test() ->
    fun(__X) ->
           case replace_ret_val_decorated() of
               __X ->
                   ok;
               __V ->
                   .erlang:error({assertEqual_failed,
                                  [{module,decorator_test},
                                   {line,20},
                                   {expression,
                                    "replace_ret_val_decorated ( )"},
                                   {expected,__X},
                                   {value,__V}]})
           end
    end(replaced).
 
test() ->
    .eunit:test(decorator_test).

As we can see it did the transform that we planned.

and the result

$ ./rebar compile eunit
==> erlang_decorators (compile)
Compiled src/decorators.erl
Compiled src/decorators_app.erl
Compiled src/decorators_sup.erl
==> erlang_decorators (eunit)
Compiled src/decorators.erl
Compiled src/decorators_app.erl
Compiled src/decorators_sup.erl
Compiled test/decorator_test.erl
  All 4 tests passed.

Thats enough for now.

Is it useful?

Its just syntactic sugar (The same is true in python).
If you are wondering where decorators are useful take a look at django.

But It was interesting to play with parse transforms, and it does provide an interesting syntax option.
I personally like this form of decorator, but beauty (especially in programming languages) is in the eye of the beholder.
I feel that the things that decorators *feel right* for is things that are a bit of a cross cutting concern, so its nicer to not need to pollute the code in the function.

After this Im thinking of doing a more general purpose attribute system.
I think it could be neat to eg automatically expose certain functions as RPC methods based on this.
Pretty atypical Erlang, but its interesting to see how you can apply constructs and idioms from other programming communities to achieve goals and save time.

Limitations and future extensions

I didnt end up implementing passing arguments to the decorator, but its a relatively simple extension.

Erlang is a functional language, with single assignment. Unless you use something like meck, you cannot reassign a function.
So these decorators don’t execute any code when the module is loaded, so we cant use this to eg track test coverage of certain annotated functions.
We could extend this in future to support generating on_loaded function, or to allow us to query a module for its absforms at runtime without requiring debug_info compiler option. Both of these approaches would go someway to addressing this limitation.

We are still bound by Erlang syntax rules. So eg you cant put decorations on individual clauses

bar(1) -> ok;
-decorate({x,y}).
bar(N) -> bar(N-1).

If I was serious about using this Id also like to extend it to make it easier/prettier to define custom decorators in a header, which could use a simplified syntax. And of course to support passing extra information to the decorator.
something like

-module(foo).
-include("logger_decorator.hrl").
 
-log_entry_exit( [{logger, logger_proc }] ).
foo() -> 1.

Hiding IO latency in generators by async prefetching

Tricks with generators

Hiding IO latency by async prefetching

Overview

Its very common for interesting problems to not fit into memory.
In cases like this being able to process a subset of the problem at once is necessary to solving the problem.
Streams present a nice way of processing a limited subset. They let you program as if you were working with the full list, but only work with a single element in memory at a time. Alex recently wrote about streams.
You may already be used to implementing streams in this fashion with generators (python), iterators(C++), coroutines (Lua).
Ill use the term “Generator”, because ill be using Python.

If the time to produce/fetch an element is high, you can easily find yourself alternating between working with an element, and fetching an element.
Often fetching an element is a high latency operation, so your program ends up waiting on IO a lot of the time.
There are a number of ways of addressing this issue. Some that spring to mind are

  1. For certain tasks, use a library that supports non blocking operation. eg eventlet
  2. Explicitely parallelize the processing code (actors, worker thread pools, multiprocessing.Pool)
  3. Fetch elements in the background while processing the current element.

All 3 techniques are suited to specific situations.
I will discuss the third one in this article.

Its possible to wrap any generator such that it prefetches from the source generater in the background.
This has the benefit of overlapping the fetching of elements with normal execution.
This works best when the cost of processing the item is high, and the cost of fetching is not quite as high.
This will ensure that there is always an element ready for us when we finish processing one.

However this will not overlap fetching of individual elements.
It cannot do this with generators in the general case as they are stateful.
So this technique is not as helpful when the time it takes to process an element is tiny compared to the time to fetch.
For example, Ive seen FQL queries take over 5 seconds.

The Code

 
import functools
import Queue
import threading
 
def async_prefetch_wrapper(iterable, buffer=10):
	"""
	wraps an iterater such that it produces items in the background
	uses a bounded queue to limit memory consumption
	"""
	done = object()
	def worker(q,it):
		for item in it:
			q.put(item)
		q.put(done)
	# launch a thread to fetch the items in the background
	queue = Queue.Queue(buffer)
	it = iter(iterable)
	thread = threading.Thread(target=worker, args=(queue, it))
	thread.daemon = True
	thread.start()
	# pull the items of the queue as requested
	while True:
		item = queue.get()
		if item == done:
			return
		else:
			yield item
 
def async_prefetch(func):
	"""
	decorator to make generator functions fetch items in the background
	"""
	@functools.wraps(func)
	def wrapper(*args, **kwds):
		return async_prefetch_wrapper( func(*args, **kwds) )
	return wrapper

Pretty straight forward.

Results

So does it work?

First Ill do a simple test that creates a bunch of files and md5s them
heres the code

 
 
def test_setup():
	files = []
	lines = 1000000
	for i in xrange(100):
		filename = "tempfile%d.txt"%i
		files.append(filename)
		with open(filename, "w") as f:
			f.write( ("%d\n"%i)*lines )
	return files
 
def test_cleanup():
	for f in glob("tempfile*.txt"):
		os.unlink(f)
 
 
def contents(iterable):
	for filename in iterable:
		with open(filename, "rb") as f:
			contents = f.read()
		yield contents
 
def test():
	files = test_setup()
	for c in contents(files):
		hashlib.md5(c).digest()
	test_cleanup()
 
from timeit import Timer
t = Timer("test()", "from __main__ import test; gc.enable()")
print t.repeat(5, 1)

To use the async prefetch we change the contents generator to the following:

@async_prefetch
def contents(iterable):
	for filename in iterable:
		with open(filename, "rb") as f:
			contents = f.read()
		yield contents

Here are the results

without async prefetch 11.282730626378491 5.430316841944997 3.947590615567062 4.129772568860009 4.102539568576454
with async prefetch 6.155376451476375 3.790340392424177 3.384881807604039 3.436732252283459 3.415469144821479
without async prefetch and no md5 6.315582636899382 3.342140062493976 3.197938983865267 3.102182470118409 3.2230784219782134

Not a spectacular improvement, but not bad.
The best speedup we can hope for is reducing the total runtime by N*min(fetch time, work time).
You can see from the 3rd row that the total runtime is dominated by IO still. Ie this is a case with tiny workload vs slow IO.

Lets try it by replacing md5 with zlib compression.

without async prefetch. zlib compress 12.757559200959898 12.518354357886267 16.015608687696343 12.331753337365505 12.05284226067839
with async prefetch. zlib compress 10.578236130569667 9.156545245275586 12.359309772610764 9.072958714026505 8.881738391331858

not bad.

So this technique can give you a speedup when you have producer style generators that take time to produce elements.

Links