This is a summary
The soft deadline has passed and the hard deadline is not far away. Soon, what is likely to be my last GSoC will be over. And it was great! These three years I’ve had the summer job of my dreams. I worked on projects I was passionate about, using tools I liked and with people I liked.
I’ve learned a lot about many aspects of software development. Before GSoC, I had never worked with any version control system. I had never made a software release. I had never written any reasonably large program. I had never had to work with existing code, and make compromises for backwards compatibility or stability. I had never contributed patches, or otherwise coordinated with others.
I made many friends.
Missing a beat
For this project, I ended up having less time than I’d thought I would initially. This was partly my fault, and partly the fault of “the elements”, like GSoC midway being late, UK uni ending too late and me not having any other source of income. The end result was that for the past few weeks, I had another job for 2 days/week, with the rest left for GSoC.
I chose to use an existing Python parser (the ‘ast’ module). That was a great time-saver. I managed to write a very basic compiler in a day.
I chose Winxed for the bits I couldn’t write in Python. Winxed is by far the best low-level Parrot language. Even outside the context of Parrot, it’s a decent language.
I chose to focus on the object system, partly because it’s what Pynie’s missing, partly because it was more challenging and interesting. I wrote a guest object system in Winxed based on Parrot’s Object/Class that behaves just like CPython’s. There are a few differences (mainly that my object system is more flexible), but they wouldn’t affect Python’s semantics. It is also, obviously, very incomplete. For example, while I have full support for classes and metaclasses, you can’t substract numbers (you can only add them).
I never had the delusion that I would be able to implement a significant part of Python. All throughout the project, I worked on a prototype, a subset, an incomplete implementation. Having realistic expectations helped.
Parrot works beautifully on ARM(el). I’ve been developing on an Efika MX Netbook for the past few weeks, and I’ve yet to encounter any ARM-specific issues.
I only used a ready-made parser. I should have forked an entire compiler even if not quite pure-python. PyPy’s compiler would’ve been a good choice, since RPython is a strict subset of Python.
I should have targeted Python 2. There would’ve been more compilers to choose from (in particular PyPy’s). Switching to Python 3 afterwards wouldn’t have been hard, and I wouldn’t bother implementing old-style classes anyway.
Perhaps I should have used 6model. From what I’ve seen of it so far, it’d be much more suitable. It’s much, much more extensible than Object/Class. The reason I didn’t use it was the lack of documentation, so perhaps I should have pestered Jonathan more about that.
Perhaps I didn’t focus enough on the compiler. I still don’t think it’s particularly important to write it now (since I consider the object system much more important), but it would most certainly have been a more impressive demo than a bunch of unit tests.
Parrot is a little slow. I know I’m the lasts person to worry about speed, but developing on an 800Mhz ARM machine makes one wish for more. A JIT would be welcome, but almost any optimisations done before that are entirely irrelevant.
Some of this I’ve said before. Parrot’s Object/Class is pretty bad, and the duality with PMCs isn’t helpful. IMCC is terrible. It’s the main reason I gave up on adding features to the compiler. I still don’t know how I could fully control the namespacing mechanism without explicit namespaces/hashes, or the exception system without putting everything in one huge try.
This I’ve also said before: I didn’t work on the project as much as I would’ve wanted. In particular, the project timeline was backloaded: the second half implied much more work than the first half. I hadn’t realised this until quite late.
Parrot has no native, pervasive bool type. This is very annoying and can only promote ugly hacks, like what Winxed and Rosella.Test do, or like I did in my get_integer override. This particular issue also introduced at least one ugly, hard to find, bug in my code.
In general, I’m disillusioned with Parrot. I though it’d be better, but in my opinion the deprecation policy has really held it back.
I didn’t blog enough. I really didn’t, and it was entirely my fault. Sorry.
I hope to keep working on puffin in my spare time. It was fun and interesting and challenging. I would like to get a better compiler and complete the object system. I think it would pay to switch to 6model, especially if NQP becomes optional at some point.
I would like to work on Parrot in the future. M0 and 6model seem of good design. I’d like to try writing an M0 interpreter in Python with PyPy, so I could get a JIT for free.
Thank you whiteknight, NotFound, allison, benabik, jnthn, dukeleto, cotto and anyone else I’m forgetting. You were a real help and I’m happy to have met you.
Did you really think I’d forget to post the quota of puffin pics? Here’s a lovely bunch of puffins!
I’ve been doing a lot less work than I had planned to. It’s mostly my own fault, but such is life.
Basically, I ran out of money. Uni ended late, GSoC started early and the GSoC midterm was late this year. I had to borrow some and get a temporary full-time job for a couple of weeks. Now I have a part-time job (2 days a week), and I have the rest of the time free for GSoC work.
But not all is lost! Here’s some puffins to make us all feel better:
This rant is long overdue.
I may have said this before, but Parrot’s Object/Class is unsuited to implementing a wide range of modern dynamic languages. It makes a ton of assumptions and manages to be worse for this purpose than Java’s object system in most ways. It’s just bad. The PMC/Object duality isn’t helping either, although it is often the only way to implement particular behaviour. I’m still having some trouble with method calls and MRO.
6model is better. It’s an actual MOP for a change. I’m not clear on the details, and the current dependency on NQP is not entirely to my liking, but it’s much cleaner and appears to support the kinds of behaviour I need “natively”. I’m happy that 6model (or similar) will move into parrot core at some point, but it’s not an option for me right now.
PIR kinda sucks. It tries to be both a compilation target and reasonable to write by hand, and achieves neither very well. Some of its features help make one’s compiler simpler (subs), but in general, it makes assumptions that I have to work around (.lexical or w/e it’s called is plain wrong for Python). But you knew that.
IMCC is terrible: “syntax error … somewhere”. Really? I thought it was an easter egg when I first saw it. It isn’t even capable of telling you where syntax errors are. Or it is if you indent your PIR just so. It must die, fast. But you knew that too.
NameSpaces and their associated instructions (get_*) are at the very least confusing. I can’t figure out how to explicitly put things in the local/global namespaces at specific points in my program, and nowhere else. Either I’m missing something, or NameSpaces aren’t right and I’ll have to keep using plain hashes.
The tools aren’t there yet. Rosella.Test is very useful, but it’s not trivially integrated in a setup script. I think that’s still not quite working in mine. Distutils itself isn’t great. So much so, that I still have a setup.py script that compiles winxed and runs tests. A setup script should be small, plain and boring, just a bit of boilerplate, and that’s simply not possible with parrot’s Distutils at the moment. Perhaps I’ve been spoilt by more mature languages/ecosystems, but these issues have been significant obstacles for me.
It’s not the end of the world, though. I wrote my own meta-object, improved the generated PIR and used hashes for scoping. I almost finished fixing the build/test system.
Next, I’ll start implementing some of the harder bits in the compiler & object system. I’d rather get those out of the way first and focus on completeness later. For example, right now you can add numbers, but you can’t subtract them. It would be trivial to implement, but it would take some time to implement all similar tiny little things.
Back to work now.
Double puffin this week! And that’s to (partly) make up for my short 3 day vacation to Scotland, starting tomorrow.
I’ve been working on the compiler, getting it to cooperate and use the object system. Getting it to use my objects wasn’t too hard, since I designed the objects for that use in the first place, but many other issues surfaced.
First, my object system isn’t just incomplete, but also buggy. I found a few minor bugs and a couple of not so minor bugs, but nothing huge. It appears that overall, it’s sound.
Second, PIR is not very nice. Sometimes it’s too low-level, often it’s too high-level. Almost always it’s surprising, and largely because it does in fact have semantics of its own that can’t be (easily) overriden. And I’ve yet to complain about IMCC, which is just dreadful.
Third, it’s not entirely clear what namespaces/scopes/frames should be implemented as in parrot. I can’t use PIR’s .local/.lex since the semantics are wrong for Python. I can’t use the NameSpace PMC because it doesn’t really work. What I’m doing now is using Python dicts to implement frames, just like CPython. This means a new dict (basically a Hash boxed in a Python.instance) gets created for every module import, function/class declaration, function call, class instantiation, etc. Also, every time any name (symbol) is looked up, at least one Hash lookup is performed.
This is very, very bad for performance. A significant amount of PyPy’s JIT’s performance comes from just virtualising frames (not creating them unless really needed). Theoretically this should be possible on Parrot too, but it would require (1) Parrot having a good JIT and MOP or (2) writing big, ugly C extensions to Parrot especially for puffin. But right now, I don’t care at all. So I’ll likely provide CPython’s sys._getframe() without any performance degradation. Instead, there will be a constant performance degradation. Yay!
It’s not all bad news, though. AFAICT, I will be able to implement correct Python semantics. There’s a lot of work to be done to achieve this, and I doubt I’ll be able to do all of it during GSoC, but I see no major obstacles that cannot be at least worked around.
See you on Friday!
Isn’t the puffin just great? Hopefully he’ll be enough to distract you from the rest of the post.
I’ve done some more work on the object system. It now has amazing features, such as attribute retrieval and descriptors. I’ve refactored much of it, and of course I broke many tests. With NotFound and whiteknight’s help I added support for better syntax for attribute retrieval (obj.attr instead of obj.get(‘attr’)) and fixed some long-standing issues. My thanks to everyone else that gave me a hand while stumbling about, debugging.
Attribute retrieval in Python isn’t as simple as you might think. When retrieving an attribute on an object, first the object’s __dict__ is searched. If the attribute isn’t there, the object’s __class__ is checked for the attribute (which in turn will check __class__.__dict__). If that fails, the object’s class’ bases (parents) are checked.
That isn’t the end of the story, though. This logic is implemented in the __getattribute__ special method. ‘object’, the parent of all objects, implements __getattribute__. Hence, it can be overridden in your own objects (usually a bad idea, though). But this isn’t the end of the story either.
In case you don’t know, everything after the dot in Python is an attribute (obj.attr_name), including callables. To support something that looks like methods, Python uses descriptors to create bound methods, which are essentially closures over ‘self’. Descriptors are a general concept however, and they’re also used to implement properties. They are implemented in object.__getattribute__.
So if you went through the trouble of reading those links to that awesome book, you are likely to share my opinion: that for its features, Python’s object system is simple, straightforward and with much of its implementation exposed to applications.
I have implemented a proto-instance, ‘object’, ‘type’, ‘function’, and partially implemented ‘tuple’ and ‘int’. All of these have tests that largely pass, my biggest culprit is __getattribute__. Also, my function takes in a parrot Sub as its second parameter, so it should be easy to access all of parrot’s features directly from python later on.
I believe I’ve implemented enough of the object system for it to be a useful target to the compiler, so I’ll start retargeting it from PIR types to this object system. While the object system isn’t entirely correct yet and still very incomplete, I believe both these issues can be fixed later, during implementation.
Finally done with exams, yesterday was my last. Ever, hopefully. So I had time to do some work. You can find it at either http://bitbucket.org/lucian1900/puffin or http://github.com/lucian1900/puffin. I pull from bitbucket, but push to both.
At the moment I am focused entirely on correctness and completeness. I care little about interop with other parrot languages and not at all about performance. I don’t want to waste time on issues that don’t have established solutions on parrot anyway. Instead, I want a correct python implementation on parrot. Interop and performance can be fixed later.
I started writing a compiler using python3’s ast module, generating PIR. I focused on a subset of python that supports int literals, int addition, assignment and printing. I wrote some tests, to check both the PIR output and its execution. This was straightforward, the ast module is very good.
Python’s semantics are almost entirely defined by its object system, so I decided to start implementing it. I looked at whiteknight and NotFound’s experiments with prototype objects (found in rosella/unstable/prototype). I found it unsuitable to directly support Python’s object system, but that it could be a useful source of inspiration.
I don’t want to write assembly and I have almost no experience with Perl, so I decided to use Winxed to implement an object system, on top of Object/Class. Objects are backed by a Hash, for __dict__, which contains all attributes of the object. Both types (classes) and objects (instances) are instances of ‘instance’, a parrot Class. ‘type’ inherits from ‘instance’, since types are also objects in python.
Python objects can be interracted with from Winxed/PIR similarly to how CPython does: “foo.bar” becomes “foo.__class__.__dict__[‘__getattribute__’](foo, ‘bar’)”.
The object system boostraps itself to the point where there is a significant subset of builtins, with almost entirely correct pythonic behaviour: type, object, int, tuple, function, BaseException etc. From there on, in theory, everything can be implemented in pure python.
The compiler doesn’t yet generate code for this object system, since there are a few vital bits missing, such as correctly working methods and metaclasses. Also, I have few tests for the object system, I’d like concentrate on improving that situation.
I had initially also considered 6model, but its unfortunate lack of documentation prevented me from properly evaluating it. After some chats with jnthn and his recent (very useful)docs, I have a much better understanding of 6model, it is in fact quite similar to my object system (but more general). Since Python’s interaction with its objects follows a very clearly defined interface, I believe it’ll be very easy to rewrite the object system from under the compiler, using whatever ends up being the recommended method. Since I’ve already built much of what 6model would offer me, I’ll stick with this (I know, it’s a bit NIH) until 6model gets integrated into parrot better.
Another issue is building&packaging. Packaging for Python(3) is well established, and Parrot has its own distutils. However, since I have code in both Python and a parrot language, the interraction is a bit tricky. The same goes for testing, in fact. I see two possible solutions: 1) write build scripts in python/distutils for building winxed&pir, likely based on allison’s work in pynie/setup.py or 2) write setup.py for python code and setup.winxed for parrot code, and add a command to setup.py that calls setup.winxed. I’m inclined to prefer the second option.
After I’ve tested the builtins in my object system, I’ll start targeting the compiler to use them. Afterwards, I’ll look at supporting more ‘exotic’ features, such as I/O or module importing.
Sadly, exams have proven to take up more time than expected. Today’s exam was ok, easier than I expected. Also easier than previous exams from this lecturer. At least my last exam is on Wednesday, so after that I’m free.
I have however managed to do some investigative work.
I looked at pynie, to determine whether there’s anything worth reusing. As it happens, pynie hasn’t been working for quite a while now. I tried to bring it out of bitrot, but I didn’t have enough knowledge of NQP/PCT and didn’t bother to do more than make it build. Furthermore, PCT isn’t particularly friendly towards python developers.
Allison suggested that I might reuse the tests, but the rest is of little use. The bootstrap tests can’t easily use a python testing tool, since it during boostrap Python code can’t run yet. Py.test might help with that, but I’m not sure I should bother using it (as opposed to unittest).
I had planned to decide this week between using 6model or building my own object model over Parrot’s existing objects. While 6model appears to be able to fully support Python, I’m reluctant to jump in and use it while it’s still an external dependency. I don’t particularly want to pioneer in the usage of 6model on parrot, I’d much rather someone with more 6model experience did that instead.
I’ve started playing with the ‘ast’ module in Python3. I’ve figured out how to walk the AST, now I have to decide what to do with it. One option might be trying to transform it to PAST and letting parrot generate the code, but afaik PAST doesn’t have a textual form that I could target. The other option is linearising the AST and generating PIR, which is likely the one I’ll take unless things change or someone tells me it’s a stupid idea.
Sorry I’m late. I’ll put my coat and hat away and be right with you. Here’s a puffin while you wait.
I’m Lucian Branescu Mihaila, and I like Python and Parrot.
I was lucky to be accepted for GSoC, doing Python3 on Parrot. Here‘s my proposal. There’s a schedule in there and explanation of what I plan to do.
It turns out that UK universities end quite late, especially in 3rd year. So I’ve demonstrated my dissertation only yesterday, I still have one coursework deadline for tomorrow and I have exams between 26th and 1st. At least exams shouldn’t be a time sink, there’s only so much that one can revise.
So I should be mostly on schedule with starting actual work.