I started using Python way back at the end of 1996 or early 1997. I was working in my PhD, for which the first project involved writing some simulations in FORTRAN. Originally I was using FORTRAN90, but then I needed to move my project to a server that had only FORTRAN77, so I was stuck with something that looked—at least to me—really ugly. While I was looking for alternatives (I used Mathematica, Matlab, SAS, Python and ASReml in my PhD), I stumbled on an article by Konrad Hinsen discussing using Python to glue FORTRAN programs. Intrigued, I downloaded Python and ordered a copy of Mark Lutz’s Programming Python (the October 1996 first edition). After reading the book for a while I was hooked on the language.
I used Python in and out for small projects, and later dropped almost all programming (that was not stats) around 2001. I have missed that quite a bit until yesterday when working with a list of words that Orlando is using. We had typed around 450 words in Spanish (and he uses around the same number in English) and I wanted to check if we had repeated words. I downloaded Python, wrote a few lines and presto! We did have around 20 repeated words and it was so nice to be able to write something in Python.
After that I did check a few web pages and I realised that the language has evolved quite nicely (although I rarely use the object oriented stuff) and there are at least two books that I will be browsing soon:
Both books are available as free downloads in a variety of formats, as well as in real old-fashioned paper. I will certainly buy the nicest one in a paper copy.
I forgot to mention that one of the great things about Python was the existence of an excellent set of libraries for matrix operations (at the time was Numpy) that has grown in to a great set of resources for scientific computing called SciPy.
It is very easy to get the HTML version of Markdown text using any of the online processors (e.g., original Perl markdown dingus and PHP markdown dingus). However, lot of the time I am not online and I do not want to have a web server running in my machine to run these scripts. I think that the easiest way to work is just to run a script in the command line to transform the text.
The options were:
I downloaded PHP (and installed it), markdown.php and smartypants.php (from the PHP markdown web site to support em-dashes and curly quotes). Then I wrote my first ever PHP script to read a text file, convert it using markdown and smartypants, and finally add body and html tags to obtain a full html document rather than an HTML snippet.
The glorious—crappy, I know—code is below:
<?php include_once "markdown.php"; include_once "smartypants.php"; $infile_name = $argv; $outfile_name = $argv; // Reading input file in Markdown markup $in_handler = fopen($infile_name, "r"); $markdown_text = fread($in_handler, filesize($infile_name)); fclose($in_handler); // Transforming Markdown to HTML and then using Smartypants // to change quotes and dashes. $html_text = Markdown($markdown_text); $html_text = SmartyPants($html_text); // Adding html and body tags to Markdown output // to get complete html page $top = "<html>\n<body>\n"; $bottom = "\n</body>\n</html>"; $full_html = $top . $html_text . $bottom; // Saving output file using HTML markup $out_handler = fopen($outfile_name, "w"); fwrite($out_handler, $full_html); fclose($out_handler); ?>
The script runs fine, and I should expand it to automatically accept a single file (say
file.txt) and transform the output to
file.html. I will also add recognition for meta tags at the beginning of the file (e.g., author and title) using a flag when calling the script. Writing this small PHP script was very easy, even starting with zero PHP knowledge.
Currently I run the script using
php.exe myscript.php infile.txt outfile.html, where php.exe is the console (not web) version of the interpreter. However, it would be possible to run it as
myscript.php infile.txt outfile.html if one includes
#!c:\php\php.exe as the first line of the script.
PS. A few minutes later. I did have quite a few problems to post this PHP code, mainly because of the insistence of Textpattern to, first, parse the code in Textile to ‘beautify it’ and then Textpattern wanted to process it! Workarounds: indent the code lines by two spaces to avoid Textile touching it and, second, use the HTML entities for lower than (
<) and greater than (
>) to signal the start and the end of the script.
There is a lot of project information, ideas, and not really structured information that I find hard to store. As I work in projects with people overseas (with whom I have no direct contact in many cases), the idea of an easily up datable site, where to put my (and their) current brain dump is very appealing. I am playing with the idea of transforming my ASReml cookbook into a Wiki site that can be corrected and improved by other ASReml users.
There are plenty of Wikis to choose from, and I have been playing with PmWiki, a Wiki clone written in PHP. I would prefer a Wiki clone written in Python (so I can tinker with it), like Moin Moin, but I not seem to have the administration privileges to set it up properly in my web server.
Wikis are collaborative sites by definition, but I want to limit access to people really knowledgeable in ASReml, so I will need to password protect the site (to avoid annoying modifications by spammers).
Checking the server logs I have discovered that many people that arrive at my posts on calling VB from R are, in fact, looking for the reverse. I have never done any programming calling R from VB; however, while I was looking for COM clients for R I also found information on COM servers. OmegaHat lists RDCOMServer as a package that exports S (or R) objects as COM objects in Windows. It provides examples on using VB, Python and Perl to call R code.
Another option is Thomas Baier’s R(D)COM Server, which is provided with examples in the same languages used by RDCOM Server.
High productivity of matrix languages like Matlab and S+ or their Open Source siblings Scilab and R are a joy to use. I wrote programs in Matlab during my PhD and I can still go back to the code and perfectly understand what is going on there. Now I am writing a lot of S+ and R code where a few lines manage to perform complex operations.
A good programmer can certainly produce better performing (on terms of speed and memory requirements) program using a low(ish) level language like C, C++ or FORTRAN. However, I am not such a good programmer and it would take me ages to do some of my work if I needed to write things using those languages. Most of the time execution speed and memory usage are not the limiting factors, and speed of development rules.
I am extremely happy now using R and playing with the idea to use it as a statistics server for a few small applications. Omega Hat seems to be a very valuable resource for all things ‘connecting R to other software’.
Around 2001 I wrote a ‘temporary quicky’ to compare new Eucalyptus samples to already identified haplotypes. I did that in a few lines of VBA in MS Excel, which was the software used as a repository for these haplotypes. At the time I suggested ‘this is a quick fix and it would be a good idea to develop a proper data base’, and suggested a structure allowing for user roles, web access, etc. I was told that ‘this is not a priority’ and ‘we are happy with the spreadsheet’.
Yesterday I was having lunch with the owner of this spreadsheet, who told me that a.- it is still being used after four years! and b.- they were having some problems because they changed a bit the structure for storing the haplotypes. I offered help to fix the problem but I was told that ‘one of my students will try to fix it, because the problem has to be something very simple’.
I thought that the comment was a bit dismissive and if it was so easy why haven’t they fixed it in over a month? Granted, the code is extremely simple but they do not have any programming experience whatsoever.
VBA is a fine scripting language, which allows people to write short and useful programs. However, I would question that in this case an Excel spreadsheet is the best option for storing molecular genetics information.
In general, scripting languages (like Matlab or R) feel like a better fit for me. Python, my all time favourite language, feels much more productive than any other language I have ever used. In addition, combining Python with the Numerical Python library produces an excellent all purpose/matrix programming language. This can be used for prototyping and—if one is happy with performance—transformed into a standalone program using a utility like py2exe.