Saturday, March 3, 2012

Comparing Matlab and Python for large image processing problems

Matlab is a popular scripting language for numerical computing.  It is popular and powerful due to its various toolbox.  Since Matlab is developed commercially, dedicated programmers are working in adding new features and enhancing the existing ones all the time.  Large image data sets created using latest imaging modalities typically take a long time to process.  The typical approach is to run the processing in parallel on many cores either on a desktop, a cluster or a supercomputer.  Commercial software like Matlab require dedicated license such as Distributed Computing Server for running such parallel jobs, which has an added cost.

Graphical Processing Unit (GPU) programming is becoming popular and easier than ever.  The license cost for GPU programming in Matlab is typically lower than that for parallel programming.  GPU programming is useful, if there is large amount of computation and fewer data transfer.  This limit exists because of smaller bandwidth between the CPU and GPU, the typical path that data has to take for processing in the GPU.  Large data set image processing typically involves large amount of I/O and data transfer between CPU and GPU and hence may not provide enough scalability.

Advantage Python

Python is a free and open source scripting language.  It can be scaled to large number of processors.  It has been shown that there is not a significant difference in computational time for a program written in Matlab vs the one written in python using numpy.   The author found that numpy run time is in the same order of magnitude as Fortan and C programs with optimization enabled.  Although, the processing time are for slightly older versions of the various languages, my experience has shown that the range of processing time remains similar.  The processing time will also vary depending on the nature of the algorithm and the expertise of the programmer.

Python has parallel processing modules like mpi4py, that allow scaling the application to large number of cores. In a supercomputing environment with large number of cores, python can be run on many of the cores.  Matlab on the other hand can only scale to the extent of the number of license. 

Python also has GPU programming capability through pycuda, in case if GPU programming suits the application.

Python is a more general purpose language compared to Matlab.  Hence integrating various databases, servers, file handling, string handling in to the program is easy.

Disadvantage Python

Since Python is an open-source package, it does not have a wide variety of canned functions and toolboxes as Matlab.  Hence, the user has to work on developing some of them.  I hope that over time this issue will be resolved.

Friday, February 17, 2012

Under-graduate education in Image processing

SURVEY URL:  http://goo.gl/ORDDz

Image acquisition and processing have become a standard method for qualifying and quantifying experimental measurements in various Science Technology Engineering and Mathematics (STEM) disciplines.  Discoveries have been made possible in medical sciences by advances in diagnostic imaging such as x-ray based computed tomography (CT) and magnetic resonance imaging (MRI).  Biological and cellular functions have been revealed with new imaging techniques in light based microscopy.  Advancements in material sciences have been aided by electron microscopy analysis of nanoparticles.    All these examples and many more require both knowledge of the physical methods to obtain images and the analytical processing methods to understand the science behind the images. 

Imaging technology continues to advance with new modalities and methods available to students and researchers in STEM disciplines.  Thus, a course in image acquisition and processing would have broad appeal across the STEM disciplines and be useful for transforming undergraduate and graduate curriculum to better prepare students for their future.

Image analysis is an extraordinarily practical technique that need not be limited to highly analytical individuals with a math and engineering background.  Since researchers in biology, medicine, and chemistry along with students and scientists from mathematics, physics and various engineering fields use these techniques regularly; there is a need for a course that provides a gradual introduction to both acquisition and processing. 

Such a course will prepare students in the common image acquisition techniques like CT, MRI, light  microscope and electron microscope.   It will also introduce the practical aspects of image acquisition such as noise, resolution etc.  The students will also be programming image processing using Python as a part of the curriculum.  They will be introduced to python modules such as numpy and scipy.   They will learn the various image processing operations like segmentation, morphological operations, measurements and visualization.

We wanted our understanding of this need with the data collected from surveying people interested in image processing or people who wish that they had such a course during their senior year in under-graduate or in graduate school.   We created a survey to obtain your feedback.    It will take only a minute of your time.   We request that you fill as much information as you can.  Please forward this URL or this blogpost to your friends as well.


Sunday, February 12, 2012

Python modules for scientific image processing



Recently my friend, Nick Labello working at University of Chicago performed a large scale, large data image processing.  It took one week to process the data acquired over 12 hours.  The program was written in Matlab and was run on a desktop. If the processing time needs to be reduced, the best course of action is to parallelize the program and run it on multiple cores / nodes.  Matlab parallelization can be expensive as it is closed-source commercial software.  Python on the other hand is free and open-source and hence can be parallelized to large number of cores.  Python also has parallelization modules like mpi4py that eases the task.  With a clear choice of programming language, Nick worked on evaluating the various python modules.  The chosen python module was used to rewrite the Matlab program.   Specifically, he reviewed the following

  1. Numpy
  2. Scipy
  3. Pymorph
  4. Mahotas
  5. Scikits-image
  6. Python Imaging Library (PIL)
Nick’s view on the various modules along with some of my view is given below.  

Numpy

It doesn't actually give you any image processing capabilities, but all of the image processing libraries rely on Numpy arrays to store the image data.  As a happy result, it is trivial to bounce between all of the different image processing libraries in the code because they all read and write the same datatype. 

It provides a stable, solid, easy to use very basic functionality (erosion, dilation, etc..).    It is missing a lot of the more advanced functions you might find in Matlab such as functions to find endpoints, perimeter pixels, etc.  These must be pieced together from elementary operations.  The documentation for NDImage is NOT very good for new users.  I had a hard time with it at first.  

It has lots of image processing functions that do not rely on anything but Numpy.  It does not depend on Scipy.  It provides a python only library.  A second interesting thing about PyMorph is that it has quite a lot of functionality compared to the other libraries.  Unfortunately, since it is written in Python, it is hundreds of times slower than Scipy and the other libraries.  This will become an issue for advanced image processing users.


Mahotas  

It provides only the most basic functions, but blazing fast, and, in my experience, twice as fast as the equivalent functions in Scipy.

Scikits-Image  

It picks up where Scipy leaves off and offers quite a few functions not found in Scipy.

Python Imaging Library (PIL)  

It is free to use but is not open-source.  It has very few image processing algorithms and not necessarily useful for scientific imaging.


Enthought  

It is __not__ an image processing library.  It is a python distribution ready for scientific programming.  It has over 100 scientific packages pre-installed, and is compiled against fast MKL/BLAS.  My tests were a little bit faster in Enthought-python than the regular Python by 2-3%.  Your mileage may vary.  The big advantage, though, is that it has everything I need except Mahotas and PyMorph.  Also, Mahotas installs easily on Enthought, whereas it was difficult on the original python installation, due to various dependencies. Enthought is free for students and employees of Universities and Colleges. Others need to pay for the service.

Ravi's view: Personally, I do not have much problem installing python modules to regular python interpreter. The only time it becomes difficult is installing scientific packages like Scipy that have numerous dependencies like Boost libraries. Readymade binary packages do eliminate the installation steps but are not necessarily well tuned for optimal performance. Enthought is a great alternative that is as easy to install and yet optimized for performance.