Saturday, January 26, 2008

On importing MatLAB figures into papers and presentations

[2008.02.01: I've had to substantially re-write this post, after I realized that my first attempt at it was horrendously convoluted and confusing. Which, in all fairness, is exactly what my own state of mind was at the time, after the hours – days! – of wrestling with the subject matter of this post.]

As a [graduate] student in engineering, I have, of course, had to write innumerable papers and reports, and deliver just as many presentations. Since most --- almost all, in fact --- of the work described in those papers and presentations was performed using MATLAB, the issue of how best to import figures created in that software package into a different software suite, meant for creating said papers and presentations, is a very important one. One that, thanks to the different ways in which MATLAB, Microsoft Office (the most-prevalent choice for creating documents today) and Adobe's PDF standard handle image data (not to mention all the various `standalone' image formats --- [Windows] bitmap, [Windows] Enhanced Metafile (EMF), Compuserve GIF, JPEG, TIFF, PNG... the list could go on forever), usually results in hours upon hours of exasperating, infuriating, hair-rending, cerebro-circuit-overloading trial and error. (MS Word's own quirks when handling non-text objects and/or relatively complex documents only add to the pain, which is why I, like many others, have switched to using LaTeX --- but that's another story.) It sure did for me, and I know it has for many others too. (See, for example, "Matlab copy-and-paste: Still broken after all these years" by Frinkytown, "How do I import MATLAB graphics into Microsoft Office 97?" by the MathWorks, this Matlab script to create a Word document from a Matlab figure, by James Lynch, and yet more search results returned from googling "matlab figures Word".) So, to save any others the ordeal of re-inventing my wheel, here is what I have learned:

(Note: The following discussion applies to figures with `line' plots. For figures with shaded plot elements, like those created with surf, mesh or pcolor, see the last paragraph of this post. Also, my system is WinXP Home / SP2, MS Office 2000 and MATLAB v6.5 (R13).)

First off, MATLAB’s own display format: figures on-screen are created at a resolution of 96 dpi, at a default size of 560x420 pixels (4.375”x5.833”). I’m not quite sure how this happens, since my laptop screen itself seems to have a resolution of 117 dpi (1400x1050 pixels, 12”x9”). Maximized, the figure becomes 1400x944 pixels, but still at 96 dpi.

If the figure is exported as a png file – or any other bitmapped (raster) format, I’m assuming – the resolution gets increased to 150 dpi. The size of the image in inches is set by the Page Setup settings of the figure window (default: 8”x6”). The size in pixels adjusts accordingly (default: 1200x900). Similarly, other Page Setup settings also apply, such as Recomputing vs. Keeping screen limits and ticks. Linewidths remain about the same, but the sizes of the text fonts and the line markers become smaller in proportion to the rest of the image. The dotted gridlines (which are actually very tiny dashes in the original MATLAB figure, with a default linewidth of 0.5) become true dotted lines – which is nice.

The results are probably the same with other bitmap-format filetypes, I’d imagine. With a tif it’s the same, except that the filesizes (in bytes) are an order of magnitude larger (two orders of magnitude larger for an uncompressed tif), with no improvement in quality (that I can observe). Jpgs are about five times larger than the pngs, again with no improvement in quality (possibly even a slight deterioration, depending on the degree of `lossiness’ chosen). Gifs are the same as pngs, and we’re not supposed to use those any more. So, in fact, with the exception of eps files (and I’ll come back to those later), pngs really are the smallest filesize format out there, smaller even than the emfs.

If the figure is exported as an emf file (which is a vector format, I think), the [relative] sizes of all the figure objects remain the same, so the image looks *exactly* like the original figure in MATLAB. The size of the image in pixels is the same as the size of the original figure on the screen at the time it was exported. However, the resolution increases to 120 dpi (so the quality improves), and the size in inches therefore gets proportionately reduced. Strangely enough, if this emf file is imported/inserted into MS Word, the resolution of the inserted image falls back to 96 dpi (and the size in inches therefore increases back to that of the original MATLAB figure). Similarly, and similarly strangely, if the emf image is converted into the png format (which is easily done using a program like Irfanview), the resolution falls to 94 dpi. And, for some still further strangeness: if this emf2png file is inserted/imported into MS Word, the resolution once again returns to the original 96 dpi!

Coming now to the quality of the various image formats when imported into the various document formats:

In a MS Word .doc file:
png: image quality is good (still at 150 dpi?), but fontsizes and markersizes need to be increased to look the same as the original MATLAB figure. The true-dotted gridlines are almost invisible – which is nice.
emf2png: quality is not as good (96 dpi) as the imported pngs, although the sizes of the objects are `authentic’. Pretty sorry-looking, in fact – everything in the image is pixelated and `choppy’ – but just about passable in a pinch.
emf: quality is the best, even though the resolution is only at 96 dpi, and the image looks exactly like the original figure. Much better than the emf2png, and also slightly better than a png even when it’s had the font/markersizes increased. _This_ is the best of the three choices, when working with Word documents.
Coloured lines in imported pngs are slightly faded (when printed out on a B/W printer), but those in imported emfs and emf2pngs are still quite visible.
In addition to importing an external image file, when working with Word you also have the option of copying-and-pasting the figure image directly from MATLAB, via the [Windows] Clipboard. Using this method, a figure that is copied in metafile-format (“Preserve information; metafile if possible”), and with the “match screen-size” option, results in an image that has an increased resolution of 120 dpi (like an emf file before it is imported), and looks exactly like the imported emf-file image (96 dpi). If the image is copied in bitmap-format, it comes in at 96 dpi and looks just like the imported emf2png image.
One final word (no, pun not intended): “direct-copying” in metafile format is sometimes preferable to importing an emf file, even though they look the same, because in some versions of Word (Word 2000, for example, but not Word 2003), the latter method can result in the individual letters of the y-axis-label --- but not the entire label as a single unit --- getting rotated 90deg clockwise, resulting in a vertical stack of horizontally-aligned letters.

.doc --> .pdf, using the Adobe pdfmaker macro:
pngs: look terrible, whether font/markersizes have been increased or not. The quality deteriorates very badly in the conversion from doc to pdf.
emf2png: quality deteriorates slightly, not as much as the `true’ pngs; so they look a little better, but not by very much. They’re passable, but just barely so.
emfs (and metafile-format direct-copies): look the best, although the little dashes of the gridlines get converted into big dashes, which is not very pretty at all. (Increasing the linewidth of the gridlines from in the original figure from 0.5 to 1.0 may help – I’m not sure; I haven’t tried it yet.)
For all three formats, coloured lines are almost completely invisible when printed out on a B/W printer.
(All this is with the default pdfmaker settings. Maybe tweaking those will help; I've never tried it, so I don't know. Not sure if using Distiller instead of pdfmaker makes any difference, either.)

.tex --> .pdf:
pdflatex only recognizes images in jpg, gif, png and pdf formats, so now it’s not possible to import those good-quality emfs or even directly copy from MATLAB via the Clipboard. So we’re stuck with pngs. On the other hand, we can now use figures that have been exported as EPS – Encapsulated PostScript – files, and which can be converted to pdf, either on the fly during `texification’ by Heiko Oberdiek’s epstopdf package, or previously by some other method.
png: looks decent, like it looks in a doc before it’s converted to pdf.
emf2png: also looks like it does in a doc before conversion to pdf.
Therefore, of the two, png is better. Font- and markersizes need to be increased, though, of course.
eps: the best of the lot. Quality is better than that of png. (I don’t know what the resolution of these images is – it seems to be between 110 and 120 dpi.) Here too, though, font/markersizes need to be increased prior to exporting from MATLAB, by possibly the same amount as for the pngs. The gridlines are true dotted lines. Also, the postscript `bounding box' is drawn tighter around the axes/plot, so that, for the same overall figure width, the graph/image is larger --- unless you specify the -loose option while using the MATLAB print command.
Coloured lines in all three formats are faded as much as, or even more than, in doc2pdfs.

The -loose option tells the postscript driver to use the figure's PaperPosition property value as the Bounding Box. This is important because, when multiple figures are to be aligned side-by-side in the final document, it is the bounding boxes that get aligned, and, if the different figures have axis labels with different sizes or positions, then not using the -loose option results in differently-sized bounding boxes, with the result that the axes of the different graphs don't line up with each other.

Thus, in shorthand, for each of the three document formats, the order of preference for the image formats would be:

doc: emf /direct-copy > png (sizes increased) > emf2png
doc2pdf: emf/direct-copy > emf2png > png (whatever the size)
tex2pdf: eps2pdf > png (sizes increased) > emf2png

Finally, coming to the scaling of the various image objects (and the images themselves), these are the settings that I have found to work for me:

For a presentation (Powerpoint), using emfs:
Resize figure window to 600 x 450 for 2 graphs per slide
(For 1 graph per slide, magnify --- in Ppt --- to 125%)
Line thickness - 3
Arrow thickness - 2
Arrow text - 16, bold
Axes labels - 14 or 16, bold
Tick labels - 14, bold
Marker size - 10 (15 for x)

For a paper (doc, doc2pdf), using emfs:
Resize (in Matlab) to 600 x 450
Resize (in Word) to 65% (width = 3.17") for double-column size
Inline with text (not floating over)
Line thickness - 2
Legend font - 10 point, normal (default)
Axes labels - 14, bold
Tick labels - 12, normal
Marker size - 6 (10 for asterisks, pentagrams and hexagrams)

For a paper (tex2pdf), using pngs:
(To fit two images side-by-side on a single-column page)
Either don't resize in Matlab (default = 560 x 420), and then scale the width in TeX to 0.5\linewidth, or resize in Matlab to 600x450 and then scale in TeX to 0.45\linewidth.
Line thickness - 1
Legend font - 14 point, normal
Axes labels - 18, normal
Tick labels - 14, normal
Marker size - 8 (12 for asterisks, pentagrams and hexagrams)

For a paper (tex2pdf), using eps2pdfs:
Same figure scaling as above, and:
Line thickness - 1
Legend font - 10 point, normal
Axes labels - 14, normal
Tick labels - 11, normal
Marker size - 6 (10 for asterisks, pentagrams and hexagrams)

For figures involving shading (pcolor, surf, mesh plots, etc.), there are still more things to consider, like the way in which MATLAB renders the image (Painters vs. Zbuffer vs. OpenGL), the colour of the background (transparent -- which doesn't always work out that way -- vs. white vs. `figure-color'), and, again, the method for copying/exporting the image (bitmap vs. metafile)... Oh, and the version of Word or Powerpoint that you're importing into. And each choice interacts with each other choice in completely unpredictable ways. (Of course.) I'll get to these another day.

------------------------------------------------------------------

2008.08.15 (Another Day):

For figures without shading, MATLAB's default renderer is Painters, which is a vector format. For figures with shading, the default renderer seems to be either ZBuffer or OpenGL, which are bitmap/raster formats. (On my computer, it's ZBuffer.)

According to MATLAB, when exporting figures to image files (this includes using the "print" command), the default output resolution of the image is:
- 150 dpi for (figures in image formats and when using the ZBuffer or OpenGL renderers)
- screen resolution for metafiles
- 864 dpi otherwise (eg: eps figures and using the Painters renderer)

Since figures, whether shaded or not, come out at 150 dpi when exported as a png/bitmap-file, this would imply that MATLAB always uses the ZBuffer/OpenGL renderer instead of Painters for this export option.

Now, the method I've been using for my tex-->pdf documents, for images with shading, is to export the images as png files, with all the default options/settings. This means, in the Page Setup options, to:
- Use manual size and position (8"x6" => 1200x900 pixels @ 150dpi)
- Force white background
- Use the default figure rendering method (which is ZBuffer on my computer)

But:
- Keep screen limits and ticks (instead of the default "Re-compute")

The same method can also be used for importing the figures into MS Word (and on to pdf), but now there's also the option of directly copying and pasting from MATLAB into Word.

If the figure is copied in metafile format, or "Preserve information (metafile if possible)" (found under Copy Options), the text comes out looking nice, but the shading becomes blocky --- i.e., the "facecolor" property of the shaded surface gets set to the default "flat" (faceted) before the copy occurs. (This can also be set by Matlab's "shading" command. I usually use the "interp" --- for interpolated --- option, because it gives a smoother, nicer look.) This can be remedied, to an extent, by decreasing the step size of the surface matrix.

If the image is copied in bitmap format, then the interpolation of the shading, if set, is retained, but now the text gets pixelated and choppy. This can be fixed, to an extent, by increasing the text fontsize and changing the fontweight to "bold".

The third option --- importing a 150-dpi png image into Word --- seems to be the best compromise solution for this trade-off: interpolated shading is retained, and the text, while not as flawless as in the pure-vectorized metafile (or emf), is still better than in the bitmap-format copy.

When the .doc file is converted to .pdf, all three formats suffer a slight, and roughly equal, deterioration. The imported-png image remains the best option, in my opinion.

A fourth option (for tex-->pdf users), is exporting the shaded figure as an eps file, but this results in ugliness, and is not an option I would consider.

14 comments:

J said...

Thought you're a Mac-user...

JasonP said...

What made you think that?

J said...

Some of MY stereotypical impression of an engineer are: Mac-user, MATLAB-lover, open-source software-lover, and MS-software-despiser... :)

Plus, I heard (not sure if it is correct) that Mac OS is good for graphic design.

JasonP said...

J: Correct, except that we're more *nix-users than Mac-users (although there are plenty of those out there, too). Anything with a command prompt, basically. We do use Windows machines a lot, too --- simply because they're so ubiquitous.

J said...

Ahha. I think that I was trying to say "*nix-users". Thanks for the clarification. To be more generalized, it should be "command-line lovers". :)

So what about you? (I mean the OS you use on your own laptop instead of the fancy "super-computer" at your workplace.)

Florist said...

I don't think EPS technically has a resolution. It is just a text file that has Postscript commands, and therefore can be rendered in any resolution. But I might be wrong here.

I find that fiddling with the properties of the gcf handle gives you much better control over the matlab figure and can be done via scripts, much easier than having to tweak individual figures. For example, to render plots in two column mode:

set(gcf,'PaperPositionMode','manual');
set(gcf,'PaperUnits','inches');
set(gcf,'PaperPosition',[0 0 3 7.5]);

Anonymous said...

I find the best method to output high quality files is to print to a file. I use png for Word and eps (-deps or -depsc) for latex. You can do it in a script, too.

e.g. to save figure 1 at 600 dpi to a png:
figure(1)
plot(rand(1,10))
print('-f1', '-r600', '-dpng', 'test.png');

Anonymous said...

I have been asked to help someone (they were desperate) to create a graph that uses shading and to paste it into a Word document so that the Word document can then also be converted to pdf, with both the word and the pdf suitable for display on the net. Is this possible?
I have access to Mathematica and Matlab, and I regularly use LaTeX (with PSTricks if needed).
I am happy to do some reading but I am unsure of what to read. Can you help?

JasonP said...

Ken:

I've updated this post with what I've found about exporting (importing) figures with shading in them. You can find the new material at the end of the post. Let me know if this helps, or if you need more information.

Florist:

I agree, about using scripts to modify the property values of the handle graphics directly. Also, there's no GUI way that I've found to specify the -loose option when printing/exporting a figure to an eps file. On the other hand, I don't know of a command-line way to specify the "Keep screen limits and ticks" option, so I'm forced to go the GUI route for that (FigureWindow > File > Page Setup > Axes and Figure). So, the method I've settled on for exporting as png/eps, but with matching screensize and with loose bounding box, is:

% ----------------------------------------------- %
% (Select one of the following, as appropriate:)

cur_axes = gca;

% OR

% This includes legends as well, but need to do this if figure windows have subplots (or multiple axes).
axobj = findobj('type', 'axes');
for i = 1:length(axobj)
cur_axes = axobj(i);

% OR

figobj = findobj('type', 'figure');
for i = 1:length(figobj)
cur_axes = get(figobj(i), 'CurrentAxes');
% ------------------------------------------------ %

x_lab = get(cur_axes, 'XLabel');
y_lab = get(cur_axes, 'YLabel');
z_lab = get(cur_axes, 'ZLabel');
titl = get(cur_axes, 'Title');

set(cur_axes, 'linewidth', 0.5);
set(cur_axes, 'yminortick', 'on', 'xminortick', 'on')
set(cur_axes, 'FontUnits', 'points', 'Fontsize', 11, 'Fontweight', 'normal');
set(x_lab, 'FontUnits', 'points', 'Fontsize', 14, 'Fontweight', 'normal');
set(y_lab, 'FontUnits', 'points', 'Fontsize', 14, 'Fontweight', 'normal');
set(z_lab, 'FontUnits', 'points', 'Fontsize', 14, 'Fontweight', 'normal');
set(titl, 'FontUnits', 'points', 'Fontsize', 11, 'Fontweight', 'normal');
end
lobj = findobj('tag', 'legend');
set(lobj, 'fontsize', 10)


Then go the GUI route to specify the "Keep screen limits and ticks" and "Use screen size" options (although the latter is equivalent to setting the figure's PaperPositionMode property value to "auto").


And then either GUI-export the figure as a png, or use the command:
print -deps2 -loose [filename]

Anonymous said...

I am interested in making several figures using the same unit (i.e. 1 unit = 1 cm on the screen), before exporting them, but I don't know how. Maybe you have an ideea that I might try.

Opher Donchin said...

Basically, you want work in centimeters on the screen:

set(gcf, 'Units', 'centimeter');

and then put the figure to whatever size you want in centimeters and set it to print at that size as well:

set(gcf, 'Position', [10 10 XX YY], 'PaperPosition', [30 30 XX YY]);

Now, put your axes in specific places also with a specific size and set the axes limits to match the size of the axes

axes('Units', 'centimeter', 'Position', [0.5 0.5 XXa YYa], 'XLim', [0 XXa], 'YLim', [0 YYa]);

Finally, draw a 1 cm line

line([1 2], [1 1]);

I think that should do you.

Opher

Unknown said...

I use print -dpng -r90. This makes plots which have readable fonts when inserted into Word and PowerPoint. -- John

Unknown said...
This comment has been removed by the author.
Unknown said...

I mostly use LaTeX to create report, parpers or presentations (beamer).
So naturally, my favorite image format for Matlab figures are eps (compiling with latex command) and pdf (compiling with pdflatex command).
I always output my figures to eps files and use epstopdf if needed.
If I have to do Word-like documents, I still use eps files under OpenOffice.org and things are almost ok (it may eats a lot of cpu at times).

On the Matlab side, I far as I've tested things, I do the following:
1) generate/display a figure (always scripts) on screen,

2) resize it,
It can be done manually, or automatically using:
pos=get(fid, 'position');
set(fid, 'Position', [pos(1:2), width, height]);

3) set figure's the "paper position mode" to auto in order to get an eps file that will look like what is displayed on screen (but this is not really accurate...):
set(gcf,'paperpositionmode','auto');

4) call a homebrew function called "fig_print". Schematically (when args parsed & string to eval constructed) it does the following:
eval(['print -f', num2str(fid), ' -d', lower(format), ' -noui -painters -adobecset -r', num2str(resolution), ' ', filename]);
For example, for figure #3, 'eps' format, 300dpi resolution and filename '/tmp/fig_print-test.eps' (default settings), it gives the following:
print -f3 -deps -noui -painters -adobecset -r300 /tmp/fig_print-test.eps
Note that I don't use -loose option since it uselessly increases the eps bounding box (adds extra margins that I don't wish to get in my documents).
The resolution shouldn't have any effect since eps is vector format, but it does indeed and modifies, for example, the radius of plot dots (I assume their raduis is set as a number of dots, so that's quite logical...)


Pros and cons of this method:
+ that's a way to control graphics output,
+ when all is well tuned, eps figure are really nice for both TeX and Word-like docs,
+ things can be almost automated (even resizing is called from my figures scripts),

- I sometimes have issues with legends and text in it that can exceed the legend box,
- I have to resize the figure smaller in order to increase font size into resulting eps figures,
- axes ticks on screen won't match those I will obtain in the eps file,
- it's painful to redo figures with a different plot/text sizes ratio (playing with figure size, cf. 2)...)