User:Quibik/Cleaning up SVG files manually

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Written as a response to User:Ahnode.

Editing SVG files via a text editor allows a greater control over the format and file contents. The major benefits are reduced file sizes, simpler and more concise code, and possibly making the code compliant with the SVG standard. In this article, I try to introduce a few useful techniques to accomplishing this.

About text editing[edit]

A good text editor is a must-have. It must support undoing and redoing, searching and replacing (preferably with regular expression support), syntax highlighting, and automatic indentation. Visual Studio Code with the "SVG" plugin by "jock" supports all this, and provides you with live preview (which you can activate by pressing the F1 key and issuing a "Preview SVG" command.) It runs on Windows, macOS and Linux.

Other alternatives include Notepad++ on Windows, and Kate (text editor) or Geany on Linux. There are many other alternatives too.

Code validation[edit]

An SVG file can be checked for standards compatibility with the W3C validator: http://validator.w3.org/. Commons has ValidSVG and InvalidSVG tags that can be added to image to indicate either status.

Please use never just the valid/invalid-tags! They will mainly categorize to heavily overcrowded categories. The best option is to use the script support which will not only check the validity of the file and make proposals for parameters to transclude {{Image generation}} which in most cases can be accepted.

Before editing[edit]

A good amount of work in later editing can be avoided by simply using the correct settings when saving the file with the vector image editor. Try re-saving a problematic file with recommended settings before starting manual editing.

Inkscape[edit]

In Inkscape, prior to saving, the File→Vacuum Defs (in later versions: File→Clean up document) command should be used. This removes unnecessary definitions from the file, reducing its size. Next, the image should be saved as 'Plain SVG' rather than 'Inkscape SVG'. This avoids saving Inkscape-specific metadata, which might be useful when editing the same file later, but is useless when using the image for display. This all and much more is done now with File→Save As…→Optimized SVG since Inkscape 0.47 (2009).

Adobe Illustrator[edit]

With the correct settings in Adobe Illustrator CS3, many issues can be avoided altogether. When saving the SVG image for use on Commons, the following settings should be used:

  • SVG profile: 'SVG 1.1'
  • Fonts: 'SVG' or 'Convert to outline', if the used fonts aren't supported by Wikimedia software (see meta:SVG fonts).
  • Images: 'Embed'
  • Preserve Illustrator editing capabilities: off. When this option is enabled, a lot of standards non-compliant code is generated and hundreds of kilobytes worth of metadata is added to the file. Unchecking this option cures most of the problems with the SVG file.

Editing[edit]

I will be using the first revision of File:MR conditional sign.svg, uploaded by User:Ahnode as an example in this tutorial. (Link to revision used) The file is excessively large: 297 KB (304105 bytes). Considering the content, (6 not simple objects*), it should not be larger than a few kilobytes. By looking at the code, we can see that the image was saved with Adobe Illustrator CS3. Most issues can be solved by re-saving the file with correct settings, but for the purposes of this tutorial this will be ignored.

*) more precise: 3 objects (an orange triangle, a black triangle frame, text of two letters). The two triangles are very simple!

Next, I will go through the steps needed to produce a clean and well-formatted file. The most important idea to keep in mind, when editing SVG files is: remove everything you can, but no more. Image editing programs often add a lot of generic or otherwise unnecessary information, that isn't always needed.

Header[edit]

Original Header:

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 13.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 14948)  -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" [
	<!ENTITY ns_extend "http://ns.adobe.com/Extensibility/1.0/">
	<!ENTITY ns_ai "http://ns.adobe.com/AdobeIllustrator/10.0/">
	<!ENTITY ns_graphs "http://ns.adobe.com/Graphs/1.0/">
	<!ENTITY ns_vars "http://ns.adobe.com/Variables/1.0/">
	<!ENTITY ns_imrep "http://ns.adobe.com/ImageReplacement/1.0/">
	<!ENTITY ns_sfw "http://ns.adobe.com/SaveForWeb/1.0/">
	<!ENTITY ns_custom "http://ns.adobe.com/GenericCustomNamespace/1.0/">
	<!ENTITY ns_adobe_xpath "http://ns.adobe.com/XPath/1.0/">
]>
<svg version="1.0" id="Layer_2" xmlns:x="&ns_extend;" xmlns:i="&ns_ai;" xmlns:graph="&ns_graphs;"
	 xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="100px" height="100px"
	 viewBox="0 0 100 100" enable-background="new 0 0 100 100" xml:space="preserve">
...

The header usually consists of the XML start tag (<?xml ?>), comment indicating image generator, DOCTYPE (can be left out), and the ‎<svg> start tag. Adobe Illustrator adds some non-standard definitions to the DOCTYPE, which can be removed without much consideration. I personally leave the generator information intact, since it is helpful to other editors. The header remains pretty much the same regardless of the image, so most of the time a generic header can be used:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"[
	<!ENTITY ns_sfw "http://ns.adobe.com/SaveForWeb/1.0/">
	<!ENTITY ns_flows "http://ns.adobe.com/Flows/1.0/">
	<!ENTITY ns_imrep "http://ns.adobe.com/ImageReplacement/1.0/">
]>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:i="http://ns.adobe.com/AdobeIllustrator/10.0/"
	width="XXXpx" height="YYYpx" xml:space="preserve">

Only the width and height attributes need to be specified.

Removing unnecessary content[edit]

The first object to remove from this file is obviously the large block of image metadata added by Illustrator, beginning on line 983:

...
<i:pgf  id="adobe_illustrator_pgf">
	<![CDATA[
	eJzsvW2vJMlxZvm9gf4Pdz8IIIFlKd5fuIsB7quGO6JEkNSMFsKAKHWXqB51VxHFbmm5v37dzJ9j
7hZ5m2qKHEmrYQW6ujwyMzIz3OOEmYWfjD/5337ysx/cf/7hb9/9YH4z3H36yZ/8yePHd2+//vDx
[snip]
xyzboLO92eZEe2vuo7X//d/B9puuCaAauh5sbhYbLbU8bLS7ZM/RGjX+Qw02erD8jbH6BR8FW0N1
NO4P1eDovf83KSE3mTfADqYAwdj/AneFH2w=
	]]>
</i:pgf>
</svg>

Remove everything between (and including) the ‎<i:pgf>...‎</i:pgf> tags. After this, the file has lost about 200 KB in size. With about 100 KB remaining, the file is still much too large. Something seems to be wrong.

Looking further around the file we find a <pattern id="Polka_Dot_Pattern">...</pattern> block, taking up a massive 939 lines! Glancing at the image, we don't see any polka-dot patterns or really any patterns at all. So this block can be safely removed.

The file is now small enough to be shown here.

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 13.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 14948)  -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"[
	<!ENTITY ns_sfw "http://ns.adobe.com/SaveForWeb/1.0/">
	<!ENTITY ns_flows "http://ns.adobe.com/Flows/1.0/">
	<!ENTITY ns_imrep "http://ns.adobe.com/ImageReplacement/1.0/">
]>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:i="http://ns.adobe.com/AdobeIllustrator/10.0/"
	 width="100px" height="100px" xml:space="preserve">
<switch>
	<foreignObject requiredExtensions="&ns_ai;" x="0" y="0" width="1" height="1">
		<i:pgfRef  xlink:href="#adobe_illustrator_pgf">
		</i:pgfRef>
	</foreignObject>
	<g i:extraneous="self">
		<g>
			<g>
				<polygon fill="#FCB034" points="91.662,88.909 49.654,88.909 7.646,88.909 49.654,15.621 91.662,88.909 				"/>
				<path d="M49.653,10.591l46.324,80.818H49.653H3.329L49.653,10.591 M49.653,20.646L11.958,86.409h37.695h37.695L49.653,20.646
					L49.653,20.646z"/>
				<path fill="#FCB034" d="M49.653,10.591l46.324,80.818H49.653H3.329L49.653,10.591 M49.653,6.569l-1.735,3.027L1.593,90.415
					l-1.716,2.994h3.452h46.325h46.324h3.451l-1.716-2.994L51.389,9.597L49.653,6.569L49.653,6.569z"/>
			</g>
			<g>
				<path d="M29.3,82.314V61.556h6.273l3.766,14.16l3.725-14.16h6.287v20.759h-3.895V65.974l-4.12,16.341H37.3l-4.106-16.341v16.341
					H29.3z"/>
				<path d="M53.527,82.314V61.556h8.821c2.219,0,3.831,0.187,4.836,0.56c1.006,0.373,1.811,1.036,2.414,1.989
					c0.604,0.953,0.906,2.044,0.906,3.271c0,1.559-0.457,2.845-1.373,3.859s-2.284,1.654-4.106,1.918
					c0.906,0.529,1.654,1.109,2.244,1.742s1.386,1.756,2.386,3.37l2.535,4.05h-5.013l-3.03-4.518
					c-1.076-1.613-1.813-2.631-2.209-3.051s-0.816-0.708-1.26-0.864c-0.444-0.155-1.147-0.233-2.11-0.233h-0.85v8.666H53.527z
					 M57.719,70.335h3.101c2.012,0,3.267-0.085,3.768-0.255c0.5-0.17,0.892-0.463,1.175-0.878s0.425-0.935,0.425-1.558
					c0-0.698-0.187-1.263-0.56-1.692s-0.899-0.7-1.579-0.813c-0.34-0.048-1.359-0.071-3.059-0.071h-3.271V70.335z"/>
			</g>
		</g>
	</g>
</switch>
</svg>

Next, a couple of unnecessary items remain: ‎<switch>...‎</switch> (only the tag, but not the things inbetween) and ‎<foreignObject>...‎</foreignObject>(including everything inbetween) tags. These don't belong to a standard-compliant SVG file and can be removed. The ‎<g> group tag includes another non-standard attribute i:extraneous="self", which can again be safely removed. I find the double level group tags unnecessary anyway, so I removed them both. Only thing left to do now is to bring the indentation level to the beginning of the line and we are done!

Resulting file:

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 13.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 14948)  -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"[
	<!ENTITY ns_sfw "http://ns.adobe.com/SaveForWeb/1.0/">
	<!ENTITY ns_flows "http://ns.adobe.com/Flows/1.0/">
	<!ENTITY ns_imrep "http://ns.adobe.com/ImageReplacement/1.0/">
]>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:i="http://ns.adobe.com/AdobeIllustrator/10.0/"
	 width="100px" height="100px" xml:space="preserve">
<g>
    <polygon fill="#FCB034" points="91.662,88.909 49.654,88.909 7.646,88.909 49.654,15.621 91.662,88.909"/>
    <path d="M49.653,10.591l46.324,80.818H49.653H3.329L49.653,10.591 M49.653,20.646L11.958,86.409h37.695h37.695L49.653,20.646 L49.653,20.646z"/>
    <path fill="#FCB034" d="M49.653,10.591l46.324,80.818H49.653H3.329L49.653,10.591 M49.653,6.569l-1.735,3.027L1.593,90.415
        l-1.716,2.994h3.452h46.325h46.324h3.451l-1.716-2.994L51.389,9.597L49.653,6.569L49.653,6.569z"/>
</g>
<g>
    <path d="M29.3,82.314V61.556h6.273l3.766,14.16l3.725-14.16h6.287v20.759h-3.895V65.974l-4.12,16.341H37.3l-4.106-16.341v16.341 H29.3z"/>
    <path d="M53.527,82.314V61.556h8.821c2.219,0,3.831,0.187,4.836,0.56c1.006,0.373,1.811,1.036,2.414,1.989
        c0.604,0.953,0.906,2.044,0.906,3.271c0,1.559-0.457,2.845-1.373,3.859s-2.284,1.654-4.106,1.918
        c0.906,0.529,1.654,1.109,2.244,1.742s1.386,1.756,2.386,3.37l2.535,4.05h-5.013l-3.03-4.518
        c-1.076-1.613-1.813-2.631-2.209-3.051s-0.816-0.708-1.26-0.864c-0.444-0.155-1.147-0.233-2.11-0.233h-0.85v8.666H53.527z
         M57.719,70.335h3.101c2.012,0,3.267-0.085,3.768-0.255c0.5-0.17,0.892-0.463,1.175-0.878s0.425-0.935,0.425-1.558
        c0-0.698-0.187-1.263-0.56-1.692s-0.899-0.7-1.579-0.813c-0.34-0.048-1.359-0.071-3.059-0.071h-3.271V70.335z"/>
</g>
</svg>

The code is now fully standards-compliant and has been reduced from 296 KB to 1.7 KB by only a few simple steps.

It is possible to strip the code to its bare essentials:

<?xml version="1.0" encoding="utf-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<polygon fill="#FCB034" points="91.662,88.909 49.654,88.909 7.646,88.909 49.654,15.621 91.662,88.909"/>
<path d="M49.653,10.591l46.324,80.818H49.653H3.329L49.653,10.591 M49.653,20.646L11.958,86.409h37.695h37.695L49.653,20.646 L49.653,20.646z"/>
<path fill="#FCB034" d="M49.653,10.591l46.324,80.818H49.653H3.329L49.653,10.591M49.653,6.569l-1.735,3.027L1.593,90.415 l-1.716,2.994h3.452h46.325h46.324h3.451l-1.716-2.994L51.389,9.597L49.653,6.569L49.653,6.569z"/>
<path d="M29.3,82.314V61.556h6.273l3.766,14.16l3.725-14.16h6.287v20.759h-3.895V65.974l-4.12,16.341H37.3l-4.106-16.341v16.341H29.3z M53.527,82.314V61.556h8.821c2.219,0,3.831,0.187,4.836,0.56c1.006,0.373,1.811,1.036,2.414,1.989 c0.604,0.953,0.906,2.044,0.906,3.271c0,1.559-0.457,2.845-1.373,3.859s-2.284,1.654-4.106,1.918 c0.906,0.529,1.654,1.109,2.244,1.742s1.386,1.756,2.386,3.37l2.535,4.05h-5.013l-3.03-4.518 c-1.076-1.613-1.813-2.631-2.209-3.051s-0.816-0.708-1.26-0.864c-0.444-0.155-1.147-0.233-2.11-0.233h-0.85v8.666H53.527z M57.719,70.335h3.101c2.012,0,3.267-0.085,3.768-0.255c0.5-0.17,0.892-0.463,1.175-0.878s0.425-0.935,0.425-1.558 c0-0.698-0.187-1.263-0.56-1.692s-0.899-0.7-1.579-0.813c-0.34-0.048-1.359-0.071-3.059-0.071h-3.271V70.335z"/>
</svg>

It is even possible to remove manual line breaks, so that the SVG file is one very long line.

More simplification is possible but needs intimate knowledge of the SVG language.

More considerations[edit]

With intimate knowledge of the simple path command, in this case just M, H, L it is possible to replace the complicated written polygon

<polygon fill="#FCB034" points="91.662,88.909 49.654,88.909 7.646,88.909 49.654,15.621 91.662,88.909"/>

by a much more simple path

<path fill="#FCB034" d="M7.646,88.909H91.662L49.654,15.621"/>

which draws exactly the same triangle between left bottom, right bottom and top point.
The second orange triangle is obsolete, there is no need to draw it again.
The black triangle frame and the letters can be drawn with the other path; when also the "Adobe" comment and the DOCTYPE declaration are stripped, it will look like

<?xml version="1.0"?>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<path fill="#FCB034" d="M49.65,6.5 0,93.4h99.5"/>
<path d="M29.3,82.314V61.556h6.273l3.766,14.16l3.725-14.16h6.287v20.759h-3.895V65.974l-4.12,16.341
H37.3l-.106-16.341v16.341H29.3zM53.527,82.314V61.556h8.821c2.219,0,3.831,0.187,4.836,0.56
c1.006,0.373,1.811,1.036,2.414,1.989c0.604,0.953,0.906,2.044,0.906,3.271
c0,1.559-0.457,2.845-1.373,3.859s-.284,1.654-4.106,1.918c0.906,0.529,1.654,1.109,2.244,1.742
s1.386,1.756,2.386,3.37l2.535,4.05h-5.013l-3.03-4.518c-1.076-1.613-1.813-2.631-2.209-3.051
s-0.816-0.708-1.26-0.864c-.444-0.155-1.147-0.233-2.11-0.233h-.85v8.666H53.527zM57.719,70.335h3.101
c2.012,0,3.267-0.085,3.768-0.255c0.5-0.17,0.892-0.463,1.175-0.878s0.425-0.935,0.425-1.558
c0-0.698-0.187-1.263-0.56-1.692s-0.899-0.7-1.579-0.813c-0.34-0.048-1.359-0.071-3.059-0.071
h-3.271V70.335zM49.653,10.591l46.324,80.818H3.329zm0,10.051L11.958,86.409h75.39L49.653,20.646z"/>
</svg>

This file of now 964 bytes (linefeeds kept) shows exactly the same image as before we started, the only complicated or difficult thing is the path text, requiring cubic Béziers. When the appearance of text is not so essential, when it would not matter if it looks a bit different, the embedding of the text would make a very simple drawing.
The drawing requires now 0.31%, three promille of the previous size of 304.105 bytes. The file has now a good state of simplification, but can still be more minimized.


A possibility is to draw everything, including the path text, with integer coordinates, it will give a look a little bit different, with now only 336 bytes (0.11%, one promille):

<?xml version="1.0"?>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<path fill="#FCB034" d="M50,6 1,93h98"/>
<path d="M29,82V61h6l4,14 4-14h6v21H45V66l-4,16H37l-4-16v16
M58,82H53V61h12a6,6 0 1,1 0,12H58V82V70h5.5a2,2 0 1,0 0-5H58v5 M53,72h9q5,1 6,4l4,6H67l-4-6q-2-3-3-2H58V82
M50,10l46,81H4zm0,10-38,66h76z"/>
</svg>

It has three independent black objects: the M, the R and the triangle, each one coded in an own line.
The half circles of the upper part of the "R" are drawn with the a command; the downstroke is drawn with quadratic Bézier curves q which is sufficient for such a simple drawing.