Plugins/Plugin XML Format

From Plnwiki

The Plugin XML Format

Plugins are saved in a simple xml format. Name value pairs are stored in entry elements like:

 <entry>
   <string>plugin_identifier</string>
   <string>edu.emory.library.multimedia.iln</string>
 </entry>

 <entry>
   <string>plugin_version</string>
   <string>3</string>
 </entry>

 <entry>
   <string>au_def_new_content_crawl</string>
   <long>86400000</long>
 </entry>


Most name value pairs are easy to interpret and easy to match with form options in the plugintool. The au_def_new_content_crawl value defines the time between content recrawls used by LOCKSS daemon. Times are given in milliseconds so that:

   31449600000  == 364 days == 52 weeks 
   15724800000  == 182 days == 26 weeks 
    7776000000  == 90 days  == quarter of a year
    1209600000  == 14 days
     604800000  == 7 day    == 1 week 
      86400000  == 1 day

Lists like rule lists and configuration parameters are formatted as shown below. Both are best viewed and edited with the plugintool they use internal constants and complicated formatting in rule patterns and parameter definitions.

 <entry>
   <string>au_crawlrules</string>
   <list>
     <string>4,"^%s", base_url</string>
     <string>1,"^%s.*", base_url</string>
   </list>
 </entry>

and

 <entry>
   <string>plugin_config_props</string>
   <list>
     <org.lockss.daemon.ConfigParamDescr>
       <key>base_url</key>
       <displayName>Base URL</displayName>
       <description>Usually of the form http://<journal-name>.com/</description>
       <type>3</type>
       <size>40</size>
       <definitional>true</definitional>
       <defaultOnly>false</defaultOnly>
     </org.lockss.daemon.ConfigParamDescr>
   </list>
 </entry>


Technical Description

Here, we will document the LOCKSS Plugin XML format in more detail.

Basic Structure

The XML document should consist of a root <map> element containing several <entry> elements. Individual <entry> elements are named according to their purpose by the first <string> child within them.

Entry: plugin_config_props

Contains a <list> whose children represent plugin parameters belonging to the plugin, represented by <org.lockss.daemon.ConfigParamDescr> elements. Example:

 <entry>
   <string>plugin_config_props</string>
   <list>
     <org.lockss.daemon.ConfigParamDescr>...</>
     <org.lockss.daemon.ConfigParamDescr>...</>
     <org.lockss.daemon.ConfigParamDescr>...</>
   </list>
 </entry>

Element: org.lockss.daemon.ConfigParamDescr

Represents a parameter accepted by the plugin. For example:

 <org.lockss.daemon.ConfigParamDescr>
   <key>base_url</key>
   <displayName>Base URL</displayName>
   <description>Usually of the form http://<journal-name>.com/</description>
   <type>3</type>
   <size>40</size>
   <definitional>true</definitional>
   <defaultOnly>false</defaultOnly>
 </org.lockss.daemon.ConfigParamDescr>
  • key: Machine-readable name for the parameter
  • displayName: Human-readable name for the parameter
  • description: Human-readable description of the parameter, generally used to help AU creators
  • type: The type of data value allowed for the parameter, referring to the following:
    • 1: String
    • 2: Integer
    • 3: URL
    • 4: Year
    • 5: Boolean
    • 6: Positive Integer
    • 7: Range
    • 8: Numeric Range
    • 9: Set
    • 10: User:passwd
    • 11: Long
  • size: The maximum allowed size of the value, presumably in characters. How this is applied to various data types is currently unknown
  • definitional: "true" if this parameter is integral to the identity of an AU. "false" otherwise
  • defaultOnly: Unknown. This is set to "false" in all of our plugins.

Entry: plugin_version

Contains a <string> made up of a numeric version number for the plugin, generally an integer starting with 1. Example:

 <entry>
   <string>plugin_version</string>
   <string>1</string>
 </entry>

Entry: au_name

Contains a <string> made up of a format string that should be used to represent individual AU definitions in a human-readable way. Example:

 <entry>
   <string>au_name</string>
   <string>"All reachable from BaseUrl/%s/%s", path, start</string>
 </entry>

Entry: au_start_url

Example:

 <entry>
   <string>au_start_url</string>
   <string>"%s/%s/%s", base_url, path, start</string>
 </entry>

Entry: au_manifest

Contains a <string> that indicates where LOCKSS should look for a collection manifest. Example:

<entry>
   <string>au_manifest</string>
   <string>1,"%s/permission.html", base_url</string>
 </entry>

Entry: au_crawl_depth

Contains an <int> specifying the maximum recursion depth for crawling. We often use 999 to effectively remove the limit, but be careful of circular symbolic links on the server being crawled in this case.

Example:

 <entry>
   <string>au_crawl_depth</string>
   <int>999</int>
 </entry>

Entry: au_def_new_content_crawl

Example:

 <entry>
   <string>au_def_new_content_crawl</string>
   <long>7257600000</long>
 </entry>

Entry: au_def_pause_time

Example:

 <entry>
   <string>au_def_pause_time</string>
   <long>6000</long>
 </entry>

Entry: plugin_name

Contains a <string> with a human-readable name for the plugin. Example:

 <entry>
   <string>plugin_name</string>
   <string>MA All reachable From BaseUrl/Start/Path</string>
 </entry>

Entry: plugin_identifier

Contains a <string> with a fully-qualified, unique name for the plugin. Example:

 <entry>
   <string>plugin_identifier</string>
   <string>org.metaarchive.AllFromStart</string>
 </entry>

Entry: au_crawlrules

Contains a <list> of <string> elements containing crawl rule strings, documented elsewhere.

The meaning of the first number in the crawl rule strings are as follows:

  • 1: Include
  • 2: Exclude
  • 3: Include No Match
  • 4: Exclude No Match
  • 5: Include Match, Else Exclude
  • 6: Exclude Match, Else Include