Extensions

Pig

Since pig jobs are so common, azkaban comes with an extension to:

  • run pig scripts directly from the command line (and view the output logs from your terminal): azkabanpig. Under the hood, this will package your script along with the appropriately generated job file and upload it to Azkaban. Running azkabanpig --help displays the list of available options (using UDFs, substituting parameters, running several scripts in order, etc.).

  • integrate pig jobs easily into your project configuration via the PigJob class which automatically sets the job type and adds the corresponding script file to the project.

    from azkaban import PigJob
    
    project.add_job('baz', PigJob({'pig.script': 'baz.pig'})
    

The full API for the PigJob class is below.

class azkaban.ext.pig.PigJob(*options)

Bases: azkaban.job.Job

Convenience job class for running pig scripts.

Parameters:options – Tuple of options (cf. Job). These options must specify a 'pig.script' key. The corresponding file will then automatically be included in the project archive.

This class allows you to specify JVM args as a dictionary by correctly converting these to the format used by Azkaban when building the job options. For example: {'jvm.args': {'foo': 1, 'bar': 2}} will be converted to jvm.args=-Dfoo=1 -Dbar=2. Note that this enables JVM args to behave like all other Job options when defined multiple times (latest values taking precedence).

Finally, by default the job type will be set automatically to 'pig'. You can also specify a custom job type for all PigJob instances in the azkabanpig section of the ~/.azkabanrc configuration file via the default.type option.

on_add(project, name, **kwargs)

This handler adds the corresponding script file to the project.