GeoKettle

What is GeoKettle?

GeoKettle is a "spatially-enabled" version of Pentaho Data Integration (Kettle). Pentaho Data Integration (Kettle) is a powerful, metadata-driven ETL (Extract, Transform and Load) tool dedicated to the integration of different data sources for building data warehouses. It is part of the open source BI (Business Intelligence) software suite designed by Pentaho.

This special distribution of Kettle includes extensions which enable the use of geospatial (GIS) data. Like Kettle, GeoKettle is released under the GNU Lesser General Public License (LGPL) license.

GeoKettle is a realization of the GeoSOA research group (headed by Prof. Thierry Badard, http://geosoa.scg.ulaval.ca) of the Department of geomatics sciences at Laval University, Quebec City, Quebec, Canada.

The GeoKettle development team is composed of:

What is geospatial data?

Geospatial data is used to locate geographic features on a map. It is used mainly in Geographic Information Systems (GIS) to create maps and perform spatial analysis. Geospatial data can be classified in two main categories: raster data, which is composed of bitmap images covering an area on the surface of the earth (e.g. satellite or aerial imagery) and vector data, in which individual features are represented by vector-based geometric primitives (such as points, lines and polygons). For example, a road can be represented as a series of line segments (what is often called a "LineString") on a map.

You may have to deal with geospatial data if your organization uses GIS software (e.g. ESRI ArcGIS or MapInfo) or has to handle spatial data in one or another GIS file format (e.g. Shapefile), XML encoding (e.g. GML, KML) or spatial DBMS (e.g. PostGIS, Oracle Spatial). In an ETL perspective, you may want to automate the transformation and loading of geospatial data from heterogeneous sources to a database. And increasingly, business intelligence applications rely on geospatial data, to enhance the user experience (e.g. map displays, end-user software such as Spatial OLAP) and provide location-aware analysis functionalities. This exposes a need for Spatially-enabled ETL tools, supporting the extraction of geospatial data from various sources, transformation of this data (including spatial analysis functions handling the geometry of geographic features) and loading to a spatially-enabled data warehouse.

GeoKettle aims to fulfill these requirements. It offers the full range of functionality of Pentaho Data Integration (Kettle), and extends it with a new "Geometry" data type for geospatial vector data. It also features input/output support for GIS file formats and spatial DBMS, spatial analysis functions (e.g. topological predicates) and scripting support (with JavaScript) for Geometry objects.

Using GeoKettle

GeoKettle can be used the exact same way as Pentaho Data Integration (PDI). Please refer to the PDI user documentation included in this distribution.

Demo transformations showing the use of the geospatial extensions are also included in this distribution, in the samples/transformations/geokettle directory. The new features for geospatial data are documented here:


Documentation

Various documentations are included both in binary and source packages of GeoKettle. Please, visit the docs subdirectory when you have extracted the archive.

You can also access a demo video which illustrates GeoKettle capabilities. Download the video in flash | wmv.

Finally, all material presented and used during the GeoKettle lab/tutorial session held at OGRS 2009 in Nantes, France on July 09, 2009 is available hereafter:

What's new?


Since release 3.1.0-20081103 :
Since release 2.5.2-20080531:

Upcoming features

The following features are not yet supported in GeoKettle, but are planned for future releases:


License and copyright

Like Pentaho Data Integration, GeoKettle is distributed under the GNU Lesser General Public License (LGPL). Included libraries (GeoTools, JTS, PostGIS driver wrapper) are also LGPL (or a compatible license). Some other libraries (JDBC drivers, Oracle SDOAPI) are closed source but included in binary form according to their respective end-user licenses. Please refer to the included LICENSE.txt file for details.

The GeoKettle extensions are Copyright (c) 2007-2009, GeoSOA research group, Departement of geomatics sciences, Laval University, Quebec, Canada.

Pentaho Data Integration (Kettle) is Copyright (c) 2007-2009, Pentaho Corporation.

Contact and mailing lists

For future releases and more information, visit us at http://www.geokettle.org.

All comments or questions about GeoKettle are welcome! Two mailing lists are available:
To subscribe or to sign off the lists, please visit:

https://lists.sourceforge.net/lists/listinfo/geokettle-users
or
https://lists.sourceforge.net/lists/listinfo/geokettle-devel

How to get involved?

There is a lot of work to do on a project like GeoKettle and your help will be greatly appreciated. So we gladly welcome any contribution to further development, implementation and feedback on usage of GeoKettle.

Nevertheless, it is often hard for new developers or users to work out where they can help. To begin with, we suggest you to subscribe to the mailing lists. Listen in for a while, to learn how others make
contributions.

You can get your local working copy of the latest code. Review the todo list, choose a task or perhaps you have noticed something that needs to be corrected. Make the changes, do the testing, generate a patch, and post to the devel mailing list.

Document writers and translators are usually the most wanted people so if you like to help but you're not familiar with the innermost technical details, don't worry: we have work for you! ;-)

Download

Below is a list of different versions released by the GeoKettle project. You can either download a binary version for direct use of GeoKettle or the source package in order to develop new steps or extend GeoKettle.

SNAPSHOT versions are intermediate releases. They precedes an official release of GeoKettle and are available only on the project page (not on Sourceforge). A SNAPSHOT version includes important bug fixes and are made available for the convenience of the community. The r### tag in the filename refers to the revision tag in the subversion repository on which this version is based.

NOTE: GeoKettle is still in a beta version. Some functionalities are not complete or fully tested. It is not recommended for use in a production environment. By using this software, you acknowledge and assume all the possible risks involved. The authors are not responsible of any consequences or losses related to the use of this software.

SVN repository and issue tracking system

The GeoKettle project is now available on Sourceforge at:

http://sourceforge.net/projects/geokettle

A subversion repository for direct access to the last version of the GeoKettle sources has been set up. The project subversion repository can be checked out through SVN with the following instruction set:

svn co https://geokettle.svn.sourceforge.net/svnroot/geokettle geokettle

This is a generic Subversion checkout command which will pull all modules, tags and/or branches of the project. You will want to add '/trunk' to the HTTPS URL above to check out only trunk (main development line).

You can also directly browse the SVN at:

http://geokettle.svn.sourceforge.net/viewvc/geokettle

An issue tracking system is also available online at:

http://sourceforge.net/tracker/?group_id=262140&atid=1127707

Acknowledgments

We would like to recognize the contributions to GeoKettle from the following organizations and people:

Community

GeoKettle is now used worldwide! You will find hereafter a short list of companies, governmental bodies, not-for-profit organisations, labs or universities that use GeoKettle, develop or plan to develop packaged solutions based on it.

This list is under construction! If you are not on the list and you want to be added, please do not hesitate to contact us.