Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Pentaho Data Integration 4 Cookbook
  • Table Of Contents Toc
  • Feedback & Rating feedback
Pentaho Data Integration 4 Cookbook

Pentaho Data Integration 4 Cookbook

4.8 (8)
close
close
Pentaho Data Integration 4 Cookbook

Pentaho Data Integration 4 Cookbook

4.8 (8)

Overview of this book

Pentaho Data Integration (PDI, also called Kettle), one of the data integration tools leaders, is broadly used for all kind of data manipulation such as migrating data between applications or databases, exporting data from databases to flat files, data cleansing, and much more. Do you need quick solutions to the problems you face while using Kettle? Pentaho Data Integration 4 Cookbook explains Kettle features in detail through clear and practical recipes that you can quickly apply to your solutions. The recipes cover a broad range of topics including processing files, working with databases, understanding XML structures, integrating with Pentaho BI Suite, and more. Pentaho Data Integration 4 Cookbook shows you how to take advantage of all the aspects of Kettle through a set of practical recipes organized to find quick solutions to your needs. The initial chapters explain the details about working with databases, files, and XML structures. Then you will see different ways for searching data, executing and reusing jobs and transformations, and manipulating streams. Further, you will learn all the available options for integrating Kettle with other Pentaho tools. Pentaho Data Integration 4 Cookbook has plenty of recipes with easy step-by-step instructions to accomplish specific tasks. There are examples and code that are ready for adaptation to individual needs.
Table of Contents (17 chapters)
close
close
Pentaho Data Integration 4 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


Databases are broadly used by organizations to store and administer transactional data such as customer service history, bank transactions, purchases and sales, and so on. They also constitute the storage method for data warehouses, the repositories used in Business Intelligence solutions.

In this chapter, you will learn to deal with databases in Kettle. The first recipe tells you how to connect to a database, which is a prerequisite for all the other recipes. The rest of the chapter teaches you how to perform different operations and can be read in any order according to your needs.

Note

The focus of this chapter is on relational databases (RDBMS). Thus the term database is used as a synonym for relational databases through the recipes.

Sample databases

Through the chapter you will use a couple of sample databases. Those databases can be created and loaded by running the scripts available at the book's website. The scripts are ready to run under MySQL.

Note

If you work with a different DBMS you may have to modify the scripts slightly.

For more information about the structure of the sample databases and the meaning of the tables and fields, please refer to Appendix, Data Structures. Feel free to adapt the recipes to different databases. You could try some well known databases; for example Foodmart (available as part of the Mondrian distribution at http://sourceforge.net/projects/mondrian/) or the MySQL sample databases (available at http://dev.mysql.com/doc/index-other.html).

Pentaho BI platform databases

As part of the sample databases used in this chapter you will use the Pentaho BI platform Demo databases. The Pentaho BI Platform Demo is a pre-configured installation that lets you explore the capabilities of the Pentaho platform. It relies on the following databases:

Database name

Description

hibernate

Administrative information including user authentication and authorization data.

quartz

Repository for Quartz, the scheduler used by Pentaho.

sampledata

Data for Steel Wheels, a fictional company that sells all kind of scale replicas of vehicles.

By default, all those databases are stored in Hypersonic (HSQLDB). The script for creating the databases in HSQLDB can be found at http://sourceforge.net/projects/pentaho/files. Under Business Intelligence Server | 1.7.1-stable look for pentaho_sample_data-1.7.1.zip.

It can also be found at svn://ci.pentaho.com/view/Platform/job/bi-platform-sample-data/.

These databases can be stored in other DBMS as well. Scripts for creating and loading these databases in other popular DBMS as for example MySQL or Oracle can be found in Prashant Raju's blog, at http://www.prashantraju.com/projects/pentaho/.

Beside the scripts, you will find instructions for creating and loading the databases.

Tip

Prashant Raju, an expert Pentaho developer, provides several excellent tutorials related to the Pentaho platform. If you are interested in knowing more about Pentaho, it's worth taking a look at his blog.

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY