Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • In-Memory Analytics with Apache Arrow
  • Toc
  • feedback
In-Memory Analytics with Apache Arrow

In-Memory Analytics with Apache Arrow

By : Matthew Topol
4.9 (15)
close
In-Memory Analytics with Apache Arrow

In-Memory Analytics with Apache Arrow

4.9 (15)
By: Matthew Topol

Overview of this book

Apache Arrow is designed to accelerate analytics and allow the exchange of data across big data systems easily. In-Memory Analytics with Apache Arrow begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrow’s versatility and benefits as you walk through a variety of real-world use cases. You'll cover key tasks such as enhancing data science workflows with Arrow, using Arrow and Apache Parquet with Apache Spark and Jupyter for better performance and hassle-free data translation, as well as working with Perspective, an open source interactive graphical and tabular analysis tool for browsers. As you advance, you'll explore the different data interchange and storage formats and become well-versed with the relationships between Arrow, Parquet, Feather, Protobuf, Flatbuffers, JSON, and CSV. In addition to understanding the basic structure of the Arrow Flight and Flight SQL protocols, you'll learn about Dremio’s usage of Apache Arrow to enhance SQL analytics and discover how Arrow can be used in web-based browser apps. Finally, you'll get to grips with the upcoming features of Arrow to help you stay ahead of the curve. By the end of this book, you will have all the building blocks to create useful, efficient, and powerful analytical services and utilities with Apache Arrow.
Table of Contents (16 chapters)
close
1
Section 1: Overview of What Arrow Is, its Capabilities, Benefits, and Goals
5
Section 2: Interoperability with Arrow: pandas, Parquet, Flight, and Datasets
11
Section 3: Real-World Examples, Use Cases, and Future Development

Arrow format versioning and stability

In order to ensure confidence that updating the version of the Arrow library in use won't break applications and the long-term stability of the Arrow project, there are two versions used to describe each release of the project: The format version and the library version. Different library implementations and releases can have different versions, but will always be implementing a specific format version. From version 1.0.0 onward, semantic versioning is used with releases.

Provided the major version of the format is the same between two libraries, any new library is backward-compatible with any older library with regards to being able to read data and metadata produced by an older library. Increases in the minor version of the format, such as an increase from version 1.0.0 to version 1.1.0, indicate new features that were added. As long as these new features are not used (such as new logical types or physical layouts), older libraries will be able to read data and metadata produced by newer versions of the libraries.

As far as the long-term stability of the format and libraries, only increases in the major version of the format would indicate any issue with the previous guarantees about compatibility. The Arrow project says that they do not expect this to be a frequent occurrence, rather it would be an exceptional event, in which case such a release would exercise caution for deployment. As a result of these compatibility guarantees, it ends up being safe and simple to ensure backward and forward compatibility when using the Arrow libraries and format.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete