
Mastering Java Machine Learning
By :

To learn from data, we must be able to understand and manage data in all forms. Data originates from many different sources, and consequently, datasets may differ widely in structure or have little or no structure at all. In this section, we present a high-level classification of datasets with commonly occurring examples.
Based on their structure, or the lack thereof, datasets may be classified as containing the following:
Financial card transactional data with labels of fraud
Market dataset for items bought from grocery store
Sample text data, with no discernible structure, hence unstructured. Separating spam from normal messages (ham) is a binary classification problem. Here true positives (spam) and true negatives (ham) are distinguished by their labels, the second token in each instance of data. SMS Spam Collection Dataset (UCI Machine Learning Repository), source: Tiago A. Almeida from the Federal University of Sao Carlos.
Time series from sensor data
Three genomic sequences are taken into consideration to show the repetition of the sequences CGGGT
and TTGAAAGTGGTG
in all three genomic sequences:
Genomic sequences of DNA as a sequence of symbols.
Insurance claim data, converted into a graph structure showing the relationship between vehicles, drivers, policies, and addresses