The structure of data


The structure of the data is a crucial factor in deciding the choice and design of a database. There are three widely recognized categories:

  • Structured
  • Semi-structured
  • Unstructured

Structured data

This type of data is typically composed of rows and columns; rows are entities or records and columns are attributes. Structured data is organized in such a way that you can be sure that the data structure will be consistent for the most part throughout the life cycle of that data, except for the possible addition or removal of some attributes altogether. This kind of data is mostly transactional or analytical.

Semi-structured data

Semi-structured data does not follow a fixed tabular format – that is, a column-row structure. Instead, it stores schema attributes along with data. The attributes for semi-structured data could vary for each record. The major differentiating factor for each kind of semi-structured data is the way they are accessed.

Some common examples and sources of semi-structured data include XML data, JSON, email data, files, and web pages. Here is an example of semi-structured data:

Unstructured data

Unstructured data includes images, audio files, and so on. Unstructured data does not have a definite schema or data model. The amount of unstructured data is much larger than that of structured data. So, the methods by which we store such data are more important than ever. Here are some examples of unstructured data:

  • Text
  • Audio
  • Video
  • Images
  • Other binary large objects (BLOBs)