Structuring Data

Structured Data

Easier to analyze, but less scalable due to schema generation

  • Follows a schema
  • Defined data types & relationships
  • e.g., SQL, Tables in a relational DB

Unstructured Data

  • Schemaless
  • Makes up most of data in the world
  • e.g., photos, chat logs, MP3

Semi-structured data

  • Does not follow larger schema
  • Self-describing structure
  • e.g., NoSQL, XML, JSON

Databases

Traditional Databases

  • OLTP

Data Warehouses

  • OLAP
  • Read-only databases
  • ETL

Data Marts

Dedicated to a specific topics

Data Lakes

  • Big Data
  • All types of data
  • Schema-on-read
  • ELT (Extract, Load, Transform)