Structuring Data
Structured Data
Easier to analyze, but less scalable due to schema generation
- Follows a schema
- Defined data types & relationships
- e.g., SQL, Tables in a relational DB
Unstructured Data
- Schemaless
- Makes up most of data in the world
- e.g., photos, chat logs, MP3
Semi-structured data
- Does not follow larger schema
- Self-describing structure
- e.g., NoSQL, XML, JSON
Databases
Traditional Databases
- OLTP
Data Warehouses
- OLAP
- Read-only databases
- ETL
Data Marts
Dedicated to a specific topics
Data Lakes
- Big Data
- All types of data
- Schema-on-read
- ELT (Extract, Load, Transform)