13.01 |Lecture 1 | [Introduction to Big Data Platforms](lecturenotes/pdfs/module0-lecture1-0-intro-v0.2.pdf) | **Linh Truong**
14.01 |Tutorial 1 | [Some industrial and open source big data platforms for Your tech radar](lecturenotes/pdfs/tutorial-walkaround-techradar-v0.2.pdf)| **Linh Truong**
20.01 |Lecture 2|Architecting Big Data Platforms| **Linh Truong**
20.01 |Lecture 2|Architecting Big Data Platforms. Additional slides: *[Cloud Infrastructures for Big Data Platforms](lecturenotes/pdfs/module1-lecture2-1-cloudinfrastructuresandservices-v0.2.pdf) and [a Recap on Performance, Dependability, and Fault Tolerance in Distributed Systems](lecturenotes/pdfs/performance-dependability-refresh_Truong.pdf)*| **Linh Truong**
21.01 |Meetup 1 | A taste of Big Data Platforms | **Rohit Raj**, Amanda Chen, Eljon Harlicaj (from the student view)
27.01 |Lecture 3 | Service and Integration Models in Big Data Platforms| **Linh Truong**
28.01 |Meetup 2 | How to succeed on assignments in Big Data Platforms?| **Amanda Chen**, Rohit Raj, Eljon Harlicaj (from the student experience)
@@ -71,6 +71,7 @@ Use nodetool to check Cassandra nodes in different data centers.
'replication_factor' : 3
};
```
>Here you should see the replication_factor set to **3**. It means that given a data item, a row in a table, the data item will be replicated into **3** nodes. Consider our test with a system of three nodes, it means a data item will be available at all nodes.
5. Let's create a table named _bird1234_ inside this keyspace
```
$CREATE TABLE tutorial12345.bird1234 (
...
...
@@ -105,7 +106,7 @@ Use nodetool to check Cassandra nodes in different data centers.
```
11. Repeat steps 1, 3 and 7 for this container. You should get the same data
>This shows that the data was correctly replicated across all our nodes and configuration was correct. Apache Cassandra has a lot of different configurations that were not covered in this tutorial and these can be found in cassandra's [documentation](https://cassandra.apache.org/doc/latest/configuration/index.html).
>Remember that we have **replication_factor==3** so a data item is replicated in 3 nodes. This shows that the data was correctly replicated across all our nodes and configuration was correct. Apache Cassandra has a lot of different configurations that were not covered in this tutorial and these can be found in cassandra's [documentation](https://cassandra.apache.org/doc/latest/configuration/index.html).
The goal of this tutorial is to study consistency support in big databases through the case of Cassandra. The focus is on understanding the consistency features provided by the systems and programmed by the developer and how they influence performance and data accuracy.
>We can only play with simple examples during the tutorial and you should conduct further hands-on to understand this subject.
The consistency level is associated with an operation (e.g. a query). It is based on *the replication_factor configured*(the number of replicas per data items) and *the available nodes* at runtime.
## 1. Setup Cassandra
The Cassandra under test is setup in Google Cloud Platform with 5 nodes, using [Bitnami Cassandra images](https://docs.bitnami.com/google/infrastructure/cassandra/). In this tutorial we have three nodes for accessing from outside the cluster:
...
...
@@ -78,7 +81,7 @@ Using **cqlsh**
mybdp@cqlsh>SELECT * from tutorial12345.bird1234;
```
#### Access data from another Aassandra node
#### Access data from another Cassandra node
Assume that you open a new terminal and connect to the cluster using **Node2** or **Node3**: