This request is used by clients and I think that this is the most important and usable request for clients, so it's better to quickly get the data rather than waiting for it a lot of time.
+1 for Availability + Partition tolerance
Let's simplify the task. Let's imagine that trashcan could add only an employee of housing and communal services. It means that we won't have any collisions and it will be easy to merge it.
If the idea is to guarantee availability + partition tolerance we will have a problem: if one person saves trashcan to one machine, another - to another machine, users can catch the inconsistency of the data before synchronization while viewing the nearest trashcans.
HERE IS THE ISSUE OF INCONSISTENCY
In the API description we define that trashcan has the following properties:
{
plastic: boolean,
glass: boolean,
paper: boolean
}
Each of the true value could mean that it's possible to utilize the trash of this type. If some of this is falsy it means that it's either full or unavailable at all.
There are two possible cases:
- 3.1. The request is called by an employee or the trashcan is tracking the state itself
It means that we will have the only source of this request for a particular trashcan. It means no conflicts for a particular trashcan. Moreover, an update of one trashcan doesn't influence other trashcans, it means that we have consistency at the level of trashcans.
But when sth goes wrong inside trashcan logic which is responsible for tracking its state, we could descrease the availability, since the trashcan is the only one that can do it.
So we could consider the second option.
- 3.2. Allow users to update the trashcan state
When the users start to update the trashcan properties it can cause data discrepancy on different machines. The issue is HOW TO MERGE DATA and what to do if MACHINES LOST CONNECTION? What we will show the users on get request?
So the service looks like Availability + Partition tolerance service.
Partition will be done among multiple domains: in our case, one domain is one city/area.
The main characteristics of the domain:
- data inside the single domain is replicated -> availability for each domain;
- we aren't allowed to make cross-domain requests -> that's ok, we don't need this;
- write is consistent for each domain.
In the unlikely event that one replica fails, SimpleDB can failover to another replica in the system.
DB supports two read consistency options: eventually consistent read and consistent read.
These are the statements from the documentation:
An eventually consistent read might not reflect the results of a recently completed write. Consistency across all copies of the data is usually reached within a second;
A consistent read returns a result that reflects all writes that received a successful response prior to the read.
So let's review the cases of reading the data:
-
It seems that for GET request of particular trashcan we could use consistent read to see a result that reflects all writes that received a successful response prior to the read. Because we'd like to make this request before changing the trashcan properties.
-
For the GET request of the list of trashcans we could use eventually consistent read, since reaching the trashcan will take some time anyway, so it's not essential to get the 100% actual data.
When the users update the entity simultaneously it's possible to use optimistic concurrency control by maintaining a timestamp attribute as part of an item and by performing a conditional update based on the value of the timestamp.
This mechanism is implemented using conditional put.
It means that if the User B changes sth between write and read action during User A update, the update of User A will fail.
However, I'm not sure that we really need it because trashcan properties have a boolean type.
If the User A wants to change plastic
property from false
to true
,
I don't think it's a problem when someone will change it to true
or false
before the write action of A, the result will be the same.
So I'm not sure that we should crash users' requests because it seems that this situation is not common for this service and users can't set custom values for trashcan properties.
SimpleDB suggests to retry requests on server errors.
It's could be problematic for POST requests since having two database entities for the one trashcan could be confusing for the users.
We could think about the following idea:
SimpleDB has itemName
field for each item which should be unique.
We can generate itemName value based on the coordinates/area identifier, so there wouldn't be two trashcans inside one area or
with the same coordinates. It means that we won't be able to add two trashcans with the same coordinates / area identifier.