How does Gluster works - Locating files after change sin cluster

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How does Gluster works - Locating files after change sin cluster

Barak Sason Rofman
Hello everyone,

I'm about to post several threads with question regarding how Gluster handles different scenarios.
I'm looking for answers on architecture/design/"the is the idea" level, and not specifically implementation (however, it would be nice to know where the relevant code is).

In this thread I want to focus on the "adding servers/bricks" scenario.
From what I know at this point, every file that's created is given a 32-bit value based on it's name, and this hashing function is fixed and independent of any factors.
Next, there is a function (a routing method), located on the client side, that *is* dependent on outside factors, such as numbers of servers (or bricks) in the system which determines on which server a particular file is located.

Let's examine the following case:
Assume (for simplicity's sake) that the hashing function assign values to file in 1-100 range (instead of 32-bit) and currently there are 4 servers in the cluster.
In this case, files 1-25 would be located on server 1, 26-50 on server 2 and so on.
Now, if a 5th server is added to the cluster, then the ranges will change: files 1-20 will be located on server 1, 21-40 on server 2 and so on.

The questions regarding this scenarios are as follows:
1 - Does the servers update the clients that an additional server (or brick) has been added to the cluster? If not, how does this happen?
2 - Does the server also know which files *should* be located on them? if so, does the servers create a link file (which specifies the "real" location of the file) for the files that are supposed to be moved (e.g. files 21-25) or actually move the data right away? Maybe this works in a completely different manner?

I have additional questions regarding this, but they are dependent om the answers to these question.

Thank you all for your help.
--
Barak Sason Rofman

Gluster Storage Development

Red Hat Israel

34 Jerusalem rd. Ra'anana, 43501

bsasonro[hidden email]    T: +972-9-7692304
M: +972-52-4326355


_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: How does Gluster works - Locating files after change sin cluster

Raghavendra Talur-2


On Wed, Sep 4, 2019 at 5:01 AM Barak Sason Rofman <[hidden email]> wrote:
Hello everyone,

I'm about to post several threads with question regarding how Gluster handles different scenarios.
I'm looking for answers on architecture/design/"the is the idea" level, and not specifically implementation (however, it would be nice to know where the relevant code is).

In this thread I want to focus on the "adding servers/bricks" scenario.
From what I know at this point, every file that's created is given a 32-bit value based on it's name, and this hashing function is fixed and independent of any factors.
Next, there is a function (a routing method), located on the client side, that *is* dependent on outside factors, such as numbers of servers (or bricks) in the system which determines on which server a particular file is located.

Let's examine the following case:
Assume (for simplicity's sake) that the hashing function assign values to file in 1-100 range (instead of 32-bit) and currently there are 4 servers in the cluster.
In this case, files 1-25 would be located on server 1, 26-50 on server 2 and so on.
Now, if a 5th server is added to the cluster, then the ranges will change: files 1-20 will be located on server 1, 21-40 on server 2 and so on.

The questions regarding this scenarios are as follows:
1 - Does the servers update the clients that an additional server (or brick) has been added to the cluster? If not, how does this happen?

Yes, addition of a brick happens through a gluster cli command that updates the volume info in glusterd. Glusterd(the one which updated config and other peers) update clients about this change.

2 - Does the server also know which files *should* be located on them? if so, does the servers create a link file (which specifies the "real" location of the file) for the files that are supposed to be moved (e.g. files 21-25) or actually move the data right away? Maybe this works in a completely different manner?

The addition of a brick has a step for updating the xattrs on the bricks which marks the range for them. The creation of link files happens lazily. Clients look up on all bricks when they don't find the file on the brick where it is supposed to be(called hashed brick), the brick where they find the file is called cached brick and a link file is created.

For more information on how clients get update from glusterd refer to https://www.youtube.com/watch?v=Gq-yBYq8Gjg


I have additional questions regarding this, but they are dependent om the answers to these question.

Thank you all for your help.
--
Barak Sason Rofman

Gluster Storage Development

Red Hat Israel

34 Jerusalem rd. Ra'anana, 43501

bsasonro[hidden email]    T: +972-9-7692304
M: +972-52-4326355

_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: How does Gluster works - Locating files after change sin cluster

Nithya Balachandran


On Thu, 5 Sep 2019 at 18:33, Raghavendra Talur <[hidden email]> wrote:


On Wed, Sep 4, 2019 at 5:01 AM Barak Sason Rofman <[hidden email]> wrote:
Hello everyone,

I'm about to post several threads with question regarding how Gluster handles different scenarios.
I'm looking for answers on architecture/design/"the is the idea" level, and not specifically implementation (however, it would be nice to know where the relevant code is).

In this thread I want to focus on the "adding servers/bricks" scenario.
From what I know at this point, every file that's created is given a 32-bit value based on it's name, and this hashing function is fixed and independent of any factors.
Next, there is a function (a routing method), located on the client side, that *is* dependent on outside factors, such as numbers of servers (or bricks) in the system which determines on which server a particular file is located.

Let's examine the following case:
Assume (for simplicity's sake) that the hashing function assign values to file in 1-100 range (instead of 32-bit) and currently there are 4 servers in the cluster.
In this case, files 1-25 would be located on server 1, 26-50 on server 2 and so on.
Now, if a 5th server is added to the cluster, then the ranges will change: files 1-20 will be located on server 1, 21-40 on server 2 and so on.

The questions regarding this scenarios are as follows:
1 - Does the servers update the clients that an additional server (or brick) has been added to the cluster? If not, how does this happen?

Yes, addition of a brick happens through a gluster cli command that updates the volume info in glusterd. Glusterd(the one which updated config and other peers) update clients about this change.

2 - Does the server also know which files *should* be located on them? if so, does the servers create a link file (which specifies the "real" location of the file) for the files that are supposed to be moved (e.g. files 21-25) or actually move the data right away? Maybe this works in a completely different manner?

The addition of a brick has a step for updating the xattrs on the bricks which marks the range for them. The creation of link files happens lazily. Clients look up on all bricks when they don't find the file on the brick where it is supposed to be(called hashed brick), the brick where they find the file is called cached brick and a link file is created.

To add to this, directories which were created before the bricks were added will not include the new bricks in the layout until a rebalance or fix-layout is run. Directories created after the add-brick will include the newly added bricks in the range.


 
For more information on how clients get update from glusterd refer to https://www.youtube.com/watch?v=Gq-yBYq8Gjg


I have additional questions regarding this, but they are dependent om the answers to these question.

Thank you all for your help.
--
Barak Sason Rofman

Gluster Storage Development

Red Hat Israel

34 Jerusalem rd. Ra'anana, 43501

bsasonro[hidden email]    T: +972-9-7692304
M: +972-52-4326355

_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel