[RFC] inode table locking contention reduction experiment

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] inode table locking contention reduction experiment

Changwei Ge
Hi,

I am recently working on reducing inode_[un]ref() locking contention by
getting rid of inode table lock. Just use inode lock to protect inode
REF. I have already discussed a couple rounds with several Glusterfs
developers via emails and Gerrit and basically get understood on major
logic around.

Currently, inode REF can be ZERO and be reused by increasing it to ONE.
This is IMO why we have to burden so much work for inode table when
REF/UNREF. It makes inode [un]ref() and inode table and dentries(alias)
searching hard to run concurrently.

So my question is in what cases, how can we find a inode whose REF is ZERO?

As Glusterfs store its inode memory address into kernel/fuse, can we
conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?


Thanks,
Changwei
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Xavi Hernandez
Hi Changwei,

On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <[hidden email]> wrote:
Hi,

I am recently working on reducing inode_[un]ref() locking contention by
getting rid of inode table lock. Just use inode lock to protect inode
REF. I have already discussed a couple rounds with several Glusterfs
developers via emails and Gerrit and basically get understood on major
logic around.

Currently, inode REF can be ZERO and be reused by increasing it to ONE.
This is IMO why we have to burden so much work for inode table when
REF/UNREF. It makes inode [un]ref() and inode table and dentries(alias)
searching hard to run concurrently.

So my question is in what cases, how can we find a inode whose REF is ZERO?

As Glusterfs store its inode memory address into kernel/fuse, can we
conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?

Yes, when an inode gets refs = 0, it means that gluster code is not using it anywhere, so it cannot be referenced again unless kernel sends new requests on the same inode. Once refs=0 and nlookup=0, the inode can be destroyed.

Inode code is quite complex right now and I haven't had time to investigate this further, but I think we could simplify inode management significantly (specially unref) if we add a reference when nlookup becomes > 0, and remove a reference when nlookup becomes 0 again. Maybe with this approach we could avoid inode table lock in many cases. However we need to make sure we correctly handle invalidation logic to keep inode table size under control.

Regards,

Xavi



Thanks,
Changwei
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Amar Tumballi-2


On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez <[hidden email]> wrote:
Hi Changwei,

On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <[hidden email]> wrote:
Hi,

I am recently working on reducing inode_[un]ref() locking contention by
getting rid of inode table lock. Just use inode lock to protect inode
REF. I have already discussed a couple rounds with several Glusterfs
developers via emails and Gerrit and basically get understood on major
logic around.

Currently, inode REF can be ZERO and be reused by increasing it to ONE.
This is IMO why we have to burden so much work for inode table when
REF/UNREF. It makes inode [un]ref() and inode table and dentries(alias)
searching hard to run concurrently.

So my question is in what cases, how can we find a inode whose REF is ZERO?

As Glusterfs store its inode memory address into kernel/fuse, can we
conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?

Xavi's answer below provides some insights. and same time, assuming that only fuse_ino_to_inode() can bring back inode from ref=0 state (for now), is a good start.
 

Yes, when an inode gets refs = 0, it means that gluster code is not using it anywhere, so it cannot be referenced again unless kernel sends new requests on the same inode. Once refs=0 and nlookup=0, the inode can be destroyed.

Inode code is quite complex right now and I haven't had time to investigate this further, but I think we could simplify inode management significantly (specially unref) if we add a reference when nlookup becomes > 0, and remove a reference when nlookup becomes 0 again. Maybe with this approach we could avoid inode table lock in many cases. However we need to make sure we correctly handle invalidation logic to keep inode table size under control.


My suggestion is, don't wait for a complete solution for posting the patch. Let us get a chance to have a look at WorkInProgress patches, so we can have discussions on code itself. It would help to reach better solutions sooner. 

Regards,

Xavi



Thanks,
Changwei
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel


_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Changwei Ge
In reply to this post by Xavi Hernandez
Hi Xavi,


On 2019/10/30 7:01 下午, Xavi Hernandez wrote:

> Hi Changwei,
>
> On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     I am recently working on reducing inode_[un]ref() locking contention by
>     getting rid of inode table lock. Just use inode lock to protect inode
>     REF. I have already discussed a couple rounds with several Glusterfs
>     developers via emails and Gerrit and basically get understood on major
>     logic around.
>
>     Currently, inode REF can be ZERO and be reused by increasing it to ONE.
>     This is IMO why we have to burden so much work for inode table when
>     REF/UNREF. It makes inode [un]ref() and inode table and dentries(alias)
>     searching hard to run concurrently.
>
>     So my question is in what cases, how can we find a inode whose REF
>     is ZERO?
>
>     As Glusterfs store its inode memory address into kernel/fuse, can we
>     conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?
>
>
> Yes, when an inode gets refs = 0, it means that gluster code is not
> using it anywhere, so it cannot be referenced again unless kernel sends
> new requests on the same inode. Once refs=0 and nlookup=0, the inode can
> be destroyed.
>

Your answer is quite a useful clue for my following work. :-)
Basically, I think I have understood the underlying intention of inode
REF/UNREF code.

> Inode code is quite complex right now and I haven't had time to
> investigate this further, but I think we could simplify inode management
> significantly (specially unref) if we add a reference when
> nlookup becomes > 0, and remove a reference when nlookup becomes 0
> again. Maybe with this approach we could avoid inode table lock in many
> cases. However we need to make sure we correctly handle invalidation
> logic to keep inode table size under control.
>

Sounds a good idea. If we can set inode ref as the *only* condition to
decide whether to free an inode and make sure we won't use an inode with
REF=0 anymore, inode management will be much simpler. We have to ensure
that after decreasing inode ref to ZERO, there should no code path
finding/referencing such an inode. Then we can directly free this inode.


Thanks,
Changwei


> Regards,
>
> Xavi
>
>
>
>     Thanks,
>     Changwei
>     _______________________________________________
>
>     Community Meeting Calendar:
>
>     APAC Schedule -
>     Every 2nd and 4th Tuesday at 11:30 AM IST
>     Bridge: https://bluejeans.com/118564314
>
>     NA/EMEA Schedule -
>     Every 1st and 3rd Tuesday at 01:00 PM EDT
>     Bridge: https://bluejeans.com/118564314
>
>     Gluster-devel mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Changwei Ge
In reply to this post by Amar Tumballi-2
Hi Amar,

On 2019/10/31 6:30 下午, Amar Tumballi wrote:

>
>
> On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Changwei,
>
>     On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hi,
>
>         I am recently working on reducing inode_[un]ref() locking
>         contention by
>         getting rid of inode table lock. Just use inode lock to protect
>         inode
>         REF. I have already discussed a couple rounds with several
>         Glusterfs
>         developers via emails and Gerrit and basically get understood on
>         major
>         logic around.
>
>         Currently, inode REF can be ZERO and be reused by increasing it
>         to ONE.
>         This is IMO why we have to burden so much work for inode table when
>         REF/UNREF. It makes inode [un]ref() and inode table and
>         dentries(alias)
>         searching hard to run concurrently.
>
>         So my question is in what cases, how can we find a inode whose
>         REF is ZERO?
>
>         As Glusterfs store its inode memory address into kernel/fuse,
>         can we
>         conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?
>
>
> Xavi's answer below provides some insights. and same time, assuming that
> only fuse_ino_to_inode() can bring back inode from ref=0 state (for
> now), is a good start.
>
>
>     Yes, when an inode gets refs = 0, it means that gluster code is not
>     using it anywhere, so it cannot be referenced again unless kernel
>     sends new requests on the same inode. Once refs=0 and nlookup=0, the
>     inode can be destroyed.
>
>     Inode code is quite complex right now and I haven't had time to
>     investigate this further, but I think we could simplify inode
>     management significantly (specially unref) if we add a reference
>     when nlookup becomes > 0, and remove a reference when
>     nlookup becomes 0 again. Maybe with this approach we could avoid
>     inode table lock in many cases. However we need to make sure we
>     correctly handle invalidation logic to keep inode table size under
>     control.
>
>
> My suggestion is, don't wait for a complete solution for posting the
> patch. Let us get a chance to have a look at WorkInProgress patches, so
> we can have discussions on code itself. It would help to reach better
> solutions sooner.

Agree.

I have almost implemented my draft design for this experiment.
The immature code has been pushed to my personal Glusterfs repo[1].

Now it's a single large patch, I will split it to patches when I decide
to push it to Gerrit for review convenience. If you prefer to push it to
Gerrit for a early review and discussion, I can do that :-). But I am
still doing some debug stuff.

My work includes:

1. Move inode refing and unrefing logic unrelated logic out from
`__inode_[un]ref()` hence to reduce their arguments.
2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
3. As `inode_table::active_size` is only used for debug purpose, convert
it to atomic variable.
4. Factor out pruning inode.
5. In order to run inode search and grep run concurrently, firstly  use
RDLOCK  and then convert it WRLOCK if necessary.
6. Inode table lock is not necessary for inode ref/unref unless we have
to move it between table lists.

etc...

Any comments, ideas, suggestions are kindly welcomed.

Thanks,
Changwei

[1]:
https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330

>
>     Regards,
>
>     Xavi
>
>
>
>         Thanks,
>         Changwei
>         _______________________________________________
>
>         Community Meeting Calendar:
>
>         APAC Schedule -
>         Every 2nd and 4th Tuesday at 11:30 AM IST
>         Bridge: https://bluejeans.com/118564314
>
>         NA/EMEA Schedule -
>         Every 1st and 3rd Tuesday at 01:00 PM EDT
>         Bridge: https://bluejeans.com/118564314
>
>         Gluster-devel mailing list
>         [hidden email] <mailto:[hidden email]>
>         https://lists.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Amar Tumballi-2
Thanks for this, github works for review right now :-)

I am occupied till Wednesday, and will review them by this week. A glance on the changes looks good to me.

Few tests which can run for validations are :

tests/bugs/shard/bug-1696136-lru-limit-equals-deletion-rate.t
tests/features/fuse-lru-limit.t
tests/bugs/shard/shard-inode-refcount-test.t

Ideal is to run the full regression with `./run-tests.sh -c` 

Regards,
Amar

On Mon, Nov 4, 2019 at 9:21 AM Changwei Ge <[hidden email]> wrote:
Hi Amar,

On 2019/10/31 6:30 下午, Amar Tumballi wrote:
>
>
> On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Changwei,
>
>     On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hi,
>
>         I am recently working on reducing inode_[un]ref() locking
>         contention by
>         getting rid of inode table lock. Just use inode lock to protect
>         inode
>         REF. I have already discussed a couple rounds with several
>         Glusterfs
>         developers via emails and Gerrit and basically get understood on
>         major
>         logic around.
>
>         Currently, inode REF can be ZERO and be reused by increasing it
>         to ONE.
>         This is IMO why we have to burden so much work for inode table when
>         REF/UNREF. It makes inode [un]ref() and inode table and
>         dentries(alias)
>         searching hard to run concurrently.
>
>         So my question is in what cases, how can we find a inode whose
>         REF is ZERO?
>
>         As Glusterfs store its inode memory address into kernel/fuse,
>         can we
>         conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?
>
>
> Xavi's answer below provides some insights. and same time, assuming that
> only fuse_ino_to_inode() can bring back inode from ref=0 state (for
> now), is a good start.
>
>
>     Yes, when an inode gets refs = 0, it means that gluster code is not
>     using it anywhere, so it cannot be referenced again unless kernel
>     sends new requests on the same inode. Once refs=0 and nlookup=0, the
>     inode can be destroyed.
>
>     Inode code is quite complex right now and I haven't had time to
>     investigate this further, but I think we could simplify inode
>     management significantly (specially unref) if we add a reference
>     when nlookup becomes > 0, and remove a reference when
>     nlookup becomes 0 again. Maybe with this approach we could avoid
>     inode table lock in many cases. However we need to make sure we
>     correctly handle invalidation logic to keep inode table size under
>     control.
>
>
> My suggestion is, don't wait for a complete solution for posting the
> patch. Let us get a chance to have a look at WorkInProgress patches, so
> we can have discussions on code itself. It would help to reach better
> solutions sooner.

Agree.

I have almost implemented my draft design for this experiment.
The immature code has been pushed to my personal Glusterfs repo[1].

Now it's a single large patch, I will split it to patches when I decide
to push it to Gerrit for review convenience. If you prefer to push it to
Gerrit for a early review and discussion, I can do that :-). But I am
still doing some debug stuff.

My work includes:

1. Move inode refing and unrefing logic unrelated logic out from
`__inode_[un]ref()` hence to reduce their arguments.
2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
3. As `inode_table::active_size` is only used for debug purpose, convert
it to atomic variable.
4. Factor out pruning inode.
5. In order to run inode search and grep run concurrently, firstly  use
RDLOCK  and then convert it WRLOCK if necessary.
6. Inode table lock is not necessary for inode ref/unref unless we have
to move it between table lists.

etc...

Any comments, ideas, suggestions are kindly welcomed.

Thanks,
Changwei

[1]:
https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330

>
>     Regards,
>
>     Xavi
>
>
>
>         Thanks,
>         Changwei
>         _______________________________________________
>
>         Community Meeting Calendar:
>
>         APAC Schedule -
>         Every 2nd and 4th Tuesday at 11:30 AM IST
>         Bridge: https://bluejeans.com/118564314
>
>         NA/EMEA Schedule -
>         Every 1st and 3rd Tuesday at 01:00 PM EDT
>         Bridge: https://bluejeans.com/118564314
>
>         Gluster-devel mailing list
>         [hidden email] <mailto:[hidden email]>
>         https://lists.gluster.org/mailman/listinfo/gluster-devel
>

_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Changwei Ge


On 2019/11/4 12:39 下午, Amar Tumballi wrote:

> Thanks for this, github works for review right now :-)
>
> I am occupied till Wednesday, and will review them by this week. A
> glance on the changes looks good to me.
>
> Few tests which can run for validations are :
>
> tests/bugs/shard/bug-1696136-lru-limit-equals-deletion-rate.t
>
> tests/features/fuse-lru-limit.t
>
> tests/bugs/shard/shard-inode-refcount-test.t
>
>
> Ideal is to run the full regression with `./run-tests.sh -c`

Sure, I will run the entire regression.
Actually, there are still some tiny problems with this large patch. I am
still working on improving it including adding some debug/trace methods.

Or is there some existing trace/dump method for Glusterfs fuse client? I
know there is something like that for brick process but can't find any
for client.

I am planning to join Glusterfs video conference on 12th this month and
then discuss about my idea.

Thanks,
Changwei

>
>
> Regards,
>
> Amar
>
>
> On Mon, Nov 4, 2019 at 9:21 AM Changwei Ge <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Amar,
>
>     On 2019/10/31 6:30 下午, Amar Tumballi wrote:
>      >
>      >
>      > On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez
>     <[hidden email] <mailto:[hidden email]>
>      > <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >
>      >     Hi Changwei,
>      >
>      >     On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge
>     <[hidden email] <mailto:[hidden email]>
>      >     <mailto:[hidden email]
>     <mailto:[hidden email]>>> wrote:
>      >
>      >         Hi,
>      >
>      >         I am recently working on reducing inode_[un]ref() locking
>      >         contention by
>      >         getting rid of inode table lock. Just use inode lock to
>     protect
>      >         inode
>      >         REF. I have already discussed a couple rounds with several
>      >         Glusterfs
>      >         developers via emails and Gerrit and basically get
>     understood on
>      >         major
>      >         logic around.
>      >
>      >         Currently, inode REF can be ZERO and be reused by
>     increasing it
>      >         to ONE.
>      >         This is IMO why we have to burden so much work for inode
>     table when
>      >         REF/UNREF. It makes inode [un]ref() and inode table and
>      >         dentries(alias)
>      >         searching hard to run concurrently.
>      >
>      >         So my question is in what cases, how can we find a inode
>     whose
>      >         REF is ZERO?
>      >
>      >         As Glusterfs store its inode memory address into kernel/fuse,
>      >         can we
>      >         conclude that only fuse_ino_to_inode() can bring back a
>     REF=0 inode?
>      >
>      >
>      > Xavi's answer below provides some insights. and same time,
>     assuming that
>      > only fuse_ino_to_inode() can bring back inode from ref=0 state (for
>      > now), is a good start.
>      >
>      >
>      >     Yes, when an inode gets refs = 0, it means that gluster code
>     is not
>      >     using it anywhere, so it cannot be referenced again unless kernel
>      >     sends new requests on the same inode. Once refs=0 and
>     nlookup=0, the
>      >     inode can be destroyed.
>      >
>      >     Inode code is quite complex right now and I haven't had time to
>      >     investigate this further, but I think we could simplify inode
>      >     management significantly (specially unref) if we add a reference
>      >     when nlookup becomes > 0, and remove a reference when
>      >     nlookup becomes 0 again. Maybe with this approach we could avoid
>      >     inode table lock in many cases. However we need to make sure we
>      >     correctly handle invalidation logic to keep inode table size
>     under
>      >     control.
>      >
>      >
>      > My suggestion is, don't wait for a complete solution for posting the
>      > patch. Let us get a chance to have a look at WorkInProgress
>     patches, so
>      > we can have discussions on code itself. It would help to reach
>     better
>      > solutions sooner.
>
>     Agree.
>
>     I have almost implemented my draft design for this experiment.
>     The immature code has been pushed to my personal Glusterfs repo[1].
>
>     Now it's a single large patch, I will split it to patches when I decide
>     to push it to Gerrit for review convenience. If you prefer to push
>     it to
>     Gerrit for a early review and discussion, I can do that :-). But I am
>     still doing some debug stuff.
>
>     My work includes:
>
>     1. Move inode refing and unrefing logic unrelated logic out from
>     `__inode_[un]ref()` hence to reduce their arguments.
>     2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
>     3. As `inode_table::active_size` is only used for debug purpose,
>     convert
>     it to atomic variable.
>     4. Factor out pruning inode.
>     5. In order to run inode search and grep run concurrently, firstly  use
>     RDLOCK  and then convert it WRLOCK if necessary.
>     6. Inode table lock is not necessary for inode ref/unref unless we have
>     to move it between table lists.
>
>     etc...
>
>     Any comments, ideas, suggestions are kindly welcomed.
>
>     Thanks,
>     Changwei
>
>     [1]:
>     https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330
>
>      >
>      >     Regards,
>      >
>      >     Xavi
>      >
>      >
>      >
>      >         Thanks,
>      >         Changwei
>      >         _______________________________________________
>      >
>      >         Community Meeting Calendar:
>      >
>      >         APAC Schedule -
>      >         Every 2nd and 4th Tuesday at 11:30 AM IST
>      >         Bridge: https://bluejeans.com/118564314
>      >
>      >         NA/EMEA Schedule -
>      >         Every 1st and 3rd Tuesday at 01:00 PM EDT
>      >         Bridge: https://bluejeans.com/118564314
>      >
>      >         Gluster-devel mailing list
>      > [hidden email] <mailto:[hidden email]>
>     <mailto:[hidden email] <mailto:[hidden email]>>
>      > https://lists.gluster.org/mailman/listinfo/gluster-devel
>      >
>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] inode table locking contention reduction experiment

Changwei Ge
Hi Amar and Glusterfs developers,

I updated my code for inode referencing and searching improvement in my
personal Github repo:
https://github.com/changweige/glusterfs/commit/8313f9ccd0fac86abf20df32f5981588c29cd26a

Change from last version:
I cleared the patch.
Pass suggested regression tests.

Also I pushed it to Gerrit, if you prefer a splited patch series. I can
do that and re-push it to Gerrit.


Below is copied from its change log for your reference:


```
     inode: reduce inode ref/unref and searching contention

     Pain:
     At present, for inode management, we use itable to manage all inodes of
     a whole glusterfs volume. Inode reference operations - ref and unref -
     involve its itable lock. So we are very likely to meet lock conflict
     when there are a great deal of files and CPU load is high. Moreover,
     searching inode is a common operation used by various xlators.

     Goal:
     1. This patch aims at reducing inode_[un]ref() lock contention by
adding
     new fine-granularity dedicated ref_lock to protect referencing.
     2. Make inode search/find run congruently by converting mutex to
rwlock.

     PS: Ideally, *only* refcount of inode is the deterministic factor for
     releasing inode. In that way, we can totally get rid of itable lock
when
     ref/unref inode. But for now, as the special case we have to handle is
     when inode ref is 1 or 0. Only then we have to take itable lock to move
     inode between lists. We should not hit the special cases mentioned very
     often. That's why we could gain performance from this patch.

     Action:
     This patch tries best to keep the original inode_[un]ref()
semantics for
     now, since inode management part is crucial infrastructure code for
     Glusterfs. We improve inode management step by step. For now, clear
     relevant part is high priority.

     1. Move itable operations out from inode_[un]ref().
     2. Add a new mutex lock to inode to protect changing inode ref.
     3. Clear inode management logic and make its client code explicit.
     4. Reducing inode ref/unref arguments.
     5. decouple inode ref/unref from inode pruning/linking/unlinking logic.
```


Thanks,
Changwei

On 2019/11/4 1:26 下午, Changwei Ge wrote:

>
>
> On 2019/11/4 12:39 下午, Amar Tumballi wrote:
>> Thanks for this, github works for review right now :-)
>>
>> I am occupied till Wednesday, and will review them by this week. A
>> glance on the changes looks good to me.
>>
>> Few tests which can run for validations are :
>>
>> tests/bugs/shard/bug-1696136-lru-limit-equals-deletion-rate.t
>>
>> tests/features/fuse-lru-limit.t
>>
>> tests/bugs/shard/shard-inode-refcount-test.t
>>
>>
>> Ideal is to run the full regression with `./run-tests.sh -c`
>
> Sure, I will run the entire regression.
> Actually, there are still some tiny problems with this large patch. I am
> still working on improving it including adding some debug/trace methods.
>
> Or is there some existing trace/dump method for Glusterfs fuse client? I
> know there is something like that for brick process but can't find any
> for client.
>
> I am planning to join Glusterfs video conference on 12th this month and
> then discuss about my idea.
>
> Thanks,
> Changwei
>
>>
>>
>> Regards,
>>
>> Amar
>>
>>
>> On Mon, Nov 4, 2019 at 9:21 AM Changwei Ge <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi Amar,
>>
>>     On 2019/10/31 6:30 下午, Amar Tumballi wrote:
>>      >
>>      >
>>      > On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez
>>     <[hidden email] <mailto:[hidden email]>
>>      > <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>>      >
>>      >     Hi Changwei,
>>      >
>>      >     On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge
>>     <[hidden email] <mailto:[hidden email]>
>>      >     <mailto:[hidden email]
>>     <mailto:[hidden email]>>> wrote:
>>      >
>>      >         Hi,
>>      >
>>      >         I am recently working on reducing inode_[un]ref() locking
>>      >         contention by
>>      >         getting rid of inode table lock. Just use inode lock to
>>     protect
>>      >         inode
>>      >         REF. I have already discussed a couple rounds with several
>>      >         Glusterfs
>>      >         developers via emails and Gerrit and basically get
>>     understood on
>>      >         major
>>      >         logic around.
>>      >
>>      >         Currently, inode REF can be ZERO and be reused by
>>     increasing it
>>      >         to ONE.
>>      >         This is IMO why we have to burden so much work for inode
>>     table when
>>      >         REF/UNREF. It makes inode [un]ref() and inode table and
>>      >         dentries(alias)
>>      >         searching hard to run concurrently.
>>      >
>>      >         So my question is in what cases, how can we find a inode
>>     whose
>>      >         REF is ZERO?
>>      >
>>      >         As Glusterfs store its inode memory address into
>> kernel/fuse,
>>      >         can we
>>      >         conclude that only fuse_ino_to_inode() can bring back a
>>     REF=0 inode?
>>      >
>>      >
>>      > Xavi's answer below provides some insights. and same time,
>>     assuming that
>>      > only fuse_ino_to_inode() can bring back inode from ref=0 state
>> (for
>>      > now), is a good start.
>>      >
>>      >
>>      >     Yes, when an inode gets refs = 0, it means that gluster code
>>     is not
>>      >     using it anywhere, so it cannot be referenced again unless
>> kernel
>>      >     sends new requests on the same inode. Once refs=0 and
>>     nlookup=0, the
>>      >     inode can be destroyed.
>>      >
>>      >     Inode code is quite complex right now and I haven't had
>> time to
>>      >     investigate this further, but I think we could simplify inode
>>      >     management significantly (specially unref) if we add a
>> reference
>>      >     when nlookup becomes > 0, and remove a reference when
>>      >     nlookup becomes 0 again. Maybe with this approach we could
>> avoid
>>      >     inode table lock in many cases. However we need to make
>> sure we
>>      >     correctly handle invalidation logic to keep inode table size
>>     under
>>      >     control.
>>      >
>>      >
>>      > My suggestion is, don't wait for a complete solution for
>> posting the
>>      > patch. Let us get a chance to have a look at WorkInProgress
>>     patches, so
>>      > we can have discussions on code itself. It would help to reach
>>     better
>>      > solutions sooner.
>>
>>     Agree.
>>
>>     I have almost implemented my draft design for this experiment.
>>     The immature code has been pushed to my personal Glusterfs repo[1].
>>
>>     Now it's a single large patch, I will split it to patches when I
>> decide
>>     to push it to Gerrit for review convenience. If you prefer to push
>>     it to
>>     Gerrit for a early review and discussion, I can do that :-). But I am
>>     still doing some debug stuff.
>>
>>     My work includes:
>>
>>     1. Move inode refing and unrefing logic unrelated logic out from
>>     `__inode_[un]ref()` hence to reduce their arguments.
>>     2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
>>     3. As `inode_table::active_size` is only used for debug purpose,
>>     convert
>>     it to atomic variable.
>>     4. Factor out pruning inode.
>>     5. In order to run inode search and grep run concurrently,
>> firstly  use
>>     RDLOCK  and then convert it WRLOCK if necessary.
>>     6. Inode table lock is not necessary for inode ref/unref unless we
>> have
>>     to move it between table lists.
>>
>>     etc...
>>
>>     Any comments, ideas, suggestions are kindly welcomed.
>>
>>     Thanks,
>>     Changwei
>>
>>     [1]:
>>    
>> https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330 
>>
>>
>>      >
>>      >     Regards,
>>      >
>>      >     Xavi
>>      >
>>      >
>>      >
>>      >         Thanks,
>>      >         Changwei
>>      >         _______________________________________________
>>      >
>>      >         Community Meeting Calendar:
>>      >
>>      >         APAC Schedule -
>>      >         Every 2nd and 4th Tuesday at 11:30 AM IST
>>      >         Bridge: https://bluejeans.com/118564314
>>      >
>>      >         NA/EMEA Schedule -
>>      >         Every 1st and 3rd Tuesday at 01:00 PM EDT
>>      >         Bridge: https://bluejeans.com/118564314
>>      >
>>      >         Gluster-devel mailing list
>>      > [hidden email] <mailto:[hidden email]>
>>     <mailto:[hidden email] <mailto:[hidden email]>>
>>      > https://lists.gluster.org/mailman/listinfo/gluster-devel
>>      >
>>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel