[RFC] What if client fuse process crash?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] What if client fuse process crash?

Changwei Ge
Hi list,

If somehow, glusterfs client fuse process dies. All subsequent file
operations will be failed with error 'no connection'.

I am curious if the only way to recover is umount and mount again?

If so, that means all processes working on top of glusterfs have to
close files, which sometimes is hard to be acceptable.


Thanks,

Changwei


_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Ravishankar N
On 05/08/19 3:31 PM, Changwei Ge wrote:
> Hi list,
>
> If somehow, glusterfs client fuse process dies. All subsequent file
> operations will be failed with error 'no connection'.
>
> I am curious if the only way to recover is umount and mount again?
Yes, this is pretty much the case with all fuse based file systems. You
can use -o auto_unmount (https://review.gluster.org/#/c/17230/) to
automatically cleanup and not having to manually unmount.
>
> If so, that means all processes working on top of glusterfs have to
> close files, which sometimes is hard to be acceptable.

There is
https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
which claims to provide a framework for transparent failovers.  I can't
find any publicly available code though.

Regards,
Ravi

>
>
> Thanks,
>
> Changwei
>
>
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> [hidden email]
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Changwei Ge
Hi Ravishankar,


Thanks for your share, it's very useful to me.

I am setting up a glusterfs storage cluster recently and the
umount/mount recovering process bothered me.


I happened to find some patches[1] from internet aiming to address such
a problem but no idea why they were not managed to merge into glusterfs
mainline.

Do you know why?


Thanks,

Changwei


[1]:

https://review.gluster.org/#/c/glusterfs/+/16843/

https://github.com/gluster/glusterfs/issues/242


On 2019/8/6 1:12 下午, Ravishankar N wrote:

> On 05/08/19 3:31 PM, Changwei Ge wrote:
>> Hi list,
>>
>> If somehow, glusterfs client fuse process dies. All subsequent file
>> operations will be failed with error 'no connection'.
>>
>> I am curious if the only way to recover is umount and mount again?
> Yes, this is pretty much the case with all fuse based file systems.
> You can use -o auto_unmount (https://review.gluster.org/#/c/17230/) to
> automatically cleanup and not having to manually unmount.
>>
>> If so, that means all processes working on top of glusterfs have to
>> close files, which sometimes is hard to be acceptable.
>
> There is
> https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
> which claims to provide a framework for transparent failovers.  I
> can't find any publicly available code though.
>
> Regards,
> Ravi
>>
>>
>> Thanks,
>>
>> Changwei
>>
>>
>> _______________________________________________
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/836554017
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/486278655
>>
>> Gluster-devel mailing list
>> [hidden email]
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Ravishankar N

On 06/08/19 11:44 AM, Changwei Ge wrote:
> Hi Ravishankar,
>
>
> Thanks for your share, it's very useful to me.
>
> I am setting up a glusterfs storage cluster recently and the
> umount/mount recovering process bothered me.
Hi Changwei,
Why are you needing to do frequent remounts? If your gluster fuse client
is crashing frequently, that should be investigated and fixed. If you
have a reproducer, please raise a bug with all the details like the
glusterfs version, core files and log files.
Regards,
Ravi

>
>
> I happened to find some patches[1] from internet aiming to address
> such a problem but no idea why they were not managed to merge into
> glusterfs mainline.
>
> Do you know why?
>
>
> Thanks,
>
> Changwei
>
>
> [1]:
>
> https://review.gluster.org/#/c/glusterfs/+/16843/
>
> https://github.com/gluster/glusterfs/issues/242
>
>
> On 2019/8/6 1:12 下午, Ravishankar N wrote:
>> On 05/08/19 3:31 PM, Changwei Ge wrote:
>>> Hi list,
>>>
>>> If somehow, glusterfs client fuse process dies. All subsequent file
>>> operations will be failed with error 'no connection'.
>>>
>>> I am curious if the only way to recover is umount and mount again?
>> Yes, this is pretty much the case with all fuse based file systems.
>> You can use -o auto_unmount (https://review.gluster.org/#/c/17230/)
>> to automatically cleanup and not having to manually unmount.
>>>
>>> If so, that means all processes working on top of glusterfs have to
>>> close files, which sometimes is hard to be acceptable.
>>
>> There is
>> https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
>> which claims to provide a framework for transparent failovers. I
>> can't find any publicly available code though.
>>
>> Regards,
>> Ravi
>>>
>>>
>>> Thanks,
>>>
>>> Changwei
>>>
>>>
>>> _______________________________________________
>>>
>>> Community Meeting Calendar:
>>>
>>> APAC Schedule -
>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>> Bridge: https://bluejeans.com/836554017
>>>
>>> NA/EMEA Schedule -
>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>> Bridge: https://bluejeans.com/486278655
>>>
>>> Gluster-devel mailing list
>>> [hidden email]
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Changwei Ge
On 2019/8/6 2:57 下午, Ravishankar N wrote:

>
> On 06/08/19 11:44 AM, Changwei Ge wrote:
>> Hi Ravishankar,
>>
>>
>> Thanks for your share, it's very useful to me.
>>
>> I am setting up a glusterfs storage cluster recently and the
>> umount/mount recovering process bothered me.
> Hi Changwei,
> Why are you needing to do frequent remounts? If your gluster fuse
> client is crashing frequently, that should be investigated and fixed.
> If you have a reproducer, please raise a bug with all the details like
> the glusterfs version, core files and log files.


Hi Ravi,

Actually, glusterfs client fuse process ran well in my environment. But
high-availability and fault-tolerance are also my big concerns.

So I killed the fuse process to see what would happen. AFAIK, userspace
processes are likely to be killed or crashed somehow, which is not under
our control. :-(

Another scenario is *software upgrade*. Since we have to upgrade
glusterfs client version in order to enrich features and fix bugs.  It
will be friendly to applications if the upgrade is transparent.


Thanks,

Changwei


> Regards,
> Ravi
>>
>>
>> I happened to find some patches[1] from internet aiming to address
>> such a problem but no idea why they were not managed to merge into
>> glusterfs mainline.
>>
>> Do you know why?
>>
>>
>> Thanks,
>>
>> Changwei
>>
>>
>> [1]:
>>
>> https://review.gluster.org/#/c/glusterfs/+/16843/
>>
>> https://github.com/gluster/glusterfs/issues/242
>>
>>
>> On 2019/8/6 1:12 下午, Ravishankar N wrote:
>>> On 05/08/19 3:31 PM, Changwei Ge wrote:
>>>> Hi list,
>>>>
>>>> If somehow, glusterfs client fuse process dies. All subsequent file
>>>> operations will be failed with error 'no connection'.
>>>>
>>>> I am curious if the only way to recover is umount and mount again?
>>> Yes, this is pretty much the case with all fuse based file systems.
>>> You can use -o auto_unmount (https://review.gluster.org/#/c/17230/)
>>> to automatically cleanup and not having to manually unmount.
>>>>
>>>> If so, that means all processes working on top of glusterfs have to
>>>> close files, which sometimes is hard to be acceptable.
>>>
>>> There is
>>> https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
>>> which claims to provide a framework for transparent failovers. I
>>> can't find any publicly available code though.
>>>
>>> Regards,
>>> Ravi
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Changwei
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> APAC Schedule -
>>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>>> Bridge: https://bluejeans.com/836554017
>>>>
>>>> NA/EMEA Schedule -
>>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>>> Bridge: https://bluejeans.com/486278655
>>>>
>>>> Gluster-devel mailing list
>>>> [hidden email]
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Niels de Vos-5
On Tue, Aug 06, 2019 at 03:14:46PM +0800, Changwei Ge wrote:

> On 2019/8/6 2:57 下午, Ravishankar N wrote:
> >
> > On 06/08/19 11:44 AM, Changwei Ge wrote:
> > > Hi Ravishankar,
> > >
> > >
> > > Thanks for your share, it's very useful to me.
> > >
> > > I am setting up a glusterfs storage cluster recently and the
> > > umount/mount recovering process bothered me.
> > Hi Changwei,
> > Why are you needing to do frequent remounts? If your gluster fuse client
> > is crashing frequently, that should be investigated and fixed. If you
> > have a reproducer, please raise a bug with all the details like the
> > glusterfs version, core files and log files.
>
>
> Hi Ravi,
>
> Actually, glusterfs client fuse process ran well in my environment. But
> high-availability and fault-tolerance are also my big concerns.
>
> So I killed the fuse process to see what would happen. AFAIK, userspace
> processes are likely to be killed or crashed somehow, which is not under our
> control. :-(
>
> Another scenario is *software upgrade*. Since we have to upgrade glusterfs
> client version in order to enrich features and fix bugs.  It will be
> friendly to applications if the upgrade is transparent.

As open files have a state associated with them, and the state is lost
when the fuse process exits. Restarting the fuse process will then need
to restore the state of the open files (and caches, and more). This is
not trivial and I do not think any work on this end has been done yet.

Some users take an alternative route. Mounted filesystems have indeed
issues with online updating. So, maybe you do not need to mount the
filesystem at all. Depending on the need of your applications, using
glusterfs-coreutils instead of a FUSE (or NFS) mount might be an option
for you. The short living processes connect to the Gluster Volume when
needed, and do not keep a connection open. Updating userspace tools is
much simpler than long running processes that are hooked into the
kernel.

See https://github.com/gluster/glusterfs-coreutils for details.

HTH,
Niels


>
>
> Thanks,
>
> Changwei
>
>
> > Regards,
> > Ravi
> > >
> > >
> > > I happened to find some patches[1] from internet aiming to address
> > > such a problem but no idea why they were not managed to merge into
> > > glusterfs mainline.
> > >
> > > Do you know why?
> > >
> > >
> > > Thanks,
> > >
> > > Changwei
> > >
> > >
> > > [1]:
> > >
> > > https://review.gluster.org/#/c/glusterfs/+/16843/
> > >
> > > https://github.com/gluster/glusterfs/issues/242
> > >
> > >
> > > On 2019/8/6 1:12 下午, Ravishankar N wrote:
> > > > On 05/08/19 3:31 PM, Changwei Ge wrote:
> > > > > Hi list,
> > > > >
> > > > > If somehow, glusterfs client fuse process dies. All
> > > > > subsequent file operations will be failed with error 'no
> > > > > connection'.
> > > > >
> > > > > I am curious if the only way to recover is umount and mount again?
> > > > Yes, this is pretty much the case with all fuse based file
> > > > systems. You can use -o auto_unmount
> > > > (https://review.gluster.org/#/c/17230/) to automatically cleanup
> > > > and not having to manually unmount.
> > > > >
> > > > > If so, that means all processes working on top of glusterfs
> > > > > have to close files, which sometimes is hard to be
> > > > > acceptable.
> > > >
> > > > There is
> > > > https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
> > > > which claims to provide a framework for transparent failovers. I
> > > > can't find any publicly available code though.
> > > >
> > > > Regards,
> > > > Ravi
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Changwei
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > >
> > > > > Community Meeting Calendar:
> > > > >
> > > > > APAC Schedule -
> > > > > Every 2nd and 4th Tuesday at 11:30 AM IST
> > > > > Bridge: https://bluejeans.com/836554017
> > > > >
> > > > > NA/EMEA Schedule -
> > > > > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > > > > Bridge: https://bluejeans.com/486278655
> > > > >
> > > > > Gluster-devel mailing list
> > > > > [hidden email]
> > > > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > > > >
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> [hidden email]
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Changwei Ge
Hi Niels,

On 2019/8/6 3:50 下午, Niels de Vos wrote:

> On Tue, Aug 06, 2019 at 03:14:46PM +0800, Changwei Ge wrote:
>> On 2019/8/6 2:57 下午, Ravishankar N wrote:
>>> On 06/08/19 11:44 AM, Changwei Ge wrote:
>>>> Hi Ravishankar,
>>>>
>>>>
>>>> Thanks for your share, it's very useful to me.
>>>>
>>>> I am setting up a glusterfs storage cluster recently and the
>>>> umount/mount recovering process bothered me.
>>> Hi Changwei,
>>> Why are you needing to do frequent remounts? If your gluster fuse client
>>> is crashing frequently, that should be investigated and fixed. If you
>>> have a reproducer, please raise a bug with all the details like the
>>> glusterfs version, core files and log files.
>>
>> Hi Ravi,
>>
>> Actually, glusterfs client fuse process ran well in my environment. But
>> high-availability and fault-tolerance are also my big concerns.
>>
>> So I killed the fuse process to see what would happen. AFAIK, userspace
>> processes are likely to be killed or crashed somehow, which is not under our
>> control. :-(
>>
>> Another scenario is *software upgrade*. Since we have to upgrade glusterfs
>> client version in order to enrich features and fix bugs.  It will be
>> friendly to applications if the upgrade is transparent.
> As open files have a state associated with them, and the state is lost
> when the fuse process exits. Restarting the fuse process will then need
> to restore the state of the open files (and caches, and more). This is
> not trivial and I do not think any work on this end has been done yet.


True, tons of work have to be done if we want to restore all files'
state to make restarted fuse process continue to work as never be restarted.

I suppose two methods might be feasible:

     One is to try to fetch file state from kernel to restore files'
state into fuse process,

     the other one is to duplicate those  state to a standby process or
just use Linux shared memory mechanism?


>
> Some users take an alternative route. Mounted filesystems have indeed
> issues with online updating. So, maybe you do not need to mount the
> filesystem at all. Depending on the need of your applications, using
> glusterfs-coreutils instead of a FUSE (or NFS) mount might be an option
> for you. The short living processes connect to the Gluster Volume when
> needed, and do not keep a connection open. Updating userspace tools is
> much simpler than long running processes that are hooked into the
> kernel.
>
> See https://github.com/gluster/glusterfs-coreutils for details.


That's helpful, but I think then some POSIX file operations can't be
performed anymore.


Thanks,

Changwei


>
> HTH,
> Niels
>
>
>>
>> Thanks,
>>
>> Changwei
>>
>>
>>> Regards,
>>> Ravi
>>>>
>>>> I happened to find some patches[1] from internet aiming to address
>>>> such a problem but no idea why they were not managed to merge into
>>>> glusterfs mainline.
>>>>
>>>> Do you know why?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Changwei
>>>>
>>>>
>>>> [1]:
>>>>
>>>> https://review.gluster.org/#/c/glusterfs/+/16843/
>>>>
>>>> https://github.com/gluster/glusterfs/issues/242
>>>>
>>>>
>>>> On 2019/8/6 1:12 下午, Ravishankar N wrote:
>>>>> On 05/08/19 3:31 PM, Changwei Ge wrote:
>>>>>> Hi list,
>>>>>>
>>>>>> If somehow, glusterfs client fuse process dies. All
>>>>>> subsequent file operations will be failed with error 'no
>>>>>> connection'.
>>>>>>
>>>>>> I am curious if the only way to recover is umount and mount again?
>>>>> Yes, this is pretty much the case with all fuse based file
>>>>> systems. You can use -o auto_unmount
>>>>> (https://review.gluster.org/#/c/17230/) to automatically cleanup
>>>>> and not having to manually unmount.
>>>>>> If so, that means all processes working on top of glusterfs
>>>>>> have to close files, which sometimes is hard to be
>>>>>> acceptable.
>>>>> There is
>>>>> https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
>>>>> which claims to provide a framework for transparent failovers. I
>>>>> can't find any publicly available code though.
>>>>>
>>>>> Regards,
>>>>> Ravi
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Changwei
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>>> Community Meeting Calendar:
>>>>>>
>>>>>> APAC Schedule -
>>>>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>>>>> Bridge: https://bluejeans.com/836554017
>>>>>>
>>>>>> NA/EMEA Schedule -
>>>>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>>>>> Bridge: https://bluejeans.com/486278655
>>>>>>
>>>>>> Gluster-devel mailing list
>>>>>> [hidden email]
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>> _______________________________________________
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/836554017
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/486278655
>>
>> Gluster-devel mailing list
>> [hidden email]
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] What if client fuse process crash?

Niels de Vos-5
On Tue, Aug 06, 2019 at 04:47:46PM +0800, Changwei Ge wrote:

> Hi Niels,
>
> On 2019/8/6 3:50 下午, Niels de Vos wrote:
> > On Tue, Aug 06, 2019 at 03:14:46PM +0800, Changwei Ge wrote:
> > > On 2019/8/6 2:57 下午, Ravishankar N wrote:
> > > > On 06/08/19 11:44 AM, Changwei Ge wrote:
> > > > > Hi Ravishankar,
> > > > >
> > > > >
> > > > > Thanks for your share, it's very useful to me.
> > > > >
> > > > > I am setting up a glusterfs storage cluster recently and the
> > > > > umount/mount recovering process bothered me.
> > > > Hi Changwei,
> > > > Why are you needing to do frequent remounts? If your gluster fuse client
> > > > is crashing frequently, that should be investigated and fixed. If you
> > > > have a reproducer, please raise a bug with all the details like the
> > > > glusterfs version, core files and log files.
> > >
> > > Hi Ravi,
> > >
> > > Actually, glusterfs client fuse process ran well in my environment. But
> > > high-availability and fault-tolerance are also my big concerns.
> > >
> > > So I killed the fuse process to see what would happen. AFAIK, userspace
> > > processes are likely to be killed or crashed somehow, which is not under our
> > > control. :-(
> > >
> > > Another scenario is *software upgrade*. Since we have to upgrade glusterfs
> > > client version in order to enrich features and fix bugs.  It will be
> > > friendly to applications if the upgrade is transparent.
> > As open files have a state associated with them, and the state is lost
> > when the fuse process exits. Restarting the fuse process will then need
> > to restore the state of the open files (and caches, and more). This is
> > not trivial and I do not think any work on this end has been done yet.
>
>
> True, tons of work have to be done if we want to restore all files' state to
> make restarted fuse process continue to work as never be restarted.
>
> I suppose two methods might be feasible:
>
>     One is to try to fetch file state from kernel to restore files' state
> into fuse process,
>
>     the other one is to duplicate those  state to a standby process or just
> use Linux shared memory mechanism?

Restoring the state from the kernel would be my preference. That is the
view of the storage that the application has as well. But it may not be
possible to recover all details that the xlators track. Storing those in
shared memory (or file backed persistent storage) might not even be
sufficient. With upgrades it is possible to get new features in existing
xlators that would need to refresh their state to get the extensions. It
is even possible that new xlators get added, and those will need to get
the state of the files too.

I think, in the end it would boil down to getting the state from the
kernel, and revalidating each inode through the mountpoint to the
server. This is also what happens on graph-switches (new volume layout
or options pushed from the server to client). To get this to work, it
needs to be possible for a FUSE service to re-attach itself to a
mountpoint where the previous FUSE process detached. I do not think this
is possible at the moment, it will require extensions in the FUSE kernel
module (and then re-attaching a new state to all inodes).

> > Some users take an alternative route. Mounted filesystems have indeed
> > issues with online updating. So, maybe you do not need to mount the
> > filesystem at all. Depending on the need of your applications, using
> > glusterfs-coreutils instead of a FUSE (or NFS) mount might be an option
> > for you. The short living processes connect to the Gluster Volume when
> > needed, and do not keep a connection open. Updating userspace tools is
> > much simpler than long running processes that are hooked into the
> > kernel.
> >
> > See https://github.com/gluster/glusterfs-coreutils for details.
>
>
> That's helpful, but I think then some POSIX file operations can't be
> performed anymore.

Indeed, glusterfs-coreutils is more of an object storage interface than
a POSIX complaint filesystem.

Niels


>
>
> Thanks,
>
> Changwei
>
>
> >
> > HTH,
> > Niels
> >
> >
> > >
> > > Thanks,
> > >
> > > Changwei
> > >
> > >
> > > > Regards,
> > > > Ravi
> > > > >
> > > > > I happened to find some patches[1] from internet aiming to address
> > > > > such a problem but no idea why they were not managed to merge into
> > > > > glusterfs mainline.
> > > > >
> > > > > Do you know why?
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Changwei
> > > > >
> > > > >
> > > > > [1]:
> > > > >
> > > > > https://review.gluster.org/#/c/glusterfs/+/16843/
> > > > >
> > > > > https://github.com/gluster/glusterfs/issues/242
> > > > >
> > > > >
> > > > > On 2019/8/6 1:12 下午, Ravishankar N wrote:
> > > > > > On 05/08/19 3:31 PM, Changwei Ge wrote:
> > > > > > > Hi list,
> > > > > > >
> > > > > > > If somehow, glusterfs client fuse process dies. All
> > > > > > > subsequent file operations will be failed with error 'no
> > > > > > > connection'.
> > > > > > >
> > > > > > > I am curious if the only way to recover is umount and mount again?
> > > > > > Yes, this is pretty much the case with all fuse based file
> > > > > > systems. You can use -o auto_unmount
> > > > > > (https://review.gluster.org/#/c/17230/) to automatically cleanup
> > > > > > and not having to manually unmount.
> > > > > > > If so, that means all processes working on top of glusterfs
> > > > > > > have to close files, which sometimes is hard to be
> > > > > > > acceptable.
> > > > > > There is
> > > > > > https://research.cs.wisc.edu/wind/Publications/refuse-eurosys11.html,
> > > > > > which claims to provide a framework for transparent failovers. I
> > > > > > can't find any publicly available code though.
> > > > > >
> > > > > > Regards,
> > > > > > Ravi
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Changwei
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > >
> > > > > > > Community Meeting Calendar:
> > > > > > >
> > > > > > > APAC Schedule -
> > > > > > > Every 2nd and 4th Tuesday at 11:30 AM IST
> > > > > > > Bridge: https://bluejeans.com/836554017
> > > > > > >
> > > > > > > NA/EMEA Schedule -
> > > > > > > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > > > > > > Bridge: https://bluejeans.com/486278655
> > > > > > >
> > > > > > > Gluster-devel mailing list
> > > > > > > [hidden email]
> > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > > > > > >
> > > _______________________________________________
> > >
> > > Community Meeting Calendar:
> > >
> > > APAC Schedule -
> > > Every 2nd and 4th Tuesday at 11:30 AM IST
> > > Bridge: https://bluejeans.com/836554017
> > >
> > > NA/EMEA Schedule -
> > > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > > Bridge: https://bluejeans.com/486278655
> > >
> > > Gluster-devel mailing list
> > > [hidden email]
> > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > >
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
[hidden email]
https://lists.gluster.org/mailman/listinfo/gluster-devel