How to recover space from already deleted files

Posted in: Site Reliability Engineering, Technical Track

Wait, what? Deleted files are gone, right? Well, not so if they’re currently in use, with an open file handle by an application. In the Windows world, you just can’t touch it, but under Linux (if you’ve got sufficient permissions), you can!

Often in the Systems Administration, and Site Reliability Engineering world, we will encounter a disk space issue being reported, and there’s very little we can do to recover the space. Everything is critically important! We then check for deleted files and find massive amounts of space consumed when someone has previously deleted Catalina, Tomcat, or Weblogic log files while Java had them in use, and we can’t restart the processes to release the handles due to the critical nature of the service. Conundrum!

Here at Pythian, we Love Your Data, so I thought I’d share some of the ways we deal with situations like this.

How to recover

First, we grab a list of PIDs with files still open, but deleted. Then iterate over the open file handles, and null them.

PIDS=$(lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq)
for PID in $PIDS; do ll /proc/$PID/fd | grep deleted; done

This could be scripted in an automatic nulling of all deleted files, with great care.

Worked example

1. Locating deleted files:

[root@importantserver1 usr]# lsof | head -n 1 ; lsof | grep -i deleted
 COMMAND   PID   USER   FD  TYPE DEVICE SIZE/OFF NODE   NAME
 vmtoolsd  2573  root   7u  REG  253,0  9857     65005  /tmp/vmware-root/appLoader-2573.log (deleted)
 zabbix_ag 3091  zabbix 3wW REG  253,0  4        573271 /var/tmp/zabbix_agentd.pid (deleted)
 zabbix_ag 3093  zabbix 3w  REG  253,0  4        573271 /var/tmp/zabbix_agentd.pid (deleted)
 zabbix_ag 3094  zabbix 3w  REG  253,0  4        573271 /var/tmp/zabbix_agentd.pid (deleted)
 zabbix_ag 3095  zabbix 3w  REG  253,0  4        573271 /var/tmp/zabbix_agentd.pid (deleted)
 zabbix_ag 3096  zabbix 3w  REG  253,0  4        573271 /var/tmp/zabbix_agentd.pid (deleted)
 zabbix_ag 3097  zabbix 3w  REG  253,0  4        573271 /var/tmp/zabbix_agentd.pid (deleted)
 java      23938 tomcat 1w  REG  253,0  0        32155  /opt/log/tomcat/catalina.out (deleted)
 java      23938 tomcat 2w  REG  253,0  45322216 32155  /opt/log/tomcat/catalina.out (deleted)
 java      23938 tomcat 9w  REG  253,0  174      32133  /opt/log/tomcat/catalina.2015-01-17.log (deleted)
 java      23938 tomcat 10w REG  253,0  57408    32154  /opt/log/tomcat/localhost.2016-02-12.log (deleted)
 java      23938 tomcat 11w REG  253,0  0        32156  /opt/log/tomcat/manager.2014-12-09.log (deleted)
 java      23938 tomcat 12w REG  253,0  0        32157  /opt/log/tomcat/host-manager.2014-12-09.log (deleted)
 java      23938 tomcat 65w REG  253,0  363069   638386 /opt/log/archive/athena.log.20160105-09 (deleted)

2. Grab the PIDs:

[root@importantserver1 usr]# lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq
 2573
 3091
 3093
 3094
 3095
 3096
 3097
 23938

Show the deleted files that each process still has open (and is consuming space):

[root@importantserver1 usr]# export PIDS=$(lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq)
[root@importantserver1 usr]# for PID in $PIDS; do ll /proc/$PID/fd | grep deleted; done
 lrwx------ 1 root root 64 Mar 21 21:15 7 -> /tmp/vmware-root/appLoader-2573.log (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted)
 l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 1 -> /opt/log/tomcat/catalina.out (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 10 -> /opt/log/tomcat/localhost.2016-02-12.log (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 11 -> /opt/log/tomcat/manager.2014-12-09.log (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 12 -> /opt/log/tomcat/host-manager.2014-12-09.log (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 2 -> /opt/log/tomcat/catalina.out (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 65 -> /opt/log/archive/athena.log.20160105-09 (deleted)
 l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 9 -> /opt/log/tomcat/catalina.2015-01-17.log (deleted)

Null the specific files (here, we target the catalina.out file):

[root@importantserver1 usr]# cat /dev/null > /proc/23938/fd/2

Alternative ending

Instead of deleting the contents to recover the space, you might be in the situation where you need to recover the contents of the deleted file. If the application still has the file descriptor open on it, you can then recover the entire file to another one (dd if=/proc/23938/fd/2 of=/tmp/my_new_file.log) – assuming you have the space to do it!

Conclusion

While it’s best not to get in the situation in the first place, you’ll sometimes find yourself cleaning up after someone else’s good intentions. Now, instead of trying to find a window of “least disruption” to the service, you can recover the situation nicely. Or, if the alternative solution is what you’re after, you’ve recovered a file that you thought was long since gone.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *