Wait, what? Deleted files are gone, right? Well, not so if they’re currently in use, with an open file handle by an application. In the Windows world, you just can’t touch it, but under Linux (if you’ve got sufficient permissions), you can!
Often in the Systems Administration, and Site Reliability Engineering world, we will encounter a disk space issue being reported, and there’s very little we can do to recover the space. Everything is critically important! We then check for deleted files and find massive amounts of space consumed when someone has previously deleted Catalina, Tomcat, or Weblogic log files while Java had them in use, and we can’t restart the processes to release the handles due to the critical nature of the service. Conundrum!
Here at Pythian, we Love Your Data, so I thought I’d share some of the ways we deal with situations like this.
How to recover
First, we grab a list of PIDs with files still open, but deleted. Then iterate over the open file handles, and null them.
PIDS=$(lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq) for PID in $PIDS; do ll /proc/$PID/fd | grep deleted; done
This could be scripted in an automatic nulling of all deleted files, with great care.
Worked example
1. Locating deleted files:
[[email protected] usr]# lsof | head -n 1 ; lsof | grep -i deleted COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME vmtoolsd 2573 root 7u REG 253,0 9857 65005 /tmp/vmware-root/appLoader-2573.log (deleted) zabbix_ag 3091 zabbix 3wW REG 253,0 4 573271 /var/tmp/zabbix_agentd.pid (deleted) zabbix_ag 3093 zabbix 3w REG 253,0 4 573271 /var/tmp/zabbix_agentd.pid (deleted) zabbix_ag 3094 zabbix 3w REG 253,0 4 573271 /var/tmp/zabbix_agentd.pid (deleted) zabbix_ag 3095 zabbix 3w REG 253,0 4 573271 /var/tmp/zabbix_agentd.pid (deleted) zabbix_ag 3096 zabbix 3w REG 253,0 4 573271 /var/tmp/zabbix_agentd.pid (deleted) zabbix_ag 3097 zabbix 3w REG 253,0 4 573271 /var/tmp/zabbix_agentd.pid (deleted) java 23938 tomcat 1w REG 253,0 0 32155 /opt/log/tomcat/catalina.out (deleted) java 23938 tomcat 2w REG 253,0 45322216 32155 /opt/log/tomcat/catalina.out (deleted) java 23938 tomcat 9w REG 253,0 174 32133 /opt/log/tomcat/catalina.2015-01-17.log (deleted) java 23938 tomcat 10w REG 253,0 57408 32154 /opt/log/tomcat/localhost.2016-02-12.log (deleted) java 23938 tomcat 11w REG 253,0 0 32156 /opt/log/tomcat/manager.2014-12-09.log (deleted) java 23938 tomcat 12w REG 253,0 0 32157 /opt/log/tomcat/host-manager.2014-12-09.log (deleted) java 23938 tomcat 65w REG 253,0 363069 638386 /opt/log/archive/athena.log.20160105-09 (deleted)
2. Grab the PIDs:
[[email protected] usr]# lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq 2573 3091 3093 3094 3095 3096 3097 23938
Show the deleted files that each process still has open (and is consuming space):
[[email protected] usr]# export PIDS=$(lsof | awk '/deleted/ { if ($7 > 0) { print $2 }; }' | uniq) [[email protected] usr]# for PID in $PIDS; do ll /proc/$PID/fd | grep deleted; done lrwx------ 1 root root 64 Mar 21 21:15 7 -> /tmp/vmware-root/appLoader-2573.log (deleted) l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted) l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted) l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted) l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted) l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted) l-wx------ 1 root root 64 Mar 21 21:15 3 -> /var/tmp/zabbix_agentd.pid (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 1 -> /opt/log/tomcat/catalina.out (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 10 -> /opt/log/tomcat/localhost.2016-02-12.log (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 11 -> /opt/log/tomcat/manager.2014-12-09.log (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 12 -> /opt/log/tomcat/host-manager.2014-12-09.log (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 2 -> /opt/log/tomcat/catalina.out (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 65 -> /opt/log/archive/athena.log.20160105-09 (deleted) l-wx------ 1 tomcat tomcat 64 Mar 21 21:15 9 -> /opt/log/tomcat/catalina.2015-01-17.log (deleted)
Null the specific files (here, we target the catalina.out file):
[[email protected] usr]# cat /dev/null > /proc/23938/fd/2
Alternative ending
Instead of deleting the contents to recover the space, you might be in the situation where you need to recover the contents of the deleted file. If the application still has the file descriptor open on it, you can then recover the entire file to another one (dd if=/proc/23938/fd/2 of=/tmp/my_new_file.log) – assuming you have the space to do it!
Conclusion
While it’s best not to get in the situation in the first place, you’ll sometimes find yourself cleaning up after someone else’s good intentions. Now, instead of trying to find a window of “least disruption” to the service, you can recover the situation nicely. Or, if the alternative solution is what you’re after, you’ve recovered a file that you thought was long since gone.
No comments