Some observations on Puppetrun with foreman

Posted in: Open Source, Site Reliability Engineering, Technical Track

After joining Pythian I was introduced to several configuration management systems and Puppet was one of them. Foreman is a system management tool which can be integrated with Puppet to manage puppet modules and to initiate puppet runs on hosts from web interface. This is very useful if you want to configure large number of systems.

Puppet kick, which was previously used to initiate puppet run from foreman is deprecated now.

For initiating puppet run from foreman interface, I used mcollective. Mcollective can be used to execute parallel jobs in remote systems. There are 3 main components,

    Client – Connects to the mcollective Server and send commands.
    Server – Runs on all managed systems and execute commands.
    Middleware – A message broker like activemq.

I used mcollective puppet module from Puppetlabs for my setup.

# puppet module install puppetlabs-mcollective

My setup includes middleware(activemq) and mcollective client in the puppet server and mcollective servers in all managed systems.

After the implementation, I found that Puppet run from foreman web interface is failing for some servers.

I found following in /var/log/foreman-proxy/proxy.log,

D, [2014-04-18T07:20:54.392392 #4256] DEBUG — : about to execute: /usr/bin/sudo /usr/bin/mco puppet runonce -I server.pythian.com
W, [2014-04-18T07:20:56.167627 #4256] WARN — : Non-null exit code when executing ‘/usr/bin/sudo/usr/bin/mcopuppetrunonce-Ireg-app-02.prod.tprweb.net’
E, [2014-04-18T07:20:56.175034 #4256] ERROR — : Failed puppet run: Check Log files

You can see that mco command is trying to execute a puppet run in server.pythian.com and failing. mco command uses several sub commands called ‘applications’ to interact with all systems and ‘puppet’ is one of them.

While running the command in commandline, I received following,

# mco puppet runonce -I server.pythian.com| [ > ] 0 / 1warn 2014/04/11 08:05:34: client.rb:218:in `start_receiver’ Could not receive all responses. Expected : 1. Received : 0

Finished processing 0 / 1 hosts in 22012.79 ms

No response from:

server.pythian.com

I am able to ping the server.

When I ran ‘mco ping’ I found that the server with issue is identified with short hostnames and others with fqdn.

$ mco pingserver time=89.95 ms
server3.pythian.com time=95.26 ms
server2.pythian.com time=96.16 ms

So mcollective is exporting a short hostname when foreman is expecting an FQDN (Fully Qualified Domain Name) from this server.

Foreman takes node name information from puppet certificate name and that is used for filtering while sending mco commands.

Mcollective exports identity differently. From https://docs.puppetlabs.com/mcollective/configure/server.html#facts-identity-and-classes,

identity
The node’s name or identity. This should be unique for each node, but does not need to be.Default: The value of Ruby’s Socket.gethostname method, which is usually the server’s FQDN.
Sample value: web01.example.com
Allowed values: Any string containing only alphanumeric characters, hyphens, and dots — i.e. matching the regular expression /\A[\w\.\-]+\Z/

I passed FQDN as identity in the servers using mcollective module, which resulted in following setting,

# cat /etc/mcollective/server.cfg |grep identity
identity = server.pythian.com

Update: I have pushed a commit to puppet-mcollective module which would automatically set the identity to fqdn.

This allowed the command to run successfully and getting ‘Puppet Run’ from foreman to work.

# mco puppet runonce -I server.pythian.com* [ ============================================================> ] 1 / 1

Now ‘mco ping’ looks good as well.

$ mco pingserver.pythian.com time=91.34 ms
server3.pythian.com time=91.23 ms
server2.pythian.com time=82.16 ms

Now let us check why this was happening.

mcollective identity is exported from ruby function Socket.gethostname.

From ruby source code you can see that Socket.gethostname is getting the value from gethostname().

./ext/socket/socket.c#ifdef HAVE_GETHOSTNAME
/*
* call-seq:
* Socket.gethostname => hostname
*
* Returns the hostname.
*
* p Socket.gethostname #=> “hal”
*
* Note that it is not guaranteed to be able to convert to IP address using gethostbyname, getaddrinfo, etc.
* If you need local IP address, use Socket.ip_address_list.
*/
static VALUE
sock_gethostname(VALUE obj)
{
#if defined(NI_MAXHOST)
# define RUBY_MAX_HOST_NAME_LEN NI_MAXHOST
#elif defined(HOST_NAME_MAX)
# define RUBY_MAX_HOST_NAME_LEN HOST_NAME_MAX
#else
# define RUBY_MAX_HOST_NAME_LEN 1024
#endif

char buf[RUBY_MAX_HOST_NAME_LEN+1];

rb_secure(3);
if (gethostname(buf, (int)sizeof buf – 1) < 0)
rb_sys_fail(“gethostname(3)”);

buf[sizeof buf – 1] = ”;
return rb_str_new2(buf);
}

gethostname is a glibc function which calls uname system call and copy the value from returned nodename.

So when foreman uses the FQDN value which it collects from puppet certificate name, mcollective exports the hostname returned by gethostname().

Now let us see how gethostname() gives different values in different systems.

When passing the complete FQDN in HOSTNAME parameter in /etc/sysconfig/network, we can see that Socket.gethostname is returning FQDN.

[[email protected] ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=centos.pythian.com[[email protected] ~]# hostname -v
gethostname()=`centos.pythian.com’
centos.pythian.com

[[email protected] ~]# irb
1.9.3-p484 :001 > require ‘socket’
=> true
1.9.3-p484 :002 > Socket.gethostname
=> “centos.pythian.com”
1.9.3-p484 :003 >

The system which was having problem was having following configuration.

[[email protected] ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=server[[email protected] ~]# hostname -v
gethostname()=`server’
server

[[email protected] ~]# irb
1.9.3-p484 :001 > require ‘socket’
=> true
1.9.3-p484 :002 > Socket.gethostname
=> “server”
1.9.3-p484 :003 >

Here ruby is only returning the short hostname for Socket.gethostname. But it was having following entry in /etc/hosts.

192.168.122.249 server.pythain.com server

This allowed system to resolve FQDN.

[[email protected] ~]# hostname -f -v
gethostname()=`server’
Resolving `server’ …
Result: h_name=`server.pythain.com’
Result: h_aliases=`server’
Result: h_addr_list=`192.168.122.249′
server.pythain.com

From ‘man hostname’.

The FQDN of the system is the name that the resolver(3) returns for the
host name.Technically: The FQDN is the name gethostbyname(2) returns for the host name returned by gethost-
name(2). The DNS domain name is the part after the first dot.

As the resolver is able to resolve the hostname from /etc/hosts, puppet is able to pick up the fqdn value for certificate which it later used by foreman.
But mcollective exports the short hostname returned by gethostname().

To fix the issue in Red Hat based linux distributions, we can try any of the following,

* Pass an FQDN in /etc/sysconfig/network like below.

# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=server.pythian.com

OR

* Use a short hostname as HOSTNAME but make sure that it would not resolve to an FQDN in /etc/hosts or DNS (not really suggested).

OR

* Pass short hostname or FQDN as HOSTNAME but, make sure that there is an entry like below in /etc/hosts and mcollective is exporting fqdn as identity.

192.168.122.249 server.pythian.com server
email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Devops Engineer
Minto Joseph is an expert in opensource technologies with a deep understanding of Linux. This allows him to troubleshoot issues from kernel to the application layer. He also has extensive experience in debugging Linux performance issues. Minto uses his skills to architect, implement and debug enterprise environments.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *