After covering hardware components of Sun Oracle Database Machine in part 1, our grand tour continues with a look at the software side. With the prominent exceptions of the Exadata storage server software and the Oracle database itself, the software stack is based on well-known and widely used open source products.
The database nodes all run Oracle Enterprise Linux 5.3, a virtual clone of Red Hat Enterprise Linux 5.3, but also including some small performance-related kernel tweaks. Version 220.127.116.11.1 of the Oracle Exadata Storage Server software contains an update to version 5.5.
The hardware drivers, notably the OpenFabrics InfiniBand implementation, are GPL-licensed open source. Stacked on top of the infiniband are the Reliable Datagram Socket (RDS) protocol and the proprietary Oracle iDB. This is what Kevin Closson said about it in his interview on our blog:
… we’ve developed and brought to market the lightest—at least in our assessment—the lightest and most adaptable of all of them, which is Reliable Datagram Sockets [RDS]. Sure, you can do IP over InfiniBand, and that starts to chew into some of the value propositions involving InfiniBand, but we’ve done none of the sort. We are fully Remote Direct Memory Access (RDMA) from point to point over RDS.
The open-source Linux kernel is tainted by three proprietary Oracle modules used to support the ASM cluster file system (ACFS), a general-purpose filesystem with ASM on the back end, which isn’t used by default. These kernel modules are installed with any regular Oracle Grid Infrastructure 11gR2 installation:
[root@cellnode ~]# lsmod | grep oracle oracleacfs 877320 0 oracleadvm 221760 0 oracleoks 276880 2 oracleacfs,oracleadvm
Running on top of the OS is the Oracle Database 11g Release 2. While it’s a stock install, Oracle 11gR2 includes a substantial amount of code to support Exadata’s features like smart scans and columnar compression, and this code is simply unused in non-Exadata deployments. The InfiniBand communication on the database cells using the iDB protocol is handled by libcell11.so module, linked into the Oracle kernel.
Although administrators rarely need to interact with the switch software directly, it runs Linux like the other components. For a switch made by Sun pre-Oracle acquisition (Sun Datacenter InfiniBand Switch 36), this is a bit of a surprise.
The Exadata storage servers are where most of the magic happens. As with the database nodes, they run Oracle Enterprise Linux 5.3, with the same kernel and drivers as the database nodes. The Exadata support agreement places strict limits on what software can be installed on storage servers, permitting only the core OS and supplied Exadata tools. Even Oracle Enterprise Manager agents are not permitted to run, and instead a special Exadata plug-in makes outgoing SSH connections from a remote Management Agent to the storage servers to monitor them. Rather than using the traditional
oracle OS user, a new OS user
celladmin is available for administrative tasks, with the core background processes running as the
root superuser. There is also a new OS user
cellmonitor with limited privileges to monitor status of the storage servers.
As a departure from ASM storage technology that uses a process architecture borrowed from database instances, the storage servers have a brand new set of processes to manage disk I/O. They are:
- RS, the restart service. Performing a similar role to SMON, RS monitors other processes, and automatically restarts them if they fail unexpectedly. RS also handles planned restarts in conjunction with software updates. The main cellrssrm process spawns several helper processes, including cellrsbmt, cellrsomt, and cellrsssmt.
- MS, the management service. MS is the back-end process that processes configuration and monitoring commands. It communicates with cellcli, described in the next section. MS is written in Java, unlike the other background processes which are distributed in binary form and are likely written in C.
- CELLSRV, the cell service. CELLSRV handles the actual I/O processing of the storage server. It is not uncommon to see heavy usage from CELLSRV process threads during periods of heavy load. Among other things, CELLSRV provides:
- Communication with database nodes using the iDB/RDS protocols over the InfiniBand network
- Disk I/O with the underlying cell disks
- Offload of SQL processing from database nodes
- I/O resource management, prioritizing I/O requests based on a defined policy
In addition to the core background processes, storage servers have tools for administration and monitoring. They include:
- cellcli, the command-line interface that allows administrators to configure and monitor storage servers with a SQL-like syntax. This tool is unique to the Exadata Storage Server Software.
- dcli, a shell script allowing commands to be run on multiple machines with a single invocation. Since storage servers do not communicate with each other directly, the dcli command is particularly useful to run configuration commands across all storage servers. The dcli tool is also available on database servers, and can similarly perform administrative actions across database nodes.
- OSWatcher, a series of scripts that gather system performance information from such tools as top, vmstat, and iostat, at regular intervals. OSWatcher output can be useful in problem diagnosis. This is the same tool that is advised to install on the database servers and that many DBA’s are already familiar with.
- ORION, Oracle I/O Numbers, an I/O performance benchmarking tool. Please remember that ORION doesn’t do anything with the data unlike the database instance so it’s generally possible to achieve slightly higher numbers compare to the full blown benchmark including the database.
- ADRCI, a command-line interface to the Automatic Diagnostic Repository. This tool can automate the generation of diagnostic reports, typically when working with Oracle support services. Oracle 11g DBAs are already familiar with this tool that came around with the new logging and tracing mechanism in Oracle 11g.
Stay tuned for the part 3, where I talk about how Exadata is packaged and sold.