Wednesday, September 12, 2007

PeopleSoft and the Load Balancer

We are doing a PeopleSoft PeopleTools upgrade among with upgrading our server environment to a "high availability" configuration. In PeopleTools, we are upgrading from 8.43 to 8.48. Many "little" things have broken. I'll write more about that in the future. This time I want to discuss a problem we had with the "high availability" configuration.

Our UNIX (Solaris) server setup includes a Cisco CSS Load Balancer, two boxes that host our redundant web servers (Weblogic) and Application Servers, and two boxes that host the database server (Oracle) and two process schedulers. There is a overall network that connects the load balancer and all four machines and a private network to connect the Application Servers to the database server.

The CSS has a virtual address to connect to the PeopleSoft web application (PIA) and balances the sessions between the two web servers. Each of the web servers is configured to use all four application server domains. This allows quite a bit of flexibility for fault tolerance and maintenance shutdowns. It also allows for horizontal scalability transparent to the users.

However, one of the problems we ran into involved the UNIX process schedulers and their ability to post files to the report repository. We have always used the FTP method for posting files. PeopleSoft also offers an HTTP post. As we learned through our troubleshooting, the FTP method has two steps. First the distribution server FTPs the files to report repository FTP address, then it sends an http query to the repository to confirm that the files are there. We initially configured the http address to our load balanced virtual host name. Our Windows process schedulers were able to successfully post files to the the report repository. Our UNIX process schedulers, however, would stay in Posting status and eventually go to Not Posted. By reviewing the logs, we figured out that the FTP was working fine. The files were actually in the repository, but http confirmation was failing. After a lot of testing, our network engineer determined that the http request was routed from the process scheduler through the CSS to a web server, but the web server response was routed directly to the database/process scheduler box through the private network. To the process scheduler distribution server, the response appeared to be coming from a server different than the one the request was sent to so the response was discarded.

If we configured the report repository http address to the actual host:port of one of the web servers, the distribution worked fine, but we lost the redundancy of the dual web servers for this function.

We also found that if we disabled the private network between the application servers and the database server, the http response was routed back through the CSS and distribution server was happy. This was not an ideal solution, though, because PeopleSoft is a very database intensive application and you want to optimize any communications between the application servers and the database.

Eventually, the network engineer came up with a solution involving a third NIC on the database server. I don't have details at the moment, but it works. I will post more information when I get it.

In future posts, I will go into other details of our architecture and the PeopleTools upgrade.