Sunday, October 24, 2010

Remote Database Access

There are 2 programmatic interfaces. One approach is to use Dynamic SQL, of which the SQL text is passed from client to server. The other approach is to disguise the remote database access underneath the normal database interface. The database schema indicates that certain tables reside on a remote machine. The database is used by the programmer in a normal way, just as if they are local (except for performance and additional error messages to handle).

Remote database access generated large network overhead because of the sheer amount of data that passed back from the server to the client. This is not good for transaction processing. Most database supports stored procedure. This turns remote database access into a form of RPC with 2 differences:

(1) it is run-time and not a compile-time interface with no IDL equivalent
(2) the procedure is typically written in proprietary language or commonly in Java.

Stored procedure approach is faster than remote database access in general. On the other hand, for ad hoc queries, remote database access is more suitable. The reason was the returned data is too dynamic (number, type etc) for RPC-like call to handle efficiently.

There are many remote database access technology. Microsoft has ODBC (Open Database Connectivity), OLD DB (Object Linking and Embedding Database), ADO (Active Data Objects) amd ADO.NET. In Java environment, there is JDBC and JDO (Java Data Objects). Oracle has Oracle Generic Connectivity and Oracle Transparent Gateway. IBM has DRDA. Each vendor wanted the customer to use their technology and integration engine. Most database however supports ODBC and JDBC.

Remote Procedure Call

Procedure calls are a major feature of most programming language. Therefore, it is logical to extend this to access remote service. The idea is that both client and server program can remain the same, just as if they were on the same machine.

The best-known RPC mechanisms are Open Network Computing (ONC) from SUN Microsystems (now Oracle) and Distributed Computing Environment (DCE) from the Open Software Foundation (OSF), a group formed in late 1980s by IBM, HP and DEC. OSF was to be an alternative to AT&T who owned the UNIX brand name and had forma group, which include UNISYS, called UNIX International.

For C program, one include a header file in the program that contains the moduile's callable procedure declarations (procedure name and parameters) minus the logic. For RPC, instead of a header file, it uses an Interface Definition Language (IDL) file. IDL file is syntactically similar to a header file but it is also used to generate client stubs and server skeletons, which are small chunk fo C code that is compiled and linked to the client and server programs.

The purpose of the stub is to convert parameters into a string of buts and send the message over network. The skeleton does the reverse and call the server. The action of converting parameters to message is called marshalling. The advantage of marshalling is that it handles differing data formats in varions platforms. The newer term for marshalling is called serialization.

The problem with RPCs is multithreading. A client program is blocked when it is calling a remote procedure. just as it would be calling a local procedure. The wait is subject to loss of message in transport, netowrk congestion, server response problem and other unpredictable conditions which render the client to wait forever. If a program read input from user while requesting for data at the backend via RPC, two threads are required.

For server, RPC requires a separate server thread for every client connetion. If threads need to share resource, the programmer must use locks, semaphores or events to synchronize accesses.

Multithreaded programs are hard to write becuase it is extremely difficult to test due to race conditions.