Monday, March 29, 2010

Sqlphobia: The Irrational Fear of SQL

The Phobia List provides an exhaustive list of phobias. Although it is difficult for many of us to understand or relate with many of these phobias, some are all too easy or me to understand. For example, I fear that I suffer at least mild cases of demophobia (fear of crowds) and homilophobia (fear of sermons). However, one fear that I see with a relatively high degree of occurrence in the software development community is the seemingly irrational (to me) fear of SQL.

The fear of SQL is so rampant that Ritesh Shah has named his blog "Fight the Fear of SQL." Other posts/comments indicating the prevalence of this fear of SQL include a Ruby on Rails related comment, a .NET related blog post, and in this Hibernate versus iBatis presentation (Java).

There have been numerous attempts at abstracting away or hiding SQL from the developer. The object-relational mapping (ORM) approach is probably the largest of these efforts in terms of person hours invested. This approach attempts to allow developers to focus on objects and their mappings to relational database structures (such as tables and columns) with the hope that the developer will never (or at least much less often) have to read, write, or maintain SQL.

Early Enterprise JavaBeans (EJB) container-managed persistence [CMP] (and especially container-managed relationships [CMR]) was a disaster by nearly all accounts. Not only was it more difficult (for most of us) to use than using SQL directly, but its (EJB 2's) EJB Query Language looked and felt suspiciously like SQL made worse by embedding it in XML. In fact, it's my contention that the much easier-to-use Java EE 5/6 with JPA is as popular as it could be because of these highly memorable negative experiences with its predecessors.

Another ORM example that was received much better in the Java community is Hibernate. Hibernate and the Spring Framework became a very common (if not THE most common) stack for Java-based applications at least in part because they were perceived as being easier to use than SQL/JDBC and J2EE. However, not everyone loves Hibernate as evidenced with posts such as Hibernate is crazy, iBATIS is awesome.

Overall, I'm a fan of JPA as I hope I made clear in my article Basic JPA Best Practices. I like the ability to relatively easily tie my normal Java objects into the transaction support of the supporting application server. Also, I think its advantageous to have a standardized approach to ORM rather than using a proprietary solution such as pre-JPA Hibernate. With all the positives of JPA, I still find there are times when it either is more than I need or when the impact it has on my object design is too great to justify. Furthermore, I have found the ability to be able to read SQL invaluable in debugging problems with my code that uses JPA implementations. In other words, I still need to know and be comfortable with SQL, even when using JPA implementations.

There are situations where even the best ORM implementation introduces more trouble and requires more effort than using SQL directly (such as via JDBC) or using a less intrusive technology (such as iBATIS). This seems to be especially true for complex data structures or for data designs that do not fit "the norm." Ted Neward has referred to object/relational mapping as the Vietnam of Computer Science. Aldo Cortesi's post A Farewell to ORMs provides excellent coverage of his moving away from use of ORMs in Python.

Regarding ORMs, Tim Bray has written, "ORM Blecch: My relationship with relational databases has always been chilly, but that’s a lot better than my relationship with object-relational mapping, which has never given me anything but pain. The one route to unambiguous success in ORM seems to be the Rails’ tactic of slashing flexibility."

I recall an interesting keynote Tim Bray gave at Colorado Software Summit 2004 where he basically asserted that we're all paid enough, trained enough, and smart enough to be able to work with SQL without need for an ORM. The interesting this is that wasn't even the point he was trying to make. He was actually trying to say the same thing for object-to-XML mapping and was using ORM almost as a given as a thing we don't always need. Mike Keith has provided a more thorough summary of this keynote and some good counter arguments. This keynote caused Mike to start writing a blog!

ORMs are a huge, but not the only, example of trying to avoid using SQL. Other examples include efforts like Java Data Objects (JDO), object databases, and XML databases. JDO had some promise but its future largely ended with what it contributed to JPA. Early on in my career, I used an object-oriented database called Versant that I actually really liked, but in general object-oriented databases have not taken much of the market share from relational databases. Similarly, XML databases already seem to have peaked without coming anywhere near the popularity of the relational database.

A very popular (in the blogosphere) alternative to SQL these days is the NoSQL concept. When I first read this term, my initial reaction was a rolling of the eyes and thinking "Here we go again." The name NoSQL is a little bit of a misnomer. If all it was about was not needing to use SQL, I would dismiss it immediately given the many available alternatives to SQL and to the fact that SQL itself is just not that difficult. I don't know much about NoSQL, but the thing that prevents me from immediately dismissing the notion is the primary cited motive for NoSQL. Despite the poorly chosen name, NoSQL really seems to be about moving away from the relational database rather than simply not using the Structured Query Language. That may or may not be a good idea and the alleged scalability benefits of NoSQL might be debated, but at least it's not grounded in the irrational fear of SQL.

Where does this irrational fear of SQL come from? I actually have experienced this myself. My undergraduate degree was in electrical and computer engineering. Although I took several computer science courses as part of my computer emphasis, there were some classes I did not get to take in the CS department. One general area I did not get any exposure to was SQL. In my first job out of college, I was somewhat intimidated by just the talk of SQL. However, one of my first projects had me working with SQL on a regular basis and I soon learned that it wasn't so bad.

Today, I have spoken with very bright, talented young developers who maintain nasty ORM mapping files with no problem, but fear that they cannot learn SQL/JDBC. I try to explain to them that the SQL is no more difficult than the mappings they are endeavoring to maintain, especially in the cases of "weird" or complex data needs. They fear the unknown.

There are real issues with SQL. Perhaps the biggest for me is the lack of any truly implemented standard. Some might argue that there is a SQL standard, but I have come to believe that a standard is only as good as its implementations. If the major and popular implementations poorly implement the standard or ignore it altogether, most of the value of having a "standard" is forfeited (it's not really standard in that case). Design patterns such as the Data Access Object were created to deal with this, but there is no question that Rails and ORMs that provided migrations capabilities have the upper hand here. In situations where multiple database vendors are used or are likely to be used, the lack of standardization in SQL can make its use truly scary.

Another commonly alleged drawback to SQL is that it requires learning different syntax and semantics than the programming language being used with SQL. This would be a bigger issue to me if basic SQL was as complex as learning the basics of a new programming language. In fact, I have benefited more than once from having the commonality of SQL to lean against while writing a database access piece of code in a new programming language. For example, it was pretty easy to learn how to apply Ruby to my database needs because of my knowledge of SQL.

In a world where polyglot programming is the trend and many believe it wise to be comfortable with at least one statically typed language and one dynamically typed language, it seems obvious to me that it's in the best interest of developers to gain some degree of comfort with SQL as well. It's still one of the most likely languages a developer will actually use in his or her career. The side benefit of becoming more comfortable with SQL is that comfort and familiarity will often drive out the irrational fear.

Software developers are (typically) too well-trained to limit themselves due to fear. The best software developers are able to think critically about their architectures, designs, and implementations and about their choices of languages, products, libraries, and frameworks. JPA implementations have provided exactly what I need in many situations. However, JDBC/SQL implementations (especially with Spring) have also been my best bet in several cases. When one removes irrational fear, one can make the correct choice for the situation. With irrational fear present, one's ability to choose correctly is significantly impeded.

I have tried to come up with a witty name of the phobia describing the irrational "fear of SQL." Sequelphobia seems to already be taken for fear of movie sequels ("The Empire Strikes Back" might cure that) or video game sequels. Because fear of SQL is often correlated with not knowing SQL, I thought about xenophobia, but that has a negative cultural/political emphasis that I don't want to drag into this discussion. For now, I'm going to go with calling it "Sqlphobia." I welcome suggestions for a better name.

Basic SQL is not that difficult. I have seen DBAs and people with far more SQL experience than I have do amazingly tricky and complex things with SQL, but I find that simple SQL covers a large percentage of my needs. JDBC makes using SQL much easier than it used to be with proprietary approaches such as Pro*C/C++ precompiler and Spring+JDBC makes use of SQL even easier. The .NET community and other language/platform ecosystems typically have similarly easy-to-use SQL support as well.

In the end, I don't believe that straight JDBC/SQL is the best choice for every project. Some of the alternatives mentioned here are superior choices given certain conditions. However, I do think that an irrational fear of SQL is demonstrated when one spends much more time shoehorning a non-SQL solution into a place where it doesn't fit well simply to avoid having to use SQL.

23 comments:

Heath said...

The reason I don't like using SQL: Portability. I have too-often discovered that the SQL I wrote for MySQL doesn't work on Postgres or Oracle. Using an ORM largely solves that problem.

@DustinMarx said...

Heath,

Thanks for the comment. Portability is also my biggest concern with use of SQL. On many of my projects, it's not a realistic need to be able to easily switch between multiple databases, but when it is a necessity, the lack of standardization across SQL implementations can be costly if not hidden by ORMs, migrations, or other techniques. That's a very legitimate reason not to use SQL directly.

Dustin

@DustinMarx said...

In Diagnosis: SQLPhobia, Mikhail Opletayev writes, "Not a day passes without people coming up with new ways to avoid SQL, to hide it from their sight, to banish it behind a layer of abstraction."

Popa Adrian Marius said...

Sometimes when i use various frameworks is better to express myself in sql than in the orm like in cakephp or django
And yes i can speed the things by writing stored procedures or triggers directly on the firebird sql server side
Maybe sql language/standard must be revised and made again sexy (these days is transformed in some sort of cobol/java) and that is a pity

Cay Horstmann said...

It's not that I don't like SQL. I just don't like writing SQL. To put a Java entity into a database, or to take it out, takes a large amount of boring boilerplate. If I make any change in the data representation, I get to muck with all that boilerplate again.

I would think that an OR mapper in the hands of someone with SQL phobia would be pretty dangerous.

A good ORM tool should not hide the SQL--it should be easy to see what is being generated on my behalf, so I can see that I am not inadvertently doing something stupid.

Unknown said...

I'm sure glad someone mentioned portability because that was my answer as well. I'm not *afraid* of SQL. Using it merely defeats one of my major goals.

Kevin Farnham said...

I blogged about this entry and featured it on the java.net home page today. I really think it's advantageous for developers who do any work that interacts with a database to have some basic knowledge of database structures and SQL. In too many work environments, it seems like the developers just say "oh, that's something for our DBA to deal with" -- rather than recognizing that the performance of their server application is highly dependent on how database queries are formulated -- whether the queries arrive at the database server via an ORM or through direct client-side SQL statements.

An excellent, thought-provoking post, Dustin!

@DustinMarx said...

All,

Thanks for the comments and feedback. I agree with the sentiments expressed. I often like annotations-based JPA (with the optional XML overriding) for many situations, but it's difficult to be more tedious than Hibernate XML mapping files. I like Spring's virtually non-invasive JDBC support to reduce the tedium of JDBC when I do use it directly. As I stated in the post, portability (or lack of standardization) is my biggest issue with SQL, but I only allow that desire to overcome other concerns if it is truly worth that. If I'm working a project for a shop that's used Oracle forever and plans to do so forever, it's not so significant, especially if they are more likely to change their non-JPA-compliant ORM (or use of optional features) than their database.

In short, I don't think any approach, technique, or language is best for all situations. It sounds like those who have left feedback so far are not the developers I'm talking about in the post. I'm talking about developers who will use an ORM no matter what, even when it's overkill and much more difficult, simply because they lack confidence in their own SQL knowledge or otherwise fear SQL. Graphical tools have helped developers "get away" with no SQL knowledge.

@DustinMarx said...

Kevin,

Thanks for posting a link to this blog post and for the nice words. Getting a link from java.net certainly makes a difference in traffic to the post!

Dustin

@DustinMarx said...

I told a colleague about this post and he said something I think is fairly profound: "Imagine if all the time, energy, creative thinking, and money that has been put into ways to get around using SQL would have been invested in improving and standardizing SQL!"

Unknown said...

I have to agree with the portability issue: SQL is not as portable as it could be, and ORMs alleviate the problem.

But that is not the main reason to use ORMs: I find the JPA query language a bit easier for everyday use, and love the higher abstraction level ORMs provide.

And, yes, in 5% of the scenarios, ORMs might not be the best solution -but, frequently, they are.

An important point: to get the most out of ORMs, you *must* know SQL quite well -sqlphobia should be considered harmful for ORM fans too.

Alex said...

One problem I have seen with the use of ORMs is that time and energy is put in to designing the Java code but the database is left to just 'happen' (magically created by Hibernate).

This results in key words being used as column names, massively wide, sparse tables, no sensible implementation of partitioning etc etc. People then wonder why the database underperforms but are utterly resistent to ever doing any database design.

Robert Boothby said...

Good post! As an experienced Java Developer who started out as an Oracle Developer I get accused of the opposite: Sqlphilia. I find many persistence problems easier to solve using a well judged spot of SQL or HQL.

On the subject of portability there is a seemingly little known part of the JDBC specification which even the most experienced Java developers seem to overlook: Scalar Functions - appendix C of the 4.0 JDBC specification. This addresses a number of portability issues by putting the onus on the driver to translate a lot of the common incompatibilities.

sauron said...

ORM lets us deal in the object world and stops us from having to write our own persistence logic. Hibernate for example integrates easily w/JBoss cache and other tools and a lot of work can be done with distributed caching and querying w/o touching the database.

So we have a more organized and portable app, as well as being able to take advantage of some advanced features. We don't fear SQL at all but we like reusing advanced caching tools.

@DustinMarx said...

sauron,

Thanks for the comment. I agree that features such as caching can make an ORM attractive and are a legitimate reason for using such a tool. In fact, as I stated in the post, I am definitely a fan of JPA as long as it's used for the right reasons, including the one you cited. However, using complicated ORM tools (of which Hibernate certainly qualifies) just to avoid SQL is a phobia and is what I was posting about.

I do think one subtle point is worth pointing out. While an ORM may increase one's database portability, it can (and often does) reduce one's data layer portability when developers stray from the JPA standard and use features specific to the mapping tool. Similarly, the power of tools that are integrated with these ORMs are often themselves not-standard and therefore not portable. It's a matter of exchanging one portability for another.

I'll admit that while I like JPA implementations such as TopLink Essentials and EclipseLink, I've largely been less than impressed with Hibernate. There are many reasons for this, but one of the reasons is not Hibernate's fault: my experience is that people seem more willing to use non-JPA features of Hibernate than they are to use such features in the reference implementations. When people use non-standard ORMs or heavily use non-standard features of ORMs (because they lack discipline or lack awareness of what they're doing or simply justify it because the cost is worth the benefit), they have no more portability than using database-specific SQL. They have only shifted where the lack of portability lies.

Finally, no matter what tools you use, if you don't understand SQL and JDBC, you're in big trouble when the tools ultimately based on SQL/JDBC don't work as they should.

Thanks again for taking the time to leave feedback. Automated or simplified caching is certainly a legitimate reason to consider an ORM or other non-SQL tool in conjunction with a relational database.

@DustinMarx said...

ORM is an anti-pattern is a relatively recent post that has some relation to my post. Of course, a well-known post on this subject with a memorable title is The Vietnam of Computer Science and there is Jeff Atwood's response Object-Relational Mapping is the Vietnam of Computer Science.

@DustinMarx said...

Taking a step back from ORMs is another post somewhat related to the subject of my post.

Dustin

@DustinMarx said...

Bad ORM is Infinitely Worse than Bad SQL is a recent related post.

@DustinMarx said...

Sometimes it’s easier to just write your SQLs is another recent related post.

@DustinMarx said...

Martin Fowler defends ORMs in Ormhate. One of his relevant quotes is, "David Heinemeier Hansson, of Active Record fame, has always argued that if you are writing an application backed by a relational database you should ... know how a relational database works."

@DustinMarx said...

Yakov Fain's The Degradation of Java Developers alludes to "Sqlphobia" (not by name, but by description).

@DustinMarx said...

Lukas Eder's post The Premature Return to SQL references and briefly discusses an online conversation about a "premature return to SQL" from a NoSQL fan perspective and from an SQL fan perspective.

Lukas concludes with an interesting thought: "If a NoSQL or NewSQL feature proves to be innovative and reliable, Oracle et al. will most certainly pick it up and integrate it into SQL. Clever NoSQL vendors thus already prepare for their exits."

Dustin

@DustinMarx said...

Geoff Wozniak has posted What ORMs have taught me: just learn SQL.