Starting out in SQL – 7 things I learned the hard way
As a travel business (or indeed any business), it is a basic necessity to be able to create reports in order to operate and manage the company effectively. In my experience, reporting is usually very bespoke to the business in question. Different companies have different things that make them tick – load factor, profit, exposure, passenger numbers etc, and even greater variation in terms of operational needs and practices.
Recently I have been helping users of the Tiger Bay Tour Operator system to get to know the data they have available to them and create some bespoke reporting for their businesses. The Tiger Bay system includes a reporting suite with a good range of operational, financial and management reports that can be run within the user interface. Further reporting opportunities are provided in the form of partially de-normalised MS-SQL views.
I have been using SQL for around 7 years, working with 4 major travel system databases, as well as other data relating to websites and supporting applications. I am self-taught, borne out of necessity when I worked in a small tour operator with big requirements. The Tiger Bay users that I have been working with are in a similar situation to me all those years ago – big on requirements and new to SQL. Working again with new users has made me focus on the tips and advice I can give to those starting out in SQL – the things I had to learn the hard way! I have shared some of the key points below.
1) Know your entities
This already sounds like a technical word, but the concept is fairly straightforward. An entity is a type of thing, and you can store information about it. For example, a travel booking is an entity, accommodation is another, flights are another. In a normalised database, each type of entity will usually be in its own table, so there will be a table containing all the bookings, another containing all the accommodations etc. Some entities are not so intuitive at first, such as the ‘booked accommodation’, e.g. information relating to the accommodation when it was booked (specific pricing, dates etc of a specific stay in a hotel).
2) Understand relationships
There are different ways entities can relate to each other, which are governed by the real life things that the data represents. You have to be careful how you join the entities/data tables together, as the relationships control how many results you get back, and you could end up with more or less data than you were expecting.
One-to-many – One booking has four passengers. Join the tables together and you will get four results. It could look like you now have four bookings if you simply count the results, or four times the revenue if you sum the results.
One-to-one – Each booking has one and only one booking reference. Each booking reference only relates to one booking. Normally you will find both data in the same table, but not always. If they are separate tables and you join them correctly, it should not alter the number of results you get.
Many-to-many – Each customer has multiple interests (e.g. ‘Skiing’, ‘Walking’ etc). Each interest is related to multiple customers. Join these correctly and you will get one row per combination, just like one-to-many.
3) Learn about joining tables
If you understand the relationships between your entities, then working out how to join tables becomes intuitive. To join tables you need to find the piece of information that exists in both tables. If a relationship is one-to-many, then each of the records on the ‘many’ side should contain the ID (or primary key) of the record on the ‘one’ side. For example, on a booking with four passengers, each passenger record will contain the ID of the booking.
Many-to-many relationships are normally handled through a third ‘bridging’ or ‘junction’ table. So there is the customers table, the interests table, and then a third table where each row contains a relationship between a specific customer and a specific interest.
How you join will affect the number of results you get. If you use an inner join, you only get results that exist in both tables (e.g. you join the client table to the booking table. You get one result for each booking that each client has made. Any client that has not made a booking, and any booking with no client is excluded from the results). If you use a left or right outer join, all the records from one table will show, but only the ones with a matching value in the other table will show (e.g. A list of all clients whether or not they have a booking, showing each booking if there is one. Any bookings without a client do not show.). You can also use a full outer join which shows all records from both tables.
4) Work out the logic on paper first
The hardest part of creating a report is working out what the report is supposed to show, and the logic behind that. It is very easy to start writing the SQL before you fully understand how certain fields or groupings are going to work, and this can end in a muddle. Unless its really straightforward I always try to write down, draw or define on paper exactly what the report is supposed to show and any logic for calculating field values.
5) Test often
If you make several changes to a SQL query before you run it, and then it doesn’t work, it can be a head-scratching exercise to pick apart what you did and find what you did wrong. So run your query regularly as you make changes so the errors or changes in the results appear straight away. It will also help you learn how the changes affect the results, and therefore understand the entities and relationships better.
6) Check your results against something you know is right
It is surprisingly easy to write a report that looks correct but the data is so wrong it would probably very quickly bankrupt your company. Always check the data against something you know is correct, such as looking at the user interface, creating the intended maths in a spreadsheet and cross-checking the result etc. Its often helpful to limit your result set to be as small as possible to do this – e.g. look at only one booking, one day of sales etc.
7) Use the graphical view
A great way to get started is to use a graphical view, such as the View Designer in Microsoft SQL Server Studio. This lets you create the query using a more interactive view of the tables, joins and fields, letting you drag and drop, use dropdown menus etc, whilst the SQL is automatically created underneath. There are some limitations to the amount you can do in this way, but it avoids starting with a blank screen and having to work out what to write yourself.