Two of the most popular file formats used in science settings are Excel spreadsheets and CSV (Comma Separated Value) files. So, it comes as no surprise to hear that they are heavily relied on in the laboratory informatics market as well. Teams using a LIMS frequently collaborate on analyzing test result data, which often involves columns and tables. Formulas to analyze the data can be added via most spreadsheet editing programs, and these calculations may be crucial to seeing patterns in the data.
Laboratory and clinical results, whatever their source, are often converted to spreadsheet format. Excel spreadsheets or tabular CSV files provide a way for a scientist or technician to interact with and share data in a well-established format.
If you’ve decided to migrate your data into your LabVantage LIMS solution, data import systems can be created for Excel and CSV file types, even when they are complicated in structure. However, it’s important to have trained LabVantage administrators create the data import systems to be sure they are well written. When done properly, they can reduce time and errors exponentially. Properly structured data opens up powerful viewing, querying, and reporting possibilities, which can greatly benefit users.
In this blog, we’ll discuss some of the options for importing Excel and CSV files into LabVantage, with special emphasis on Data File Definitions, one of the most powerful and flexible tools included with the application.
Many scientists have been taught that a spreadsheet is a database, while many data experts would disagree, they’re both right at some level. The main difference, in practice, between a database and a spreadsheet is that a database tends to follow rules about data organization that are not required by a spreadsheet.
The keys to success when creating an Excel or CSV file import system are data consistency, organization, and analysis. In a properly designed database, a given piece of data should only appear in one place. For instance, the price of a widget should only exist in one table, no matter how many views it will be displayed in.
This is important because when importing data into LabVantage, organization matters a lot. If input data has duplications, then a decision has to be made about removing it, or establishing which location is definitive.
Data integrity and organization considerations include the following:
If you make the decision to move or import your data from spreadsheets and delimited files into LabVantage LIMS, there are two primary tools you can use for data migration: Data File Definitions (DFDs) and using the Application Programming Interface (API). As using the API is an advanced option requiring programming of multiple systems, this blog will focus on DFDs. DFDs can be understood and used without the programming skills required for using the API. DFDs are exportable and can be copied from one system to another.
Highlights of working with Data File Definitions in LabVantage include the following:
Figure 1. Upon importing a test version of a spreadsheet, the LabVantage user interface allows administrators to map the “shape” of the data correctly
Figure 2. The visual editor allows administrators to design the logic of fine-grained tasks needed during data import. In this example, business logic would determine whether to import a row of data, or send a bulletin that a condition was not met.
Importing complicated data will always be challenging. It can be very difficult to communicate to users that things that make intuitive sense to them are often hard for computers to understand. Incoming data must be consistent, but LabVantage DFDs and other techniques can be made sophisticated enough to allow for variations. If a new version of an import data comes along, it should be relatively easy for trained administrators to copy existing well-made DFDs to account for new types of import data. For initial set up of importing, it is wise to consult experts who understand both LabVantage and data science. A small allocation of time setting up your LabVantage import systems could mean exponentially faster processes and a much lower error and issue rate.
➞ Additional Reading: Migration of Biobank Data from Multiple Systems to LabVantage 8 LIMS
What was the most complicated issue you have encountered during data import? How did you resolve it?
Comments