CSDB usage: advanced features

This section describes details of additional CSDB user operations, which are available from the Extras and Maintenance sections of the menu. For basic help, refer to the CSDB usage help.
 

Content:

 
Extras:
 
 
Maintenance:
  • RDF feed
  • Structure translation
  • Data submission
  • Feedback
  • Data export
  • Data validation
  • Data import
  • Database initialization
  • Update of lists
  • Data removal
  • RDF feed

    Major CSDB data are exportable in the Resource Description Framework (RDF), a semantic web data model designed for conceptual description of information. The data are exported as a set of triples (serialized RDF feed), and utilize the ontology "GlycoRDF" developed during Biohackathon 2012 (Toyama, Japan) and Biohackathon 2013 (Dalian, China). Current version of GlycoRDF is documented: ontology visualized online, participants & publications, ontology in OWL, documentation.

    To generate the RDF feed, select an appropriate serialization format (Turtle, RDF/XML, RDF/JSON, N-Triples) from the drop-down list at the bottom of the ID search form and click Make RDF. The web-API to generate the RDF feeds is located at http://csdb.glycoscience.ru/integration/make_rdf.php and accepts the following parameters:

    As an example, run http://csdb.glycoscience.ru/integration/make_rdf.php?id_list=21720&mode=record&clean=0&db=bacterial&format=turtle and look at the RDF feed. Please note that the service is designed to keep the URLs in the total RDF feed valid, and is not proposed for bulk generation of data. In case you need the whole CSDB RDFized, download the RDF dump (ZIPped TTL-file generated on 2020 Sep 20).

    Structure translation

    Structure translator allows conversion of the structure in CSDB Linear notation to and from other glycan encoding languages.

    Translate from GlycoCT section provides a GlycoCT parser and converter of user input, and instructs how to use the web-API to convert data in the automated way. You can copy-paste a GlycoCT condensed code into the textbox and press Convert to run the translator.

    Translate from CSDB section allows translation of structures from CSDB Linear notation. Depending on the selected destination format, additional conversion controls are displayed. The destination formats are the following:

    Automated web-service section describes the translator web-API and its usage in the automated way.

    Data submission

    Users can submit data to CSDB on the data submission page. Please fill in the fields with published data only. The meaning of fields is briefly explained in the form, and additional tools are provided to help with author spelling, bibliographic, structural and taxonomic references, structure encoding and NMR spectra assignment. The obligatory fields are shown in bold. For more details please refer to Dump format help.

    After submission, errors are checked automatically, and the report is returned in a separate window, so you can correct the data and try submission again. If either compound or publication is already present in the database in a different context, cross-references will be generated automatically. In this case, it is advised to follow these references to check how the existing data correlate with those being submitted. After successful validation of the submitted record, a CSDB dump file is displayed and sent to the CSDB staff for manual curation, approval and upload.

    If you are familiar with the CSDB dump format and wish to submit massive blocks of data, the best way is to send your dump file to Philip Toukach and wait for the error report before the next step.

    Feedback

    The feedback form is proposed for contacts with the CSDB staff. Please, provide your name and e-mail address, and select a contact reason:

    Pressing the Submit feedback button sends data to the CSDB team. If your feedback implies an answer, please allow a ten days for processing.

    Data export

    This password-protected feature is located in the Maintenance subdivision and is proposed for export of the CSDB data in a record-oriented form (CSDB text dump file; see Dump format). To export, specify a range of CSDB record IDs and press the Make dump button. Use commas to separate IDs, e.g. 1,2, and hyphens to specify ID ranges, e.g. 1-10. If the Warn... checkbox is checked, warnings about missing IDs will be included in the dump. The data are output to the browser, from where you can copy them.

    Please note, that our server is not capable to run the massive export. The better way to export big slices (or all) of the database is to request a dump using a feedback form.

    Data validation

    There are three tools to check the data before importing: Test dump for all errors, Test dump for errors in the structure syntax, and Check a single structure.

    Total test is available from a Test dump option in the Maintenance submenu. The dump file you specify (see Dump format) is imported to a special test database running in parallel to the main one. A report returned to your browser lists all syntactical/logical errors and suspicious data that could be detected in the unmanned way. The test database is always cleared before the import, so the report will not contain errors that depend on the previously imported content (e.g. contradiction of taxon assignment or repeating frame positioning between a record in the imported dump file and a related one that exists in the main database imported from another dump file). The inclusion of non-critical warnings into the error report is controlled by Do not log non-critical warnings checkbox.

    Annotators are encouraged to always test their dumps using this feature, and correct the dumps until no errors are repoted before the submission to a curator. Only one dump can be tested simulatneously, so if you see a message that the test database is busy, please visit the Test dump page later. The checkbox Check NMR chemical shifts by comparison to simulated NMR data initiates this long check with the specified chemical shift and trust level tolerance. Such validation of the NMR data in the dump files is supposed to be run at least once upon preparing the data for import.

    The structure validation tool (Check structure element in the Maintenance submenu) is provided in parallel to Test dump because more than a half of errors in user dumps usually relate to structure encoding. Please provide a CSDB dump (see Dump format) containing structure(s), select an appropriate level of verboseness by checkboxes, and press the Check button to display a report. The validator checks the content of the ST1 and ST2 fields in the dump file for correct syntax, spelling of monomeric names, and chemical and topological allowability of the structure.

    If you uncheck the Display unexplained alias errors checkbox, error messages for structures containing an alias unexplained after double slash (Subst etc.) will not be included in the report. Unless the Display non-critical parsing warnings checkbox is checked, warnings that do not prevent a structure from parsing are not displayed.

    If a structure contains errors that can be corrected automatically (selection of the main chain, side chain order etc.) they are normalized, and a link to the dump copy with normalized structures is provided above the report. Please use this pre-processed dump as a basis for subsequent error corrections. Reports on structure normalization are output along with other messages if the Display structure auto-correction reports checkbox was checked.

    Check structure syntax tool on the same page is proposed for checking the structures separately from the dump files. Type or paste the CSDB Linear term in the Structure field and press the Check button to process it. The tool attempts parsing of the CSDB Linear syntax, displays a parsed data array and a normalized structure encoding (CSDB Linear code replay), and visualizes the structure in Sweet-DB and SNFG formats for easy comparison with figures in the annotated articles. Then it checks all the SMILES codes provided in Subst, Sug, and other aliases, generates the overall SMILES code and a combined structural formula for checking stereo-chemistry, aglycons, and other structural features.

    For batch application of the structure checking tool to bulk arrays of stuctures, please provide them as a newline-separated list in the uploaded file (see the form at the page bottom).

    Data import

    This password-protected feature is located in the Maintenance subdivision and is proposed for the import of data. The data should be provided as a file in a record-oriented form (CSDB text dump file; see Dump format). If the Clear database checkbox is checked, the whole database (except tables related to monomer namespace, aglycons, epitopes, and journals) will be cleared before import. Please use this checkbox to re-import a complete dump of CSDB only.

    Pressing the Import button starts the validation procedure for every record in the provided dump and imports data if no errors are found. By default, the validator reports results in a verbose form, i.e. its output contains critical error messages, non-critical warnings or successful import reports for every record. The Suppress non-critical warnings checkbox can be used to simplify the report by exclusion of warnings, which do not impede the data import. The generated report is output to the browser and can be used for subsequent analysis and correction of errors in the dump file.

    Glycosyltransferase and conformation modules of CSDB rely on their own subdatabases imported separately. To import glycosyltransferases, export UTF-8 text from a Microsoft Excel spreadsheet, where you annotated glycosyltransferase activity, remove all double quotes (except where they denote a linkage position with two apostrophs), and provide a file. Use Clear the GT database... checkbox to replace the content only if you have a merged dump file for the whole glycosyltrabnsferase subdatabase.

    To import conformation data, copy the XML and JSON files prepared by molecular dynamics trajectory processor to the conformation data folder on the CSDB server and press the Import button to import this folder into the conformation subdatabase.

    Database initialization

    This password-protected Maintenance feature is proposed for the database initialization. It rebuilds the database structure and clears its content. If the short mode of initialization is selected, the supplementary tables remain intact. Upon full initalization, these supplementary tables are prelimanarily filled from the special text files stored on the CSDB server. The four checkboxes (Add residue information..., Add journal information..., Add aglycon information..., Add epitope information...) control rebuilding of these tables. To replace the corresponding server-side text files (RESIDUES.TXT, JOURNALS.TXT, AGLYCONS.TXT, EPITOPES.TXT), please use the Upload... buttons below the form. After adding new data records to these files, you can refresh the related tables without clearing the rest of the database by unchecking the uppermost checkbox re-initialize database from backup.

    If not instructed otherwise (Initialize topology checkbox), the topology-related tables are not affected by the initialization. Upon initialization and subsequent import of the main database, the non-permament IDs (for compounds, articles, organisms, spectra, etc. - everything except CSDB record IDs) are subject to be re-assigned, thus cross-links from the glycosyltransferase and conformation modules become irrelevant. Due to this, the two mentioned subdatabases should be re-imported every time the initialization/import of the main database is finished.

    Update of lists

    The password-protected Update lists feature avilable in the Maintenance submenu forces the database engine to regenerate service files used in the web front-end: number of structures and publications, lists of genera, species, strains, organisms, journals etc. After operations, which affect the database content, this procedure is called automatically. Manual list update can be used for monitoring of the import process started in another browser window.

    Data removal

    This password-protected feature is located in the Maintenance submenu and allows removal of the specified record from the database. To delete records, specify the range of the CSDB record IDs and press the Delete button. If the Warn... checkbox is checked, warnings about missing IDs will be included in the report.

    When a record is deleted:

  • If an associated publication is not present in any other records, the publication, corresponding article ID, authors and other related data are also removed from the database.
  • If an associated compound is not present in any other records, the structure, corresponding compound ID, residue instances and rows in the connection table are also removed from the database.
  • If an associated organism is not present in any other records, the organism, corresponding organism ID, genus, species and other related data are also removed from the database.

    Home