jems 2020 mini banner 01

The long way to magnetic materials design run it fast

More
7 months 2 weeks ago #521 by jserra
Plenary Session
Friday, December 11
Speaker: Stefano Sanvito
Full Title: The long way to magnetic materials design run it fast

Please Log in or Create an account to join the conversation.

More
7 months 1 week ago #594 by SofiaAbrunhosa
Session questions:
  1. What is the current status of automated information search in the literature to construct train sets?
  2. Is there a database of molecules that can be used to explore their magnetic properties?
  3. How was the subset of training data chosen? Was it random or was there some criteria for that? For example in the alloys example, with CoMn, NiRh, FeNi
  4. How do you estimate the uncertainty of the model for random forests (first part)?

We kindly ask the speakers to reply to any questions left unanswered during the talk.

Please Log in or Create an account to join the conversation.

More
7 months 1 week ago #600 by Sanvito
> What is the current status of automated information search in the literature to construct train sets?

Yes, as I said at the moment there are several attempts at extracting data from literature using natural language processing (NPL). It is relatively easy to identify paragraphs that talk about a particular topic (say T_C). Most of this is done by a set of algorithms called logistic regression. Much more difficult is to extract the actual data automatically. Here you have to hard-code semantic rules and usually it gets complicated and very time consuming. There are algorithm to generate semantic rules, but they are difficult to use and require lots of training. We are working on these topics, but it will still take a while to have all this done. Other approaches (with some success) consist in extract information from figures and tables. Here the issue is with formats (how a picture is encrypted in a pdf) and with little standardisation.

Please Log in or Create an account to join the conversation.

More
7 months 1 week ago #601 by Sanvito
> Is there a database of molecules that can be used to explore their magnetic properties?

I cannot think about anything specific for magnetic properties. There are several databases for structural properties, e.g. the Cambridge one: www.ccdc.cam.ac.uk/products/csd/
There may be something in Pauling file ( paulingfile.com ), but I think it is mostly for inorganic stuff.

Please Log in or Create an account to join the conversation.

More
7 months 1 week ago #602 by Sanvito
>How was the subset of training data chosen? Was it random or was there some criteria for that? For example in the alloys example, with CoMn, NiRh, FeNi

The training set is chosen randomly out of the 2500 data point we have. We use 10-fold cross validation to train the model over different selections of random data. In fact we find that biasing the initial choice of the training set always makes the model less predictive. We also removed data about alloys because there were too many with respect to the pure phases and were biasing the model.
In the example I show for the three binary systems we use the model trained on the random selection of data. Note that the model was not trained specifically on that binary space, and it was exactly the same model predicting the three diagrams.
I think you can probably construct much more accurate models if you restrict to specific composition for which you have lots of data. For instance if you have a lot of info for a ternary or quaternary phase diagram, then probably you can make a very accurate model for that material system. The model, however, will work ONLY for that system.

Please Log in or Create an account to join the conversation.

More
7 months 1 week ago #603 by Sanvito
> How do you estimate the uncertainty of the model for random forests (first part)?

Yes, essentially the error is the variance in the predictions made by the various trees of the random forest model. A relatively standard way to predict the error in ML is to train multiple models on the same dataset (e.g. multiple neural networks trained on different random portions of your data). Then your actual prediction is the average and your error is the variance.

Please Log in or Create an account to join the conversation.

Time to create page: 0.303 seconds

Organization

INESC MN logo

Abreu Events - Lisbon Office

For general information about the congress, including registration, please contact us at:
 This email address is being protected from spambots. You need JavaScript enabled to view it.
 +351 21 415 6120